chaotic and noisy time series. Here, the prediction accuracy is defined as the correlation between the observed and pre-
dicted values of a particular time series. The dotted line shows that the correlation does not decline for additive
noise here white noise is superimposed on a periodic signal as one tries to forecast further into the future. In
contrast, the solid line for a time series generated from a chaotic Henon map shows the declining signature charac-
teristic of a chaotic sequence. For a detailed discussion on the Henon map including its stability and phase-space
characteristics we refer to Ref.
19
. The correlation coeffi- cient of the Henon map prediction drops abruptly from 0.95
for T
p
¼ 1 to 0.16 for T
p
¼ 3. Such a sharp drop in the
prediction accuracy is a characteristic signature of a chaotic signal. If there is a periodicity in the signal which is less than
the maximum prediction lead time then the effects of the periodicity of the signal will show up in the prediction
accuracy. To avoid such an influence, usually a difference time series is used
9,10
. On the other hand, the correlation coefficient for the noisy time series does not show such an
exponential loss of information with prediction lead time. In the following section, we will explore the utility of this
diagnostic tool to characterize the nature of daily streamflow.
3 PHASE-SPACE-BASED MODEL FOR STREAMFLOW PREDICTION
3.1 Analysis of daily streamflow from the southwestern United States
The dataset used in this study is described by Wallis et al.
20
. It consists of daily streamflow measurements from 1948 to
1988 for 1009 streamgages across the United States. All files are serially complete for 41 water years beginning in
October 1948 and ending in September 1988. Missing data in the raw data records are estimated using simple prorating
methods described in Wallis et al.
20
. First, eight stations are chosen from the southwestern
United States covering three states: Arizona, California and New Mexico. Relevant information for eight selected
Table 1. Characteristic attributes for streamgages from the southwestern United States
Number Station
identity Area
km
2
Latitude Longitude
Daily average
flow rate m
3
s
¹ 1
Coefficient of variation
of daily flows
Average yield
10
5
m day
¹ 1
1 x102818
47 36.78
118.26 0.36
1.37 66.10
2 x112135
2465 36.86
118.97 42.86
1.69 150.20
3 q094710
3156 31.63
110.17 1.45
5.48 3.97
4 q094305
4826 33.06
108.54 4.45
2.69 7.97
5 q094975
7376 33.79
110.50 19.06
2.22 22.32
6 x094420
10381 32.97
109.31 5.21
2.97 4.33
7 q094985
11148 33.62
110.92 24.52
2.61 19.00
8 q094485
20442 32.87
109.51 13.16
3.51 5.56
Fig. 5. Daily time series of eight streamflow records from the southwestern United States described in Table 1, for 41 years 1948–1988.
The vertical axis is flowrate m
3
s
¹ 1
.
Phase-space analysis of daily streamflow 467
stations used in this study is summarized in Table 1. Here, basin yield is defined as the average flow rate per unit area.
These stations are chosen to represent a wide range of basin areas from the same geographical region. Basin areas for the
selected stations range between 47 and 20 442 km
2
. Station IDs with letter prefix ‘q’ indicates that there were no data
gaps for the station in the raw USGS data file, while a prefix ‘x’ indicates that there were periods of missing data that
were estimated by Wallis et al.
20
. Fig. 5 shows the variations of daily streamflow values for
the eight streamflows for 41 years. It appears that the smaller basins 1 and 2 seem to have a pronounced
annual cycle while the larger basins 3–8 do not show appreciable annual cycle. This feature can be clearly seen
in Fig. 6, which shows the streamflows for the smallest and the largest basins during the first 5 years 1948–1953. The
autocorrelation function for the selected eight basins are shown in Fig. 7. It appears that the two smallest basins 1
and 2 show apparent periodicity, while the other six do not show any clear periodic signature. This preliminary analysis
suggests that streamflow records 1 and 2 are dominated by a seasonal cycle with added noise. On the other hand, a sharp
decay in the autocorrelation function for the other six streamflow records suggests that their dynamics might be
controlled either by random processes or by deterministic chaos. We will use phase-space model-based predictions to
make a distinction between these two types of streamflow characteristics. Fig. 8a and b show three-dimensional
phase-space maps for q10140 with two different values of lag time t. If the dimension of the underlying attractor is
greater than three, a phase-space map in a three or lower dimension would appear as a cluster of points with no
identifiable structure. It appears that the underlying dynamics for this time series q10140 has a higher
dimensional attractor, and consequently the underlying structure is hidden. However, a higher dimensional phase-
space map, although difficult to visualize, is expected to show structured pattern in the phase space.
As discussed in Section 2, the first step in developing a phase-space model for streamflow signals involves the
determination of optimum embedding dimensions from the daily streamflow time series. This is done by plotting
the correlation coefficient between the observed and pre- dicted streamflows for T
p
¼ 1 1-day ahead prediction as
a function of embedding dimension. Fig. 9 shows the correlation coefficient for eight streamflows as a function
of embedding dimension, M. We choose the optimum embedding dimension such that it produces the largest
correlation coefficient for 1-day ahead prediction. For example, streamflow record 5 produces a maximum corre-
lation coefficient of 0.85 for M ¼ 4 and hence for this streamflow four is chosen as the optimum embedding
dimension. Optimum embedding dimensions found were 2, 3, 3, 4, 4, 7, 2 and 4 for streamflow records 1–8, respec-
tively. An estimate of optimum embedding dimension provides an indication of the underlying complexity of the
system. For example, in general, the larger the embedding dimension the greater is the underlying complexity. There is
no apparent trend between the optimal embedding dimension and basin area. With these embedding dimension
estimates, we are now set to make predictions.
Fig. 10 shows the prediction accuracy for the selected stations as a function of the prediction lead time. For each
of these streamflows we have made 1-day to 20-day ahead predictions. The two smallest basins show a very high
degree of correlation between the observed and the pre- dicted sequence. This persistence in prediction accuracy
may be considered analogous to periodic signal with
Fig. 6. Daily time series of the smallest and the largest basin for the first five years 1948–1953.
468 Q. Liu
et al.
additive noise, as seen in the illustrative example of Fig. 4. Prediction accuracy for the other six stations show an
exponential decay with the prediction lead time. Sugihara et al
.
10
argued that such an exponential decline in prediction accuracy could arise from locally exponentially diverging
trajectories and could be taken as an operational definition of chaos. A sharp decay of correlation between the observed
and predicted streamflow records shown in Fig. 10 could thus suggest a possible presence of deterministic chaos. This
serves as a preliminary evidence that the daily streamflow time series analyzed here show a change in dynamics as we
increase the basin area. It appears to show a tendency to go from noisy dynamics to chaotic dynamics for increasing
basin areas. As we explain below, the influence of other factors such as climate, topography, vegetation and soil
texture could complicate this apparent relationship between basin area and streamflow characteristics.
A direct implication of the results reported above is that increasing basin area does not necessarily imply increased
linearity or enhanced predictability. This is somewhat counterintuitive. One could argue that a larger basin
would spatially average small-scale fluctuations in forcing functions e.g. rainfall and basin attributes e.g. spatial
variability in topography, soil texture. This averaging should reduce the dimension of the underlying dynamical
system and consequently lead to increased streamflow pre- dictability. There does not appear to be any consistent
reduction in the optimum embedding dimension as we increase the basin area. Another feature to note for these
eight stations is that there appears to be a relationship between the yield average flow rate per unit area expressed
as depth per day and basin dynamics. For higher yield, basin dynamics appear to be more predictable, whereas
for lower yield it becomes more unpredictable. If one looks at the geographical locations of these basins, basins
1 and 2 are seen in the Sierra Nevada while the other six basins appear to be in the Gila and Salt River drainages. The
Sierra Nevada area is dominated by winter storm fronts coming from the Pacific, and snow accumulation and
snowmelt play a strong role in the hydrology of streamflow records 1 and 2. The streamflow records 3–8, on the
other hand, are affected by more variable winter storms,
Fig. 7.
Autocorrelation function for the eight streamflow records from the southwestern United States as a function of lag days.
Phase-space analysis of daily streamflow 469
by small-scale convective events, and by occasional intense, large area summer monsoons. Hence, there
tends to be less persistence in these streamflow signals. Based on these hydrometeorological explanations, one
could argue that shift in streamflow characteristics from noisy dynamics to low-dimensional determinism are sig-
nificantly affected by variability and timing of atmo- spheric
processes in
this region.
This, however,
complicates the notion of increased linearity or enhanced predictability of streamflow with increasing area. As the
results and inferences presented above are based on the analysis of eight select stations from a geographical
region, further analysis with more streamgages from other regions are required before a generalized conclusion
can be attempted.
3.2 Analysis of daily streamflow from the continental United States