Analysis of daily streamflow from the southwestern United States

chaotic and noisy time series. Here, the prediction accuracy is defined as the correlation between the observed and pre- dicted values of a particular time series. The dotted line shows that the correlation does not decline for additive noise here white noise is superimposed on a periodic signal as one tries to forecast further into the future. In contrast, the solid line for a time series generated from a chaotic Henon map shows the declining signature charac- teristic of a chaotic sequence. For a detailed discussion on the Henon map including its stability and phase-space characteristics we refer to Ref. 19 . The correlation coeffi- cient of the Henon map prediction drops abruptly from 0.95 for T p ¼ 1 to 0.16 for T p ¼ 3. Such a sharp drop in the prediction accuracy is a characteristic signature of a chaotic signal. If there is a periodicity in the signal which is less than the maximum prediction lead time then the effects of the periodicity of the signal will show up in the prediction accuracy. To avoid such an influence, usually a difference time series is used 9,10 . On the other hand, the correlation coefficient for the noisy time series does not show such an exponential loss of information with prediction lead time. In the following section, we will explore the utility of this diagnostic tool to characterize the nature of daily streamflow. 3 PHASE-SPACE-BASED MODEL FOR STREAMFLOW PREDICTION

3.1 Analysis of daily streamflow from the southwestern United States

The dataset used in this study is described by Wallis et al. 20 . It consists of daily streamflow measurements from 1948 to 1988 for 1009 streamgages across the United States. All files are serially complete for 41 water years beginning in October 1948 and ending in September 1988. Missing data in the raw data records are estimated using simple prorating methods described in Wallis et al. 20 . First, eight stations are chosen from the southwestern United States covering three states: Arizona, California and New Mexico. Relevant information for eight selected Table 1. Characteristic attributes for streamgages from the southwestern United States Number Station identity Area km 2 Latitude Longitude Daily average flow rate m 3 s ¹ 1 Coefficient of variation of daily flows Average yield 10 5 m day ¹ 1 1 x102818 47 36.78 118.26 0.36 1.37 66.10 2 x112135 2465 36.86 118.97 42.86 1.69 150.20 3 q094710 3156 31.63 110.17 1.45 5.48 3.97 4 q094305 4826 33.06 108.54 4.45 2.69 7.97 5 q094975 7376 33.79 110.50 19.06 2.22 22.32 6 x094420 10381 32.97 109.31 5.21 2.97 4.33 7 q094985 11148 33.62 110.92 24.52 2.61 19.00 8 q094485 20442 32.87 109.51 13.16 3.51 5.56 Fig. 5. Daily time series of eight streamflow records from the southwestern United States described in Table 1, for 41 years 1948–1988. The vertical axis is flowrate m 3 s ¹ 1 . Phase-space analysis of daily streamflow 467 stations used in this study is summarized in Table 1. Here, basin yield is defined as the average flow rate per unit area. These stations are chosen to represent a wide range of basin areas from the same geographical region. Basin areas for the selected stations range between 47 and 20 442 km 2 . Station IDs with letter prefix ‘q’ indicates that there were no data gaps for the station in the raw USGS data file, while a prefix ‘x’ indicates that there were periods of missing data that were estimated by Wallis et al. 20 . Fig. 5 shows the variations of daily streamflow values for the eight streamflows for 41 years. It appears that the smaller basins 1 and 2 seem to have a pronounced annual cycle while the larger basins 3–8 do not show appreciable annual cycle. This feature can be clearly seen in Fig. 6, which shows the streamflows for the smallest and the largest basins during the first 5 years 1948–1953. The autocorrelation function for the selected eight basins are shown in Fig. 7. It appears that the two smallest basins 1 and 2 show apparent periodicity, while the other six do not show any clear periodic signature. This preliminary analysis suggests that streamflow records 1 and 2 are dominated by a seasonal cycle with added noise. On the other hand, a sharp decay in the autocorrelation function for the other six streamflow records suggests that their dynamics might be controlled either by random processes or by deterministic chaos. We will use phase-space model-based predictions to make a distinction between these two types of streamflow characteristics. Fig. 8a and b show three-dimensional phase-space maps for q10140 with two different values of lag time t. If the dimension of the underlying attractor is greater than three, a phase-space map in a three or lower dimension would appear as a cluster of points with no identifiable structure. It appears that the underlying dynamics for this time series q10140 has a higher dimensional attractor, and consequently the underlying structure is hidden. However, a higher dimensional phase- space map, although difficult to visualize, is expected to show structured pattern in the phase space. As discussed in Section 2, the first step in developing a phase-space model for streamflow signals involves the determination of optimum embedding dimensions from the daily streamflow time series. This is done by plotting the correlation coefficient between the observed and pre- dicted streamflows for T p ¼ 1 1-day ahead prediction as a function of embedding dimension. Fig. 9 shows the correlation coefficient for eight streamflows as a function of embedding dimension, M. We choose the optimum embedding dimension such that it produces the largest correlation coefficient for 1-day ahead prediction. For example, streamflow record 5 produces a maximum corre- lation coefficient of 0.85 for M ¼ 4 and hence for this streamflow four is chosen as the optimum embedding dimension. Optimum embedding dimensions found were 2, 3, 3, 4, 4, 7, 2 and 4 for streamflow records 1–8, respec- tively. An estimate of optimum embedding dimension provides an indication of the underlying complexity of the system. For example, in general, the larger the embedding dimension the greater is the underlying complexity. There is no apparent trend between the optimal embedding dimension and basin area. With these embedding dimension estimates, we are now set to make predictions. Fig. 10 shows the prediction accuracy for the selected stations as a function of the prediction lead time. For each of these streamflows we have made 1-day to 20-day ahead predictions. The two smallest basins show a very high degree of correlation between the observed and the pre- dicted sequence. This persistence in prediction accuracy may be considered analogous to periodic signal with Fig. 6. Daily time series of the smallest and the largest basin for the first five years 1948–1953. 468 Q. Liu et al. additive noise, as seen in the illustrative example of Fig. 4. Prediction accuracy for the other six stations show an exponential decay with the prediction lead time. Sugihara et al . 10 argued that such an exponential decline in prediction accuracy could arise from locally exponentially diverging trajectories and could be taken as an operational definition of chaos. A sharp decay of correlation between the observed and predicted streamflow records shown in Fig. 10 could thus suggest a possible presence of deterministic chaos. This serves as a preliminary evidence that the daily streamflow time series analyzed here show a change in dynamics as we increase the basin area. It appears to show a tendency to go from noisy dynamics to chaotic dynamics for increasing basin areas. As we explain below, the influence of other factors such as climate, topography, vegetation and soil texture could complicate this apparent relationship between basin area and streamflow characteristics. A direct implication of the results reported above is that increasing basin area does not necessarily imply increased linearity or enhanced predictability. This is somewhat counterintuitive. One could argue that a larger basin would spatially average small-scale fluctuations in forcing functions e.g. rainfall and basin attributes e.g. spatial variability in topography, soil texture. This averaging should reduce the dimension of the underlying dynamical system and consequently lead to increased streamflow pre- dictability. There does not appear to be any consistent reduction in the optimum embedding dimension as we increase the basin area. Another feature to note for these eight stations is that there appears to be a relationship between the yield average flow rate per unit area expressed as depth per day and basin dynamics. For higher yield, basin dynamics appear to be more predictable, whereas for lower yield it becomes more unpredictable. If one looks at the geographical locations of these basins, basins 1 and 2 are seen in the Sierra Nevada while the other six basins appear to be in the Gila and Salt River drainages. The Sierra Nevada area is dominated by winter storm fronts coming from the Pacific, and snow accumulation and snowmelt play a strong role in the hydrology of streamflow records 1 and 2. The streamflow records 3–8, on the other hand, are affected by more variable winter storms, Fig. 7. Autocorrelation function for the eight streamflow records from the southwestern United States as a function of lag days. Phase-space analysis of daily streamflow 469 by small-scale convective events, and by occasional intense, large area summer monsoons. Hence, there tends to be less persistence in these streamflow signals. Based on these hydrometeorological explanations, one could argue that shift in streamflow characteristics from noisy dynamics to low-dimensional determinism are sig- nificantly affected by variability and timing of atmo- spheric processes in this region. This, however, complicates the notion of increased linearity or enhanced predictability of streamflow with increasing area. As the results and inferences presented above are based on the analysis of eight select stations from a geographical region, further analysis with more streamgages from other regions are required before a generalized conclusion can be attempted.

3.2 Analysis of daily streamflow from the continental United States