Directory UMM :Data Elmu:jurnal:A:Agricultural Systems:Vol65.Issue1.Jul2000:
                                                                                Agricultural Systems 65 (2000) 43±72
www.elsevier.com/locate/agsy
Short survey
Scaling-up crop models for climate variability
applications$
J.W. Hansen a,*, J.W. Jones b
a
International Research Institute for Climate Prediction, PO Box 1000, Palisades,
NY 10964-8000, USA
b
Agricultural and Biological Engineering Department, University of Florida, PO Box 110570,
Gainesville, FL 32611-0570, USA
Received 9 February 2000; received in revised form 2 June 2000; accepted 6 June 2000
Abstract
Although most dynamic crop models have been developed and tested for the scale of a
homogeneous plot, applications related to climate variability are often at broader spatial
scales that can incorporate considerable heterogeneity. This study reviews issues and approaches related to applying crop models at scales larger than the plot. Perfect aggregate prediction at larger scales requires perfect integration of a perfect model across the range of
variability of perfect input data. Aggregation error results from imperfect integration of heterogeneous inputs, and includes distortions of either spatial mean values of predictions or
year-to-year variability of the spatial means. Approaches for reducing aggregation error
include sampling input variability in geographic or probability space, and calibration of
model inputs or outputs. Implications of scale and spatial interactions for model structure and
complexity are a matter of ongoing debate. Large-scale crop model applications must address
limitations of soil, weather and management data. Distortion of weather sequences from
spatial averaging is a particular danger. A case study of soybean in the state of Georgia, USA,
illustrates several crop model scaling approaches. # 2000 Elsevier Science Ltd. All rights
reserved.
Keywords: Aggregation; GIS; Simulation; Yield forecasting; Calibration
$
Florida Agricultural Experiment Station, Journal Series No. R-07058.
* Correspopnding author. Tel.: +1-914-680-4410; fax: +1-914-680-4864.
E-mail address: [email protected] (J.W. Hansen).
0308-521X/00/$ - see front matter # 2000 Elsevier Science Ltd. All rights reserved.
PII: S0308-521X(00)00025-1
44
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
Nomenclature. De®nitions of variable and subscript symbols
Variables
A
Area of the region of interest
AWHC Plant-available water-holding capacity in a soil pro®le or unit soil
depth
e, e, eParticular input, vector of inputs at a point, and spatial average
vector of model inputs, respectively
Calibrated, eective input value and vector, respectively, of calie , e
brated inputs
Model of response (e.g. crop yield) to the environment
f(.)
Univariate or multivariate probability density function
g(.)
p, p
Proportion(s) of the area of interest in an individual and vector,
respectively, of crops or land uses
R
Spatial region
V
Variance±covariance matrix
x, z
Horizontal spatial coordinates
yt,i, y-, Y Crop yield or other response predicted at point i in year t, predicted
spatial average in year t, and predicted average over space and time,
respectively
a, b
Intercept and slope of a least-squares linear regression
y
Volumetric soil water content
g
Calibration multiplier for AWHC
r
Crop stand density
s, s
Actual and predicted standard deviation, respectively, among years
of actual and predicted spatial average yields
t,i, t;i ; U Actual crop yield or other response at point i in year t, averaged over
space in year t, and averaged over space and time, respectively
Subscripts
m, i
Number of spatial points or units, and an individual spatial point or
unit
n, t
Number of years, and an individual year
L, U, S Indicators of lower, drained upper, and saturated critical values of y,
respectively
T
A time trend
C
Value corrected by a post-simulation adjustment
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
45
1. Introduction
Dynamic, process-level crop models are playing an increasing role in translating
information about climate variability into predictions and recommendations at a
range of scales, tailored to the needs of agricultural decision makers. Although such
models have generally been developed and tested at the scale of a homogeneous plot,
decision makers often need information at broader spatial scales where the
assumption of a homogeneous environment does not hold, and at higher system
levels where dierent constraints operate. Growing interest in precision agriculture
has brought awareness that farmers deal with spatial heterogeneity even at a ®eld
scale. Policy makers are often concerned with climate impacts at district, watershed,
national or broader scales. Crop model applications related to climate prediction
depend critically on the assumption that the models can capture the year-to-year
pattern of response to climate variability.
In this paper, we review methods for scaling up crop model predictions, and
summarize challenges that arise when applying dynamic crop models developed for
the plot level to broader spatial scales and higher system levels. A case study of
soybean in Georgia, USA, illustrates several of these approaches. Our focus is on
applications for which response to interannual climate variability is important.
2. The spatial aggregation problem
2.1. Variability in space and time
Crops are produced in an environment that varies both in space and in time.
Scaling up entails applying models that assume a homogeneous environment (i.e. a
point in space) to larger areas that can encompass a considerable range of spatial
variability. Inputs to crop simulation models typically include daily weather, soil
properties (including topography and initial conditions) and management (including
cultivar characteristics); collectively these inputs de®ne the environment of the
modeled system. All three types of environment inputs vary spatially. The spatial
heterogeneity of a given environment input can be represented by its distribution in
either geographic (e.g. in a geographical information system [GIS]) or probability
(e.g. as a probability distribution) space (Band et al., 1991; King, 1991). Data collection methods or formats of existing data bases are likely to dictate the method of
representing heterogeneity.
Geostatistics (Webster, 1985; Oliver and Webster, 1991) provides a useful perspective of variability in space. A variogram describes the variance of the stochastic
component of spatial variability of a property as a function of distance between
sample points (the lag). Variance that increases with lag expresses the spatial
dependence of regionalized variables. In other words, ``places that are near to each
other are more alike than those that are further away'' (Curran et al., 1997, p. 1).
Spatial dependence both permits and constrains inferences of properties based on
proximity to measurements. If central tendency and dispersion of a property do not
46
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
exhibit systematic spatial trends (i.e. are second-order stationary), then the variogram
will reach a maximum (the sill) at a ®nite lag (the range). The range marks the limits
of spatial dependence and, importantly, of spatial interpolation. The variogram
often approaches the ordinate at some positive variance (the nugget) that represents
small-scale, spatially independent random variation. Characteristics of a variogram
depend on sample area, the minimum and maximum lag sampled, and sometimes
the direction of sampling. The variogram is the basis for a family of optimal point
and areal interpolation techniques known collectively as kriging.
Crop yields at a particular point in space vary from season to season primarily
because of the temporal variability of weather. Because of interactions between
plants, soil and weather, spatial patterns of yields can also vary between years. Some
aspects of variability are analogous in space and time (e.g. means, trends, autocorrelated and purely random variability). However, variability in time occurs in a
single dimension, and is characterized by directional dependence; current variability
can be in¯uenced by past, but not future variability.
Possible interactions of the spatial and temporal dimensions of crop model applications can complicate notation, statistical description and interpretation. For
example, root mean squared error (RMSE) of crop yield predictions (y) could be
calculated based on deviations from observed yields () accumulated over time
represented by n years:
RMSEtime  nÿ1
n
X
yt; ÿ t;; 2 1=2 ;
1
t1
over space represented by m ®elds or measurement plots:
RMSEspace  mÿ1
m
X
y;i ÿ ;i 2 1=2 ;
2
i1
or by a combination of time and space:
RMSEcombined  m nÿ1
m
n X
X
yi; j ÿ i; j 2 1=2 :
3
t1 i1
Variables are de®ned in the Nomenclature. We assume that applications of crop
models related to climate variability and prediction are concerned primarily with
response to interannual variability of areal-average model results over some region
of interest (e.g. Eq. (1)), rather than spatial patterns of variability or combined
spatial and temporal statistics. However, valid predictions of yields for a particular
year or crop response to interannual climate variability averaged over a region
depends on appropriate representation of variability of inputs in space.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
47
2.2. Perfect aggregation
``Perfect aggregation'' (Iwasa et al., 1987) can be represented as integration of
some response (e.g. crop yield) across the range of variability of the environment
within some given region. Aggregation of an extensive variable (e.g. crop production)
yields a spatial total, whereas aggregation of an intensive variable (e.g. crop yield)
results in a spatial average. Although the discussion below focuses on crop yields, it
is applicable to any model response variable.
Let the function t=f(et) represent actual crop-yield response in year t to a particular set of environmental inputs (i.e. weather, soil and management conditions)
represented by et. Because et varies over (x, z) space (et  et x; z)), f(et) also varies
spatially. The average yield t over a two-dimensional region Rt in a given year t can
be obtained by dividing the aggregate yield, integrated over (x, z) space, by the area
At of the region (adapted from King, 1991):
f et x; zdx dz:
4
t  Aÿ1
t
Rt
Assuming a perfect model f(et) and perfect characterization of et(x, z), integration of
Eq. (4) will give the true spatial mean yield. We discuss the implications of model
and input errors later.
Alternatively, the variability of et can be expressed as a multivariate probability
distribution represented by the density function, g(et). Average response can then be
expressed as a statistical expectation (adapted from Rastetter et al., 1992):
5
t  Ef et   f et  g et  det :
et
The probabilistic formulation (Eq. (5)) is equivalent to the spatial formulation (Eq.
(4)) only if f(et) is a spatially independent process, meaning that interactions between
neighboring spatial units do not alter response (Band et al., 1991). We will discuss
violations of this assumption later.
2.3. Bias of spatial average
Simulation studies have often inferred regional crop response to climate variability based on one or a few ``representative locations''. ``Representative'' suggests
that the environment et at that location approximates the average environment et of
the region or subregion of interest. Even if the locations are truly representative in
this sense, yields simulated at representative locations will not generally represent
either the spatial average or the interannual variability of regional yields because of
aggregation error.
One form of aggregation error occurs when the mean of a nonlinear response
to a heterogeneous environment is estimated as the response to the mean environment et :
48
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
y t  f et ;
6
where:
et  Aÿ1
et x; zdx dz;
7
R
and yt is an estimate of t . If et is heterogeneous and f(et) is either concave or convex through the range of variability of et, then yt will be a biased estimate of t .
The degree of bias (yt ÿt ) is a function of the distribution of et and the curvature
of f(et).
The problem is easy to visualize for a simple case where heterogeneity of the
environment is represented by two equally probable scalar values, e1 and e2. Fig. 1
shows rainfed soybean (`Bragg') yield response, f , to stand density simulated
by CROPGRO V.3.5 (Boote et al., 1998) using 1995 weather data for Gainesville,
FL, USA, parameters for a Millhopper ®ne sand, and typical planting date (15 June)
and row spacing (91 cm). To keep the example simple, we assume that stand density
matched a target of 22 mÿ2 (2) on one-half of an otherwise homogeneous ®eld, but
was only 2 mÿ2 (1) on the other half due to poor germination, giving a mean density () of 12 mÿ2. The `true' mean yield () for the ®eld is the mean of yield
responses to the high ( f (22)=2.76 Mg haÿ1) and low stand densities ( f (2)=1.72
Mg haÿ1), or 2.24 Mg haÿ1. If we ignored stand heterogeneity and simulated yield
response to average stand density of 12 mÿ2 to obtain 2.59 Mg haÿ1, we would
underestimate by 13%. Such use of response to mean inputs to estimate mean
response to heterogeneous inputs is sometimes called the ``fallacy of the averages''
(Templeton and Lawlor, 1981).
Fig. 1. Aggregation error of simulated soybean yield due to heterogeneity of stand density, Gainesville,
FL, 1995.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
49
2.4. Bias of temporal variability
A second and often neglected result of using one or a few representative points to
model response to a spatially heterogeneous environment is a tendency to overestimate interannual variability. We can envision a region as having a ®nite number
(m) of spatially distributed time series of annual yields for, for example, individual
plants or ®elds. The interannual standard deviation of the regional average yield
can be expressed in matrix form as:
 pT Vp1=2
8
(Helstrom, 1991), where p is a vector of fractions of the area occupied by each of the
m time series, and V is the mm variance±covariance matrix of the individual ®elds
or plants. For the simple case of two spatially segregated time series, 1 and, 2,
occurring in proportions p1 and p2, with standard deviation 1 and 2 and crosscorrelation r12, the mean standard deviation of the individual time series:
s  p1 1  p2 2 ;
will overestimate the standard deviation of the aggregate time series:
ÿ
1=2
;
 p21 12  p22 22  2r12 p1 p2 1 2
unless, 1 and, 2 are perfectly correlated (i.e. r12=1; van Noordwijk et al., 1994).
As a simple, hypothetical illustration of the problem, we assume that the `true'
average yield of maize in North Florida, USA, in any given year is the simple average (i.e. p1=p2=0.5) of yields simulated with weather data from Lake City and
Ocala (Fig. 2). CERES-Maize V.3.5 (Ritchie et al., 1998) simulated 1976±1995 yields
using observed precipitation and temperature data (EarthInfo, 1996), and solar
irradiance generated stochastically (Hansen, 1999) using parameters calculated from
Fig. 2. Weather station locations in North Florida, USA.
50
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
observations at Jacksonville, FL (NREL, 1992). The soil (Millhopper ®ne sand),
cultivar (`McCurdy 84aa'), planting date (15 April) and planting arrangement (8 mÿ2
in 50 cm rows) are consistent with conditions and practices within the region.
Simulated yields showed a slight negative correlation (r=ÿ0.109) between the two
locations. Although the `true' regional mean through time (7.55 Mg haÿ1) is simply
the average of the means simulated for Lake City (7.61 Mg haÿ1) and Ocala (7.49
Mg haÿ1), the standard deviations simulated for either Lake City (1.35 Mg haÿ1) or
Ocala (1.81 Mg haÿ1) seriously overestimate the `true' standard deviation for the
region (1.07 Mg haÿ1; Fig. 3).
Many regional crop-simulation studies show a tendency to over predict observed
interannual yield variability (Mearns et al., 1992; Rosenberg et al., 1992; Moen et
al., 1994; Meinke and Hammer, 1995; Chipanshi et al., 1998; Rosenthal et al., 1998).
Some (Mearns et al.; Meinke and Hammer; Rosenthal et al.) have attributed this
Fig. 3. Simulated maize yields for (A) Lake City and (B) Ocala, FL, and (C) their mean, 1976±1995.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
51
bias to aggregation error, while others (Chipanshi et al.) have explained it in terms
of model errors, such as excessive sensitivity to water de®cit in dry years and pest
and disease eects in wet years, that the models do not capture.
2.5. Emergent properties and processes
Moving from a homogeneous plot to a ®eld, farm, regional landscape or the biosphere involves more than incorporation of additional environmental heterogeneity
associated with increasing spatial scale. Each of these represents a level in the hierarchy of agroecosystems. New properties and processes emerge at each system level
as a result of new components (e.g. human and economic subsystems) or interactions among neighboring components of the system (e.g. intercrop competition).
Interactions among neighboring components can violate the validity of the probabilistic representation of the aggregation problem (Eq. (5)). Lateral ¯ow of water,
solutes and sediment emerges as a potential determinant of crop performance at
the ®eld level. Interactions among intercropped species can also be important within
a ®eld. Farm resource allocation, and human goals and decisions constrain crop
production at a farm scale. Water allocation and competing land uses are constraints that emerge at regional scales. Hierarchy theory (O'Neill et al., 1986; MuÈller,
1992) and Eq. (8) predict that agricultural systems at increasing scale should become
less sensitive to high-frequency disturbances (e.g. interannual climate variability) in
favor of lower-frequency signals (e.g. long-term climate change).
3. Aggregation approaches
Our understanding of the nature and sources of error associated with increasing spatial scale suggests several potential approaches for controlling or minimizing
the eects of those errors (King, 1991; Luxmoore et al., 1991; Rastetter et al.,
1992). These approaches fall under the broad categories of input sampling and
calibration.
3.1. Input sampling
Input sampling involves simulating a response repeatedly using dierent sets of
inputs sampled in a manner that captures enough of the heterogeneity of the environment to reduce aggregation error to an acceptable level. Perfect aggregation by
analytical integration over geographic (Eq. (4)) or probability space (Eq. (5)) is
intractable for most dynamic, process-oriented crop models. Input sampling methods can be viewed as numerical approximations of perfect aggregation. Averaging
simulation results across spatial grid cells approximates the solution of Eq. (4) by
Euler integration. Predictions based on stochastic sampling can approximate the
solution of Eq. (5) by Monte Carlo integration (King, 1991). Aggregation using
iterative input sampling and simulation is generally data- and computationally
intensive. Sensitivity analysis is a useful step to identify variables that are likely to
52
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
contribute to aggregation errors. Analyses of the linearity of response to the expected range of input values (Rastetter et al., 1992) and sensitivity of mean response to
the variance of inputs (Addiscott, 1993) provide an idea of the potential for aggregation bias due to heterogeneity of particular inputs.
3.1.1. Sampling in geographic space
Spatial dependence of variability of regionalized variables is the basis for partitioning a region into smaller, relatively homogeneous spatial units. Validity of the
common assumption that variability within spatial units is negligible (Burke et al.,
1991; Haskett et al., 1995b; de Jager et al., 1998), and therefore the utility of spatial
partitioning, depends on the nugget variance and the range of spatial dependence of
the particular variable relative to the size of the spatial partitions. Spatial patterns of
inputs can account for much of the dependence between jointly distributed input
variables. Resulting reduction of aggregation error will depend on the proportion of
variability that the spatial partitions account for, and on the accuracy of the estimates of mean values of inputs within each unit.
GIS automate the management, analysis and display of spatial information.
Vector-based GIS partition the environment into polygons of arbitrary shape
representing, for example, soil map units, crop reporting districts or nearest weather
station theissen polygons. Although boundaries of dierent input types (e.g. soil
maps, crop reporting districts, weather station theissen polygons) generally do not
coincide, GIS support overlaying of polygons of dierent input variables to create
new polygons of unique input combinations. Raster-based GIS partition the environment into regularly shaped and sized cells based more on convenience than on
natural boundaries. The one-to-one spatial correspondence of dierent variables in
a raster-based GIS simpli®es analyses. Input data formats or sampling methods (e.g.
combine yield monitors, remotely sensed land use and climate data, output from
dynamic atmospheric models) sometimes favor raster representation. Regional
applications of crop models have used both vector (van Lanen et al., 1992; Thornton
et al., 1995; Rosenthal et al., 1998) and raster (Carbone et al., 1996; Thornton et al.,
1997b; de Jager et al., 1998) GIS for managing soil and weather inputs, automating
spatial averaging and visualizing spatial patterns of results. The widespread applicability and bene®ts of GIS have prompted development of generic tools that link
crop models and GIS packages (reviewed by Hartkamp et al., 1999).
3.1.2. Sampling in probability space
Spatial aggregation can, in principle, be accomplished by repeated simulations
using stochastically sampled inputs whose spatial heterogeneity is represented by
probability distributions. Once heterogeneity of inputs is measured and represented
by univariate or multivariate probability distributions, techniques collectively known
as Monte Carlo simulation are available to derive and analyze distributions of simulation results. Exhaustive sampling may be feasible for discrete distributions with
manageable numbers of values (e.g. planting dates). Independent random sampling is
the most common approach when several input distributions are involved or when
some input distributions are continuous. Independent random sampling from several
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
53
input distributions can require large numbers of simulation runs to achieve a given
level of con®dence in output distribution parameters.
Latin hypercube sampling (McKay et al., 1979; Stein, 1987) oers a more ecient
alternative to independent random sampling. Each of k input distributions is
strati®ed into l equal-probability classes, which are sampled without replacement and
combined randomly into l unique input vectors, requiring only l simulation runs
compared to lk runs for the same intensity of independent sampling with replacement.
Methods are available to either eliminate spurious correlation among independent
variables or to impose correlation among jointly distributed variables (Iman and
Conover, 1982; Owen, 1994). Latin hypercube sampling has been applied to characterizing uncertainty of simulated crop yield and NOÿ
3 -leaching (Bouma et al., 1996),
and has been proposed as a means of spatially aggregating crop and forestry yield
simulations under spatial heterogeneity (Luxmoore, 1988; Luxmoore et al., 1991).
3.2. Regional calibration
Crop model predictions usually bene®t from local calibration. As discussed previously, even a perfect model that is calibrated at a plot scale will yield biased
aggregate predictions if the heterogeneity of the environment is not adequately
characterized and sampled. If historic data are available for the response variable
and region of interest, biases can be characterized and corrected through calibration of model inputs or outputs. Calibration can correct for multiple and hidden
sources of error (Rastetter et al., 1992). However, calibration precludes predictive
validation using the same data set (Addiscott, 1993). A common solution is to divide
the data, calibrate with one subset, then validate with a dierent subset of observed
data (Power, 1993).
3.2.1. Calibration of model inputs
The planting density response example that we used to illustrate the spatial averaging problem (Fig. 1) can also illustrate how input calibration can correct mean
aggregation bias. Observed mean aggregate yield response U replaces the modelderived mean aggregate response. In this simple case, calibration represents the
entire region of interest with a single, derived value, =fÿ1( U ), that has no direct
relationship to any measured values of . Using the same assumptions and the
simulated aggregate yield of 2.24 Mg haÿ1 implies an eective stand density
(=6.01 mÿ2) that falls between the low (1=2 mÿ2) and mean ( =12 mÿ2) stand
densities. Simulated response to the single eective stand density will then match the
aggregate response to the heterogeneous densities observed in the ®eld. Calibration
should involve the mean of several years of observed yields for the region of interest.
For applications involving climate variability, eective e for multiple input variables can be obtained by minimizing interannual prediction error (e.g. RMSE) using
a nonlinear optimization algorithm.
Rastetter et al. (1992) recommended input calibration alone or in combination
with other aggregation methods whenever observed data are available and aggregation error is suspect. However, in the context of hydrological models, Beven (1989)
54
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
argued that the use of a single eective value to represent a heterogeneous parameter
is likely to invalidate both the physical interpretation of the parameter and the
structure of the ®ne-scale model.
3.2.2. Calibration of model outputs
Systematic prediction errors are easily corrected by calibration of model outputs.
The correction factor approach corrects mean bias by multiplying each yield prediction by U=Y (Haskett et al., 1995b; Russell and van Gardingen, 1997). Although
the multiplicative adjustment results in a proportional change in the predicted standard deviation, it does not attempt to correct interannual variability. Simulated
yields adjusted by a least-squares linear correction:
yC;t   yt
9
(Kunkel and Hollinger, 1991; Rosenthal et al., 1998) minimizes squared prediction
error (e.g. RMSE) by removing its systematic component, leaving only unsystematic
or random error. However, the standard deviation of the corrected series will
generally be lower than that of the observed time series. An alternative linear
correction:
yC;t  t ÿ Ys=  U;
10
reproduces the mean and standard deviation of the observed series, but with higher
prediction error. This correction may be preferable to Eq. (9) for risk studies where
preserving interannual variability is more important than minimizing prediction
error.
Historic crop yield data often display time trends in central tendency and sometimes interannual variability. Time trends are usually attributed to changes in technology or land-use patterns. The most frequent way to deal with yield time trends
is to derive a trend, yT, t, as a parametric (Swanson and Nyankori, 1979; Carlson et
al., 1996; Mjelde and Keplinger, 1998) or smoothing (Nicholls, 1985; Hansen et al.,
1998) function of the observed time series, t. Smoothing techniques separate the
relatively high-frequency response to weather variability from the lower-frequency
response to technology and other factors (Hansen et al.). If appears stationary,
then, t can be detrended to a year b basis by an additive adjustment:
yC;t  t  yT;b ÿ yT;t:
11
If changes in proportion to yT, a multiplicative adjustment:
yC;t  t yT;b =yT;t ;
12
will simultaneously correct the trends of and . Alternatively, the ®tted trend can
be imposed on the simulated time series (Kunkel and Hollinger, 1991; Supit, 1997;
Fagerberg et al., 1998).
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
55
Although weather is often regarded as stationary, changes in land use and crop
production have been linked to time trends in precipitation (Viglizzo et al., 1995)
and temperatures (Bell and Fischer, 1994). Crop models will presumably capture the
eects of such climatic trends. The dierence or ratio of mean observed yields and
yields simulated with ®xed management represents the component of yield trends
due to factors other than weather, and may provide a superior technology trend
adjustment (Bell and Fischer).
4. Dealing with imperfect models
The discussion of aggregation error holds true even when models and their inputs
are perfect. However, crop models are not perfect. Increasing the spatial domain of
an analysis often introduces new constraints that are controllable in plot-scale studies, and new processes that result from spatial interactions or the emergence of new
system components, that can invalidate regional predictions from plot-scale models.
Scaling up may, at some point, involve modifying available models (``phenomenonadded modeling'' [Luxmoore et al., 1991]) to incorporate these new constraints and
processes. Relevant model modi®cations could include changing model structure or
embedding model code or output into a model of a larger-scale processes.
4.1. Model complexity
Two opposing schools of thought seem to exist regarding the implications of scale
for model structure. The ®rst suggests that appropriate model complexity should
increase with spatial scale because moving from plot to larger scales usually introduces additional determinants of crop production. Rabbinge (1993) classi®ed levels
of crop production by the factors that limit production: potential production limited
only by irradiance, temperature and CO2; attainable production limited also by water
and macronutrients; and actual production limited also by pests and toxic factors.
The evolution of crop models has paralleled these levels of production. Models
capable of simulating potential production processes (i.e. photosynthesis, respiration, partitioning, phenology) were developed ®rst, then modi®ed by the addition of
models of the soil water balance, and later N dynamics and use. Recent or ongoing
attempts to incorporate additional determinants of actual production include models of P dynamics (Gerakis et al., 1998), drainage (Shen et al., 1998), and response to
pests, diseases (Teng et al., 1998) and various soil factors that constrain root growth
(Lizaso and Ritchie, 1997; Calmon et al., 1999). As models incorporate additional
processes, they tend to grow in complexity and data requirements.
An opposing school of thought argues that simpler models are more appropriate
for higher system levels and broader spatial scales (Beven, 1989; Addiscott, 1998;
Heuvelink, 1998; Jansen, 1998). The ®rst argument is that simpler models tend to
have more modest input requirements, reducing errors associated with the uncertainties of input values. Second, simple empirical models often reduce nonlinearities
that cause aggregation bias. Finally, ®ltering of high-frequency signals at large
56
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
spatial scales eliminates the need for detailed models of ®ne-scale (e.g. cellular)
processes with a small time constant.
In our opinion, a hybrid approach may often be appropriate. The physiological
detail in existing process-level models captures much of the mechanism of crop
response to weather variability and its interaction with management, and should not
be discarded without clear justi®cation. On the other hand, incomplete understanding of processes, and excessive model complexity and data requirements would
seem to preclude the development and use of models for regional applications that
simulate physiological mechanisms of all important yield determinants. As an
example of the hybrid approach, R.A.C. Mitchell (IACR, Hertfordshire, UK, personal communication, 1999) used dynamic models to simulate 1980±1993 winter
wheat yields for 48 variety trials in the UK. Simulations explained a small portion of
the interannual pattern of mean yields (Table 1). A simple linear regression function
of rainfall during grain-®ll, and minimum temperatures during the coldest three
consecutive days gave better predictions (r=0.59). However, simulations corrected
with regression results improved predictions relative to simulations or regression
alone. The post-simulation adjustment accounted for processes (presumably diseases
and winter kill) that the crop models did not capture.
4.2. Spatial interactions
We can envision three processes Ð surface and subsurface hydrology, intercrop
competition, and farm resource allocation Ð in which dynamic interactions in space
can modify crop yields. One obvious approach to simulating each of these processes
is to embed existing crop models within a model of the higher-level system. This
would require the ability to iteratively model, on a daily time step, the processes (e.g.
lateral water movement, intercrop competition, or farm resource allocation) of the
higher-level system, then simulate resource uptake and physiological processes for
the crops in each spatial unit. In the intercrop and farm examples, the overall system
model must be able to handle dierent crop species with possibly dissimilar model
structures (Caldwell and Hansen, 1993). Current crop models are typically structured to simulate an entire growing season for one crop species. Embedding these
models into models of three-dimensional hydrology, intercrop competition or farm
Table 1
Predictability of 1980±1993 winter wheat yields from UK variety trials without and with an empirical
climatic correctiona
Model
CERES
MAFF
a
b
Without correction
With correction
%RMSEb
r
%RMSEb
r
32
9
0.05
0.31
7
6
0.68
0.76
Source: R.A.C. Mitchell, IACR, Hertfordshire, UK, personal communication, 1999.
RMSE as percent of mean; r, linear cross-correlation.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
57
operations would require restructuring the models so dierent crops can be simulated in parallel (Caldwell and Hansen; Jones et al., 1997; Sadler and Russell, 1997;
Thornton et al., 1997a). Although our experience in modeling multiple cropping
systems proves that it is possible, the diculty of reorganizing model code and the
need to repeat the exercise for each model revision suggests that embedded models
of these higher-level systems will not be sustainable without a commitment on the
part of the crop modeling community to develop and maintain an appropriate
modular structure.
5. Dealing with imperfect data
The availability of input data of adequate quality and spatial coverage is perhaps
the most serious practical constraint to applications of crop models at regional or
larger scales (de Wit and van Keulen, 1987; Russell and van Gardingen, 1997;
Heuvelink, 1998). King et al. (1997, p. 143) argued that ``upscaling to larger areas
invariably means a loss in the precision and observation density of data used to
parameterize a model. It also raises questions about the suitability of applying the
model at a scale dierent from the one for which it was developed.'' Although we
hold a more optimistic view, each type of input Ð soil, weather and management Ð
presents dicult challenges. The high cost of physical measurement at the desired
density for regional model applications generally necessitates interpolation from
sparse measurements or estimation from more readily available surrogate data.
Where existing spatial data bases are available, inconsistent spatial coverage and
boundaries between soils, weather stations or climate model grid cells, and crop
reporting districts present additional challenges.
5.1. Soil
Applications of agricultural and environmental simulation models at regional and
larger scales rely heavily on spatial soil data bases to account for heterogeneity of
important soil properties. Although regional model applications typically treat soil
map units as homogeneous regions described by a single set of soil parameter values,
soil properties within a map unit can vary considerably. For example, the median
CV of plant-available water-holding capacity (AWHC) within soil associations in
the STATSGO (Reybold and TeSelle, 1989) soil data base ranged from 40 to 60%
for ®ve states in the northeast USA (Lathrop et al., 1995). For two counties in New
Jersey, the mean CV (coecient of variation) was about 25% within soil series in the
more detailed SSURGO (Reybold and TeSelle) data base. Twelve soil properties
showed a wide range of variability and generally non-normal distributions within a
single map unit in Missouri, USA (Young et al., 1998). Warrick (1998) grouped soil
properties by their typical ranges of within-®eld CV: 50% (i.e. saturated and unsaturated hydraulic conductivity, in®ltration rate, solute concentrations).
58
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
Soil heterogeneity within map units has important implications for agricultural
simulation applications. In a 350 kmÿ2 study area in New Jersey, USA, the forest
production model, PnET, under predicted mean evapotranspiration by 16% and
mean primary production by 17% using rasterized soil inputs from STATSGO
relative to predictions using inputs derived from the higher resolution SSURGO
spatial soil data base (Lathrop et al., 1995). In a simulation study of the hydrologic
response of a grassland watershed to precipitation, mean soil parameter values generated only 14% of the mean runo obtained from partitioning the watershed into
14 soil texture classes (Sharma and Luxmoore, 1979). However, using mean soil
properties had little eect on simulated evapotranspiration. Other studies have
shown insensitivity of mean simulated soil evaporation (Lewan and Jansson, 1996)
and rice yield (Wopereis et al., 1996) to spatial heterogeneity of soil properties.
Luxmoore et al. (1991, p. 286) therefore cautioned that ``in some special situations
mean behavior may be representative of the whole, but this cannot be assumed.''
Fortunately, growing appreciation of the importance of soil heterogeneity for model
applications and improving data-storage capabilities are prompting calls and eorts
to include information about the variability of parameters within map units in soil
data bases (Arnold and Wilding, 1991; Burrough, 1993; Lathrop et al.; Finke et al.,
1996; Young et al., 1998).
Although higher-resolution soil data can potentially reduce aggregation bias
associated with heterogeneity of soil properties, the higher-resolution data do
not account for all important variability, and are often not available. When
soil data bases include pro®le properties and areal proportions corresponding to
multiple pedons within each map unit, parameter values for the individual soils
can be used iteratively as model input. Simulation results are then aggregated
by areal weighting. Alternatively, the properties can be ®t to theoretical distributions for stochastic sampling (Shaer, 1988; Haskett et al., 1995a; Bouma et al.,
1996).
Because of the generally uncharacterized variability of soil parameters within an
association or series map unit, the prospect of calibrating eective values of soil
parameters is appealing. AWHC is an important determinant of crop response to
climate variability. Although it can be measured, it is usually estimated from soil
physical properties (Ritchie and Crumb, 1989; Tietje and Tapkenhinrichs, 1993;
Ritchie et al., 2000) and the depth and vertical distribution of the root system.
Kunkel and Hollinger (1991) achieved good interannual (1979±1990) predictions of
regional maize and soybean yields for nine states in the Midwestern USA by calibrating maximum rooting depth Ð one means of adjusting AWHC Ð for a representative soil in each crop reporting district. To simulate regional yields of barley in
Sweden, Fagerberg et al. (1998) simulated yields on a clay soil with 196 mm and a
sandy soil with 70 mm AWHC. They calibrated mean eective AWHC by adjusting
the areal proportions of the two soils for each reporting district. Paz et al. (1998)
demonstrated that CROPGRO could account for much of the spatial and temporal
variability of soybean response to water stress within a ®eld when spatially varying
maximum rooting depth and saturated hydraulic conductivity were calibrated from
yield maps.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
59
5.2. Weather
Weather data Ð observed, estimated or predicted Ð are central to crop model
applications related to climate variability and prediction. Simulations at locations
far from measured data, or where essential variables or periods are missing, must
rely on estimated data. The most common estimate is to simply use the nearest
weather station as a proxy for unmeasured weather at the location of interest. For
regional applications using spatial partitioning, theissen polygons provide an automatic method for identifying the nearest station to any geographical point and for
obtaining areal weighting factors for each station (Carbone et al., 1996; Rosenthal et
al., 1998). Remote sensing oers some promise for ®lling gaps in surface-measured
weather data. Because solar irradiance data has been a problem for crop model
applications due to the cost and calibration problems of sensors, the prospect of
using global data bases of satellite-derived solar irradiance (Whitlock et al., 1995)
directly or to parameterize stochastic generators is appealing.
Although spatial averaging and interpolation are sometimes used to estimate daily
weather, spatial averaging biases the variability of daily time-series data. Because of
the many nonlinear processes that they embody, crop models are sensitive not only
to mean climate, but also to its variability within and between seasons (Semenov and
Porter, 1995; Mearns et al., 1996; Riha et al., 1996). This is particularly important
for precipitation because of its in¯uences on processes, such as solute leaching, soil
erosion and crop water stress response, that depend on soil water balance dynamics.
A simple example illustrates the potential problem. We estimated 1976±1995 daily
weather data (i.e. observed temperatures and precipitation [EarthInfo, 1996]) and
solar irradiance generated (Hansen, 1999) using parameters derived from Jacksonville data (NREL, 1992) for Gainesville, FL, using inverse-distance-weighted averages from four surrounding stations (Fig. 2). Although the interpolation procedure
produced reasonable estimates of monthly total rainfall, it seriously over-predicted
mean wet-day intensity and under-predicted relative frequency of wet days for all
calendar months (Fig. 4).
An arti®cial increase in rainfall frequency and decrease in mean intensity due to
spatial aggregation may have two contrasting eects on soil water availability and
crop yield response. On the one hand, frequent low-intensity showers do not
recharge soil water reserves in deeper layers, but favor increased evaporation from
the soil surface, thereby increasing water stress (de Wit and van Keulen, 1987).
On the other hand, increasing the frequency of rainfall events tends to reduce the
duration of dry periods between rain events, thereby decreasing the probability of
water stress. Simulation studies under arti®cially imposed changes in climate variability suggest that conditions of low mean rainfall, high potential evaporation and
high AWHC favor the ®rst mechanism (increasing soil evaporation), resulting in
negative yield bias, whereas higher mean rainfall and lower AWHC favor positive
bias (Carbone, 1993; Mearns et al., 1996; Riha et al., 1996).
Returning to our previous example, we examined simulated maize yield response
to interpolated weather data for Gainesville, and to observed weather for each of the
®ve North Florida stations. Using parameters for a Millhopper ®nd sand, a 15 April
60
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
Fig. 4. Monthly mean (A) rainfall total, (B) wet-day intensity, and (C) relative frequency of wet days for
Gainesville, FL, 1976±1995, observed and interpolated from surrounding stations.
planting date, and realistic cultivar and management inputs, the CERES-Maize V.
3.5 model predicted higher mean grain yields with less interannual variability using
interpolated weather than using observed weather for any of the ®ve weather stations (Table 2). Applying inverse distance interpolation to yields simulated for each
station other than Gainesville resulted in more realistic mean yields, but with a low
standard deviation. As our previous discussion suggests, the lower standard deviation is probably more representative of regional yields.
In spite of ongoing debate about whether atmospheric circulation models simulate
processes at grid points or averaged over grid cells (Skelly and Henderson-Sellers,
1996), such models have shown rather consistent tendencies to over-predict rainfall
occurrence and under-predict intensity (Mearns et al., 1990, 1995). Other studies
have considered the implications of averaging observed weather data into rectangular grid cells at resolutions that are typical of atmospheric models used to predict
61
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
Table 2
Simulated maize yield statistics for locations and interpolation schemes, North Florida, USA, 1976±1995a
Source
Ocala (OC)
Lake City (LC)
Cross City (CC)
Jacksonville (JA)
Gainesville (GA)
Interpolated weather
Interpolated yields
Y (Mg haÿ1)
7.49
7.61
7.38
5.76
6.71
8.19
7.18
s (Mg haÿ1)
1.81
1.35
1.90
2.10
1.89
0.95
1.08
CV (%)
24.2
17.7
25.8
36.4
28.2
11.6
15.1
Linear cross-correlation (r)
OC
LC
CC
JA
GA
1.000
ÿ0.109
0.222
0.310
0.356
1.000
ÿ0.110
0.208
0.569
1.000
0.452
0.277
1.000
0.384
1.000
a
Weather and simulated yields were interpolated for Gainesville from the other locations by inversedistance weighted averaging; Y,s, CV, mean, standard deviation and coecient of variation across years
of spatial average yield.
long-term change and seasonal variability. In a 1.6 million km2 region in the
Central USA, average (1953±1975) soybean yields simulated by SOYGRO
using grid cell-averaged weather showed an average bias ranging from 18.5 (22
grid) to 28.0% (55 grid) relative to the simple average of yields simulated for
each of >500 stations in the region (Carbone, 1993). Larger errors in individual
grid cells or individual years tended to cancel each other. Easterling et al. (1998)
found that errors in simulating reported (1984±1992) maize and wheat yields in a
portion of the US Great Plains decreased as resolution increased from 2.82.8 to
11 . Increasing resolution further to 0.50.5 did not further improve predictions.
These illustrations highlight the importance of spatial and temporal downscaling of
climate model output in a manner that preserves both the meaningful features
of the model predictions and the statistical properties of the historical daily
sequences.
5.3. Management
Crop management inputs typically considered include crop species and cultivar;
planting date and spatial arrangement; irrigation, fertilizer and sometimes biocide
applications; and sometimes land preparation and tillage. Spatial heterogeneity of
management can contribute to aggregation bias. Because management is seldom
consistent from year to year, spatial representations of management variables are
not generally available. Typical or recommended practices are therefore often
applied uniformly within a region.
If a region includes a mixture of irrigated and rainfed production of a particular
crop, knowing the relative areas of each will be important. Areas of irrigated and
rainfed production are sometimes, but not always, available. In rainfed production,
small changes in the timing of water shortage relative to critical periods of crop
growth sometimes have profound eects on yields. Farmers therefore often diversify
planting date and cultivar to reduce risk. The use of one or more representative
62
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
cultivars is often a reasonable approximation. Alternatively, eective cultivars can
be derived by calibration against observed development and yield data at the spatial
scale of interest. Hodges et al. (1987) used this approach quite successfully to calibrate nine eective maize cultivars for CERES-Maize from crop reporting district
data. They then selected the best of the nine eective cultivars for each of 51 weather
locations to simulate regional 1982±1985 maize yields in 14 states in the US northern
Midwest. Studies have shown that using several planting dates within the reported
range can improve regional yield predictions relative t
                www.elsevier.com/locate/agsy
Short survey
Scaling-up crop models for climate variability
applications$
J.W. Hansen a,*, J.W. Jones b
a
International Research Institute for Climate Prediction, PO Box 1000, Palisades,
NY 10964-8000, USA
b
Agricultural and Biological Engineering Department, University of Florida, PO Box 110570,
Gainesville, FL 32611-0570, USA
Received 9 February 2000; received in revised form 2 June 2000; accepted 6 June 2000
Abstract
Although most dynamic crop models have been developed and tested for the scale of a
homogeneous plot, applications related to climate variability are often at broader spatial
scales that can incorporate considerable heterogeneity. This study reviews issues and approaches related to applying crop models at scales larger than the plot. Perfect aggregate prediction at larger scales requires perfect integration of a perfect model across the range of
variability of perfect input data. Aggregation error results from imperfect integration of heterogeneous inputs, and includes distortions of either spatial mean values of predictions or
year-to-year variability of the spatial means. Approaches for reducing aggregation error
include sampling input variability in geographic or probability space, and calibration of
model inputs or outputs. Implications of scale and spatial interactions for model structure and
complexity are a matter of ongoing debate. Large-scale crop model applications must address
limitations of soil, weather and management data. Distortion of weather sequences from
spatial averaging is a particular danger. A case study of soybean in the state of Georgia, USA,
illustrates several crop model scaling approaches. # 2000 Elsevier Science Ltd. All rights
reserved.
Keywords: Aggregation; GIS; Simulation; Yield forecasting; Calibration
$
Florida Agricultural Experiment Station, Journal Series No. R-07058.
* Correspopnding author. Tel.: +1-914-680-4410; fax: +1-914-680-4864.
E-mail address: [email protected] (J.W. Hansen).
0308-521X/00/$ - see front matter # 2000 Elsevier Science Ltd. All rights reserved.
PII: S0308-521X(00)00025-1
44
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
Nomenclature. De®nitions of variable and subscript symbols
Variables
A
Area of the region of interest
AWHC Plant-available water-holding capacity in a soil pro®le or unit soil
depth
e, e, eParticular input, vector of inputs at a point, and spatial average
vector of model inputs, respectively
Calibrated, eective input value and vector, respectively, of calie , e
brated inputs
Model of response (e.g. crop yield) to the environment
f(.)
Univariate or multivariate probability density function
g(.)
p, p
Proportion(s) of the area of interest in an individual and vector,
respectively, of crops or land uses
R
Spatial region
V
Variance±covariance matrix
x, z
Horizontal spatial coordinates
yt,i, y-, Y Crop yield or other response predicted at point i in year t, predicted
spatial average in year t, and predicted average over space and time,
respectively
a, b
Intercept and slope of a least-squares linear regression
y
Volumetric soil water content
g
Calibration multiplier for AWHC
r
Crop stand density
s, s
Actual and predicted standard deviation, respectively, among years
of actual and predicted spatial average yields
t,i, t;i ; U Actual crop yield or other response at point i in year t, averaged over
space in year t, and averaged over space and time, respectively
Subscripts
m, i
Number of spatial points or units, and an individual spatial point or
unit
n, t
Number of years, and an individual year
L, U, S Indicators of lower, drained upper, and saturated critical values of y,
respectively
T
A time trend
C
Value corrected by a post-simulation adjustment
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
45
1. Introduction
Dynamic, process-level crop models are playing an increasing role in translating
information about climate variability into predictions and recommendations at a
range of scales, tailored to the needs of agricultural decision makers. Although such
models have generally been developed and tested at the scale of a homogeneous plot,
decision makers often need information at broader spatial scales where the
assumption of a homogeneous environment does not hold, and at higher system
levels where dierent constraints operate. Growing interest in precision agriculture
has brought awareness that farmers deal with spatial heterogeneity even at a ®eld
scale. Policy makers are often concerned with climate impacts at district, watershed,
national or broader scales. Crop model applications related to climate prediction
depend critically on the assumption that the models can capture the year-to-year
pattern of response to climate variability.
In this paper, we review methods for scaling up crop model predictions, and
summarize challenges that arise when applying dynamic crop models developed for
the plot level to broader spatial scales and higher system levels. A case study of
soybean in Georgia, USA, illustrates several of these approaches. Our focus is on
applications for which response to interannual climate variability is important.
2. The spatial aggregation problem
2.1. Variability in space and time
Crops are produced in an environment that varies both in space and in time.
Scaling up entails applying models that assume a homogeneous environment (i.e. a
point in space) to larger areas that can encompass a considerable range of spatial
variability. Inputs to crop simulation models typically include daily weather, soil
properties (including topography and initial conditions) and management (including
cultivar characteristics); collectively these inputs de®ne the environment of the
modeled system. All three types of environment inputs vary spatially. The spatial
heterogeneity of a given environment input can be represented by its distribution in
either geographic (e.g. in a geographical information system [GIS]) or probability
(e.g. as a probability distribution) space (Band et al., 1991; King, 1991). Data collection methods or formats of existing data bases are likely to dictate the method of
representing heterogeneity.
Geostatistics (Webster, 1985; Oliver and Webster, 1991) provides a useful perspective of variability in space. A variogram describes the variance of the stochastic
component of spatial variability of a property as a function of distance between
sample points (the lag). Variance that increases with lag expresses the spatial
dependence of regionalized variables. In other words, ``places that are near to each
other are more alike than those that are further away'' (Curran et al., 1997, p. 1).
Spatial dependence both permits and constrains inferences of properties based on
proximity to measurements. If central tendency and dispersion of a property do not
46
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
exhibit systematic spatial trends (i.e. are second-order stationary), then the variogram
will reach a maximum (the sill) at a ®nite lag (the range). The range marks the limits
of spatial dependence and, importantly, of spatial interpolation. The variogram
often approaches the ordinate at some positive variance (the nugget) that represents
small-scale, spatially independent random variation. Characteristics of a variogram
depend on sample area, the minimum and maximum lag sampled, and sometimes
the direction of sampling. The variogram is the basis for a family of optimal point
and areal interpolation techniques known collectively as kriging.
Crop yields at a particular point in space vary from season to season primarily
because of the temporal variability of weather. Because of interactions between
plants, soil and weather, spatial patterns of yields can also vary between years. Some
aspects of variability are analogous in space and time (e.g. means, trends, autocorrelated and purely random variability). However, variability in time occurs in a
single dimension, and is characterized by directional dependence; current variability
can be in¯uenced by past, but not future variability.
Possible interactions of the spatial and temporal dimensions of crop model applications can complicate notation, statistical description and interpretation. For
example, root mean squared error (RMSE) of crop yield predictions (y) could be
calculated based on deviations from observed yields () accumulated over time
represented by n years:
RMSEtime  nÿ1
n
X
yt; ÿ t;; 2 1=2 ;
1
t1
over space represented by m ®elds or measurement plots:
RMSEspace  mÿ1
m
X
y;i ÿ ;i 2 1=2 ;
2
i1
or by a combination of time and space:
RMSEcombined  m nÿ1
m
n X
X
yi; j ÿ i; j 2 1=2 :
3
t1 i1
Variables are de®ned in the Nomenclature. We assume that applications of crop
models related to climate variability and prediction are concerned primarily with
response to interannual variability of areal-average model results over some region
of interest (e.g. Eq. (1)), rather than spatial patterns of variability or combined
spatial and temporal statistics. However, valid predictions of yields for a particular
year or crop response to interannual climate variability averaged over a region
depends on appropriate representation of variability of inputs in space.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
47
2.2. Perfect aggregation
``Perfect aggregation'' (Iwasa et al., 1987) can be represented as integration of
some response (e.g. crop yield) across the range of variability of the environment
within some given region. Aggregation of an extensive variable (e.g. crop production)
yields a spatial total, whereas aggregation of an intensive variable (e.g. crop yield)
results in a spatial average. Although the discussion below focuses on crop yields, it
is applicable to any model response variable.
Let the function t=f(et) represent actual crop-yield response in year t to a particular set of environmental inputs (i.e. weather, soil and management conditions)
represented by et. Because et varies over (x, z) space (et  et x; z)), f(et) also varies
spatially. The average yield t over a two-dimensional region Rt in a given year t can
be obtained by dividing the aggregate yield, integrated over (x, z) space, by the area
At of the region (adapted from King, 1991):
f et x; zdx dz:
4
t  Aÿ1
t
Rt
Assuming a perfect model f(et) and perfect characterization of et(x, z), integration of
Eq. (4) will give the true spatial mean yield. We discuss the implications of model
and input errors later.
Alternatively, the variability of et can be expressed as a multivariate probability
distribution represented by the density function, g(et). Average response can then be
expressed as a statistical expectation (adapted from Rastetter et al., 1992):
5
t  Ef et   f et  g et  det :
et
The probabilistic formulation (Eq. (5)) is equivalent to the spatial formulation (Eq.
(4)) only if f(et) is a spatially independent process, meaning that interactions between
neighboring spatial units do not alter response (Band et al., 1991). We will discuss
violations of this assumption later.
2.3. Bias of spatial average
Simulation studies have often inferred regional crop response to climate variability based on one or a few ``representative locations''. ``Representative'' suggests
that the environment et at that location approximates the average environment et of
the region or subregion of interest. Even if the locations are truly representative in
this sense, yields simulated at representative locations will not generally represent
either the spatial average or the interannual variability of regional yields because of
aggregation error.
One form of aggregation error occurs when the mean of a nonlinear response
to a heterogeneous environment is estimated as the response to the mean environment et :
48
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
y t  f et ;
6
where:
et  Aÿ1
et x; zdx dz;
7
R
and yt is an estimate of t . If et is heterogeneous and f(et) is either concave or convex through the range of variability of et, then yt will be a biased estimate of t .
The degree of bias (yt ÿt ) is a function of the distribution of et and the curvature
of f(et).
The problem is easy to visualize for a simple case where heterogeneity of the
environment is represented by two equally probable scalar values, e1 and e2. Fig. 1
shows rainfed soybean (`Bragg') yield response, f , to stand density simulated
by CROPGRO V.3.5 (Boote et al., 1998) using 1995 weather data for Gainesville,
FL, USA, parameters for a Millhopper ®ne sand, and typical planting date (15 June)
and row spacing (91 cm). To keep the example simple, we assume that stand density
matched a target of 22 mÿ2 (2) on one-half of an otherwise homogeneous ®eld, but
was only 2 mÿ2 (1) on the other half due to poor germination, giving a mean density () of 12 mÿ2. The `true' mean yield () for the ®eld is the mean of yield
responses to the high ( f (22)=2.76 Mg haÿ1) and low stand densities ( f (2)=1.72
Mg haÿ1), or 2.24 Mg haÿ1. If we ignored stand heterogeneity and simulated yield
response to average stand density of 12 mÿ2 to obtain 2.59 Mg haÿ1, we would
underestimate by 13%. Such use of response to mean inputs to estimate mean
response to heterogeneous inputs is sometimes called the ``fallacy of the averages''
(Templeton and Lawlor, 1981).
Fig. 1. Aggregation error of simulated soybean yield due to heterogeneity of stand density, Gainesville,
FL, 1995.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
49
2.4. Bias of temporal variability
A second and often neglected result of using one or a few representative points to
model response to a spatially heterogeneous environment is a tendency to overestimate interannual variability. We can envision a region as having a ®nite number
(m) of spatially distributed time series of annual yields for, for example, individual
plants or ®elds. The interannual standard deviation of the regional average yield
can be expressed in matrix form as:
 pT Vp1=2
8
(Helstrom, 1991), where p is a vector of fractions of the area occupied by each of the
m time series, and V is the mm variance±covariance matrix of the individual ®elds
or plants. For the simple case of two spatially segregated time series, 1 and, 2,
occurring in proportions p1 and p2, with standard deviation 1 and 2 and crosscorrelation r12, the mean standard deviation of the individual time series:
s  p1 1  p2 2 ;
will overestimate the standard deviation of the aggregate time series:
ÿ
1=2
;
 p21 12  p22 22  2r12 p1 p2 1 2
unless, 1 and, 2 are perfectly correlated (i.e. r12=1; van Noordwijk et al., 1994).
As a simple, hypothetical illustration of the problem, we assume that the `true'
average yield of maize in North Florida, USA, in any given year is the simple average (i.e. p1=p2=0.5) of yields simulated with weather data from Lake City and
Ocala (Fig. 2). CERES-Maize V.3.5 (Ritchie et al., 1998) simulated 1976±1995 yields
using observed precipitation and temperature data (EarthInfo, 1996), and solar
irradiance generated stochastically (Hansen, 1999) using parameters calculated from
Fig. 2. Weather station locations in North Florida, USA.
50
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
observations at Jacksonville, FL (NREL, 1992). The soil (Millhopper ®ne sand),
cultivar (`McCurdy 84aa'), planting date (15 April) and planting arrangement (8 mÿ2
in 50 cm rows) are consistent with conditions and practices within the region.
Simulated yields showed a slight negative correlation (r=ÿ0.109) between the two
locations. Although the `true' regional mean through time (7.55 Mg haÿ1) is simply
the average of the means simulated for Lake City (7.61 Mg haÿ1) and Ocala (7.49
Mg haÿ1), the standard deviations simulated for either Lake City (1.35 Mg haÿ1) or
Ocala (1.81 Mg haÿ1) seriously overestimate the `true' standard deviation for the
region (1.07 Mg haÿ1; Fig. 3).
Many regional crop-simulation studies show a tendency to over predict observed
interannual yield variability (Mearns et al., 1992; Rosenberg et al., 1992; Moen et
al., 1994; Meinke and Hammer, 1995; Chipanshi et al., 1998; Rosenthal et al., 1998).
Some (Mearns et al.; Meinke and Hammer; Rosenthal et al.) have attributed this
Fig. 3. Simulated maize yields for (A) Lake City and (B) Ocala, FL, and (C) their mean, 1976±1995.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
51
bias to aggregation error, while others (Chipanshi et al.) have explained it in terms
of model errors, such as excessive sensitivity to water de®cit in dry years and pest
and disease eects in wet years, that the models do not capture.
2.5. Emergent properties and processes
Moving from a homogeneous plot to a ®eld, farm, regional landscape or the biosphere involves more than incorporation of additional environmental heterogeneity
associated with increasing spatial scale. Each of these represents a level in the hierarchy of agroecosystems. New properties and processes emerge at each system level
as a result of new components (e.g. human and economic subsystems) or interactions among neighboring components of the system (e.g. intercrop competition).
Interactions among neighboring components can violate the validity of the probabilistic representation of the aggregation problem (Eq. (5)). Lateral ¯ow of water,
solutes and sediment emerges as a potential determinant of crop performance at
the ®eld level. Interactions among intercropped species can also be important within
a ®eld. Farm resource allocation, and human goals and decisions constrain crop
production at a farm scale. Water allocation and competing land uses are constraints that emerge at regional scales. Hierarchy theory (O'Neill et al., 1986; MuÈller,
1992) and Eq. (8) predict that agricultural systems at increasing scale should become
less sensitive to high-frequency disturbances (e.g. interannual climate variability) in
favor of lower-frequency signals (e.g. long-term climate change).
3. Aggregation approaches
Our understanding of the nature and sources of error associated with increasing spatial scale suggests several potential approaches for controlling or minimizing
the eects of those errors (King, 1991; Luxmoore et al., 1991; Rastetter et al.,
1992). These approaches fall under the broad categories of input sampling and
calibration.
3.1. Input sampling
Input sampling involves simulating a response repeatedly using dierent sets of
inputs sampled in a manner that captures enough of the heterogeneity of the environment to reduce aggregation error to an acceptable level. Perfect aggregation by
analytical integration over geographic (Eq. (4)) or probability space (Eq. (5)) is
intractable for most dynamic, process-oriented crop models. Input sampling methods can be viewed as numerical approximations of perfect aggregation. Averaging
simulation results across spatial grid cells approximates the solution of Eq. (4) by
Euler integration. Predictions based on stochastic sampling can approximate the
solution of Eq. (5) by Monte Carlo integration (King, 1991). Aggregation using
iterative input sampling and simulation is generally data- and computationally
intensive. Sensitivity analysis is a useful step to identify variables that are likely to
52
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
contribute to aggregation errors. Analyses of the linearity of response to the expected range of input values (Rastetter et al., 1992) and sensitivity of mean response to
the variance of inputs (Addiscott, 1993) provide an idea of the potential for aggregation bias due to heterogeneity of particular inputs.
3.1.1. Sampling in geographic space
Spatial dependence of variability of regionalized variables is the basis for partitioning a region into smaller, relatively homogeneous spatial units. Validity of the
common assumption that variability within spatial units is negligible (Burke et al.,
1991; Haskett et al., 1995b; de Jager et al., 1998), and therefore the utility of spatial
partitioning, depends on the nugget variance and the range of spatial dependence of
the particular variable relative to the size of the spatial partitions. Spatial patterns of
inputs can account for much of the dependence between jointly distributed input
variables. Resulting reduction of aggregation error will depend on the proportion of
variability that the spatial partitions account for, and on the accuracy of the estimates of mean values of inputs within each unit.
GIS automate the management, analysis and display of spatial information.
Vector-based GIS partition the environment into polygons of arbitrary shape
representing, for example, soil map units, crop reporting districts or nearest weather
station theissen polygons. Although boundaries of dierent input types (e.g. soil
maps, crop reporting districts, weather station theissen polygons) generally do not
coincide, GIS support overlaying of polygons of dierent input variables to create
new polygons of unique input combinations. Raster-based GIS partition the environment into regularly shaped and sized cells based more on convenience than on
natural boundaries. The one-to-one spatial correspondence of dierent variables in
a raster-based GIS simpli®es analyses. Input data formats or sampling methods (e.g.
combine yield monitors, remotely sensed land use and climate data, output from
dynamic atmospheric models) sometimes favor raster representation. Regional
applications of crop models have used both vector (van Lanen et al., 1992; Thornton
et al., 1995; Rosenthal et al., 1998) and raster (Carbone et al., 1996; Thornton et al.,
1997b; de Jager et al., 1998) GIS for managing soil and weather inputs, automating
spatial averaging and visualizing spatial patterns of results. The widespread applicability and bene®ts of GIS have prompted development of generic tools that link
crop models and GIS packages (reviewed by Hartkamp et al., 1999).
3.1.2. Sampling in probability space
Spatial aggregation can, in principle, be accomplished by repeated simulations
using stochastically sampled inputs whose spatial heterogeneity is represented by
probability distributions. Once heterogeneity of inputs is measured and represented
by univariate or multivariate probability distributions, techniques collectively known
as Monte Carlo simulation are available to derive and analyze distributions of simulation results. Exhaustive sampling may be feasible for discrete distributions with
manageable numbers of values (e.g. planting dates). Independent random sampling is
the most common approach when several input distributions are involved or when
some input distributions are continuous. Independent random sampling from several
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
53
input distributions can require large numbers of simulation runs to achieve a given
level of con®dence in output distribution parameters.
Latin hypercube sampling (McKay et al., 1979; Stein, 1987) oers a more ecient
alternative to independent random sampling. Each of k input distributions is
strati®ed into l equal-probability classes, which are sampled without replacement and
combined randomly into l unique input vectors, requiring only l simulation runs
compared to lk runs for the same intensity of independent sampling with replacement.
Methods are available to either eliminate spurious correlation among independent
variables or to impose correlation among jointly distributed variables (Iman and
Conover, 1982; Owen, 1994). Latin hypercube sampling has been applied to characterizing uncertainty of simulated crop yield and NOÿ
3 -leaching (Bouma et al., 1996),
and has been proposed as a means of spatially aggregating crop and forestry yield
simulations under spatial heterogeneity (Luxmoore, 1988; Luxmoore et al., 1991).
3.2. Regional calibration
Crop model predictions usually bene®t from local calibration. As discussed previously, even a perfect model that is calibrated at a plot scale will yield biased
aggregate predictions if the heterogeneity of the environment is not adequately
characterized and sampled. If historic data are available for the response variable
and region of interest, biases can be characterized and corrected through calibration of model inputs or outputs. Calibration can correct for multiple and hidden
sources of error (Rastetter et al., 1992). However, calibration precludes predictive
validation using the same data set (Addiscott, 1993). A common solution is to divide
the data, calibrate with one subset, then validate with a dierent subset of observed
data (Power, 1993).
3.2.1. Calibration of model inputs
The planting density response example that we used to illustrate the spatial averaging problem (Fig. 1) can also illustrate how input calibration can correct mean
aggregation bias. Observed mean aggregate yield response U replaces the modelderived mean aggregate response. In this simple case, calibration represents the
entire region of interest with a single, derived value, =fÿ1( U ), that has no direct
relationship to any measured values of . Using the same assumptions and the
simulated aggregate yield of 2.24 Mg haÿ1 implies an eective stand density
(=6.01 mÿ2) that falls between the low (1=2 mÿ2) and mean ( =12 mÿ2) stand
densities. Simulated response to the single eective stand density will then match the
aggregate response to the heterogeneous densities observed in the ®eld. Calibration
should involve the mean of several years of observed yields for the region of interest.
For applications involving climate variability, eective e for multiple input variables can be obtained by minimizing interannual prediction error (e.g. RMSE) using
a nonlinear optimization algorithm.
Rastetter et al. (1992) recommended input calibration alone or in combination
with other aggregation methods whenever observed data are available and aggregation error is suspect. However, in the context of hydrological models, Beven (1989)
54
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
argued that the use of a single eective value to represent a heterogeneous parameter
is likely to invalidate both the physical interpretation of the parameter and the
structure of the ®ne-scale model.
3.2.2. Calibration of model outputs
Systematic prediction errors are easily corrected by calibration of model outputs.
The correction factor approach corrects mean bias by multiplying each yield prediction by U=Y (Haskett et al., 1995b; Russell and van Gardingen, 1997). Although
the multiplicative adjustment results in a proportional change in the predicted standard deviation, it does not attempt to correct interannual variability. Simulated
yields adjusted by a least-squares linear correction:
yC;t   yt
9
(Kunkel and Hollinger, 1991; Rosenthal et al., 1998) minimizes squared prediction
error (e.g. RMSE) by removing its systematic component, leaving only unsystematic
or random error. However, the standard deviation of the corrected series will
generally be lower than that of the observed time series. An alternative linear
correction:
yC;t  t ÿ Ys=  U;
10
reproduces the mean and standard deviation of the observed series, but with higher
prediction error. This correction may be preferable to Eq. (9) for risk studies where
preserving interannual variability is more important than minimizing prediction
error.
Historic crop yield data often display time trends in central tendency and sometimes interannual variability. Time trends are usually attributed to changes in technology or land-use patterns. The most frequent way to deal with yield time trends
is to derive a trend, yT, t, as a parametric (Swanson and Nyankori, 1979; Carlson et
al., 1996; Mjelde and Keplinger, 1998) or smoothing (Nicholls, 1985; Hansen et al.,
1998) function of the observed time series, t. Smoothing techniques separate the
relatively high-frequency response to weather variability from the lower-frequency
response to technology and other factors (Hansen et al.). If appears stationary,
then, t can be detrended to a year b basis by an additive adjustment:
yC;t  t  yT;b ÿ yT;t:
11
If changes in proportion to yT, a multiplicative adjustment:
yC;t  t yT;b =yT;t ;
12
will simultaneously correct the trends of and . Alternatively, the ®tted trend can
be imposed on the simulated time series (Kunkel and Hollinger, 1991; Supit, 1997;
Fagerberg et al., 1998).
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
55
Although weather is often regarded as stationary, changes in land use and crop
production have been linked to time trends in precipitation (Viglizzo et al., 1995)
and temperatures (Bell and Fischer, 1994). Crop models will presumably capture the
eects of such climatic trends. The dierence or ratio of mean observed yields and
yields simulated with ®xed management represents the component of yield trends
due to factors other than weather, and may provide a superior technology trend
adjustment (Bell and Fischer).
4. Dealing with imperfect models
The discussion of aggregation error holds true even when models and their inputs
are perfect. However, crop models are not perfect. Increasing the spatial domain of
an analysis often introduces new constraints that are controllable in plot-scale studies, and new processes that result from spatial interactions or the emergence of new
system components, that can invalidate regional predictions from plot-scale models.
Scaling up may, at some point, involve modifying available models (``phenomenonadded modeling'' [Luxmoore et al., 1991]) to incorporate these new constraints and
processes. Relevant model modi®cations could include changing model structure or
embedding model code or output into a model of a larger-scale processes.
4.1. Model complexity
Two opposing schools of thought seem to exist regarding the implications of scale
for model structure. The ®rst suggests that appropriate model complexity should
increase with spatial scale because moving from plot to larger scales usually introduces additional determinants of crop production. Rabbinge (1993) classi®ed levels
of crop production by the factors that limit production: potential production limited
only by irradiance, temperature and CO2; attainable production limited also by water
and macronutrients; and actual production limited also by pests and toxic factors.
The evolution of crop models has paralleled these levels of production. Models
capable of simulating potential production processes (i.e. photosynthesis, respiration, partitioning, phenology) were developed ®rst, then modi®ed by the addition of
models of the soil water balance, and later N dynamics and use. Recent or ongoing
attempts to incorporate additional determinants of actual production include models of P dynamics (Gerakis et al., 1998), drainage (Shen et al., 1998), and response to
pests, diseases (Teng et al., 1998) and various soil factors that constrain root growth
(Lizaso and Ritchie, 1997; Calmon et al., 1999). As models incorporate additional
processes, they tend to grow in complexity and data requirements.
An opposing school of thought argues that simpler models are more appropriate
for higher system levels and broader spatial scales (Beven, 1989; Addiscott, 1998;
Heuvelink, 1998; Jansen, 1998). The ®rst argument is that simpler models tend to
have more modest input requirements, reducing errors associated with the uncertainties of input values. Second, simple empirical models often reduce nonlinearities
that cause aggregation bias. Finally, ®ltering of high-frequency signals at large
56
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
spatial scales eliminates the need for detailed models of ®ne-scale (e.g. cellular)
processes with a small time constant.
In our opinion, a hybrid approach may often be appropriate. The physiological
detail in existing process-level models captures much of the mechanism of crop
response to weather variability and its interaction with management, and should not
be discarded without clear justi®cation. On the other hand, incomplete understanding of processes, and excessive model complexity and data requirements would
seem to preclude the development and use of models for regional applications that
simulate physiological mechanisms of all important yield determinants. As an
example of the hybrid approach, R.A.C. Mitchell (IACR, Hertfordshire, UK, personal communication, 1999) used dynamic models to simulate 1980±1993 winter
wheat yields for 48 variety trials in the UK. Simulations explained a small portion of
the interannual pattern of mean yields (Table 1). A simple linear regression function
of rainfall during grain-®ll, and minimum temperatures during the coldest three
consecutive days gave better predictions (r=0.59). However, simulations corrected
with regression results improved predictions relative to simulations or regression
alone. The post-simulation adjustment accounted for processes (presumably diseases
and winter kill) that the crop models did not capture.
4.2. Spatial interactions
We can envision three processes Ð surface and subsurface hydrology, intercrop
competition, and farm resource allocation Ð in which dynamic interactions in space
can modify crop yields. One obvious approach to simulating each of these processes
is to embed existing crop models within a model of the higher-level system. This
would require the ability to iteratively model, on a daily time step, the processes (e.g.
lateral water movement, intercrop competition, or farm resource allocation) of the
higher-level system, then simulate resource uptake and physiological processes for
the crops in each spatial unit. In the intercrop and farm examples, the overall system
model must be able to handle dierent crop species with possibly dissimilar model
structures (Caldwell and Hansen, 1993). Current crop models are typically structured to simulate an entire growing season for one crop species. Embedding these
models into models of three-dimensional hydrology, intercrop competition or farm
Table 1
Predictability of 1980±1993 winter wheat yields from UK variety trials without and with an empirical
climatic correctiona
Model
CERES
MAFF
a
b
Without correction
With correction
%RMSEb
r
%RMSEb
r
32
9
0.05
0.31
7
6
0.68
0.76
Source: R.A.C. Mitchell, IACR, Hertfordshire, UK, personal communication, 1999.
RMSE as percent of mean; r, linear cross-correlation.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
57
operations would require restructuring the models so dierent crops can be simulated in parallel (Caldwell and Hansen; Jones et al., 1997; Sadler and Russell, 1997;
Thornton et al., 1997a). Although our experience in modeling multiple cropping
systems proves that it is possible, the diculty of reorganizing model code and the
need to repeat the exercise for each model revision suggests that embedded models
of these higher-level systems will not be sustainable without a commitment on the
part of the crop modeling community to develop and maintain an appropriate
modular structure.
5. Dealing with imperfect data
The availability of input data of adequate quality and spatial coverage is perhaps
the most serious practical constraint to applications of crop models at regional or
larger scales (de Wit and van Keulen, 1987; Russell and van Gardingen, 1997;
Heuvelink, 1998). King et al. (1997, p. 143) argued that ``upscaling to larger areas
invariably means a loss in the precision and observation density of data used to
parameterize a model. It also raises questions about the suitability of applying the
model at a scale dierent from the one for which it was developed.'' Although we
hold a more optimistic view, each type of input Ð soil, weather and management Ð
presents dicult challenges. The high cost of physical measurement at the desired
density for regional model applications generally necessitates interpolation from
sparse measurements or estimation from more readily available surrogate data.
Where existing spatial data bases are available, inconsistent spatial coverage and
boundaries between soils, weather stations or climate model grid cells, and crop
reporting districts present additional challenges.
5.1. Soil
Applications of agricultural and environmental simulation models at regional and
larger scales rely heavily on spatial soil data bases to account for heterogeneity of
important soil properties. Although regional model applications typically treat soil
map units as homogeneous regions described by a single set of soil parameter values,
soil properties within a map unit can vary considerably. For example, the median
CV of plant-available water-holding capacity (AWHC) within soil associations in
the STATSGO (Reybold and TeSelle, 1989) soil data base ranged from 40 to 60%
for ®ve states in the northeast USA (Lathrop et al., 1995). For two counties in New
Jersey, the mean CV (coecient of variation) was about 25% within soil series in the
more detailed SSURGO (Reybold and TeSelle) data base. Twelve soil properties
showed a wide range of variability and generally non-normal distributions within a
single map unit in Missouri, USA (Young et al., 1998). Warrick (1998) grouped soil
properties by their typical ranges of within-®eld CV: 50% (i.e. saturated and unsaturated hydraulic conductivity, in®ltration rate, solute concentrations).
58
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
Soil heterogeneity within map units has important implications for agricultural
simulation applications. In a 350 kmÿ2 study area in New Jersey, USA, the forest
production model, PnET, under predicted mean evapotranspiration by 16% and
mean primary production by 17% using rasterized soil inputs from STATSGO
relative to predictions using inputs derived from the higher resolution SSURGO
spatial soil data base (Lathrop et al., 1995). In a simulation study of the hydrologic
response of a grassland watershed to precipitation, mean soil parameter values generated only 14% of the mean runo obtained from partitioning the watershed into
14 soil texture classes (Sharma and Luxmoore, 1979). However, using mean soil
properties had little eect on simulated evapotranspiration. Other studies have
shown insensitivity of mean simulated soil evaporation (Lewan and Jansson, 1996)
and rice yield (Wopereis et al., 1996) to spatial heterogeneity of soil properties.
Luxmoore et al. (1991, p. 286) therefore cautioned that ``in some special situations
mean behavior may be representative of the whole, but this cannot be assumed.''
Fortunately, growing appreciation of the importance of soil heterogeneity for model
applications and improving data-storage capabilities are prompting calls and eorts
to include information about the variability of parameters within map units in soil
data bases (Arnold and Wilding, 1991; Burrough, 1993; Lathrop et al.; Finke et al.,
1996; Young et al., 1998).
Although higher-resolution soil data can potentially reduce aggregation bias
associated with heterogeneity of soil properties, the higher-resolution data do
not account for all important variability, and are often not available. When
soil data bases include pro®le properties and areal proportions corresponding to
multiple pedons within each map unit, parameter values for the individual soils
can be used iteratively as model input. Simulation results are then aggregated
by areal weighting. Alternatively, the properties can be ®t to theoretical distributions for stochastic sampling (Shaer, 1988; Haskett et al., 1995a; Bouma et al.,
1996).
Because of the generally uncharacterized variability of soil parameters within an
association or series map unit, the prospect of calibrating eective values of soil
parameters is appealing. AWHC is an important determinant of crop response to
climate variability. Although it can be measured, it is usually estimated from soil
physical properties (Ritchie and Crumb, 1989; Tietje and Tapkenhinrichs, 1993;
Ritchie et al., 2000) and the depth and vertical distribution of the root system.
Kunkel and Hollinger (1991) achieved good interannual (1979±1990) predictions of
regional maize and soybean yields for nine states in the Midwestern USA by calibrating maximum rooting depth Ð one means of adjusting AWHC Ð for a representative soil in each crop reporting district. To simulate regional yields of barley in
Sweden, Fagerberg et al. (1998) simulated yields on a clay soil with 196 mm and a
sandy soil with 70 mm AWHC. They calibrated mean eective AWHC by adjusting
the areal proportions of the two soils for each reporting district. Paz et al. (1998)
demonstrated that CROPGRO could account for much of the spatial and temporal
variability of soybean response to water stress within a ®eld when spatially varying
maximum rooting depth and saturated hydraulic conductivity were calibrated from
yield maps.
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
59
5.2. Weather
Weather data Ð observed, estimated or predicted Ð are central to crop model
applications related to climate variability and prediction. Simulations at locations
far from measured data, or where essential variables or periods are missing, must
rely on estimated data. The most common estimate is to simply use the nearest
weather station as a proxy for unmeasured weather at the location of interest. For
regional applications using spatial partitioning, theissen polygons provide an automatic method for identifying the nearest station to any geographical point and for
obtaining areal weighting factors for each station (Carbone et al., 1996; Rosenthal et
al., 1998). Remote sensing oers some promise for ®lling gaps in surface-measured
weather data. Because solar irradiance data has been a problem for crop model
applications due to the cost and calibration problems of sensors, the prospect of
using global data bases of satellite-derived solar irradiance (Whitlock et al., 1995)
directly or to parameterize stochastic generators is appealing.
Although spatial averaging and interpolation are sometimes used to estimate daily
weather, spatial averaging biases the variability of daily time-series data. Because of
the many nonlinear processes that they embody, crop models are sensitive not only
to mean climate, but also to its variability within and between seasons (Semenov and
Porter, 1995; Mearns et al., 1996; Riha et al., 1996). This is particularly important
for precipitation because of its in¯uences on processes, such as solute leaching, soil
erosion and crop water stress response, that depend on soil water balance dynamics.
A simple example illustrates the potential problem. We estimated 1976±1995 daily
weather data (i.e. observed temperatures and precipitation [EarthInfo, 1996]) and
solar irradiance generated (Hansen, 1999) using parameters derived from Jacksonville data (NREL, 1992) for Gainesville, FL, using inverse-distance-weighted averages from four surrounding stations (Fig. 2). Although the interpolation procedure
produced reasonable estimates of monthly total rainfall, it seriously over-predicted
mean wet-day intensity and under-predicted relative frequency of wet days for all
calendar months (Fig. 4).
An arti®cial increase in rainfall frequency and decrease in mean intensity due to
spatial aggregation may have two contrasting eects on soil water availability and
crop yield response. On the one hand, frequent low-intensity showers do not
recharge soil water reserves in deeper layers, but favor increased evaporation from
the soil surface, thereby increasing water stress (de Wit and van Keulen, 1987).
On the other hand, increasing the frequency of rainfall events tends to reduce the
duration of dry periods between rain events, thereby decreasing the probability of
water stress. Simulation studies under arti®cially imposed changes in climate variability suggest that conditions of low mean rainfall, high potential evaporation and
high AWHC favor the ®rst mechanism (increasing soil evaporation), resulting in
negative yield bias, whereas higher mean rainfall and lower AWHC favor positive
bias (Carbone, 1993; Mearns et al., 1996; Riha et al., 1996).
Returning to our previous example, we examined simulated maize yield response
to interpolated weather data for Gainesville, and to observed weather for each of the
®ve North Florida stations. Using parameters for a Millhopper ®nd sand, a 15 April
60
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
Fig. 4. Monthly mean (A) rainfall total, (B) wet-day intensity, and (C) relative frequency of wet days for
Gainesville, FL, 1976±1995, observed and interpolated from surrounding stations.
planting date, and realistic cultivar and management inputs, the CERES-Maize V.
3.5 model predicted higher mean grain yields with less interannual variability using
interpolated weather than using observed weather for any of the ®ve weather stations (Table 2). Applying inverse distance interpolation to yields simulated for each
station other than Gainesville resulted in more realistic mean yields, but with a low
standard deviation. As our previous discussion suggests, the lower standard deviation is probably more representative of regional yields.
In spite of ongoing debate about whether atmospheric circulation models simulate
processes at grid points or averaged over grid cells (Skelly and Henderson-Sellers,
1996), such models have shown rather consistent tendencies to over-predict rainfall
occurrence and under-predict intensity (Mearns et al., 1990, 1995). Other studies
have considered the implications of averaging observed weather data into rectangular grid cells at resolutions that are typical of atmospheric models used to predict
61
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
Table 2
Simulated maize yield statistics for locations and interpolation schemes, North Florida, USA, 1976±1995a
Source
Ocala (OC)
Lake City (LC)
Cross City (CC)
Jacksonville (JA)
Gainesville (GA)
Interpolated weather
Interpolated yields
Y (Mg haÿ1)
7.49
7.61
7.38
5.76
6.71
8.19
7.18
s (Mg haÿ1)
1.81
1.35
1.90
2.10
1.89
0.95
1.08
CV (%)
24.2
17.7
25.8
36.4
28.2
11.6
15.1
Linear cross-correlation (r)
OC
LC
CC
JA
GA
1.000
ÿ0.109
0.222
0.310
0.356
1.000
ÿ0.110
0.208
0.569
1.000
0.452
0.277
1.000
0.384
1.000
a
Weather and simulated yields were interpolated for Gainesville from the other locations by inversedistance weighted averaging; Y,s, CV, mean, standard deviation and coecient of variation across years
of spatial average yield.
long-term change and seasonal variability. In a 1.6 million km2 region in the
Central USA, average (1953±1975) soybean yields simulated by SOYGRO
using grid cell-averaged weather showed an average bias ranging from 18.5 (22
grid) to 28.0% (55 grid) relative to the simple average of yields simulated for
each of >500 stations in the region (Carbone, 1993). Larger errors in individual
grid cells or individual years tended to cancel each other. Easterling et al. (1998)
found that errors in simulating reported (1984±1992) maize and wheat yields in a
portion of the US Great Plains decreased as resolution increased from 2.82.8 to
11 . Increasing resolution further to 0.50.5 did not further improve predictions.
These illustrations highlight the importance of spatial and temporal downscaling of
climate model output in a manner that preserves both the meaningful features
of the model predictions and the statistical properties of the historical daily
sequences.
5.3. Management
Crop management inputs typically considered include crop species and cultivar;
planting date and spatial arrangement; irrigation, fertilizer and sometimes biocide
applications; and sometimes land preparation and tillage. Spatial heterogeneity of
management can contribute to aggregation bias. Because management is seldom
consistent from year to year, spatial representations of management variables are
not generally available. Typical or recommended practices are therefore often
applied uniformly within a region.
If a region includes a mixture of irrigated and rainfed production of a particular
crop, knowing the relative areas of each will be important. Areas of irrigated and
rainfed production are sometimes, but not always, available. In rainfed production,
small changes in the timing of water shortage relative to critical periods of crop
growth sometimes have profound eects on yields. Farmers therefore often diversify
planting date and cultivar to reduce risk. The use of one or more representative
62
J.W. Hansen, J.W. Jones / Agricultural Systems 65 (2000) 43±72
cultivars is often a reasonable approximation. Alternatively, eective cultivars can
be derived by calibration against observed development and yield data at the spatial
scale of interest. Hodges et al. (1987) used this approach quite successfully to calibrate nine eective maize cultivars for CERES-Maize from crop reporting district
data. They then selected the best of the nine eective cultivars for each of 51 weather
locations to simulate regional 1982±1985 maize yields in 14 states in the US northern
Midwest. Studies have shown that using several planting dates within the reported
range can improve regional yield predictions relative t