S. Landau et al. Agricultural and Forest Meteorology 101 2000 151–166 157
2.5. The model fitting process At the end of the variable selection process the
best definitions of the phenological phases were iden- tified. The next step was the translation of these def-
initions into a phenology sub-model and the genera- tion of climate input required in the maximum model
using this sub-routine. Then during the model fitting process formal inference was used to assess the sig-
nificance of terms included in the maximum model on the basis of both development samples. A parsi-
monious yield response sub-model was determined by step-wise dropping of variables from the maxi-
mum model. At each step the explanatory variable which tested as insignificant at the 5-level using the
F
-test and gave the smallest variance ratio, or a vari- able which showed an unexpected sign for its coef-
ficient estimate, was dropped. All dummy variables were kept in the model as adjustment factors. Vari-
ables reflecting main effects were only dropped, once any interaction term involving them had disappeared.
Also, non-linear threshold parameters were retained as they were believed to be needed to ensure physiologi-
cal meaningfulness. The significance testing assumed independently distributed normal errors with expecta-
tion zero and unknown but constant variance. Residual diagnostics were employed throughout to check the
distributional assumptions. In contrast to the variable selection process the original non-aggregated yields
were used during the fitting process.
The relative importance of each term in the final cli- mate response sub-model was assessed by decompos-
ing the model’s regression sum of squares into com- ponents due to each term. Because the explanatory
variables were not orthogonal to each other, the order in which the terms were added intodropped out of
the model affected the part of the sum of squares that was attributed to them. Therefore, for comparison pur-
poses, a forward and a backward selection procedure were employed to achieve a decomposition.
2.6. Testing the new hybrid-model The new parsimonious hybrid-model was tested
with independent observed yields in the test sample to provide an assessment of the predictive accuracy of
the new hybrid-model in practice. This also allowed comparison of the predictive accuracy of the new
hybrid-model with that of the mechanistic crop mod- els which had already undergone independent testing
for UK well-managed yields Landau et al., 1998.
As in Landau et al. 1998, observed yields were av- eraged within 1 km squares within each year to match
the precision of the interpolated weather variables. The root mean square error RMSE of differences be-
tween observed and predicted yields and correlations were employed to measure the accuracy of the new
hybrid-model for predicting temporally and spatially distributed UK yields. The new model’s accuracy for
predicting annual average yields was also measured in order to assess the model’s ability to predict purely
temporal variation in UK well-managed yields. To take account of the fact that the variance of average yields
is inversely proportional to the number of originally available yields all accuracy measures were weighted
accordingly.
3. Results
3.1. Simplified CERES-wheat phenology sub-model A parsimonious phenology model which splits the
crop year into the five phases during which climate data is aggregated was defined by simplifying the
CERES-wheat phenology sub-routine Hodges and Ritchie, 1991. The structure and parameters of this
sub-routine were retained because firstly, this phe- nology model had performed best in explaining the
variation in a set of observed anthesis dates Table 2 and it was used to define the initial anthesis phase
Fig. 2. Secondly, during the variable selection pro- cess the CERES-wheat formula for the duration of
grain-filling E
IV b
, Fig. 2 was identified as the one which explained most yield variation by aggregated
climate input during a grain-filling phase. The defi- nition identified to aggregate climate input during an
early-reproductive phase S
II c
, Fig. 2 had not orig- inated from CERES-wheat but was easily redefined
within the framework of this sub-routine Table 3. The phenological sub-model requires specification
of the cultivar-specific parameters sensitivity to ver- nalisation υ, sensitivity to photoperiod ρ, phyllochron
interval φ and crop genetic coefficient ψ. Parameter settings were taken from the CERES-wheat model
158 S. Landau et al. Agricultural and Forest Meteorology 101 2000 151–166
Table 3 Phases defined by parsimonious phenology sub-model and their approximate interpretation according to CERES-wheat
a
Phase Start day of crop year
End day of crop year Interpretation
I S
S
II
− 1
Sowing to early terminal spikelet II
S
II
A− 11
Early terminal spikelet to start of ear growth III
A− 10
A+ 10
Start of ear growth to start of grain-filling IV
A+ 11
E
IV
Grain-filling V
E
IV
+ 1
H End of grain-filling to harvest
a
The simplified CERES-wheat phenology subroutine is utilised to predict the terminal spikelet stage S
II
, the date of anthesis A and the end of grain-filling E
IV
. The crop year is defined as the period between 1st September and 31st August the following year.
as those supplied for cultivar Avalon υ=0.033, ρ=
0.008, φ=95 degree days, ψ=470 degree days. The parsimonious phenology sub-model requires
fewer inputs and is of less complexity than the CERES-wheat phenology sub-routine. Simplifications
of the CERES-wheat routine were achieved by omit- ting adjustments believed to be of little effect in the
UK and by using simpler formulations. Specifically, the simplified routine i bases all degree day criteria
on mean daily temperatures within a range rather than employing a complicated mechanism involving min-
imum and maximum temperatures within different temperature ranges, ii omits an adjustment for snow
cover, iii does not model a delay in germination due to insufficient soil water content, iv uses simpler
formulae for calculating daily vernalisation units and v does not allow for vernalisation to be reversed
under warm conditions.
The simplified phenology model was tested against the set of observed anthesis dates. The predic-
tive power of the simpler model r=0.62, bias 2.9 days, RMSE 7 days was almost identical to that of
CERES-wheat cf. Table 2 with the anthesis date pre- dictions of the two models differing by a maximum
of 2 days correlation between predictions, r=0.995. To ensure that the simpler phenology model had not
omitted features important under wider climatic con- ditions than were present in the 57 trials where anthe-
sis dates were observed, the model’s predictions were also compared with CERES-wheat predictions for the
first development sample. Again, all differences be- tween predictions were found to be less than 3 days.
3.2. Parsimonious yield response sub-model During the variable selection process the maxi-
mum model was established from the pool of climatic explanatory variables based on the n=303 aggre-
gated yield observations in development, Sample 1. The pool of climatic explanatory variables reflected
direct and indirect effects i.e. via effects on pests, diseases, agronomy of climate on well-managed
yields. Emphasis was placed on climatic effects likely to dominate in the UK. A summary of the climate
effect considered during the variable selection pro- cess is given in Table 4. More details can be found in
Landau 1998.
Table 5 lists the 22 explanatory variables contained in the maximum model for climate effects on grain
yield. The model was described by the basic linear relationship
Y
i
= β
1
+
23
X
j = 2
β
j
X
ij
+ ε
i
where Y
i
, i = 1, . . . n denotes the i-th grain yield ob-
servation, X
ij
is the respective value of the j-th ex- planatory variable X
j
as defined in Table 5 and β
j
, j=
1. . . 23 denote the linear parameters. The model also contained two non-linear parameters α, γ em-
ployed in the definitions of X
15
and X
16
, respectively. To ease interpretation of the model’s constant β
1
, ex- planatory variables were centred Table 5. Finally, the
error term ε
i
reflects factors not accounted for in the model, for example management factors or mere ran-
dom intrinsic variability. During the variable selection process, the relatively
few yields observed during the early harvest years 1976–1980 14 out of 303 aggregated values were
consistently over-estimated, presumably due to ma- jor changes in technology in the late 1970s. Hence
in all subsequent analyses yields from these years were excluded to ensure availability of more recent
technology.
S. Landau et al. Agricultural and Forest Meteorology 101 2000 151–166 159
Table 4 Summary of climate effects assessed during the variable selection process
a
Climate effect category Number of steps in
Phase Climate effect
Expected direction of cumulative procedure
climate effect on yield Effects of rainfall
1 IV
Disease −
Drought +
Lodging −
Sprouting −
2 II
Disease −
Drought +
3 III
Disease −
Drought +
Effect of radiation interception
4 III
Carbon assimilation: amount light energy available
+ 5
IV Carbon assimilation: amount
light energy available +
and temperature driven dura- tion of phase
− 6
II Carbon assimilation: amount
light energy available +
and temperature driven dura- tion of phase
− Radiation damage
− Interaction between water
and radiation levels 7
II Radiation damage under
drought conditions −
Yield loss under extreme 8
I–II Frost damage
+ temperatures
9 III–IV
Heat damage −
10 Meiosis
Cold damage +
11 Anthesis
Cold damage +
Yield loss due to harvest 12
V Shedding of over-ripe grain
− conditions
Wetness at harvest −
Yield loss due to drilling 13
I Delay in sowing date
− conditions
Wetness at sowing −
Effects during vegetative growth
14 I
Varied temperature effects early canopy development,
encouragement of aphid pop- ulation growth
? 15
I Carbon assimilation
+ 16
I Disease, nitrogen leaching. . .
−
a
In each step of the cumulative procedure a set of alternative expressions for the respective climate effects were investigated. The phases refer to the vegetative Phase I, the early-reproductive Phase II, the anthesis Phase III, the grain-filling Phase IV and the
pre-harvest Phase V.
After the end of the selection process climatic ex- planatory variables could be defined over phases re-
turned by the phenology sub-model. The first and sec- ond development samples of grain yields were then
used during the model fitting process to select empir- ically important variables from this maximum model
and to estimate their effects. A total of n=1242 grain yield observations during the period 1981–1993 was
analysed see Table 1. The total variance of these ob- served grain yields amounted to 1.88 t
2
ha
− 2
of which R
2 ad
= 26.3 was accounted for by fitting the maxi-
mum model. Table 6 demonstrates the effect of the step-wise
dropping of the unwanted explanatory variables. The effect of total radiation during Phase IV X
11
remained negative after dropping the interaction be- tween radiation and trial type X
13
. Therefore, X
11
was excluded from the model to ensure interpretabil- ity of model terms. Table 7 lists the constant term
and the 17 explanatory variables included in the
160 S. Landau et al. Agricultural and Forest Meteorology 101 2000 151–166
Table 5 Set of explanatory variables constituting the maximum model
a
Explanatory Model term
Interpretation indicating the period over which sum- variable
maries were calculated X
1
β
1
Mean yield for non-variety trials for which sowing date and harvest date were known and harvest took place
before 31st August X
2
β
2
lnP
mean
+ 0.1 −0.33
Main effect of mean and maximum daily rainfall in Phase IV
X
3
β
3
lnP
max
− P
mean
+ 0.1 −2.22
X
4
β
4
V Main effect of variety trials
X
5
β
5
lnP
mean
+ 0.1 −0.33 V
Interaction effect of mean daily rainfall in Phase IV and the variety trial factor
X
6
β
6
lnP
prop
1− P
prop
−0.69 Main effect of proportion of days in Phase II when rain
occurred X
7
β
7
lnP
mean
+ 0.1 −0.12
Effect of mean and maximum daily rainfall in Phase III X
8
β
8
lnP
max
− P
mean
+ 0.1 −1.87
X
9
β
9
R
tot
− 351
Effect of total radiation in Phase III X
10
β
10
D−31.98 Main effect of duration of Phase IV
X
11
β
11
R
tot
− 517.6
Main effect of total radiation in Phase IV X
12
β
12
D−31.98 V Interaction effect of duration of Phase IV and the variety
trial factor X
13
β
13
R
tot
− 517.6 V
Interaction effect of total radiation in Phase IV and the variety trial factor
X
14
β
14
R
mean
− 15.49
Main effect of mean daily radiation in Phase II X
15
β
15
1− P
prop
− α
I
[α,1]
1− P
prop 12
R
mean
− 6.26
Interaction effect of the proportion of days in Phase II when rain occurred and the mean daily radiation in Phase
II X
16
β
16
T
min3
− γ
I
−∞,γ ]
T
min3
+3.63 Effect of minimum daily temperatures throughout the
crop year X
17
β
17
lnP
mean
+ 0.1 +0.304 1−B 1−C
Effect of mean daily rainfall in the week before harvest X
18
β
18
B Effect of unknown harvest date
X
19
β
19
C Effect of harvest later than 31st August
X
20
β
20
S −43.87 1−F Effect of delay in sowing date
X
21
β
21
F Effect of unknown sowing dates
X
22
β
22
lnP
mean
+ 0.1 −0.298
Effect of rainfall during FebruaryMarch X
23
β
23
lnP
mean
+ 0.1 −0.2671−F
Effect of rainfall during the first 2 weeks after sowing
a
All explanatory variables are defined over phases returned by the phenology sub-model in Section 3.1 Table 3. Linear parameters are denoted by β
i
; non-linear parameters by α and γ . For specified periods the mean P
mean
and maximum daily total rainfall P
max
measured in mm, the proportion of days when rain occurred P
prop
, the total radiation R
tot
in MJ m
− 2
, the mean daily radiation R
mean
in MJ m
− 2
per day, the minimum mean daily minimum temperature of three consecutive days T
min3
in
◦
C and the duration of the period D measured in days itself are employed as climate input summaries. The dummy variable V reflects a variety trial factor.
It takes the value one if the respective yield value has been obtained from a variety trial and the value zero otherwise. Variables B, C and F
define dummy variables employed to take account of conditions at harvest and sowing. Variable B takes value one when the harvest date is unknown; C takes value one when the harvest date was known but harvest occurred after the 31st August. Variable F takes the
value one when the sowing date was unknown. S is the sowing date measured in days since 1st September. I
[a,b]
x denotes the indicator function which takes the value one when x lies within the interval [a, b] and takes the value zero otherwise.
selected yield response sub-model and their estimated effects. This final parsimonious yield response model
was able to explain R
2 ad
= 26.1 of the variance in
grain yields. It predicted grain yields in development Samples 1 and 2 with a RMSE of 1.17 t ha
− 1
and achieved a correlation between yield observations and
predictions of r=0.52. The relative importance of each term in the yield
response sub-model was assessed by decomposing the model’s regression sum of squares into components
S. Landau et al. Agricultural and Forest Meteorology 101 2000 151–166 161
Table 6 Sums of squares SS and variance ratios VR for terms dropped
from the maximum model
a
Model d.f. Explanatory
SS VR
variables Null model 1239
X
1
2328 –
Maximum model 22 X
1
,. . . , X
23
640 20.96
Drop variable 1 X
23
0.004 0.001
Drop variable 1 X
13
0.66 0.48
Drop variable 1 X
11
2.97 2.15
Drop variable 1 X
12
1.38 1.00
Drop variable 1 X
8
2.86 2.06
Selected model 17 X
1
,. . . , X
7
, X
9
, X
10
, X
14
,. . . , X
22
632 26.78
a
A single term significantly improved the model fit at the 5-level if its variance ratio exceeded the 95-quantile
of an F-distribution with 1 and 1200 degrees of freedom F
1,1200;0.95
= 3.84.
due to each term using a forward and a backward se- lection procedure. Table 8 shows that both decompo-
sitions attributed similar relative importances to the model terms. A major change in ranking was only
found for the sowing date X
20
, the effect of the du- ration of Phase IV X
10
and the mean rainfall effect during Phase IV X
2
.
Table 7 Parameter estimates and their standard errors s.e. for the parsi-
monious yield response sub-model n=1240 Explanatory
Estimate of Estimate of
variable linear parameter
non-linear β
j
s.e. parameter s.e.
X
1
8.649 0.164 –
X
2
− 1.346 0.14
– X
3
0.5731 0.0893 –
X
4
− 0.1466 0.0991
– X
5
0.518 0.139 –
X
6
− 2.06 0.29
– X
7
− 0.2237 0.0703
– X
9
0.002021 0.000851 –
X
10
0.0622 0.0121 –
X
14
0.1483 0.0407 –
X
15
− 0.3766 0.0674
0.155 0.0065 X
16
0.0636 0.0142 −
3.21 2.29 X
17
− 0.1360 0.0564
– X
18
− 0.1080 0.0938
– X
19
− 0.522 0.130
– X
20
− 0.00822 0.00392
– X
21
0.10 0.117 –
X
22
− 0.6628 0.0984
– Table 8
Sums of squares SS and variance ratios VR for each term included in the parsimonious yield response sub-model
a
Step Forward selection
Backward selection Variable
SS VR
Variable SS
VR 1
X
16
107.54 59.98
X
18
1.88 1.35
2 X
6
109.26 64.04
X
21
1.47 1.06
3 X
22
83.67 51.02
X
20
5.86 4.21
4 X
7
62.43 39.24
X
9
8.16 5.84
5 X
19
32.24 20.59
X
17
7.62 5.43
6 X
20
28.13 18.21
X
4
16.18 11.44
7 X
2
21.02 13.75
X
14
22.73 15.88
8 X
3
56.45 38.04
X
15
17.20 11.91
9 X
5
36.11 24.81
X
19
24.94 17.04
10 X
10
26.16 18.22
X
5
30.17 20.29
11 X
15
16.06 11.28
X
10
44.97 29.54
12 X
14
22.30 15.85
X
7
49.05 31.43
13 X
4
12.81 9.17
X
3
58.73 36.54
14 X
17
6.87 4.94
X
2
42.25 25.76
15 X
9
7.27 5.24
X
22
83.67 49.04
16 X
21
1.47 1.06
X
6
109.26 60.93
17 X
18
1.88 1.35
X
16
107.54 57.25
a
A step-wise forward and backward variable selection proce- dure were employed. A term significantly improved the model fit
at the 5-level if its variance ratio exceeded 3.84.
3.3. Independent testing Evaluation of the predictive accuracy of the parsi-
monious hybrid-model on the basis of development samples 1 and 2 was expected to be over-optimistic
since these yield data represent the empirical basis from which the model was developed. Independent
testing of the new model was based on yield data from the test sample. This sample contained n=246
1 km square yield values during the time period for which the new model was applicable 1981–1993, see
Table 1.
Fig. 3 demonstrates the performance of the new par- simonious hybrid-model for predicting the temporally
and spatially distributed UK well-managed yields in the test sample. The parsimonious hybrid-model was
almost unbiased bias=0.078 t ha
− 1
, the correlation between observed and predicted yields was r=0.41
significantly different from zero at the 5 level and the RMSE when predicting yields with the new model
was 1.21 t ha
− 1
. Fig. 4 shows the temporal trend in annual average
UK well-managed yields and their predictions from the hybrid-model. The model predictions followed the
162 S. Landau et al. Agricultural and Forest Meteorology 101 2000 151–166
Fig. 3. Yields predicted by the parsimonious hybrid-model plot- ted against observed grain yields in the independent test sample
n=246. The 1:1 line is shown representing perfect agreement.
observed annual average yields well. The correlation between annual average predicted and observed yields
was r=0.77 again significant at the 5 level. This
Fig. 4. Annual average observed yields in the independent test sample closed symbols and annual average predictions from the parsimonious hybrid-model open symbols plotted against year.
indicated that purely temporal variation in annual av- erage yields was more easily accounted for by cli-
matic differences according to the new hybrid-model than the combined spatial-temporal variation in UK
well-managed yields.
4. Discussion