191 G. Russell et al. European Journal of Agronomy 11 1999 187–206
Intelligence, was used to represent relationships within the system. The table templates in particular
were found to be a versatile format for representing and rules in the knowledge base and for program-
ming the inference engine of the operating shell. the data, as many of the sources favoured tabular
formats. Even when this was not the case, it was Its features, such as pattern matching, tree-based
data structuring and automatic backtracking, pro- often possible to manipulate the data to fit the
templates. However, care had to be taken to ensure vide a powerful and flexible framework for solving
problems that involve data items and the relation- that the sense of the actual facts was not altered
when doing this. ships between them. The version used was Logic
Programming Associates WinPro Version 3.1, which provided interfacing with the Microsoft
3.3. Knowledge model Windows environment.
3.3.1. Attribute-value statements This general-purpose statement was able to
3. Results
represent a large proportion of the facts within the knowledge base:
3.1. Task specification
att_val Attribute, Value, Place, Time
The potential users identified four main tasks The argument, Attribute, is used to specify
to be carried out by the Crop Knowledge Base the attribute of a crop, an object, or a process to
System: which the statement can be applied. The Value
1. to specify where location or environment a argument gives the value for the attribute appro-
particular crop species is or could be grown; priate for the combination of circumstances given
2. to identify situations soil, weather in which by the final two arguments. Place and Time
crop yields are significantly reduced below the specify the location and the year, season or pheno-
water-limited potential; logical stage to which the information applies. The
3. to generate the input files needed to run crop value of an attribute can be given as a word, lines
models; of text, a number or a list of items. Facts can be
4. to act as an encyclopaedia to store information turned into rules by making their truth conditional
about crops. on the truth of other att_val statements. In the
All these goals are achievable, although not all example given below, the Value argument speci-
were fully implemented in the current version of fies which particular object of the category ‘crop’
the Crop Knowledge Base System. the information applies to at location ‘Scotland’,
at ‘any time’. This type of formulation would be 3.2. Application model
used as a conditional clause attached to another att_val
statement, for example giving the earliest An example of the output from this process is
date of harvest: shown in Fig. 2. Although several hundred state-
att_val crop, ‘common winter wheat’ ,
ments were examined, it was found that almost all ‘Scotland’ , ‘any time’.
could be allocated to one of four classes: statements about the values of attributes of
an object; 3.3.2. Hierarchical statements
Attribute-value statements are linked to crop, statements about position in a taxonomic
hierarchy; location and soil taxonomies included in the Crop
Knowledge Base. If there is no direct match to a tables of values; and
references to time. query, the system is able to use its hierarchical
structures to find solutions at other levels of aggre- These four basic unitary knowledge statement
types were found to be sufficient to represent the gation. A simplified crop hierarchy is given in
Fig. 3. Similar hierarchical structures are employed relationships and rules defining the knowledge
192 G. Russell et al. European Journal of Agronomy 11 1999 187–206
Fig. 2. Example showing a an extract from Narciso et al. 1992, page 33, and b part of the results of the knowledge extraction process. The indices in the remarks section of the document refer to entries in a legend that has not been presented here. QU:
qualifying information; CT: contextual information; DD: data dictionary information; RF: the reference from which the item of knowledge was extracted; SO: the source of knowledge, if different from the reference.
for the location and soil taxonomies. If the user relating either to common wheat or durum wheat
and so on down to the lowest level of the hierarchy. queries the system about wheat, and the knowledge
base does not contain the information specifically However, if information is still not found, the
system will search upwards for generalised state- related to it, the system will search for information
193 G. Russell et al. European Journal of Agronomy 11 1999 187–206
Fig. 3. Part of the crop hierarchy showing cereal and wheat types.
ments about first cereals then arable crops. The crossed the boundaries of the physiographic
region, the agricultural part of the former could crop hierarchy is essentially botanical and is
broadly compatible with the classification adopted usually be allocated entirely to one of the latter
regions. The division of Europe into grouped areas by EUROSTAT for the collection of agricultural
statistics. The division between winter and spring was only partially implemented since there is no
general agreement about their appropriate bound- crops is actually between autumn and spring sow-
ings since biological spring types can be sown in aries, although river basins would make a good
starting point. The system also contained a table autumn, and the classes are defined in terms of
the date of sowing. This classification implies that of latitudes and longitudes of the centroid of the
agricultural area of each NUTS I region or coun- winter and spring wheat have more in common
than winter wheat and winter barley. try, for places outside the EU.
Soils were classified according to the modified Location is specified in terms of administrative
regions in the NUTS Nomenclature des Unite´s FAO classification adopted for the 1:1 000 000
Soil Map Of the European Communities CEC, Territoriales Statistiques scheme of EUROSTAT
expanded with primary administrative regions for 1985
countries outside the EU. Although each NUTS region has a code number, this has been revised
3.3.3. Table references entries An example of a general template used in the
more than once, and it was felt that it would be better to use the name of each region as the
knowledge base for the table entries is identifier. Unfortunately, in several cases, the same
table CropType, AgriculturalPractice,
region name is used for more than one country. LocationList, ListofRelatedInformation.
To overcome this and to improve the search effi- ciency, each region name was linked directly to
The Knowledge Model representation of the data given in Fig. 2 is shown as an instance of a
the top region of its hierarchy. The system can take account of the disparity
table entry in Fig. 4. This table entry only covers the optimal conditions for a clay soil type. Similar
that there often is between the boundaries of administrative regions and of agro-ecological
table entries give the data for loam and sandy loam soils, which are also mentioned in the
zones. A number of searchable physiographic regions, such as the Po valley of northern Italy,
extracted information. Other sets of tables in the knowledge base cover the information relating to
were defined as aggregations of NUTS III regions. Although the NUTS region boundaries often
unacceptable agricultural practices, for instance
194 G. Russell et al. European Journal of Agronomy 11 1999 187–206
Fig. 4. Example of a knowledge base table entry for optimal agricultural practices relating to skim ploughing for a common wheat crop in Italy, Spain or Greece using information from Narciso et al. 1992.
those that result in crop failure, and crop depend on context, and this was not always stated
explicitly. For example, since the list of factors calendars.
significantly affecting wheat yield varies with cli- mate and soil, it is important to specify exactly
3.3.4. Reference to time Time refers to any temporal indication including
which region an author’s statements refer to. Some publications contained useful compila-
day, month, year or phenological stage and is represented by the statement:
tions of data from primary sources although inad- vertent or deliberate changes had occasionally been
temp_reason TimeOfInterest,
made in the process. No data were entered into a ListOfValidTimes.
data file without checking the accuracy of content and spelling. The latter is particularly important
This predicate should perhaps be considered the temporal equivalent of the spatial hierarchies. It is
when it is used as a search term. Typographical errors were often found in the source literature.
used to signify whether a given time occurs within a longer period and is called upon to perform a
For example, the phrase ‘forbed and sprangled roots’ appeared in one description of sugar beet
check on the temporal applicability of the informa- tion requested by the user.
in place of ‘forked and strangled roots’. Dubious terms like this can be spotted by an alert operator
and referred to a dictionary or an agronomist for 3.3.5. Knowledge acquisition
Crop data relevant to Europe, i.e. west of the resolution. More difficult to spot are errors in
numbers unless the error is in the position of a Ural Mountains, and North Africa were extracted
from 30 documents covering the major crops, decimal point. In other cases, data sources were
found to quote different values for the same attri- although the extent of the coverage varied with
crop. Data acquisition was carried out by an bute. This problem was overcome by first seeking
expert advice to determine whether one should be operator without specialised agronomic knowl-
edge, and, although agronomists were always preferred or, if not, giving both with qualifying
comments. Problems were encountered where available to assist, this help was rarely required. It
was sometimes difficult to correctly interpret state- terms were inadequately defined in the source
literature. For example, the Leaf Area Index of ments expressed in natural language. Meaning can
195 G. Russell et al. European Journal of Agronomy 11 1999 187–206
cereals may exclude the area of the leaf sheaths, 3.5. Inference engine
and soil water content can be expressed volumetri- cally or gravimetrically. Where terms were defined,
Prolog itself includes an inference engine that can be used for reasoning with a knowledge base.
it was usually possible to convert to a single scale. This was often the case with phenological stages,
However, it has to be expanded, as described below, to allow more complex queries to be addressed.
for which several descriptions of development are currently in use Landes Porter, 1989.
The system includes information about the 3.5.1. Data matching
Fig. 5 shows how the system attempts to match parameters required to run the WOFOST Van
Diepen et al., 1989 crop model. Parameter values a query to the information in a table of type:
for other models can be derived manually from
table_name CropType, PhenologicalStage,
these or from other information in the system. LocationList, ListofRelatedInformation.
The system was interfaced successfully with an external database. However, this could only be
The user inputs crop, phenological phase and location, and the system works from left to right
done reliably when there was a formal link between the field names in the database and the attributes
to attempt a direct match with the knowledge in the knowledge base. Problems were encountered
known to the system, even though the Crop Knowledge Base System could read the database
in reasoning with phenological stages, and these are explained later.
field names. This was partly because database management systems have restrictive rules for
naming fields and partly because there is no guar- 3.5.2. Hierarchical searches
One of the important features provided by the antee that the definition of field names is the same
in both systems. Thus, the completed prototype Crop Knowledge Base system is its ability to infer
a solution if there is no direct match with informa- did not include a facility for interfacing with
external systems. tion in the knowledge base. In a conventional
database, if any one of the first three arguments By the end of the project, enough information
had been included to demonstrate the potential of i.e. fields in a database of the table did not match
the input query, the system would fail to produce the system.
a solution. However, in the Crop Knowledge Base System, the inference engine searches for solutions
3.4. Data Dictionary to related queries. Thus, an unsatisfied query about
‘common wheat’ would automatically be directed Construction of the Data Dictionary was gen-
erally straightforward, although there are terms to ‘common winter wheat’ and ‘common spring
wheat’, and, failing this, ‘wheat’ itself. If only one such as Leaf Area Index that can be defined in
more than one way. This problem was overcome of the three arguments finds no direct match, then
the appropriate hierarchical search is carried out by including a warning in the entry. The main
difficulty experienced was ensuring completeness. until a solution is found. However, if more than
one argument fails to match, a decision has to be This task could be made easier by automatically
generating a pro forma every time a new term is made about the order in which the matches are
sought. If the crop of interest can be matched to added to the system. Two types of entry can be
distinguished, those referring to attribute names data in the knowledge base, whether directly or
hierarchically, then the system tries to match the and those referring to terms used in qualifying
information or in the data dictionary definition phenological stage. If no match is found, then a
solution will be sought for any phenological stage. themselves. Ideally, a restricted and consistent
vocabulary would
be used
for all
entries. In either case, a search is then made to match the
location of interest. In the table entries, the loca- Compound terms were treated as single entries
although it would be better to index the root term tions for which the information is valid are format-
ted into a list of locations. The list is searched for as well as all the occurrences of the compounds.
196 G. Russell et al. European Journal of Agronomy 11 1999 187–206
Fig. 5. Matching an input query to information held in a table. PHENO: phenological stage.
a direct match, and if none is found, the system candidate region must both be either inside or
outside the Mediterranean zone, which is defined attempts to link the query to a location for which
information is held using the location hierarchy. in a rule as a set of NUTS regions and countries.
The Mediterranean zone was identified for special Should this be unsuccessful, further inferences are
made with regard to nearby regions, as described treatment because the climate imposes constraints
on agriculture that have led to the development of in Section 3.5.3.
It is important to recognise that these search similar farming systems across the whole area and
which differ from those elsewhere. The next step paths and criteria are themselves rules developed
from expert knowledge and that other formula- is to identify NUTS I regions whose centroid lies
within a rectangle centred on the centroid of the tions could equally well have been included.
region for which information is required. For regions outside the Mediterranean zone, the search
3.5.3. Inferring alternative locations Users querying the knowledge base specify the
is initially restricted to one of four quadrants north or south of latitude 50° N; east or west of
geographical region of interest. If there is no exact match, the system first tries to find a solution for
longitude 20° E . These divisions were chosen subjectively to separate northern and southern
a region above or below the area of interest in the location hierarchy up to the level of the primary
farming systems and eastern and western farming systems. Latitude is not used in the search rules
administrative region of a country. If this pro- cedure does not produce a solution, a proximity
for the Mediterranean zone because the seasonal cycle of photoperiod is less marked at these south-
search is invoked to find regions that are consid- ered similar to the region of interest. The first
erly latitudes, and millennia of selection have pro- duced crop cultivars with a life cycle that is closely
constraint is that the region of interest and the
197 G. Russell et al. European Journal of Agronomy 11 1999 187–206
tied to a growing season defined in terms of water attribute, value, and location and time of interest.
Associated with each task is a list of relevant availability rather than temperature and solar radi-
ation. Finally, if no solution is found, the search attributes Fig. 6, which are selected from a drop-
down list-box. The user then fills the remaining is extended to the whole knowledge base.
slots. The time slot is usually filled by ‘any time’, but exists for the few cases where information is
3.6. User interface only relevant for a particular period of time. The
screen also allows the help and encyclopaedia The main screen presents the user with a menu
bar with file, tasks and help menus. The file and functions to be accessed so that descriptions of the
attributes can be obtained. If further information help facilities include standard utilities, and the
task menu provides options related to the tasks is required from the user, an additional informa-
tion screen is displayed, which has the same format specified in Section 3.1. Task1 is divided into three,
thus giving a total of six options. Two of these as the query screen.
tasks are sub-divided, and further drop-down menus are displayed. The encyclopaedia query task
3.7. Results of testing the knowledge base accesses the data dictionary through a browser. If
one of the other tasks is selected, the query screen During development, the knowledge base was
tested thoroughly for accuracy in its retrieval and appears. This has slots for entering the query
Fig. 6. Stratification of query attributes by Crop Knowledge Base task.
198 G. Russell et al. European Journal of Agronomy 11 1999 187–206
reasoning. The knowledge base presently occupies oped, it was felt that it would be better to give the
operator guidance about search strategies rather approximately 2 Mb, and there are gaps in its
coverage because the information was not avail- than to produce a large number of solutions that
might not be relevant. There is an apparent incon- able, or because it was not considered a high
priority by the potential users. The purpose of the sistency in the system in that disease is noted as
causing yield loss although it is not marked as a work was to build a prototype, so it was more
important to concentrate on testing the utility of yield variation factor. This is because the data
came from separate sources. It would be possible the system rather than to aim for completeness. In
some cases, multiple answers are given, and the to include a rule that defined a yield variation
factor as one that resulted in yield falling below user has to exercise judgement as to which is the
most appropriate. In general, it was found that the potential.
terms were used consistently in the system in spite of a wide range of sources used. A selection of
3.7.2. Model parameters The system includes all the information needed
representative queries is given below to demon- strate the usefulness of the system. Note that the
to produce the input files needed to run the WOFOST crop model. This model requires two
answer to a query can be a single direct match, a series of direct matches where the value varies
sets of parameters, those that are constant for a particular crop and those that vary geographically.
within the location of interest or where there are conflicting views in the literature, an inferred solu-
Fig. 8 shows the results of a query about the initial crop specifications for emergence of winter wheat.
tion or a related solution. The query produces values for the parameters
TBASEM the threshold temperature for calculat- 3.7.1. Crop queries — site attributes
The first example query is ‘‘how much winter ing thermal time to emergence, TEFFMX the
maximum daily increment in thermal time, and wheat yield is lost to pathogens in France?’’. This
is a crop query to which the answer varies with TSUMEM the thermal time from sowing to emer-
gence and gives the reference to the source mate- site, i.e. location. The task menu offers four attri-
butes relevant to yield loss Fig. 6: reasons for rial. These results were checked to ensure that they
accurately reflected the original material. Although yield loss; yield variation factors; yield loss by
disease; yield loss by waterlogging. Fig. 7 shows this example involves a direct match, the same
procedure would be used to derive values for a an extract from the results for a query on yield
loss by disease. The full display gives information crop
for which
information was
unknown. Parameters that vary geographically can also be
on all the diseases known to the system that can affect winter wheat in France. Although there was
found on a region-by-region basis. In many cases, the information is incomplete or has previously
nothing relating directly to winter wheat, informa- tion was found for common wheat, which occurs
been derived by interpolation. The current version of the knowledge base can only reason hierarchi-
at the next level in the crop hierarchy. Both the binomial and common name are given for the
cally or by proximity, although it would be possible to develop a form of intelligent interpolation based
pathogen along with the recorded yield loss associ- ated with it. Additional information associated
on the degree of similarity between regions see Section 4.2. There is currently no facility to
with the disease and the yield loss are given as a comment. In this case, the information does not
automatically output the model parameter files themselves. Although this could be done, some
come from a primary source, but the reference allows the user to carry out a further investigation.
mechanism would be needed to reconcile any conflicting values.
In some cases, further information would be avail- able using the alarms and hazards task. Although
there is a logical distinction between the two tasks, 3.7.3. Location queries — site attributes
The next query is about the form of agriculture they are clearly related. However, although rules
for finding additional solutions could be devel- practised on eutric regosols in the Pinhal Litoral
199 G. Russell et al. European Journal of Agronomy 11 1999 187–206
Fig. 7. Results of a query about yeild loss due to disease in winter wheat in France.
200 G. Russell et al. European Journal of Agronomy 11 1999 187–206
Fig. 8. Results of a query about the WOFOST parameters relating to the initial crop specifications.
region of Portugal. The solution is given in Fig. 9 3.7.4. Crop queries — crop calendar
Fig. 11 gives three solutions to the query ‘‘when together with some qualifying information. The
proof tree that is used to show the chain of is winter wheat sown in Oost-Nederland?’’ The
second solution is for Gelderland, which is part of reasoning involved is given in Fig. 10. It can be
seen that the solution has been derived from the Oost-Nederland. Thus, queries about a region also
retrieve solutions for the constituent regions. If rules that limited arable farming takes place on
these soils in Italy and Portugal and that Pinhal the query had been posed for the Netherlands,
solutions would also have been found for Oost- Litoral is a part of Portugal. From the associated
comment, it can be seen that the type of farming Groningen
NUTS III
and Overig-Zeeland
NUTS III but not for the Netherlands in this suggested does not imply that other types of farm-
ing are not possible. This distinction between a case. Queries at country level can sometimes pro-
duce very large numbers of solutions, and so the null and a missing value is a recurring problem in
any knowledge base or database. If the system system is currently set up to limit the solutions to
ten. If ten solutions are produced, the user should were connected to an external regional statistical
database, it would be possible to develop more restrict the search.
A comparison of the first two queries also shows complex rules for allocating land use to pedocli-
matic mapping unit. an apparent inconsistency, with sowing starting
201 G. Russell et al. European Journal of Agronomy 11 1999 187–206
Fig. 9. Results of a query about the type of agriculture carried out on Eutric Regosols in Pinhal Litoral, Portugal.
later in the large region than in one of its constitu- to the specification right up to the end of the
project. Although the project has now finished, ents. This may be a genuine difference of opinion
due to differences in the primary sources. However, subsequent experience in using the system will
prove valuable in assessing its functionality in the term ‘earliest’ does not refer to the absolute
earliest date ever recorded but to the ten percentile practice and in drawing up a specification for
further developments. value averaged over a 5-year period. This is an
expert interpretation of the data given in the The tests showed that the system could perform
the tasks specified in Section 3.1 and highlighted original source. It is thus theoretically possible,
although unlikely in this case, for both statements some of the benefits of using this approach rather
than normal database operations. For example: to be true. The first and third solutions also differ.
These come from independent sources, and the 1. The system can infer solutions to a query where
there is no direct match. user must decide which to accept.
2. Conditional information, qualifying the validity of the information found, is given with each
solution.
4. Discussion