Introduction Directory UMM :Data Elmu:jurnal:E:European Journal of Agronomy:Vol11.Issue3-4.Nov1999:

European Journal of Agronomy 11 1999 187–206 www.elsevier.comlocateeja Development of a crop knowledge base for Europe G. Russell a,, R.I. Muetzelfeldt a, K. Taylor a, J.-M. Terres b a Institute of Ecology and Resource Management, the University of Edinburgh, West Mains Road, Edinburgh EH9 3JG, UK b Joint Reseach Centre — Space Applications Institute, I-21020, Ispra VA, Italy Accepted 19 April 1999 Abstract This paper describes the development of a Crop Knowledge Base System for use in crop modelling and yield forecasting. The inherent limitations of databases can cause problems for dealing effectively with geo-referenced data sets. The system described here includes all the normal database operations but also handles richer types of data. Values of queried attributes can be deduced through the use of heritability within hierarchies and by the application of rules that summarise sets of empirical observations. The system is able to reason about similar locations, so that where no information exists in the knowledge base for a particular location, then an alternative location, subject to similar meteorological and agricultural conditions, will be sought. © 1999 Elsevier Science B.V. All rights reserved. Keywords: Database; Knowledge base; Model; Phenology; Yield forecasting

1. Introduction

large sub-national regions, in order to manage the cereal market and to adjust the Common The aim of the project entitled Establishment of Agricultural Policy. Yield forecasts and informa- computerised crop knowledge base was ‘to design tion on crop condition are provided from deter- and implement a working, computer-based crop ministic Agrometeorological Crop Growth knowledge base’. The work was funded by the Simulation models Supit, 1997 using meteorolog- Agriculture Information System group of the Joint ical data, soil characteristics from the European Research Centre of the European Commission, soil map King et al., 1995 and crop-specific Ispra, Italy, as a means of rationalising existing parameters Boons-Prins et al., 1993. During the holdings of information in reports and databases monitoring period, which runs from February to within the MARS Monitoring Agriculture with October, the model is run monthly to provide a Remote Sensing project Genovese, 1998. A yield forecast for the main annual crops. The detailed account of the work is given in Russell model outputs and auxiliary information are then et al. 1997. analysed and integrated by a multi-disciplinary Since 1993, the European Commission has oper- team, including agronomists and statisticians, ated through the MARS project, a European yield- before the final figures are released to the custom- forecasting system at the level of countries and ers. During the analysis process, the model predic- tions are sometimes refined using expert agronomic knowledge to take account of factors not included Corresponding author. Tel.: +44-131-535 4063; in the models. The Crop Knowledge Base System fax +44-131-229-2601 E-mail address: g.russelled.ac.uk G. Russell was seen as a tool that could aid this activity by 1161-030199 – see front matter © 1999 Elsevier Science B.V. All rights reserved. PII: S 1 1 6 1 -0 3 0 1 9 9 0 0 03 0 - 1 188 G. Russell et al. European Journal of Agronomy 11 1999 187–206 integrating, in a systematic and objective manner, parts. The first is a generic operating shell ana- logous to a database management system and pieces of information that are difficult to store in a conventional database. consisting of the inference engine and the user interface. The second, which is specific to the crop Crop models for making regional scale predic- tions are now routinely run using data stored in knowledge domain, is the full knowledge base. It is made up of the core knowledge base, which the database facilities of Geographic Information Systems. Although this approach is very powerful, contains the data files and the rules that formulate the knowledge about the data, and the data limitations are imposed by the nature of conven- tional databases which, in the case of geo-refer- dictionary, which contains the definitions, termi- nology, units and synonyms used in the Crop enced data, include: a rigid and pre-determined structure; Knowledge Base System and which is itself a specialised form of knowledge base. a need to have an entry for each regional entity; the practical difficulties of tagging entries with qualifying information; 2.2. Construction of the crop knowledge base an inability to cope with conflicting information except by flagging it as an error; and The knowledge base was constructed according to the guidelines of Debenham 1989. This the need for extensive modifications when com- ponent maps and associated databases are involved a series of stages see below correspond- ing to the procedures in Fig. 1, each of which was updated by changing boundaries or adding new fields. characterised by the production of a document, data files, or software. In practice, these stages are Knowledge bases can include all the functions of a relational database Black, 1986 but are able not independent, and an element of iteration was required. Broadly, the knowledge base was devel- to handle information in a more flexible manner. They can deduce values for an attribute using the oped in two phases: 1 development of the operat- ing shell using a small knowledge base, which concept of heritability within a hierarchy, by apply- ing a rule that summarises empirical observations, included a representative set of information and associated data dictionary, and 2 fine-tuning the or by using transfer rules King et al., 1995; Batjes, 1996 that relate the required attribute to a known shell and expanding the amount of information held in the knowledge base and data dictionary. attribute. The extended knowledge database for interpreting the 1:1 000 000 EU soil map Van Ranst et al., 1995 is a type of knowledge base 2.2.1. Task specification Potential users from a range of disciplines and that uses rules to deduce soil properties from other information held in the system. Although this rule- organisations were consulted by questionnaire and personal interview to identify the tasks that the based approach for estimating unknown values could be incorporated as a function within a crop system should be able to perform. The question- naire was structured so that the respondent was model, there is a strong argument for separating models from their data sources, not least because offered a choice of options but could add addi- tional options. the derived data may be required by several models. 2.2.2. Application Model The Application Model is the formal specifica- tion of the knowledge to be included in the knowl- 2. Methodology edge base. In the present case, it consists of a small number of generic templates in stylised or quasi- 2.1. Components of the system natural language, representing the different types of knowledge to be incorporated into the system. The components of the Crop Knowledge Base System and their relationships are shown in the Sentences and tables from a set of representative publications Narciso et al., 1992; Boons-Prins bottom part of Fig. 1. The system consists of two 189 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Fig. 1. Project procedures, goals and communications links. Thin arrows show the procedures linking different stages. Thick arrows show the flow of information linking the different components in the completed Knowledge Base System. et al., 1993; Russell and Wilson, 1994 were ana- details, so that each piece of information was understandable on its own; lysed by dividing them into atomic statements and re-writing them in simple English. Atomic state- 5. classify and group types of information to pro- duce a small number of generic templates. ments are simplified and unambiguous sentences consisting of a subject and object that are recog- nised entities, a verb that describes the relationship 2.2.3. Knowledge Model Knowledge modelling is the formal representa- between them, and any conditional clauses. The procedure followed was to: tion in a computer language of the statements derived from the process of knowledge acquisition 1. identify tables and sentences from the sample texts which contain relevant information; such that the semantics are made clear and accessible to the automated inference mechanism 2. select within these tables and sentences items of information that were either considered typical whose reasoning will utilise the knowledge content of these formal representations. Each element of of the agronomy domain or represented particu- lar challenges; the Knowledge Model is associated with one of the Application Model templates. Construction 3. translate the information into simplified English; of the Knowledge Model depends on the process of knowledge acquisition. 4. add ancillary information meta-information, such as qualifying, contextual and source The types of information required for the system 190 G. Russell et al. European Journal of Agronomy 11 1999 187–206 e.g. crop calendars, factors limiting yield, and solution. Sometimes, the query is a simple one where the desired information is contained in the crop model parameters for a range of crops and system and can be looked up. Often, however, the locations were inferred from the objectives of the Inference Engine has to use a sequence of rules to project. If the data source is unreliable, then any infer a solution. data entered as facts could potentially undermine the whole system, particularly if these ‘facts’ are 2.2.6. User interface then used to deduce new information. In all cases, The user interface was developed within an attempt was therefore made to establish the Microsoft Windows and was menu driven. Thus, reliability of the data either in terms of its origin, all tasks, user queries, and requirements for addi- e.g. papers in refereed journals that were consid- tional information from the user are selectable ered reliable, or consistency with other informa- from drop-down lists of items. tion. No attempt was made to resolve incon- sistencies between different but reliable sources in 2.3. Procedures for testing case the discrepancy was due to real but uniden- tified factors. Two aspects of the Knowledge Base System One way of rapidly expanding the content of were tested for accuracy and functionality. These the knowledge base would be to connect the system were the software shell i.e. the inference engine to external databases using knowledge base rules plus the user interface and the knowledge base. that include the field names. Such links would be Unit testing of the software shell was carried out particularly advantageous for databases that have in phase 1 of the development and a workshop to be updated regularly, such as those of agricul- was held in phase 2 to test the system and identify tural statistics, and for those specifying the regional any errors. distribution of climates and soil types. The system At all stages of development of the system, the was therefore connected to a small external test information in the knowledge base was routinely database, and an investigation was carried out into tested against the four criteria suggested by Walker whether a reliable link could also be made to and Sinclair 1995: externally maintained databases. validity of representation, relevance, ambiguity and utility associated with individual 2.2.4. Data Dictionary statements; The Data Dictionary defines the attributes used repetition of, and contradictions between, in the Knowledge Model. The definitions resemble statements; encyclopaedia entries more than dictionary entries completeness of the knowledge base as a whole; since they include synonyms, units, maximum and consistency in the use of terms. minimum values and other information required The results of the tests were compared with the to establish the full meaning of the knowledge appropriate data files in the knowledge base and statements. All items in the Data Dictionary were any anomalies were assessed from the aspect of designed to be accessible to the reasoning process, both system operation and the representation and e.g. to allow the identification of values outside validity of the data. Error tracing was facilitated the permitted range and to convert from one set by the inclusion of an option, a proof tree, for of units to another. As the dictionary was compiled listing the rules used to satisfy each query. The from a wide range of sources, it was important to continuous assessment and correction procedure check that all the definitions were applicable was designed to minimise the risk of errors affect- throughout Europe. ing the further development of the system. 2.2.5. Inference engine 2.4. Software issues The inference engine infers solutions to user queries by applying rules contained in the knowl- Prolog Bratko, 1990, which is one of the two major languages used in the field of Artificial edge base in sequence till it finds a matching 191 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Intelligence, was used to represent relationships within the system. The table templates in particular were found to be a versatile format for representing and rules in the knowledge base and for program- ming the inference engine of the operating shell. the data, as many of the sources favoured tabular formats. Even when this was not the case, it was Its features, such as pattern matching, tree-based data structuring and automatic backtracking, pro- often possible to manipulate the data to fit the templates. However, care had to be taken to ensure vide a powerful and flexible framework for solving problems that involve data items and the relation- that the sense of the actual facts was not altered when doing this. ships between them. The version used was Logic Programming Associates WinPro Version 3.1, which provided interfacing with the Microsoft 3.3. Knowledge model Windows environment. 3.3.1. Attribute-value statements This general-purpose statement was able to

3. Results