Results Directory UMM :Data Elmu:jurnal:E:European Journal of Agronomy:Vol11.Issue3-4.Nov1999:

191 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Intelligence, was used to represent relationships within the system. The table templates in particular were found to be a versatile format for representing and rules in the knowledge base and for program- ming the inference engine of the operating shell. the data, as many of the sources favoured tabular formats. Even when this was not the case, it was Its features, such as pattern matching, tree-based data structuring and automatic backtracking, pro- often possible to manipulate the data to fit the templates. However, care had to be taken to ensure vide a powerful and flexible framework for solving problems that involve data items and the relation- that the sense of the actual facts was not altered when doing this. ships between them. The version used was Logic Programming Associates WinPro Version 3.1, which provided interfacing with the Microsoft 3.3. Knowledge model Windows environment. 3.3.1. Attribute-value statements This general-purpose statement was able to

3. Results

represent a large proportion of the facts within the knowledge base: 3.1. Task specification att_val Attribute, Value, Place, Time The potential users identified four main tasks The argument, Attribute, is used to specify to be carried out by the Crop Knowledge Base the attribute of a crop, an object, or a process to System: which the statement can be applied. The Value 1. to specify where location or environment a argument gives the value for the attribute appro- particular crop species is or could be grown; priate for the combination of circumstances given 2. to identify situations soil, weather in which by the final two arguments. Place and Time crop yields are significantly reduced below the specify the location and the year, season or pheno- water-limited potential; logical stage to which the information applies. The 3. to generate the input files needed to run crop value of an attribute can be given as a word, lines models; of text, a number or a list of items. Facts can be 4. to act as an encyclopaedia to store information turned into rules by making their truth conditional about crops. on the truth of other att_val statements. In the All these goals are achievable, although not all example given below, the Value argument speci- were fully implemented in the current version of fies which particular object of the category ‘crop’ the Crop Knowledge Base System. the information applies to at location ‘Scotland’, at ‘any time’. This type of formulation would be 3.2. Application model used as a conditional clause attached to another att_val statement, for example giving the earliest An example of the output from this process is date of harvest: shown in Fig. 2. Although several hundred state- att_val crop, ‘common winter wheat’ , ments were examined, it was found that almost all ‘Scotland’ , ‘any time’. could be allocated to one of four classes: statements about the values of attributes of an object; 3.3.2. Hierarchical statements Attribute-value statements are linked to crop, statements about position in a taxonomic hierarchy; location and soil taxonomies included in the Crop Knowledge Base. If there is no direct match to a tables of values; and references to time. query, the system is able to use its hierarchical structures to find solutions at other levels of aggre- These four basic unitary knowledge statement types were found to be sufficient to represent the gation. A simplified crop hierarchy is given in Fig. 3. Similar hierarchical structures are employed relationships and rules defining the knowledge 192 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Fig. 2. Example showing a an extract from Narciso et al. 1992, page 33, and b part of the results of the knowledge extraction process. The indices in the remarks section of the document refer to entries in a legend that has not been presented here. QU: qualifying information; CT: contextual information; DD: data dictionary information; RF: the reference from which the item of knowledge was extracted; SO: the source of knowledge, if different from the reference. for the location and soil taxonomies. If the user relating either to common wheat or durum wheat and so on down to the lowest level of the hierarchy. queries the system about wheat, and the knowledge base does not contain the information specifically However, if information is still not found, the system will search upwards for generalised state- related to it, the system will search for information 193 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Fig. 3. Part of the crop hierarchy showing cereal and wheat types. ments about first cereals then arable crops. The crossed the boundaries of the physiographic region, the agricultural part of the former could crop hierarchy is essentially botanical and is broadly compatible with the classification adopted usually be allocated entirely to one of the latter regions. The division of Europe into grouped areas by EUROSTAT for the collection of agricultural statistics. The division between winter and spring was only partially implemented since there is no general agreement about their appropriate bound- crops is actually between autumn and spring sow- ings since biological spring types can be sown in aries, although river basins would make a good starting point. The system also contained a table autumn, and the classes are defined in terms of the date of sowing. This classification implies that of latitudes and longitudes of the centroid of the agricultural area of each NUTS I region or coun- winter and spring wheat have more in common than winter wheat and winter barley. try, for places outside the EU. Soils were classified according to the modified Location is specified in terms of administrative regions in the NUTS Nomenclature des Unite´s FAO classification adopted for the 1:1 000 000 Soil Map Of the European Communities CEC, Territoriales Statistiques scheme of EUROSTAT expanded with primary administrative regions for 1985 countries outside the EU. Although each NUTS region has a code number, this has been revised 3.3.3. Table references entries An example of a general template used in the more than once, and it was felt that it would be better to use the name of each region as the knowledge base for the table entries is identifier. Unfortunately, in several cases, the same table CropType, AgriculturalPractice, region name is used for more than one country. LocationList, ListofRelatedInformation. To overcome this and to improve the search effi- ciency, each region name was linked directly to The Knowledge Model representation of the data given in Fig. 2 is shown as an instance of a the top region of its hierarchy. The system can take account of the disparity table entry in Fig. 4. This table entry only covers the optimal conditions for a clay soil type. Similar that there often is between the boundaries of administrative regions and of agro-ecological table entries give the data for loam and sandy loam soils, which are also mentioned in the zones. A number of searchable physiographic regions, such as the Po valley of northern Italy, extracted information. Other sets of tables in the knowledge base cover the information relating to were defined as aggregations of NUTS III regions. Although the NUTS region boundaries often unacceptable agricultural practices, for instance 194 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Fig. 4. Example of a knowledge base table entry for optimal agricultural practices relating to skim ploughing for a common wheat crop in Italy, Spain or Greece using information from Narciso et al. 1992. those that result in crop failure, and crop depend on context, and this was not always stated explicitly. For example, since the list of factors calendars. significantly affecting wheat yield varies with cli- mate and soil, it is important to specify exactly 3.3.4. Reference to time Time refers to any temporal indication including which region an author’s statements refer to. Some publications contained useful compila- day, month, year or phenological stage and is represented by the statement: tions of data from primary sources although inad- vertent or deliberate changes had occasionally been temp_reason TimeOfInterest, made in the process. No data were entered into a ListOfValidTimes. data file without checking the accuracy of content and spelling. The latter is particularly important This predicate should perhaps be considered the temporal equivalent of the spatial hierarchies. It is when it is used as a search term. Typographical errors were often found in the source literature. used to signify whether a given time occurs within a longer period and is called upon to perform a For example, the phrase ‘forbed and sprangled roots’ appeared in one description of sugar beet check on the temporal applicability of the informa- tion requested by the user. in place of ‘forked and strangled roots’. Dubious terms like this can be spotted by an alert operator and referred to a dictionary or an agronomist for 3.3.5. Knowledge acquisition Crop data relevant to Europe, i.e. west of the resolution. More difficult to spot are errors in numbers unless the error is in the position of a Ural Mountains, and North Africa were extracted from 30 documents covering the major crops, decimal point. In other cases, data sources were found to quote different values for the same attri- although the extent of the coverage varied with crop. Data acquisition was carried out by an bute. This problem was overcome by first seeking expert advice to determine whether one should be operator without specialised agronomic knowl- edge, and, although agronomists were always preferred or, if not, giving both with qualifying comments. Problems were encountered where available to assist, this help was rarely required. It was sometimes difficult to correctly interpret state- terms were inadequately defined in the source literature. For example, the Leaf Area Index of ments expressed in natural language. Meaning can 195 G. Russell et al. European Journal of Agronomy 11 1999 187–206 cereals may exclude the area of the leaf sheaths, 3.5. Inference engine and soil water content can be expressed volumetri- cally or gravimetrically. Where terms were defined, Prolog itself includes an inference engine that can be used for reasoning with a knowledge base. it was usually possible to convert to a single scale. This was often the case with phenological stages, However, it has to be expanded, as described below, to allow more complex queries to be addressed. for which several descriptions of development are currently in use Landes Porter, 1989. The system includes information about the 3.5.1. Data matching Fig. 5 shows how the system attempts to match parameters required to run the WOFOST Van Diepen et al., 1989 crop model. Parameter values a query to the information in a table of type: for other models can be derived manually from table_name CropType, PhenologicalStage, these or from other information in the system. LocationList, ListofRelatedInformation. The system was interfaced successfully with an external database. However, this could only be The user inputs crop, phenological phase and location, and the system works from left to right done reliably when there was a formal link between the field names in the database and the attributes to attempt a direct match with the knowledge in the knowledge base. Problems were encountered known to the system, even though the Crop Knowledge Base System could read the database in reasoning with phenological stages, and these are explained later. field names. This was partly because database management systems have restrictive rules for naming fields and partly because there is no guar- 3.5.2. Hierarchical searches One of the important features provided by the antee that the definition of field names is the same in both systems. Thus, the completed prototype Crop Knowledge Base system is its ability to infer a solution if there is no direct match with informa- did not include a facility for interfacing with external systems. tion in the knowledge base. In a conventional database, if any one of the first three arguments By the end of the project, enough information had been included to demonstrate the potential of i.e. fields in a database of the table did not match the input query, the system would fail to produce the system. a solution. However, in the Crop Knowledge Base System, the inference engine searches for solutions 3.4. Data Dictionary to related queries. Thus, an unsatisfied query about ‘common wheat’ would automatically be directed Construction of the Data Dictionary was gen- erally straightforward, although there are terms to ‘common winter wheat’ and ‘common spring wheat’, and, failing this, ‘wheat’ itself. If only one such as Leaf Area Index that can be defined in more than one way. This problem was overcome of the three arguments finds no direct match, then the appropriate hierarchical search is carried out by including a warning in the entry. The main difficulty experienced was ensuring completeness. until a solution is found. However, if more than one argument fails to match, a decision has to be This task could be made easier by automatically generating a pro forma every time a new term is made about the order in which the matches are sought. If the crop of interest can be matched to added to the system. Two types of entry can be distinguished, those referring to attribute names data in the knowledge base, whether directly or hierarchically, then the system tries to match the and those referring to terms used in qualifying information or in the data dictionary definition phenological stage. If no match is found, then a solution will be sought for any phenological stage. themselves. Ideally, a restricted and consistent vocabulary would be used for all entries. In either case, a search is then made to match the location of interest. In the table entries, the loca- Compound terms were treated as single entries although it would be better to index the root term tions for which the information is valid are format- ted into a list of locations. The list is searched for as well as all the occurrences of the compounds. 196 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Fig. 5. Matching an input query to information held in a table. PHENO: phenological stage. a direct match, and if none is found, the system candidate region must both be either inside or outside the Mediterranean zone, which is defined attempts to link the query to a location for which information is held using the location hierarchy. in a rule as a set of NUTS regions and countries. The Mediterranean zone was identified for special Should this be unsuccessful, further inferences are made with regard to nearby regions, as described treatment because the climate imposes constraints on agriculture that have led to the development of in Section 3.5.3. It is important to recognise that these search similar farming systems across the whole area and which differ from those elsewhere. The next step paths and criteria are themselves rules developed from expert knowledge and that other formula- is to identify NUTS I regions whose centroid lies within a rectangle centred on the centroid of the tions could equally well have been included. region for which information is required. For regions outside the Mediterranean zone, the search 3.5.3. Inferring alternative locations Users querying the knowledge base specify the is initially restricted to one of four quadrants north or south of latitude 50° N; east or west of geographical region of interest. If there is no exact match, the system first tries to find a solution for longitude 20° E . These divisions were chosen subjectively to separate northern and southern a region above or below the area of interest in the location hierarchy up to the level of the primary farming systems and eastern and western farming systems. Latitude is not used in the search rules administrative region of a country. If this pro- cedure does not produce a solution, a proximity for the Mediterranean zone because the seasonal cycle of photoperiod is less marked at these south- search is invoked to find regions that are consid- ered similar to the region of interest. The first erly latitudes, and millennia of selection have pro- duced crop cultivars with a life cycle that is closely constraint is that the region of interest and the 197 G. Russell et al. European Journal of Agronomy 11 1999 187–206 tied to a growing season defined in terms of water attribute, value, and location and time of interest. Associated with each task is a list of relevant availability rather than temperature and solar radi- ation. Finally, if no solution is found, the search attributes Fig. 6, which are selected from a drop- down list-box. The user then fills the remaining is extended to the whole knowledge base. slots. The time slot is usually filled by ‘any time’, but exists for the few cases where information is 3.6. User interface only relevant for a particular period of time. The screen also allows the help and encyclopaedia The main screen presents the user with a menu bar with file, tasks and help menus. The file and functions to be accessed so that descriptions of the attributes can be obtained. If further information help facilities include standard utilities, and the task menu provides options related to the tasks is required from the user, an additional informa- tion screen is displayed, which has the same format specified in Section 3.1. Task1 is divided into three, thus giving a total of six options. Two of these as the query screen. tasks are sub-divided, and further drop-down menus are displayed. The encyclopaedia query task 3.7. Results of testing the knowledge base accesses the data dictionary through a browser. If one of the other tasks is selected, the query screen During development, the knowledge base was tested thoroughly for accuracy in its retrieval and appears. This has slots for entering the query Fig. 6. Stratification of query attributes by Crop Knowledge Base task. 198 G. Russell et al. European Journal of Agronomy 11 1999 187–206 reasoning. The knowledge base presently occupies oped, it was felt that it would be better to give the operator guidance about search strategies rather approximately 2 Mb, and there are gaps in its coverage because the information was not avail- than to produce a large number of solutions that might not be relevant. There is an apparent incon- able, or because it was not considered a high priority by the potential users. The purpose of the sistency in the system in that disease is noted as causing yield loss although it is not marked as a work was to build a prototype, so it was more important to concentrate on testing the utility of yield variation factor. This is because the data came from separate sources. It would be possible the system rather than to aim for completeness. In some cases, multiple answers are given, and the to include a rule that defined a yield variation factor as one that resulted in yield falling below user has to exercise judgement as to which is the most appropriate. In general, it was found that the potential. terms were used consistently in the system in spite of a wide range of sources used. A selection of 3.7.2. Model parameters The system includes all the information needed representative queries is given below to demon- strate the usefulness of the system. Note that the to produce the input files needed to run the WOFOST crop model. This model requires two answer to a query can be a single direct match, a series of direct matches where the value varies sets of parameters, those that are constant for a particular crop and those that vary geographically. within the location of interest or where there are conflicting views in the literature, an inferred solu- Fig. 8 shows the results of a query about the initial crop specifications for emergence of winter wheat. tion or a related solution. The query produces values for the parameters TBASEM the threshold temperature for calculat- 3.7.1. Crop queries — site attributes The first example query is ‘‘how much winter ing thermal time to emergence, TEFFMX the maximum daily increment in thermal time, and wheat yield is lost to pathogens in France?’’. This is a crop query to which the answer varies with TSUMEM the thermal time from sowing to emer- gence and gives the reference to the source mate- site, i.e. location. The task menu offers four attri- butes relevant to yield loss Fig. 6: reasons for rial. These results were checked to ensure that they accurately reflected the original material. Although yield loss; yield variation factors; yield loss by disease; yield loss by waterlogging. Fig. 7 shows this example involves a direct match, the same procedure would be used to derive values for a an extract from the results for a query on yield loss by disease. The full display gives information crop for which information was unknown. Parameters that vary geographically can also be on all the diseases known to the system that can affect winter wheat in France. Although there was found on a region-by-region basis. In many cases, the information is incomplete or has previously nothing relating directly to winter wheat, informa- tion was found for common wheat, which occurs been derived by interpolation. The current version of the knowledge base can only reason hierarchi- at the next level in the crop hierarchy. Both the binomial and common name are given for the cally or by proximity, although it would be possible to develop a form of intelligent interpolation based pathogen along with the recorded yield loss associ- ated with it. Additional information associated on the degree of similarity between regions see Section 4.2. There is currently no facility to with the disease and the yield loss are given as a comment. In this case, the information does not automatically output the model parameter files themselves. Although this could be done, some come from a primary source, but the reference allows the user to carry out a further investigation. mechanism would be needed to reconcile any conflicting values. In some cases, further information would be avail- able using the alarms and hazards task. Although there is a logical distinction between the two tasks, 3.7.3. Location queries — site attributes The next query is about the form of agriculture they are clearly related. However, although rules for finding additional solutions could be devel- practised on eutric regosols in the Pinhal Litoral 199 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Fig. 7. Results of a query about yeild loss due to disease in winter wheat in France. 200 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Fig. 8. Results of a query about the WOFOST parameters relating to the initial crop specifications. region of Portugal. The solution is given in Fig. 9 3.7.4. Crop queries — crop calendar Fig. 11 gives three solutions to the query ‘‘when together with some qualifying information. The proof tree that is used to show the chain of is winter wheat sown in Oost-Nederland?’’ The second solution is for Gelderland, which is part of reasoning involved is given in Fig. 10. It can be seen that the solution has been derived from the Oost-Nederland. Thus, queries about a region also retrieve solutions for the constituent regions. If rules that limited arable farming takes place on these soils in Italy and Portugal and that Pinhal the query had been posed for the Netherlands, solutions would also have been found for Oost- Litoral is a part of Portugal. From the associated comment, it can be seen that the type of farming Groningen NUTS III and Overig-Zeeland NUTS III but not for the Netherlands in this suggested does not imply that other types of farm- ing are not possible. This distinction between a case. Queries at country level can sometimes pro- duce very large numbers of solutions, and so the null and a missing value is a recurring problem in any knowledge base or database. If the system system is currently set up to limit the solutions to ten. If ten solutions are produced, the user should were connected to an external regional statistical database, it would be possible to develop more restrict the search. A comparison of the first two queries also shows complex rules for allocating land use to pedocli- matic mapping unit. an apparent inconsistency, with sowing starting 201 G. Russell et al. European Journal of Agronomy 11 1999 187–206 Fig. 9. Results of a query about the type of agriculture carried out on Eutric Regosols in Pinhal Litoral, Portugal. later in the large region than in one of its constitu- to the specification right up to the end of the project. Although the project has now finished, ents. This may be a genuine difference of opinion due to differences in the primary sources. However, subsequent experience in using the system will prove valuable in assessing its functionality in the term ‘earliest’ does not refer to the absolute earliest date ever recorded but to the ten percentile practice and in drawing up a specification for further developments. value averaged over a 5-year period. This is an expert interpretation of the data given in the The tests showed that the system could perform the tasks specified in Section 3.1 and highlighted original source. It is thus theoretically possible, although unlikely in this case, for both statements some of the benefits of using this approach rather than normal database operations. For example: to be true. The first and third solutions also differ. These come from independent sources, and the 1. The system can infer solutions to a query where there is no direct match. user must decide which to accept. 2. Conditional information, qualifying the validity of the information found, is given with each solution.

4. Discussion