Knowledge, representation and models

survey of artificial intelligence techniques in remote sensing. Knowledge is defined in Section 2, as well as approaches to knowledge representation, control issues and approaches to the modelling of features in machine vision. Feature representation and the fea- ture recognition process are covered in Section 3, while examples of the application of the methods of knowledge representation in both photogrammetry and remote sensing are presented in Section 4.

2. Knowledge, representation and models

2.1. Definitions of knowledge Ž . Merriam Webster Dictionary 1999 defines knowledge as the fact or condition of knowing some- thing with familiarity gained through experience or association, the range of one’s information or under- standing, the sum of what is known. Representation is the act or action of representing; the state of being represented, the act of delineating; to represent is to serve as a sign or symbol of, serve as the counterpart or image of, to describe as having a specified charac- ter or quality, to correspond to in essence. A model is a miniature representation of something, an exam- ple for imitation or emulation. Within computer vision and artificial intelligence, these terms are used loosely in conformance with their dictionary meanings, and technical definitions are hard to come by. Computer vision must produce a ‘‘useful’’ description of a scene depicted in an image, whose initial representation is an array of image intensity values. At the low-leÕel Õision stage, the early processing of the image takes place. Do- main-independent image processing algorithms ex- tract, characterise and label components at the mid- dle-leÕel Õision stage. This stage then delivers more generalised image representations to the higher-leÕel Ž . Õ ision stage Ballard and Brown, 1982; Marr, 1982 , which attempts to emulate cognition. To cope with the changes in lighting and viewpoint, the effects of shape and shading, variations in the imaging process such as in camera angle and position, and noise at the lower-level image processing stage, we need knowledge of the world in which the images are recorded and the specific application, via rich repre- sentations at the higher level, which in computer vision are usually called models. These models ex- plain, describe, or abstract the image information. The gap between the image and the models is bridged via a range of representations which connect the input image to the output interpretation. The repre- Ž sentations are categorised as Ballard and Brown, . 1982 : 1. Generalised images, which are iconic and ana- logue representations of the input data; binary imagesrsilhouettes are examples. 2. Segmented images, consisting of sets of pixels likely to correspond to real objects; for example, the outputs of segmentation algorithms. 3. Geometric representations, which deal with shape information; many object models in computer vision, for example, are shape-based. 4. Relational models, which encode knowledge that is used in high-level reasoning and interpretation; Artificial Intelligence tools are often used for representation and modelling. Many semantic net- work models in the literature, for example, fall into this category, some of which are mentioned later in the paper. Each method of representation has limited applica- tions on its own. Hence, all four types of representa- tions are vital in the image interpretation task. We shall concentrate on the fourth category, namely relational models, which bring together knowledge representation and models for the purpose of image understanding. 2.2. Knowledge representation The objective of knowledge representation is to Ž express knowledge in computer-tractable form Rus- . sell and Norvig, 1995 . A good knowledge represen- tation language should be expressive, concise, unam- biguous, and context-independent. First Order Logic Ž . FOL is the basis of many representation schemes in artificial intelligence. FOL has a formal syntax and semantics, and the interpretation of a sentence in the language is the fact to which it refers. Inference procedures for FOL permit one to derive new sen- tences from old ones. Such a formal inference proce- dure may be used to automatically derive valid con- clusions from known facts. Both logic programming languages and production systems are based on FOL. Logic programming languages such as Prolog permit the representation of knowledge in a re- stricted form of FOL; they also implement inference procedures and allow the derivation of new informa- tion from current knowledge. They usually use back- ward chaining for control, which applies the logical inference rule backwards: to prove something, they find logical implications in the database that would allow the conclusion of the desired statement. Thus, when a goal is to be established, backward chaining is the preferred mode of inference. Production systems consist of a knowledge base of facts and a set of rules or productions, repre- sented using logical implication. The following is an example of a production: IF a region is an elongated and homogeneous object THEN it belongs to a road object. The production system applies the rules to the knowledge base and obtains new assertions, in an endless cycle called the match–select–act cycle. In the match phase, the system finds all rules whose antecedent is satisfied by current data values. In the select phase, the system decides on one rule to execute, out of the matched rules in the first phase. The selected rule is then executed in the act phase, where the execution of a rule might involve inser- tions into and deletions from the knowledge base as well as input and output of data values. Frames and semantic networks are popular knowledge representation schemes in artificial intel- ligence and recently, in photogrammetry and remote sensing. They use the metaphor that objects are nodes in a graph, that these nodes are organised in a taxonomic structure, and that links between nodes represent binary relations. In frame systems, the binary relations are thought of as slots in one frame that are filled by another frame; in semantic net- works, they are thought of as arrows between nodes. The meaning and implementations of these two types of systems can be identical. Description logic systems evolved from semantic networks; the basic idea is to express and reason with complex definitions of, and relations among, objects and classes. Description logics provide three Ž . kinds of reasoning services Nebel, 1990 : 1. Classification of concept descriptions, by auto- matic arrangement of concepts in a specialisation hierarchy. 2. Classification of individual objects given a de- scription of their properties. 3. Maintenance of the overall consistency of the knowledge base. The languages provided by these logics are rather inexpressive and it is difficult to specify complex constraints. Their advantages are that they have for- mal semantics on which the reasoning services are based, as well as simple logical operations. In summary, a logic programming language such as Prolog has an execution model that is simple enough for a programmer to deal with. Recently, the introduction of Prolog compilers has served to boost the desirability of Prolog for prototyping small-scale artificial intelligence projects, in preference to C. Production systems are popular in modelling human reasoning; unlike Prolog, production systems are not query-based, and are good for implementing open- ended non-terminating systems, which are in opera- tion continuously. Semantic networks provide a graphical interface which is easier to comprehend than text-based formalisms. They can be as expres- sive as FOL, though most are not, since they impose severe constraints on what may be expressed. Their advantages include the ability to express hierarchical connections in a modular fashion, and their relative simplicity. Description logics combine clear seman- tics with simple logical operations. Therefore, while all these schemes are based on FOL, there are trade- offs to using one or the other. 2.3. Control issues Whatever the image representation chosen, the processing of the image data and its representations may be either image data-driÕen, called bottom-up control, or internal model-driÕen, also called top- Ž down control Ballard and Brown, 1982; Sonka et . al., 1993 . Bottom-up data-driven control progresses through image preprocessing and segmentation to descriptions, with each stage producing data for the next. Bottom-up control is useful if domain-indepen- dent image processing is cheap, and input data is Ž . Ž . accurate and reliable. Marr 1982 and Ullman 1984 advocated the bottom-up approach on the basis that bottom-up processing of data occurs invariably in human vision. Marr saw this leading to an intermedi- ate representation called the 2 1r2-D sketch, con- taining surface orientations and distances in a viewer-centred frame of reference, as well as discon- tinuities in surface distances and orientations. In addition, Ullman hypothesised higher-level processes called visual routines, which detect features of inter- est in the intermediate representation. Top-down model-driven control is driven by ex- pectations and predictions generated in the knowl- edge base. Thus, model-driven control attempts to perform internal model verification, in a goal-di- rected manner. A common top-down technique is hypothesise-and-Õerify, which can normally control low-level processing. There appears to be support for the view that some aspects of human vision are not bottom-up, and the model-driven approach is moti- vated by this observation, as well as the desire to minimise low-level processing. In practice, computer vision systems tend to favour mixed top-down and bottom-up control that focuses attention efficiently and makes the computation prac- tical. Either parallel or serial computation may be performed within any of these schemes. Both top-down and bottom-up controls imply a hierarchy of processes. In heterarchical control, pro- cesses are viewed as a collection of cooperating and competing experts, and at any time, the ‘expert’ which can ‘help the most’ is chosen. Blackboard architectures are an example of this approach, in which modular knowledge sources communicate via Ž . a common blackboard memory to which they can write and from which they can read. 2.4. Modelling issues In the model-based approach to computer vision, a priori models of possible objects in a class of images are defined and utilised for object recogni- tion. The models encode external knowledge of the world and the application. Object models may be appearance models, shape models, physical models, etc. Each of these should capture the range of varia- tion in the presentation of the object due to changes in viewpoint, in lighting, and even changes in shape Ž . in the case of flexible objects Pope, 1995 . In addition, variations due to the image acquisition process itself, as well as variations among individual members of the object class, should be accounted for. Objects of interest may be 2-D or 3-D; they may also be rigid, articulated, or flexible. The images themselves may be range images or intensity images. Recognition is accomplished by determining a corre- spondence between some attributes of the image and comparable attributes of the model in a matching Ž . phase. Relevant attributes of a model image are represented using one of the representation schemes discussed earlier. Recognising a 3-D object in an intensity image of an unrestricted scene is the most difficult form of the problem, and aerial and space images fall into this category. Loss of depth due to projection, occlusion and cluttering of details are some of the problems occurring; further, image in- tensity is only indirectly related to object shape.

3. Automated feature extraction