Introduction Directory UMM :Data Elmu:jurnal:P:Photogrametry & Remotesensing:Vol55.Issue3.Sept2000:

Ž . ISPRS Journal of Photogrammetry Remote Sensing 55 2000 189–200 www.elsevier.nlrlocaterisprsjprs Location similarity of regions q Stephan Winter Department of Geoinformation, Technical Uni Õersity Vienna, Gusshausstr. 27-29, A-1040 Vienna, Austria Received 25 February 1999; accepted 4 March 2000 Abstract This article gives a systematic investigation of location-based similarity between regular regions. Starting from reasonable conditions for such measures, it is shown that there is only a finite number of location properties to be compared. The complete set of combinations is presented, and their behaviour and interpretability are discussed. Similarity measures are needed for all kinds of matching problems, including merging spatial data sets, change detection, and generalization. However, the measures are empirical measures. Therefore, measures found in literature seem to be chosen at random. With this synopsis, I show the differences of behaviour for all available choices. q 2000 Elsevier Science B.V. All rights reserved. Keywords: similarity; topological relations; quality assessment; distance

1. Introduction

1.1. Moti Õation Similarity is a concept widely used, referring to Ž . space and geographic information systems GIS ; it provides the basis for handling positional uncertainty and imprecision, for matching spatial entities, for merging spatial data sets, for change detection, or for generalization. Since similarity is the basis, there needs to be a measure to make it quantifiable. Addi- tionally, similarity is the central notion for any ab- straction and has been discussed in the categorization q This article is a significantly revised and extended version of: Location-based similarity measures of regions. In: Fritsch, D., Ž . Englich, M., Sester M. Eds. : GIS Between Visions and Applica- tions. International Archives of Photogrammetry and Remote Ž . Sensing, Vol. 32r4 1998 , pp. 669–676. Ž . E-mail address: wintergeoinfo.tuwien.ac.at S. Winter . controversy as an undecidable problem for 2000 Ž . years Flasch, 1986 . The fundamental question of similarity is to find a common reference frame for measuring: there are so many aspects of physical, linguistic or semantic similarity, that a statement ‘A is similar to B’ contains no information as long as the referred aspects are not specified. For the present investigation, the reference frame is location and location-based similarity. Spatial entities in databases, in this article as- sumed as regions, are models of real world objects. The comparison of the location of two regions from different data sets is based on the hypothesis that both are modeling the same object. The grade of similarity allows an assessment of that hypothesis. It is most likely that two models are never identical, because of different concepts applied to the real world, of context-dependent levels of detail, of changes in a dynamic world, and of errors in data 0924-2716r00r - see front matter q 2000 Elsevier Science B.V. All rights reserved. Ž . PII: S 0 9 2 4 - 2 7 1 6 0 0 0 0 0 1 9 - 8 capture. Introducing similarity at this point allows to characterize the grade of coincidence by common location as well as the grade of distance by distinct Ž . location Tversky, 1977 . As both aspects of similar- ity are not redundant, it is necessary to characterize them in a more precise way. Thus, we talk of Ž . Ž similarity common location and dissimilarity dis- . tinct location . Obviously, the concept of similarity is broader than a pure distance measure. Further- more, specific similarity measures could yield selec- tive information about different aspects contributing Ž . to similarity, like grade of equality or overlap. Similarity measures are empirical measures. Therefore, measures found in literature often seem to be chosen at random. The choice is based on proper- ties of the specific measure without comparison to other possible location-based similarity measures. I will show a synopsis of all choices with significant difference of behaviour, and I will characterize the different measures for their specific behaviour. 1.2. Focus This paper presents a systematic investigation of similarity measures between two discrete regions Ž . from different data sets Fig. 1 . The only aspect considered is the location of regions, which is a function of coordinates in a given geometry. I ex- clude all thematic attributes of regions as well as relations between objects, both relating to their own problems and literature. Finally, I will not treat the matching problem of two regions from different data sources. It will be shown that the number of location properties to be compared is finite. A complete list of possible combinations will be presented and dis- cussed. Other similarity measures can only be given Ž with higher orders of normalization e.g., L -norm p Fig. 1. Given two regions, A and B, from two independent data sets: to what extent are they similar? . with p 1 . It can be expected that such measures cannot generate new information because the combi- natorial complexity of possible properties is ex- hausted with the measures given here. Giving the preconditions that a measure should be symmetric, normalized, and free of dimension, area ratios will be set up. Only some of all possible ratios fulfill these preconditions. These ratios are useful similarity measures. Hence, their behaviour and se- mantical interpretations will be discussed. Also other conditions will be investigated, especially reflexivity and the triangle equation. It will be shown that they may not be postulated for similarity measures. Different measures characterize different proper- ties or interrelations between position and size of two regions. None of the measures can be a measure of overall similarity. Consequently, at least two of the listed measures are necessary to describe similar- ity as well as dissimilarity. In literature, either one or Ž . the other pair of these measures is used. Typically, it is a practical approach which leads to the choice of measures without reference to alternatives. It will be shown at which point and to what extent alternative measures exist. 1.3. Structure This article starts by investigating similarity as a concept and introduces location as a reference frame Ž . for the similarity of regions Section 2 . Then loca- tion-based measures based on intersection sets will Ž . be introduced Section 3 . The sizes of the intersec- tion sets are normalized by setting them into ratios. These ratios will be investigated and discussed in Section 4. In simple test situations, the behaviour of Ž . these measures will be demonstrated Section 5 . Ž . Finally, the conclusion Section 6 will discuss this approach and its results in a wider context.

2. Similarity and location