Similarity and location Directory UMM :Data Elmu:jurnal:P:Photogrametry & Remotesensing:Vol55.Issue3.Sept2000:

capture. Introducing similarity at this point allows to characterize the grade of coincidence by common location as well as the grade of distance by distinct Ž . location Tversky, 1977 . As both aspects of similar- ity are not redundant, it is necessary to characterize them in a more precise way. Thus, we talk of Ž . Ž similarity common location and dissimilarity dis- . tinct location . Obviously, the concept of similarity is broader than a pure distance measure. Further- more, specific similarity measures could yield selec- tive information about different aspects contributing Ž . to similarity, like grade of equality or overlap. Similarity measures are empirical measures. Therefore, measures found in literature often seem to be chosen at random. The choice is based on proper- ties of the specific measure without comparison to other possible location-based similarity measures. I will show a synopsis of all choices with significant difference of behaviour, and I will characterize the different measures for their specific behaviour. 1.2. Focus This paper presents a systematic investigation of similarity measures between two discrete regions Ž . from different data sets Fig. 1 . The only aspect considered is the location of regions, which is a function of coordinates in a given geometry. I ex- clude all thematic attributes of regions as well as relations between objects, both relating to their own problems and literature. Finally, I will not treat the matching problem of two regions from different data sources. It will be shown that the number of location properties to be compared is finite. A complete list of possible combinations will be presented and dis- cussed. Other similarity measures can only be given Ž with higher orders of normalization e.g., L -norm p Fig. 1. Given two regions, A and B, from two independent data sets: to what extent are they similar? . with p 1 . It can be expected that such measures cannot generate new information because the combi- natorial complexity of possible properties is ex- hausted with the measures given here. Giving the preconditions that a measure should be symmetric, normalized, and free of dimension, area ratios will be set up. Only some of all possible ratios fulfill these preconditions. These ratios are useful similarity measures. Hence, their behaviour and se- mantical interpretations will be discussed. Also other conditions will be investigated, especially reflexivity and the triangle equation. It will be shown that they may not be postulated for similarity measures. Different measures characterize different proper- ties or interrelations between position and size of two regions. None of the measures can be a measure of overall similarity. Consequently, at least two of the listed measures are necessary to describe similar- ity as well as dissimilarity. In literature, either one or Ž . the other pair of these measures is used. Typically, it is a practical approach which leads to the choice of measures without reference to alternatives. It will be shown at which point and to what extent alternative measures exist. 1.3. Structure This article starts by investigating similarity as a concept and introduces location as a reference frame Ž . for the similarity of regions Section 2 . Then loca- tion-based measures based on intersection sets will Ž . be introduced Section 3 . The sizes of the intersec- tion sets are normalized by setting them into ratios. These ratios will be investigated and discussed in Section 4. In simple test situations, the behaviour of Ž . these measures will be demonstrated Section 5 . Ž . Finally, the conclusion Section 6 will discuss this approach and its results in a wider context.

2. Similarity and location

Similarity is a concept that varies between disci- plines: in mathematics, it describes a type of trans- Ž . formation Edgar, 1990 ; in statistics, it means that Ž . two similar signals are correlated Jahne, 1995 ; in ¨ cognition, it means that similar things belong to the Ž . same category Lakoff, 1987 ; in visual perception, similarity is based on the laws of Gestalt theory Ž . Metzger, 1936 ; and, in computer vision, it is re- lated to topological and geometric properties, like Ž Euler number, area, compactness Haralick and . Shapiro, 1992 . Not all of these concepts of similar- Ž . ity have a ratio scale Stevens, 1946 , i.e., not all concepts can be measured. This variety of aspects of physical, linguistic or semantic similarity requires a specification in which way two objects shall be similar. In this paper, the location of two spatial extended objects shall be Ž . investigated for similarity. Tversky 1977 postulated that similarity of objects increases with the number of common features, and decreases with the number of distinct features. He proposed a contrast model, which expresses similarity between objects A and B as a weighted difference of the measures of their common and distinct features. Similarity Ais ex- pressed as a function h of three arguments: . . . the features that are common in both A and B; . . . the features that belong to A but not to B; . . . the Ž . features that belong to B but not to A.B p. 330 : Ž . Ž . s A, B s h Al B, Ay B, B y A 1 Ž . Ž . Ž . Ž . s a h Al B y b h Ay B yg h B y A Consequently, similarity is more than an inverse of distance or difference. Difference of spatial ex- tended objects causes costs in mapping one onto the other; whereas common features of the objects repre- sent benefits, which have also to be considered in the Ž . mapping Vosselman, 1992 . For that reason, Tver- sky’s model provides the basis of the argumentation in this paper: similarity is considered as a combined measure of similar parts and dissimilar parts. With regard to location: similarity is a combination of the Ž . parts sharing a location similarity in a narrow sense Ž . and the parts that are different dissimilarity . In the following, it is generally assumed that all treated areal objects are existing and not empty. Location is represented simply by the location func- tion: if x , y f A Ž . f x , y s 2 Ž . Ž . ½ 1 if x , y g A Ž . Ž . 2 Ž . Ž . 2 with x,y g R vector representation or x,y g Z Ž . raster representation , respectively. In the following, there will be referred to R 2 only, without loss of generality. All the formulas can be applied also to Z 2 replacing integrals by sums. At this point, it makes sense to distinguish be- tween contexts of similarity of regions. As a binary relation, similarity may concern: Ž . 1 Two different objects in the real world: In this case, similarity concerns shape only with the basic Ž . assumption that two physical objects cannot exist at the same location at the same time. Ž . 2 Two different abstractions of the same object: In this case, similarity concerns the two contexts. Location-based measures can be used to specify one type of context by describing indicators.