Location-based measures Directory UMM :Data Elmu:jurnal:P:Photogrametry & Remotesensing:Vol55.Issue3.Sept2000:

similarity is based on the laws of Gestalt theory Ž . Metzger, 1936 ; and, in computer vision, it is re- lated to topological and geometric properties, like Ž Euler number, area, compactness Haralick and . Shapiro, 1992 . Not all of these concepts of similar- Ž . ity have a ratio scale Stevens, 1946 , i.e., not all concepts can be measured. This variety of aspects of physical, linguistic or semantic similarity requires a specification in which way two objects shall be similar. In this paper, the location of two spatial extended objects shall be Ž . investigated for similarity. Tversky 1977 postulated that similarity of objects increases with the number of common features, and decreases with the number of distinct features. He proposed a contrast model, which expresses similarity between objects A and B as a weighted difference of the measures of their common and distinct features. Similarity Ais ex- pressed as a function h of three arguments: . . . the features that are common in both A and B; . . . the features that belong to A but not to B; . . . the Ž . features that belong to B but not to A.B p. 330 : Ž . Ž . s A, B s h Al B, Ay B, B y A 1 Ž . Ž . Ž . Ž . s a h Al B y b h Ay B yg h B y A Consequently, similarity is more than an inverse of distance or difference. Difference of spatial ex- tended objects causes costs in mapping one onto the other; whereas common features of the objects repre- sent benefits, which have also to be considered in the Ž . mapping Vosselman, 1992 . For that reason, Tver- sky’s model provides the basis of the argumentation in this paper: similarity is considered as a combined measure of similar parts and dissimilar parts. With regard to location: similarity is a combination of the Ž . parts sharing a location similarity in a narrow sense Ž . and the parts that are different dissimilarity . In the following, it is generally assumed that all treated areal objects are existing and not empty. Location is represented simply by the location func- tion: if x , y f A Ž . f x , y s 2 Ž . Ž . ½ 1 if x , y g A Ž . Ž . 2 Ž . Ž . 2 with x,y g R vector representation or x,y g Z Ž . raster representation , respectively. In the following, there will be referred to R 2 only, without loss of generality. All the formulas can be applied also to Z 2 replacing integrals by sums. At this point, it makes sense to distinguish be- tween contexts of similarity of regions. As a binary relation, similarity may concern: Ž . 1 Two different objects in the real world: In this case, similarity concerns shape only with the basic Ž . assumption that two physical objects cannot exist at the same location at the same time. Ž . 2 Two different abstractions of the same object: In this case, similarity concerns the two contexts. Location-based measures can be used to specify one type of context by describing indicators. Ž . 3 Two different representations of the same object: In this case, similarity of location concerns identity, or at least part-of relations. Similarity can be used to match regions, to detect differences in data sets, e.g., changes, and so on. In special circumstances, the third case can be treated as an estimation problem of a shift between Ž . two correlated spatial signals. That is common Ž . practice in image matching Ackermann, 1984 . But all differences between the two signals, which cannot be described by the shift, violate the estimation model. Hence, they have to be small. In this article, no restriction shall be put on the shape or correlation between the two regions considered. For that reason, statistical matching techniques will not be considered further. This concept of location shows that matching of data sets is an ill-posed problem requiring empirical approaches and that similarity measures for location are, thus, empirical measures.

3. Location-based measures

In this chapter, we will derive location-based similarity measures with special attention to com- Ž pleteness. They will be based on the sizes of inter- . section sets with strong interrelation to weighted topological relations. Location of a region was defined as the space Ž Ž .. covered by this region Eq. 2 . Measures based on Ž . location count or integrate atomic elements of space; these are points in R 2 , and raster cells in Z 2 . For similarity measures, one will count atoms that are covered by both regions or atoms that are cov- ered by either one or the other region. In terms of topology, the size of intersection sets between the interior and the exterior of the considered regions comes into focus. Thus, the strong mathematical formulation of topological relations by emptinessr Ž nonemptiness of intersection sets Egenhofer and . Herring, 1990 is softened to graded, fuzzy or uncer- Ž . tain topological relations Winter, 1996 . An exam- Ž . ple is the relation equal A,B : if two regions A and B fulfill the strict relation, they share all covered space without distinct space which means they are perfectly similar. All other states of location will be treated as more or less equal corresponding to more or less similar. The number of possible combinations of such intersection sets is finite. In the following, all loca- tion-based measures will be collected. Then the ra- tios of those measures will be investigated for their use of location-based similarity measures. 3.1. Intersection sets The intersection sets between the interior and exterior will be investigated to characterize strict topological relations. Then the qualitative relations will be graded by the size of the sets. Ž Ž .. The location function Eq. 2 distinguishes two Ž Ž . . sets, the interior f x,y s 1 and the exterior Ž Ž . . f x,y s 0 of a region A. The function needs no concept of neighborhood. Therefore, open and closed sets cannot be distinguished in the functional repre- sentation. The inverse of function f, f y1 , yields the complement of A, i.e., A. Thus, for two regions, A and B, a set of four intersection sets in total can be derived. Consider Fig. 2. Region A from a data set A and region B from a data set B have an arbitrary posi- tion relative to each other; in the figure, the rectangle Fig. 2. The intersection sets between two rectangular regions, A and B, form a partition of the plane. The background is assumed to be unlimited. A is top-left of the rectangle B, and both are over- lapping. Their intersection sets form a partition of the planar space with — generally, at most — four sets: A l B, A l B, A l B, A l B. 3 Ž . All other sets are unions of those intersection sets. For example: A s A l B j A l B Ž . Ž . B s A l B j A l B 4 Ž . Ž . Ž . A j B s A l B j A l B j A l B Ž . Ž . Ž . First of all, the size m of the sets is interesting. An elementary operation size-of is introduced here, in short, written in mathematical notation by P . This operation can be defined on the location func- Ž . Ž . 2 tion for A, f x,y , and B, g x,y , in R as: Ž . Ž . m s Al B s f x , y g x , y d x d y HH 1 x , y y1 Ž . Ž . m s Al B s f x , y g x , y d x d y HH 2 x , y y1 Ž . Ž . m s Al B s f x , y g x , y d x d y HH 3 x , y y1 y1 Ž . Ž . m s Al B s f x , y g x , y d x d y HH 4 x , y 5 Ž . With unlimited functions, f and g, in R 2 , the size m is always `. Hence, no information is contributed 4 by m . Therefore, m can be excluded from further 4 4 consideration. Once the sizes m are known, they can be mapped i Žœ. to binary measures m with values 0 and 1 i Ž œ. 4 0 , for i g 1 . . . 4 : 1 if m 0 i m ™m s 6 Ž . i i ½ if m s 0 i For the binary measures the following dependen- cies exist: Ž . 1 m is never 0 for finite A and B. It con- 4 tributes no qualitative information. In consequence, a situation between any two regions A and B can be described qualitatively by combinations of the triple 4 3 m , m , m . That yields 2 s 8 theoretically possi- 1 2 3 ble combinations. Ž . Ž . Ž . 2 There is no pair of m , m and m , m that 1 2 1 3 Žœ œ. can be 0 , 0 based on the presumption that neither A nor B is empty. With A s m q m and B s m 1 2 1 q m at least one term in each sum must be 0 3 Žwith the property of partitions to be pairwise dis- joint, the size of a union of intersection sets can be . written as a sum . That dependency excludes three of 4 4 4 the eight triples of m : 0, 0, 0 , 0, 1, 0 , 0, 0, 1 are i impossible. The remaining five triples correspond to the fol- lowing separable topological relations: 4 Ž . 1. 0, 1, 1 disjunctrtouching : A and B have no part in common; 4 Ž . 2. 1, 1, 1 overlap : A and B have parts in com- mon and parts not in common; 4 Ž . 3. 1, 0, 0 equal : all parts of A are parts of B and vice versa; 4 Ž . 4. 1, 1, 0 containsrcovers : all parts of B are part of A, and A has additional parts; 4 Ž . 5. 1, 0, 1 contained byrcovered by : all parts of A are part of B, and B has additional parts. Using intersection sets seems to be similar to the Ž . work of Egenhofer and Franzosa 1991 who deter- mined the topological relation between A and B by intersection sets first. They investigated the intersec- tion sets of interiors and boundaries with the restric- tion to simple regions. They could separate eight Ž . families of topological relations. Our classification works also for complex regions, which are multiply connected regions or regions with many components. Indeed, both approaches require regular closed re- gions. The proposed topological relations represent a subset of the Egenhofer relations; Fig. 3 is a general- ization of his conceptual neighborhood graph Ž . Egenhofer and Al-Taha, 1992 . The topological relation can serve as similarity Ž . measure on an ordinal scale Egenhofer, 1997 : equal is the highest degree of similarity, and each of the Fig. 3. The topological relations representable by the two-dimen- sional intersection sets, related by conceptual neighborhood. direct neighbored relations in the graph of Fig. 3 guarantees higher similarity than the relation dis- junct that is farthest from equal, with a distance of two graph edges. The next step has to be the quanti- tative description of similarity. 3.2. Combinations of intersection sets Ž . The size measures m of Eq. 5 will be investi- i gated numerically for all possible combinations of intersection sets. Ž . With partition into at most four intersection sets, in principle 16 combinations of intersection sets are possible. The number follows from the sequence of n binomial coefficients , with n s 4, the number ž k 4 of intersection sets, and k g 0, . . . , 4 , the number of combined elementary sets. Concerning the size of these combinations, all those combinations contain- ing m as a term of the sum will be constantly `. 4 With excluding m and the limitation to the triple 4 4 m , m , m , the number of relevant unions of 1 2 3 intersection sets decreases to the sequence of bino- 4 mial coefficients with n s 3 and k g 0, . . . ,3 . Their sizes are: œ k s 0: excluded for similar reasons than m Ž . 4 k s 1: m , m , m the elementary set Ž . 1 2 3 7 Ž . k s 2: m q m , m q m , m q m all 2-tuples Ž . 1 2 1 3 2 3 k s 3: m q m q m the single 3-tuple Ž . 1 2 3 The domain of values for the size of an arbitrary 2 Ž . w x set X in R is dom X s 0, ` . The case k s 0 is trivial: œ w x dom 0 s 0 . 8 Ž . Ž . Regions A and B shall be limited to finite sets which may not be empty. Then holds: 0 - A , B - ` . It follows for the sizes of three considered inter- Ž . section sets k s 1 : dom m :0 - A l B F min A , B 9 Ž . Ž . 1 dom m :0 - A l B F B 2 dom m :0 - A l B F A 3 Ž Ž .. For k s 2, it follows cf. Eq. 4 : m q m s A l B q A l B s A 10 Ž . 1 2 m q m s A l B q A l B s B 1 3 m q m s A l B q A l B 2 3 with the domains: w x dom m q m s A Ž . 1 2 w x dom m q m s B Ž . 11 Ž . 1 3 w x dom m q m s 0, A q B Ž . 2 3 Ž Ž .. Finally, for k s 3, it follows cf. Eq. 4 : m q m q m s A l B q A l B q A l B 1 2 3 s A j B 12 Ž . with the domain: dom m q m q m s max A , B , A q B . Ž . Ž . 1 2 3 13 Ž . In the following, the union sets will be used as short forms for the combination of elementary inter- section sets: m s m q m q m . 8 1 2 3 3.3. Other size measures Obviously other set size measures exist. Some of them already occur in the domain limitations of the Ž Ž . Ž . Ž .. location-based measures Eqs. 9 , 11 and 13 . In the linear form, they are: m s min A , B , m s max A , B , Ž . Ž . 5 6 m s A q B . 14 Ž . 7 These three measures are dependent by: min A , B q max A , B s A q B . 15 Ž . Ž . Ž . All three of these measures are independent from the relative location of two regions, and for that reason, they are not considered as candidates for location-based similarity measures. But they are needed for normalization of the location-based mea- sures as the consideration of domains has shown. Besides, these measures are symmetric. There can also be set up nonlinear measures. Partly, they are of higher dimensions — e.g., A P B — which disqualifies for normalization. The other part consists of norms of higher dimension measures. A prototype is the L -norm, e.g., p s 2 produces p Ž . A P B Molenaar and Cheng, 1998 . Such norms use necessarily the same combina- tions of sets as the already given measures; they contribute no new information. Therefore, the given set of size measures is sufficiently complete.

4. Location-based similarity measures