Location-based similarity measures Directory UMM :Data Elmu:jurnal:P:Photogrametry & Remotesensing:Vol55.Issue3.Sept2000:

The domain of values for the size of an arbitrary 2 Ž . w x set X in R is dom X s 0, ` . The case k s 0 is trivial: œ w x dom 0 s 0 . 8 Ž . Ž . Regions A and B shall be limited to finite sets which may not be empty. Then holds: 0 - A , B - ` . It follows for the sizes of three considered inter- Ž . section sets k s 1 : dom m :0 - A l B F min A , B 9 Ž . Ž . 1 dom m :0 - A l B F B 2 dom m :0 - A l B F A 3 Ž Ž .. For k s 2, it follows cf. Eq. 4 : m q m s A l B q A l B s A 10 Ž . 1 2 m q m s A l B q A l B s B 1 3 m q m s A l B q A l B 2 3 with the domains: w x dom m q m s A Ž . 1 2 w x dom m q m s B Ž . 11 Ž . 1 3 w x dom m q m s 0, A q B Ž . 2 3 Ž Ž .. Finally, for k s 3, it follows cf. Eq. 4 : m q m q m s A l B q A l B q A l B 1 2 3 s A j B 12 Ž . with the domain: dom m q m q m s max A , B , A q B . Ž . Ž . 1 2 3 13 Ž . In the following, the union sets will be used as short forms for the combination of elementary inter- section sets: m s m q m q m . 8 1 2 3 3.3. Other size measures Obviously other set size measures exist. Some of them already occur in the domain limitations of the Ž Ž . Ž . Ž .. location-based measures Eqs. 9 , 11 and 13 . In the linear form, they are: m s min A , B , m s max A , B , Ž . Ž . 5 6 m s A q B . 14 Ž . 7 These three measures are dependent by: min A , B q max A , B s A q B . 15 Ž . Ž . Ž . All three of these measures are independent from the relative location of two regions, and for that reason, they are not considered as candidates for location-based similarity measures. But they are needed for normalization of the location-based mea- sures as the consideration of domains has shown. Besides, these measures are symmetric. There can also be set up nonlinear measures. Partly, they are of higher dimensions — e.g., A P B — which disqualifies for normalization. The other part consists of norms of higher dimension measures. A prototype is the L -norm, e.g., p s 2 produces p Ž . A P B Molenaar and Cheng, 1998 . Such norms use necessarily the same combina- tions of sets as the already given measures; they contribute no new information. Therefore, the given set of size measures is sufficiently complete.

4. Location-based similarity measures

In this section, we will derive location-based sim- ilarity measures with special attention on complete- ness. The size measures of Section 3 are used and coupled with three criteria for similarity measures: symmetry, normalization and freedom of dimension. It will be possible to set up lists of such measures and to describe their properties. 4.1. Criteria for similarity measures Three criteria will be established to specify simi- larity measures. With these criteria, it will be possi- ble to derive such measures from the size measures. As already discussed similarity is an empirical con- cept which is not identical to distance. Nevertheless, special conditions for distances will be investigated too. The considered criteria for similarity measures are: Ž . 1 Symmetry: Without explicit reasons from prior knowledge, the situation between A and B is sym- metric; no region is preferred as a reference or a prototype. In such neutral situations, a measure must be independent from the order of the considered regions A and B: similar A, B s similar B, A . 16 Ž . Ž . Ž . Ž . 2 Domain limitation: It is useful to have normal- ized measures. This property eases interpretation and comparison of measures: 0 F similar A, B F 1. 17 Ž . Ž . For this reason, suited ratios of size measures are introduced as similarity measures. Ž . 3 Freedom of dimension: Similarity measures shall be free of dimension because similarity is no physical concept or property. That can be reached by building ratios of measures with the same dimension. 4.2. Symmetry First, we consider symmetry in the size measures. The case k s 0 is meaningless in the context of similarity. From all other tuples only a few are Ž symmetric taking advantage from abbreviations by . unions : œ Ž . k s 0:0 excluded above Ž . k s1:m based on Al B 1 18 Ž . Ž Ž . Ž .. k s 2:m q m based on Al B j Al B 2 3 Ž . k s 3:m based on Aj B 8 In the following, it is sufficient to investigate this reduced set of size measures as the only symmetric ones. They have to be normalized now. 4.3. Normalization to dimension-less ratios Ž . The symmetric size measures of Eq. 18 will be normalized. For that purpose, the domains of size Ž Ž . Ž . Ž .. measures are used Eqs. 9 , 11 and 13 . Normal- ization must not destroy the symmetry property; for that reason, the norm factors must be symmetric too. Further, norm factors may never take the value 0. This argument excludes the measures m s A l B 1 Ž . Ž . Ž . and m q m s A l B j A l B from the 2 3 list of possible norm factors. To keep the third criterion, only the linear size measures are consid- ered as norm factors. The remaining candidates for norm factors are the measures: m , m , m , m 19 Ž . 8 5 6 7 Table shows the matrix of all 4 = 3 ratios. Not all 4 of the ratios are normalized to 0 . . . 1 . The ratios will be discussed individually in the next section. 4.4. Similarity measures In this section, the ratios of Table 1 will be investigated. Normal ratios are considered as similar- ity measure, or as dissimilarity measure, if their behaviour meets, the following idea of location-based similarity. The meaning of a location-based similar- ity measure shall be that of a fuzzy membership Ž . value Zadeh, 1965 to a topological relation; a value of 1 refers to total correspondence with the discrete relation and a value of 0 refers to total disagreement with the discrete relation. Such a relation can be equal. There will be also similarity measures regis- tering correspondence to any containment. The meaning of the individual measures will be discussed with regard to the actual topological relation referred to. The ratios in detail are as follows. w x s : Domain of values is 0, 1 . 0 stands for totally 11 Ž œ . disjoint regions A l B s 0 , and 1 stands for identi- Ž . cal regions A l B s A j B . This ratio is a proto- typical example of a location-based similarity mea- sure increasing with the grade of similarity up to equality. w x s : Domain of values is 0, 1 . 0 occurs only if 12 A s B, and 1 occurs if A and B are totally disjoint. With this behaviour, the ratio complements s , 11 Table 1 Combination of all possible ratios of size measures Denominator Numerator m s AlB m q m m s AjB 1 2 3 8 s AlB q AlB m s AjB s s s 8 11 12 13 Ž . m s min A , B s s s 5 21 22 23 Ž . m s max A , B s s s 6 31 32 33 m s A q B s s s 7 41 42 43 Not all of the ratios are normal. which is corresponding to the complementing sets in the numerators with regard to the denominator. This ratio is a typical dissimilarity measure decreasing with the grade of similarity. w x s : Domain of values is 1 , trivially. 13 w x s : Domain of values is 0, 1 . 0 stands for totally 21 disjoint regions, and 1 stands for complete coverage, Ž . containment, or identity : . The ratio does not recognize the proportion in size between A and B and, therefore, it is not suited as a similarity mea- sure. Nevertheless, this ratio could be used as a Ž . measure for the grade of symmetric overlap. w . s : Domain of values is 0, ` . Again, 0 occurs 22 only if A s B. But the denominator is not sufficient to normalize the numerator. That property excludes this ratio from the list of similarity measures. Addi- tionally, values different from 0 are difficult to inter- pret, because numerator and denominator are not correlated. w . s : Domain of values is 1, ` . 1 occurs if 23 A s B, and the ratio increases in all other cases. Without being normalized, this ratio is excluded from the list of similarity measures. w x s : Domain of values is 0, 1 . 0 occurs if both 31 regions are disjoint, and 1 occurs only if A s B in contrast to s . With its sensitivity for proportions 21 between A and B, this ratio is a suited similarity measure. w x s : Domain of values is 0, 2 . 0 occurs if A s B, 32 and 2 occurs if A is disjoint from B and A s B . As long as one region is coveredrcontained in the other region, the value of the ratio is limited by an upper bound of 1. As long as both regions are disjoint, the value of the ratio is limited by a lower bound of 1. In any case of overlap, no prediction can be made. This ratio could be normalized by division Ž by 2; then it represents a dissimilarity measure de- . creasing with growing similarity . w x s : Domain of values is 1, 2 . The value 1 stands 33 for all cases of coveragercontainment or identity. The value 2 occurs for disjoint regions, if A s B . Neither domain nor the behaviour recommends this ratio as similarity measure. w x s : Domain of values is 0, 1r2 . 0 stands for 41 disjoint regions, and 1r2 stands for A s B. If we Ž . would normalize the ratio by multiplication with 2 , the result would be a mean size of A and B as Ž Ž .. denominator cf. Eq. 15 . Then the behaviour of Ž . the normalized ratio s would be in between of 41 s and s . This yields no new information. 31 21 w x s : Domain of values is 0, 1 . 0 occurs if A s B, 42 and 1 occurs if A and B are disjoint. Again, this is a mean ratio of s and s , but it fulfills the condi- 22 32 Ž . tions of a dis- similarity measure. w x s : Domain of values is 1r2, 1 . The lower 43 bound occurs if A s B. 1 occurs in all cases of disjoint regions, but is reached also in all other topologic relations, if A and B are different in the order of magnitude. This ratio represents an extraor- dinary dissimilarity measure. In summary, given all possible ratios of size measures the following are similarity measures: 4 similarity measures: s , s , s 2 . 20 Ž . 11 31 41 Another list contains dissimilarity measures: dissimilarity measures: s 1 32 s , , s , s y 2 . 21 Ž . 12 42 43 ½ 5 ž 2 2 Both lists are complete regarding the given crite- ria. 4.5. Combination of similarity measures In this section, different combinations of similar- ity measures will be discussed. Evidence will be given that both lists from above are needed, which will be supported by some examples of recent appli- cations. Ž Ž .. With Tversky’s contrast model in mind Eq. 1 , our lists of similarity and dissimilarity become more transparent. All similarity measures are based on the numerator A l B , which represents the common features between A and B. All dissimilarity mea- sures, with one exception, are based on the numera- tor A l B q A l B , which represents the dis- tinctive features of A and B. The exception, s , 43 treats topological relations combined with orders of magnitude mixing different kinds of features, metric and topologic ones. These considerations lead to the expectation that in practical applications one mea- sure from each list is required to assess similarity completely. Ž Consider the following example Harvey et al., . 1998 : To evaluate a match of two regions, two measures are introduced: an inclusion function, which is in fact identical to s and yields the grade of 21 Ž overlap instead of similarity nevertheless: the com- . mon features , and a surface distance, which is iden- Ž tical to s and which measures dissimilarity dis- 12 . tinctive features . Thus, the hypothesis is supported that two measures are needed. The question remains interesting whether other pairs of measures would have been also useful. The authors do not discuss their choice. Another example is mentioned in Ragia Ž . and Winter 1998 : The authors match two buildings from two data sets with special requirements regard- ing the aggregation levels of the data sets. Part of relations are accepted as a match. Similarity is re- placed by weighted topological relations, e.g., by s 21 and s . With this choice, only common features are 31 considered, but not the distinctive. Similarity of regions has to be handled in a different way to similarity of lower dimensional Ž . entities. Recently, Walter 1997 matched lines and points of street networks. He works only with dis- Ž . tance measures costs neglecting the weight of com- mon features. That is justified for one-dimensional data sets because the probability is very small that Ž two lines coincide by chance the probability for two . points is even zero . Similarity of spatial relations cannot be treated by Ž sizes of sets the single exception are topological . Ž . relations . For example, Bruns and Egenhofer 1996 Ž . and Egenhofer 1997 are investigating spatial scenes. Though they involve metric refinements of topologi- Ž Ž .. cal relations cf. Eq. 5 , they need an additional concept of similarity for other spatial relations. They also work with distance measures, which they derive from conceptual neighborhood graphs. Metric properties would require additional condi- tions, especially reflexivity and the triangle equation. Now it will be investigated how far the location-based similarity measures follow such rules. A symmetric, normalized similarity measure al- lows to introduce its inverse: dissimilar A, B s 1 y similar A, B 22 Ž . Ž . Ž . The inverse topological relation is always dis- junct, which will be supported by the interpretation of the similarity and dissimilarity measures in Sec- tion 4.4. The found measures will not complement each other; therefore, the formal introduction of an inverse is useful. Reflexivity: similar A, A s 0, similar A, A s 1. 23 Ž . Ž . Ž . If B is assigned to A, the first rule is fulfilled by all three similarity measures, with m s 0 for 1 disjunct regions. If B is assigned to A, the second rule is fulfilled by all three similarity measures. Reflexivity put on dissimilarity requires an ex- change of the rules applying the inverse property Ž Ž .. Ž . Eq. 22 on Eq. 23 : dissimilar A, A s 1, dissimilar A, A s 0 Ž . Ž . 24 Ž . A triangle equation, e.g., in the form: similar A, B similar B,C F similar A,C Ž . Ž . Ž . does not hold. Multiplication is required to keep the norm, and the relation sign has to be converted for multiplication factors - 1. But neither disjunct re- gions A and C require that A and B or B and C are Ž . disjunct i.e., their similarity is 0 , nor equal regions A and C require that A and B or B and C are equal too. Location-based similarity is not metric. Ž . Ž . Ž . Fig. 4. The three test scenarios for similarity measures: left a , center b , right c . Explanation: see text. Ž . Fig. 5. The behavior of the measures in the three tests see text .

5. Testing the behaviour of the found measures