Appendix G Formal Models for Artificial Intelligence Methods: Mathematical Similarity Measures for Pattern Recognition

As we discussed in Chap. 10 , the idea of similarity of two objects (phenomena) is a fundamental one in the area of pattern recognition and cluster analysis. Firstly we introduce mathematical foundations for defining similarity measures, then we survey the most popular measures.

G.1 Metric and Topological Spaces

Let us introduce basic notions concerning metric and topological spaces [ 294 ]. Definition G.1 Let X be a nonempty set. A metric on a set X is any function 21

ρ : X × X −→ R + fulfilling the following conditions.

1. ∀x ∈ X : ρ(x, y) = 0 iff x = y.

2. ∀x, y ∈ X : ρ(x, y) = ρ(y, x).

3. ∀x, y, z ∈ X : ρ(x, z) ≤ ρ(x, y) + ρ(y, z). Definition G.2 If ρ is a metric on a set X, then a pair (X, ρ) is called a metric space. Elements of a metric space (X, ρ) are called points. For any x, y ∈ X, a value ρ(x, y)

is called a distance between points x and y. Definition G.3 Let (X, ρ) be a metric space. A ball (open ball) of a radius r > 0

and centered at a point a ∈ X is a set:

K(a, r) = {x ∈ X : ρ(a, x) < r}.

Definition G.4 Let (X, ρ) be a metric space. A set U ⊂ X is called an open set iff every point of a set U is included in a set U together with some ball centered at this point, i.e.

∀ x ∈ U ∃ r > 0 : K(x, r) ⊂ U.

21 R + = [0, +∞). © Springer International Publishing Switzerland 2016

285 M. Flasi´nski, Introduction to Artificial Intelligence, DOI 10.1007/978-3-319-40022-8

286 Appendix G: Formal Models for Artificial Intelligence Methods … Definition G.5 Let X be a nonempty set, T be a family of subsets of X. The family

T is called a topology for X, if it fulfills the following conditions. • ∅, X ∈ T .

• A finite intersection of elements of T is the element of T . • An arbitrary union of elements of T is the element of T .

Definition G.6 If T is a topology for a set X, then a pair (X, T ) is called a topological space . The members of T are called open sets in (X, T ).

G.2 Metrics Used in Pattern Recognition

In pattern recognition and cluster analysis selection of an adequate metric is essential for the effectiveness of the method constructed. Now we present the most popular metrics in this area.

Definition G.7 The Minkowski metric ρ p is given by the formula:

For cases p = 2 and p = 1 of the Minkowski metric, the following metrics are defined.

Definition G.8 The Euclidean metric ρ 2 is given by the formula:

Definition G.9 The Manhattan metric ρ 1 is given by the formula:

If p → ∞, then the following metric is received. Definition G.10 The Chebyshev metric ρ ∞ is given by the formula:

ρ ∞ ( x, y) = max {|x i −y i |} .

1≤j≤n

In order to illustrate the differences among metrics (Euclidean, Manhattan, Cheby- shev), balls of radius 1 centered at a point having coordinates (0, 0) (unit balls) are shown in Fig. G.1 .

Appendix G: Formal Models for Artificial Intelligence Methods … 287

(a) X 2 (b)

X 2 (c)

Fig. G.1 Unit balls constructed with various metrics: a Euclidean, b Manhattan, c Chebyshev

The metrics introduced above are used primarily for the recognition of patterns, which are represented by vectors of continuous features. If patterns are represented with binary feature vectors or with structural/syntactic descriptions, then metrics of

a different nature are applied. Let us present such metrics. In computer science and artificial intelligence the Hamming metric [ 124 ] plays an important role. For example, we introduced this metric discussing Hamming neural

be a set of terminal symbols (alphabet). 22 Definition G.11 Let there be given two strings of characters (symbols): x =

networks in Chap. 11 T

i }. A distance ρ H between strings x and y in the sense of the Hamming metric equals ρ H ( x, y) = |H|, where |H| is the number of elements of a set H.

x 1 x 2 ... x n , y =y 1 y 2 ... y n T ∗ . Let H = {x i , i = 1, . . . , n : x i

In other words, the Hamming metric defines on how many positions two strings differ one from another.

The Levenshtein metrics [ 104 , 179 ] are generalizations of the Hamming metric. Let us introduce them.

Definition G.12 ∗ T .A

∗ T such that y ∈ F(x) is called a string transformation. Let us introduce the following string transformations.

1. A substitution error transformation F S :η 1 a η 2 F !−→ S η 1 b η 2 a, b T , a η 1 , η 2 T ∗ .

2. A deletion error transformation F D :η 1 a η 2 F !−→ D η 1 η 2 a T , η 1 , η 2 T ∗ .

3. An insertion error transformation F I :η 1 η 2 F !−→ I η 1 a η 2 a T , η 1 , η 2 T ∗ . Definition G.13

A distance ρ L between strings x and y in the sense of the (simple) Levenshtein metric is defined as the smallest number of string transformations F S , F D , F I required to obtain the string y from the string x.

Before we introduce generalizations of the simple Levenshtein metric, let us notice that in computer science sometimes we do not want to preserve all the properties of

22 Notions of formal language theory are introduced in Appendix E .

288 Appendix G: Formal Models for Artificial Intelligence Methods …

a metric formulated in Definition G.1 . Therefore, we use some modified versions of the notion of metric.

• A pseudometric does not fulfill the first condition of Definition G.1 . Instead, the following condition holds: ∀x ∈ X : ρ(x, x) = 0, but it is possible that ρ(x, y) = 0

• A quasimetric does not fulfill the second condition (symmetry) of Definition G.1 . • A semimetric does not fulfill the third condition (the triangle inequality) of Defi-

nition G.1 . In the following definitions we call all these modified versions, briefly, a metric

[ 104 ]. Definition G.14

∗ T . Let us ascribe weights α, β, γ to string transformations: F S , F D , F I , respectively. Let M

be a sequence of string transformations applied to obtain the string y from the string x such that we have used s M substitution error transformations, d M deletion error transformations and i M insertion error transformations.

Then, a distance ρ LW TE between strings x and y in the sense of the Levenshtein metric weighted according to a type of an error is given by the following formula:

ρ LW TE ( x, y) = min M {α · s M +β·d M +γ·i M }. Let us note that if the weight of a deletion error transformation β differs from the

weight of an insertion error transformation γ, then the Levenshtein metric weighted according to the type of an error is a quasimetric.

Definition G.15 Let S(a, b) denote the cost of a substitution error transformation described as in point 1 of Definition G.12 , S(a, a) = 0, D(a) denote the cost of a deletion error transformation described as in point 2 of Definition G.12 .

Let I(a, b) denote the cost of the insertion of a symbol b before a symbol a, i.e.

T , η 1 , η 2 T ∗ , and, additionally, let I ′ ( b) denote the cost of the insertion of a symbol b at the end

Appendix G Formal Models for Artificial Intelligence Methods: Mathematical Similarity Measures for Pattern Recognition