Appendix G Formal Models for Artificial Intelligence Methods: Mathematical Similarity Measures for Pattern Recognition

Appendix G Formal Models for Artificial Intelligence Methods: Mathematical Similarity Measures for Pattern Recognition

As we discussed in Chap. 10 , the idea of similarity of two objects (phenomena) is a fundamental one in the area of pattern recognition and cluster analysis. Firstly we introduce mathematical foundations for defining similarity measures, then we survey the most popular measures.

G.1 Metric and Topological Spaces

Let us introduce basic notions concerning metric and topological spaces [ 294 ]. Definition G.1 Let X be a nonempty set. A metric on a set X is any function 21

ρ : X × X −→ R + fulfilling the following conditions.

1. ∀x ∈ X : ρ(x, y) = 0 iff x = y.

2. ∀x, y ∈ X : ρ(x, y) = ρ(y, x).

3. ∀x, y, z ∈ X : ρ(x, z) ≤ ρ(x, y) + ρ(y, z). Definition G.2 If ρ is a metric on a set X, then a pair (X, ρ) is called a metric space. Elements of a metric space (X, ρ) are called points. For any x, y ∈ X, a value ρ(x, y)

is called a distance between points x and y. Definition G.3 Let (X, ρ) be a metric space. A ball (open ball) of a radius r > 0

and centered at a point a ∈ X is a set:

K(a, r) = {x ∈ X : ρ(a, x) < r}.

Definition G.4 Let (X, ρ) be a metric space. A set U ⊂ X is called an open set iff every point of a set U is included in a set U together with some ball centered at this point, i.e.

∀ x ∈ U ∃ r > 0 : K(x, r) ⊂ U.

21 R + = [0, +∞). © Springer International Publishing Switzerland 2016

285 M. Flasi´nski, Introduction to Artificial Intelligence, DOI 10.1007/978-3-319-40022-8

286 Appendix G: Formal Models for Artificial Intelligence Methods … Definition G.5 Let X be a nonempty set, T be a family of subsets of X. The family

T is called a topology for X, if it fulfills the following conditions. • ∅, X ∈ T .

• A finite intersection of elements of T is the element of T . • An arbitrary union of elements of T is the element of T .

Definition G.6 If T is a topology for a set X, then a pair (X, T ) is called a topological space . The members of T are called open sets in (X, T ).

G.2 Metrics Used in Pattern Recognition

In pattern recognition and cluster analysis selection of an adequate metric is essential for the effectiveness of the method constructed. Now we present the most popular metrics in this area.

Definition G.7 The Minkowski metric ρ p is given by the formula:

For cases p = 2 and p = 1 of the Minkowski metric, the following metrics are defined.

Definition G.8 The Euclidean metric ρ 2 is given by the formula:

Definition G.9 The Manhattan metric ρ 1 is given by the formula:

If p → ∞, then the following metric is received. Definition G.10 The Chebyshev metric ρ ∞ is given by the formula:

ρ ∞ ( x, y) = max {|x i −y i |} .

1≤j≤n

In order to illustrate the differences among metrics (Euclidean, Manhattan, Cheby- shev), balls of radius 1 centered at a point having coordinates (0, 0) (unit balls) are shown in Fig. G.1 .

Appendix G: Formal Models for Artificial Intelligence Methods … 287

(a) X 2 (b)

X 2 (c)

Fig. G.1 Unit balls constructed with various metrics: a Euclidean, b Manhattan, c Chebyshev

The metrics introduced above are used primarily for the recognition of patterns, which are represented by vectors of continuous features. If patterns are represented with binary feature vectors or with structural/syntactic descriptions, then metrics of

a different nature are applied. Let us present such metrics. In computer science and artificial intelligence the Hamming metric [ 124 ] plays an important role. For example, we introduced this metric discussing Hamming neural

be a set of terminal symbols (alphabet). 22 Definition G.11 Let there be given two strings of characters (symbols): x =

networks in Chap. 11 T

i }. A distance ρ H between strings x and y in the sense of the Hamming metric equals ρ H ( x, y) = |H|, where |H| is the number of elements of a set H.

x 1 x 2 ... x n , y =y 1 y 2 ... y n T ∗ . Let H = {x i , i = 1, . . . , n : x i

In other words, the Hamming metric defines on how many positions two strings differ one from another.

The Levenshtein metrics [ 104 , 179 ] are generalizations of the Hamming metric. Let us introduce them.

Definition G.12 ∗ T .A

∗ T such that y ∈ F(x) is called a string transformation. Let us introduce the following string transformations.

1. A substitution error transformation F S :η 1 a η 2 F !−→ S η 1 b η 2 a, b T , a η 1 , η 2 T ∗ .

2. A deletion error transformation F D :η 1 a η 2 F !−→ D η 1 η 2 a T , η 1 , η 2 T ∗ .

3. An insertion error transformation F I :η 1 η 2 F !−→ I η 1 a η 2 a T , η 1 , η 2 T ∗ . Definition G.13

A distance ρ L between strings x and y in the sense of the (simple) Levenshtein metric is defined as the smallest number of string transformations F S , F D , F I required to obtain the string y from the string x.

Before we introduce generalizations of the simple Levenshtein metric, let us notice that in computer science sometimes we do not want to preserve all the properties of

22 Notions of formal language theory are introduced in Appendix E .

288 Appendix G: Formal Models for Artificial Intelligence Methods …

a metric formulated in Definition G.1 . Therefore, we use some modified versions of the notion of metric.

• A pseudometric does not fulfill the first condition of Definition G.1 . Instead, the following condition holds: ∀x ∈ X : ρ(x, x) = 0, but it is possible that ρ(x, y) = 0

• A quasimetric does not fulfill the second condition (symmetry) of Definition G.1 . • A semimetric does not fulfill the third condition (the triangle inequality) of Defi-

nition G.1 . In the following definitions we call all these modified versions, briefly, a metric

[ 104 ]. Definition G.14

∗ T . Let us ascribe weights α, β, γ to string transformations: F S , F D , F I , respectively. Let M

be a sequence of string transformations applied to obtain the string y from the string x such that we have used s M substitution error transformations, d M deletion error transformations and i M insertion error transformations.

Then, a distance ρ LW TE between strings x and y in the sense of the Levenshtein metric weighted according to a type of an error is given by the following formula:

ρ LW TE ( x, y) = min M {α · s M +β·d M +γ·i M }. Let us note that if the weight of a deletion error transformation β differs from the

weight of an insertion error transformation γ, then the Levenshtein metric weighted according to the type of an error is a quasimetric.

Definition G.15 Let S(a, b) denote the cost of a substitution error transformation described as in point 1 of Definition G.12 , S(a, a) = 0, D(a) denote the cost of a deletion error transformation described as in point 2 of Definition G.12 .

Let I(a, b) denote the cost of the insertion of a symbol b before a symbol a, i.e.

T , η 1 , η 2 T ∗ , and, additionally, let I ′ ( b) denote the cost of the insertion of a symbol b at the end

Dokumen yang terkait

Hubungan pH dan Viskositas Saliva terhadap Indeks DMF-T pada Siswa-siswi Sekolah Dasar Baletbaru I dan Baletbaru II Sukowono Jember (Relationship between Salivary pH and Viscosity to DMF-T Index of Pupils in Baletbaru I and Baletbaru II Elementary School)

0 46 5

Institutional Change and its Effect to Performance of Water Usage Assocition in Irrigation Water Managements

0 21 7

The Effectiveness of Computer-Assisted Language Learning in Teaching Past Tense to the Tenth Grade Students of SMAN 5 Tangerang Selatan

4 116 138

the Effectiveness of songs to increase students' vocabuloary at second grade students' of SMP Al Huda JAkarta

3 29 100

The effectiveness of classroom debate to improve students' speaking skilll (a quasi-experimental study at the elevent year student of SMAN 3 south Tangerang)

1 33 122

Kerjasama ASEAN-China melalui ASEAN-China cooperative response to dangerous drugs (ACCORD) dalam menanggulangi perdagangan di Segitiga Emas

2 36 164

The Effect of 95% Ethanol Extract of Javanese Long Pepper (Piper retrofractum Vahl.) to Total Cholesterol and Triglyceride Levels in Male Sprague Dawley Rats (Rattus novergicus) Administrated by High Fat Diet

2 21 50

Factors Related to Somatosensory Amplification of Patients with Epigas- tric Pain

0 0 15

The Concept and Value of the Teaching of Karma Yoga According to the Bhagavadgita Book

0 0 9

Pemanfaatan Permainan Tradisional sebagai Media Pembelajaran Anak Usia Dini untuk Mengembangkan Aspek Moral dan Bahasa Anak Utilization of Traditional Games as Media Learning Early Childhood to Develop Aspects of Moral and Language Children Irfan Haris

0 0 11