Statistical Pattern Recognition

10.5 Statistical Pattern Recognition

In statistical pattern recognition (Bayesian approach), presented by Richard O. Duda 18 and Peter E. Hart in [78], the probability 19 of assigning a pattern to a class is taken into consideration. These methods are based on the Bayesian model. 20

16 Vladimir Naumovich Vapnik—a professor at the Institute of Control Sciences, Moscow from 1961 to 1990, then at AT&T Bell Labs and NEC Laboratories, Princeton. His work concerns

mainly statistics and Artificial Intelligence (Vapnik–Chervonenkis theory). 17 This is why the method is called Support Vector Machines.

18 Richard O. Duda—a professor of electrical engineering at San Jose State University. His achieve- ments concern pattern recognition. He defined the Hough transform. He is a co-author of the

excellent monograph “Pattern Classification and Scene Analysis”. 19 The basic notions of probability theory are introduced in Appendices I.1, B.1 and I.2.

20 Thomas Bayes—an eminent English mathematician and a Presbyterian minister. The “father” of statistics.

10.5 Statistical Pattern Recognition 149

Fig. 10.3 a Examples of probability density functions for two classes, b a reconstruction of a probability density function on a basis of a histogram, c an example of a decision rule for the Bayesian classifier

In order to simplify our considerations, let us assume that all patterns belong to one of two classes in a one-dimensional feature space. Let us assume that for classes ω 1 and ω 2 we know the a priori probabilities P(ω 1 ) and P(ω 2 ) . The a priori probability P(ω 1 ) gives the probability that a pattern belongs to the class ω 1 in general. 21 Then, let us assume that for this (single) feature X 1 characterizing our patterns we know probability density functions p(X 1 |ω 1 ) and p(X 1 |ω 2 ) for both classes. The probability density function p(X 1 |ω 1 ) is a function which for a given value of the variable X 1 assigns the probability of its occurrence, assuming that we say about a pattern which belongs to the class ω 1 . Let us return to our “fish example”. Now, ω 1 means the class of sprats, ω 2 means the class of anchovy, and the feature X 1 means the length of a fish. Then, p(9|sprat) = 0.42 means that the probability that a fish is 9 cm long, assuming that it is a sprat, is equal to 0.42. For example, probability

density functions for two classes are shown in Fig. 10.3 a. As we can see, values of 21 We may know, for example, that there are four times more patterns belonging to the class ω 1 than

patterns belonging to the class ω 2 in nature. Then, P(ω 1 )= 4/5 and P(ω 2 )= 1/5.

150 10 Pattern Recognition and Cluster Analysis the feature X 1 of patterns belonging to the class ω 1 belong to the interval [5, 13].

Values around 9 are the most probable (the probability more than 0.4). Values of the feature X 1 of patterns belonging to the class ω 2 belong to the interval [10, 24]. Values around 17 are the most probable. If we use the Bayesian approach, the question is how to determine the proba- bility density function for a given class? The simplest technique consists of using

a histogram, which shows the empirical distribution of a feature. This is shown in Fig. 10.3 b. 22 If we have defined a priori probabilities for classes ω 1 and ω 2 , and probability density functions for these classes, we can define a posteriori probabilities P(ω 1 | X 1 ) and P(ω 2 | X 1 ) , according to Bayes’ rule:

P(ω 1 | X 1 ) is the probability that an unknown pattern belongs to the class ω 1 , depend- ing on the value of its feature X 1 . Thus, for our example P(sprat | 7) = 0.35 is inter- preted in the following way. The probability that a fish of length 7 cm is a sprat equals

0.35. Let us notice that we can omit the denominator of formula ( 10.12 ), because it is the same for both classes. Example graphs of a posteriori probability functions for both classes depending on the feature X 1 are shown in Fig. 10.3 c. These graphs cross for the value x B . This means that for values of the feature X 1 greater than x B the probability that a pattern belongs to the class ω 2 is greater than the probability that the pattern belongs to the class ω 1 (and vice versa). As we can see, there are values of X 1 for which the probabilities of belonging to both classes are non-zero. There are also values of X 1 for which the probability of belonging to a given class is equal to zero.

We can generalize our considerations to the case of more than two classes. For classes ω 1 ,ω 2 ,...,ω C formula ( 10.12 ) is of the following form:

Now, we can formulate a rule for recognizing an unknown pattern characterized by one feature X = (X 1 ) with the help of the Bayes classifier. The classifier assigns the pattern to that class for which the a posteriori probability is the biggest. Thus, X is assigned to the class ω L , if

P(ω L | X 1 )> P(ω j | X 1 )

22 The height of the bar for an interval [a, b] should be h = p/w, where p is the number of elements of the learning set which belong to the given class and are in the interval [a, b], and w is the number

of all elements of the learning set which belong to the given class.

10.5 Statistical Pattern Recognition 151 In case we would like to recognize patterns in an n-dimensional feature space, i.e.,

X = (X 1 , X 2 ,..., X n ) , we can use the so-called naive Bayes classifier. We make the assumption that all the features are independent. Then, the probability density function for an n-dimensional feature vector is defined in the following way:

p(X|ω j )=

We can generalize the Bayes classifier even more. We can assume that erroneous decisions concerning the recognition of an unknown pattern can have various costs, i.e., they have various consequences. Then, we introduce a function of the cost (of errors). This function and the a posteriori probability are the basis for defining the risk function for the classifier. In this approach we try to minimize the risk function.

Dokumen yang terkait

Hubungan pH dan Viskositas Saliva terhadap Indeks DMF-T pada Siswa-siswi Sekolah Dasar Baletbaru I dan Baletbaru II Sukowono Jember (Relationship between Salivary pH and Viscosity to DMF-T Index of Pupils in Baletbaru I and Baletbaru II Elementary School)

0 46 5

Institutional Change and its Effect to Performance of Water Usage Assocition in Irrigation Water Managements

0 21 7

The Effectiveness of Computer-Assisted Language Learning in Teaching Past Tense to the Tenth Grade Students of SMAN 5 Tangerang Selatan

4 116 138

the Effectiveness of songs to increase students' vocabuloary at second grade students' of SMP Al Huda JAkarta

3 29 100

The effectiveness of classroom debate to improve students' speaking skilll (a quasi-experimental study at the elevent year student of SMAN 3 south Tangerang)

1 33 122

Kerjasama ASEAN-China melalui ASEAN-China cooperative response to dangerous drugs (ACCORD) dalam menanggulangi perdagangan di Segitiga Emas

2 36 164

The Effect of 95% Ethanol Extract of Javanese Long Pepper (Piper retrofractum Vahl.) to Total Cholesterol and Triglyceride Levels in Male Sprague Dawley Rats (Rattus novergicus) Administrated by High Fat Diet

2 21 50

Factors Related to Somatosensory Amplification of Patients with Epigas- tric Pain

0 0 15

The Concept and Value of the Teaching of Karma Yoga According to the Bhagavadgita Book

0 0 9

Pemanfaatan Permainan Tradisional sebagai Media Pembelajaran Anak Usia Dini untuk Mengembangkan Aspek Moral dan Bahasa Anak Utilization of Traditional Games as Media Learning Early Childhood to Develop Aspects of Moral and Language Children Irfan Haris

0 0 11