Minimum Euclidian Distance Discriminant

6.2.1 Minimum Euclidian Distance Discriminant

The minimum Euclidian distance discriminant classifies cases according to their distance to class prototypes, represented by vectors m k . Usually, these prototypes are class means. We consider the distance taken in the “natural” Euclidian sense. For any d-dimensional feature vector x and any number of classes, ω k (k = 1, …, c), represented by their prototypes m k , the square of the Euclidian distance between the feature vector x and a prototype m

is expressed as follows:

d k (x ) = ∑ ( x i − m ik ) .

This can be written compactly in vector form, using the vector dot product:

k ( x ) = ( x − m k )( ’ x − m k ) = x ’ x − m k ’ x − x ’ m k + m k ’ m k . 6.5

Grouping together the terms dependent on m k , we obtain:

d 2 k ( x ) = − 2 ( m k ’ x − 0 . 5 m k ’ m k ) + x ’ x . 6.6a

We choose class ω 2

k , therefore the m k , which minimises d k ( x ) . Let us assume

c = 2. The decision boundary between the two classes corresponds to:

d 1 2 ( x ) = d 2 2 ( x ) . 6.6b

Thus, using 6.6a, one obtains:

( m 1 − m 2 ) ’ [ x − 0 . 5 ( m 1 + m 2 ) ] = 0 . 6.6c

6 Statistical Classification

Equation 6.6c, linear in x, represents a hyperplane perpendicular to (m 1 –m 2 )’ and passing through the point 0.5(m 1 +m 2 )’ halfway between the means, as illustrated in Figure 6.1 for d = 2 (the hyperplane is then a straight line). For c classes, the minimum distance discriminant is piecewise linear, composed of segments of hyperplanes, as illustrated in Figure 6.3 with an example of a

decision region for class ω 1 in a situation of c = 4.

Figure 6.3. Decision region for ω 1 (hatched area) showing linear discriminants relative to three other classes.

Example 6.1

Q: Consider the Cork Stoppers’ dataset (see Appendix E). Design and evaluate a minimum Euclidian distance classifier for classes 1 ( ω 1 ) and 2 ( ω ), 2

using only feature N (number of defects).

A: In this case, a feature vector with only one element represents each case: x = [N]. Let us first inspect the case distributions in the feature space (d = 1) represented by the histograms of Figure 6.4. The distributions have a similar shape

with some amount of overlap. The sample means are m 1 = 55.3 for ω 1 and m 2 =

79.7 for ω 2 . Using equation 6.6c, the linear discriminant is the point at half distance from the means, i.e., the classification rule is:

If x < ( m 1 + m 2 ) / 2 = 67 . 5 then x ∈ ω 1 else x ∈ ω 2 . 6.7

The separating “hyperplane” is simply point 68 2 . Note that in the equality case (x = 68), the class assignment is arbitrary.

The classifier performance evaluated in the whole dataset can be computed by counting the wrongly classified cases, i.e., falling into the wrong decision region (a half-line in this case). This amounts to 23% of the cases.

We assume an underlying real domain for the ordinal feature N. Conversion to an ordinal is performed when needed.

6.2 Linear Discriminants 227

Figure 6.4. Feature N histograms obtained with STATISTICA for the first two classes of the cork-stopper data.

Figure 6.5. Scatter diagram, obtained with STATISTICA, for two classes of cork stoppers (features N, PRT10) with the linear discriminant (solid line) at half distance from the means (solid marks).

Example 6.2

Q: Redo the previous example, using one more feature: PRT10 = PRT/10.

A: The feature vector is:

6 Statistical Classification

 or x = [ N PRT10 ] ’ . 6.8

 PRT 10 

In this two-dimensional feature space, the minimum Euclidian distance classifier is implemented as follows (see Figure 6.5):

1. Draw the straight line (decision surface) equidistant from the sample means, i.e., perpendicular to the segment linking the means and passing at half distance.

2. Any case above the straight line is assigned to ω 2 . Any sample below is

assigned to ω 1 . The assignment is arbitrary if the case falls on the straight- line boundary.

Note that using PRT10 instead of PRT in the scatter plot of Figure 6.5 eases the comparison of feature contribution, since the feature ranges are practically the same.

Counting the number of wrongly classified cases, we notice that the overall error falls to 18%. The addition of PRT10 to the classifier seems beneficial.