Decision Regions and Functions
6.1 Decision Regions and Functions
Consider a data sample constituted by n cases, depending on d features. The central idea in statistical classification is to use the data sample, represented by vectors in
an ℜ d feature space, in order to derive a decision rule that partitions the feature space into regions assigned to the classification classes. These regions are called decision regions. If a feature vector falls into a certain decision region, the associated case is assigned to the corresponding class.
Let us assume two classes, ω 1 and ω 2 , of cases described by two-dimensional feature vectors (coordinates x 1 and x 2 ) as shown in Figure 6.1. The features are random variables, X 1 and X 2 , respectively.
Each case is represented by a vector = [ x
1 x 2 ] ’ ∈ ℜ . In Figure 6.1, we
used o to denote class “” ω 1 cases and “” × to denote class ω 2 cases. In general, the
cases of each class will be characterised by random distributions of the corresponding feature vectors, as illustrated in Figure 6.1, where the ellipses represent equal-probability density curves that enclose most of the cases.
Figure 6.1 also shows a straight line separating the two classes. We can easily
write the equation of the straight line in terms of the features X 1 , X 2 using coefficients or weights w 1 , w 2 and a bias term w 0 as shown in equation 6.1. The
weights determine the slope of the straight line; the bias determines the straight line intersect with the axes.
d X 1 , X 2 ( x ) ≡ d ( x ) = w 1 x 1 + w 2 x 2 + w 0 = 0 . 6.1
Equation 6.1 also allows interpretation of the straight line as the root set of a linear function d(x). We say that d(x) is a linear decision function that divides
6 Statistical Classification (categorises) ℜ 2 into two decision regions: the upper half plane corresponding to
d(x) > 0 where feature vectors are assigned to ω 1 ; the lower half plane corresponding to d(x) < 0 where feature vectors are assigned to ω 2 . The
classification is arbitrary for d(x) = 0.
x x xx
Figure 6.1. Two classes of cases described by two-dimensional feature vectors
(random variables X 1 and X 2 ). The black dots are class means.
The generalisation of the linear decision function for a d-dimensional feature space in d ℜ is straightforward:
d ( x ) = w ’ x + w 0 , 6.2
where w x represents the dot product 1 ’ of the weight vector and the d-dimensional feature vector.
The root set of d(x) = 0, the decision surface, or discriminant, is now a linear d-dimensional surface called a linear discriminant or hyperplane. Besides the simple linear discriminants, one can also consider using more complex decision functions. For instance, Figure 6.2 illustrates an example of two-dimensional classes separated by a decision boundary obtained with a quadratic decision function:
d ( x ) = w 5 x 2 1 + w 4 x 2 2 + w 3 x 1 x 2 + w 2 x 2 + w 1 x 1 + w 0 . 6.3
Linear decision functions are quite popular, as they are easier to compute and have simpler statistical analysis. For this reason in the following we will only deal with linear discriminants.
The dot product x y is obtained by adding the products of corresponding elements of the ’
two vectors x and y.
6.2 Linear Discriminants
x x x x x x x x x xx x x x x x x x o x
xx x x x x x x x ω 1
x x xx x x
Figure 6.2. Decision regions and boundary for a quadratic decision function.