SCENE MATCHING AND DETECTION

9.12 SCENE MATCHING AND DETECTION

A problem of much significance in image analysis is the detection of change or presence of an object in a given scene. Such problems occur in remote sensing for monitoring growth patterns of urban areas, weather prediction from satellite images, diagnosis of disease from medical images, target detection from radar images, and automation using robot vision, and the like. Change detection is also useful in alignment or spatial registration of two scenes imaged at different instants or using different sensors. For example, a large object photographed in small overlapping sections can be reconstructed by matching the overlapping parts.

Image Subtraction Changes in a dynamic scene observed as u; (m, n), i =

1, 2, . . . are given by

(9.121) Although elementary, this image-subtraction technique is quite powerful in care­

e; (m, n) = u; (m, n) - u;-1 (m, n)

fully controlled imaging situations. Figure

9.44 shows an example from digital

radiology. The images u1 and u2 represent, respectively, the X-ray images before and after injection of a radio-opaque dye in a renal study. The change, not visible in

u2, can be easily detected as renal arteries after u1 has been subtracted out. Image subtraction is also useful in motion detection based security monitoring systems,

segmentation of parts from a complex assembly, and so on. Template Matching and Area Correlation

The presence of a known object in a scene can be detected by searching for the location of match between the object template u (m, n) and the scene v (m, n). Template matching can be conducted by searching the displacement of u (m, n),

where the mismatch energy is minimum. For a displacement (p, q), we define the

mismatch energy

CJ; (p, q) � L 2: [v(m, n) - u(m - p, n - q)]2 m n

m n 2: 2: Jv(m, n)J2 + 2: 2: Ju(m, n)J2 - 2 2: 2: v(m, n)u(m -p, n - q) m n m n

400 Image Analysis and Computer Vision Chap. 9 400 Image Analysis and Computer Vision Chap. 9

c) Difference

Figure 9.44 Change detection in digital radiography.

For <T; (p, q) to achieve a minimum, it is sufficient to maximize the cross­ correlation

Cvu (p, q) � 2: 2:

m n v(m, n)u(m n

p, - q),

'V(p, q) (9. 123)

From the Cauchy-Schwarz inequality, we have

1 � � v(m, n)u(m n I

lcvul =

q)

p,

::; ��lv(m,n)l2 ��lu(m,n)l2

Sec. 9. 1 2 Scene Matching and Detection 401 Sec. 9. 1 2 Scene Matching and Detection 401

location(s)

u(m - p, n - q) Figure

9.45 Template matching by area correlation.

where the equality occurs if and only if v(m, n) = a.u(m -p, n - q), where a is an arbitrary constant and can be set equal to l. This means the cross-correlation Cvu (p, q) attains the maximum value when the displaced position of the template coincides with the observed image. Then, we obtain

(9. 125) and the desired maximum occurs when the observed image and the template are

Cvu (p, q) = m n 2: 2: lv(m, n)l2 > 0

spatially registered. Therefore, a given object u(m, n) can be located in the scene by searching the peaks of the cross correlation function (Fig. 9.45). Often the given template and the observed image are not only spatially translated but are also relatively scaled and rotated. For example,

v(m, n) = au ( , ;e )

-- 'Y2

m -p ' n - q '

'YI (9. 126)

where 'YI and -y2 are the scale factors, (p ', q ') are the displacement coordinates, and

6 is the rotation angle of the observed image with respect to the template. In such cases the cross-correlation function maxima have to be searched in the parameter space (p ', q ', 'Yi. -y2, 6). This can become quite impractical unless reasonable esti­ mates of 'Yi. -y2, and e are given.

The cross-correlation Cvu (p, q) is also called the area correlation. It can be evaluated either directly or as the inverse Fourier transform of

(9. 127) The direct computation of the area correlation is useful when the template is small.

Cvu (wi, w2) � .s'T{cvu (p, q)} = V(wi, w2)U* (wi, w2)

Otherwise, a suitable-size FFT is employed to perform the Fourier transform calcu­ lations. Template matching is particularly efficient when the data is binary. In that case, it is sufficient to search the minima of the total binary difference

'Yvu (p, q) � 2: ·2: [v(m, n) (B u (m -p, n - q)J

which requires only the simple logical exclusive-OR operations. The quantity 'Yvu (p, q) gives the number of. pixels in the image that do not match with the

template at location (p, q). This algorithm is useful in recognition of printed characters or objects characterized by known boundaries as in the inspection of printed circuit boards.

402 Image Analysis and Computer Vision Chap. 9

Matched Filtering [56-57] Suppose a deterministic object u(m, n), displaced by (m0, n0), is observed in the

presence of a surround (for example, other objects), and observations are a colored noise field TJ(m, n) with power spectral density S11 (wi. w2). The observations are

(9.129) The matched filtering problem is to find a linear filter g(m, n) that maximizes the

v(m, n) = u (m - m0, n - n0) + TJ(m, n)

output signal-to-noise ratio (SNR) Is (0, 0)1 2

SNR �

LL m n E[lg(m, n) ® TJ(m, n)l2] (9. 130)

- mo, n - no ) Here s(m, n) represents the signal content in the filtered output g(m, n) ® v(m, n).

Following Problem 9.16, the matched filter frequency response is found to be

mo + no which gives its impulse response as

(wi. w2)

g(m, n) = r� (m, n) ® u(-m - m0, -n - no)

where

g--1[s ( )]

r� (m, n) 1 �

(9.133) the matched filter output can be written as g(m, n) ® v(m, n) = u(-m - m0, -n - n0) ® v(m, n)

v(m, n) � v(m, n) ® r� (m, n)

(9.134) =LL i v(i, j)u(i - m - m0,j - n - n0)

which, according to (9. 123), is Cvu (m + m0, n + n0), the area correlation of v(m, n) with u (m + m0, n + n0). If (m0, n0) were known, then the SNR would be maximized

at (m, n) = (0, 0), as desired in (9.130) (show!). In practice these displacement values are unknown. Therefore, we compute the correlation c.u (m, n) and search for the location of maxima that gives (m0, n0). Therefore, the matched filter can be

implemented as an area correlator with a preprocessing filter (Fig. 9.46a). Recall from Section 6.7 (Eq. (6.91)) that r� (m, n) would be proportional to the impulse response of the minimum variance noncausal prediction error filter for a random

field with power spectral density S11 ( wi, w2 ). For highly correlated random fields­ for instance, the usual monochrome images-r� (m, n) represents a high-pass filter. For example, if the background has object-like power spectrum [see Section 2.11]

Sec. 9. 1 2 Scene Matching and Detection 403 Sec. 9. 1 2 Scene Matching and Detection 403

I I Min variance I I a2 -a Oi2

v (m , n ) I noncausal

u(-m, -n)

(b) Example of r; (m, n) Figure

(a) Overall filter

9.46 Matched filtering in the presence of colored noise. For white noise case r� (m,n) = 8(m, n).

STl (wi. w2) = ( (9.135)

1 - 2a COS W1 )( 1 - 2a COS Wz )

then r� (m, n) is a high-pass filter whose impulse response is shown in Fig. 9.46b (see Example 6. 10). This suggests that template matching is more effective if the edges (and other high frequencies) are matched whenever a given object has to be de­ tected in the presence of a correlated background.

If TJ(m, n) is white noise, then STl will be constant for instance, and r- (m, n) = 8(m, n). Now the matched filter reduces to the area correlator of Fig. 9.45.

Direct Search Methods [58-59] Direct methods of searching for an object in a scene are useful when the template

size is small compared to the region of search. We discuss some efficient direct search techniques next.

Two-dimensional logarithmic search. This method reduces the search iterations to about log n for an n x n area. Consider the mean distortion function

MN

MNm = l n = l L f(v(m, n) - u (m + i, n +j)), -p :s i, j :sp (9.136) where f(x) is a given positive and increasing function of x, u(m, n) is an M x N

template and v(m, n) is the observed image. The template match is restricted to a preselected [-p, p] x [-p, p] region. Some useful choices for f(x) are lxl and x2•

We define direction of minimum distortion (DMD) as the direction vector ( i, j) that minimizes

D (i, j). Template match occurs when the DMD has been found within the search region. Exhaustive search for DMD would require evaluation of D(i, j) for (2p + l)i directions. If the

D (i, j) increases monotonically as we move away from the DMD along any direction, then the search can be speeded up by successively reducing the area of search. Figure 9.47a illustrates the procedure for p =

5. The algorithm consists of searching five locations (marked 0 ), which contain the center of the search area and the midpoints between the center and the four boundaries of the area. The locations searched at the initial step are marked 1. The optimum direction

404 Image Analysis and Computer Vision Chap. 9

A 2-D logarithmic search procedure for the direction of minimum distortion. The figure shows the concept of the 2-D logarithmic search to find a pixel in another frame, which is registered with respect to the pixel (i, j) of a given frame, such that the mean square error

over a block defined around (i, j) is minimized. The

search is done step by step, with O indicating the directions searched at a step number marked. The numbers circled show the optimum directions for that

search step and the • shows the final optimum direction,

(i - 3, j + 1 ) , in this example. This procedure requires,

searching only 13 to 21 locations for the given grid, as

opposed to 121 total possibilities.

(a)

Figure

9.47 T\vo-dimensional logarith­ mic search. (a) The algorithm. (b) Exam­ ple Courtesy Stuart Wells, Herriott-Watt

(b)

Univ. U . K.

Sec. 9. 1 2 Scene Matching and Detection 405

(circled numbers) gives the location of the new center for the next step. This procedure continues until the plane of search reduces to a 3 x

3 size. In the final step all the nine locations are searched and the location corresponding to the minimum gives the DMD.

Algorithm [58]. For any integerm > O, define..K(m) = {(i, j); -m :s i, j :s m}, .A"(m) = {(O, 0), (m, 0), (0, m), (-m, 0), (0, -m)}.

Step 1 (Initialization)

D (i, j) = oo ,

(i, j) E ..K(p ); n' � Integer [log2p]; n = max{2, 2n' - 1} q =l=O

(or an initial guess for DMD)

Step 2 vK'(n) �vK(n)

Step 3 Find (i, j)

E .A"'(n) such that D (i + q, j + l) is minimum. If i = 0 and

0, go to Step 5; otherwise go to Step 4. Step 4 q �q + i, l �z + j ; .A"' (n) �vK'(n) - (-i, -j); go to Step 3. Step 5 n �n/2. If n = 1 , go to Step 6; otherwise go to Step 2.

D (i + q, j + l) is minimum. q� i+ q, l �t + j. (q, l) then gives the DMD.

Step 6 Find (i, j) E �¥"(1) such that

If the direction of minimum distortion lies outside ..K(p ), the algorithm con­ verges to a point on the boundary that is closest to the DMD. This algorithm has been found useful for estimating planar-motion of objects by measuring displace­

ments of local regions from one frame to another. Figure 9.47b shows the motion vectors detected in an underwater scene involving a diver and turbulent water flow.

III

mi l! - �

v(m, n)

v1 ( m, n)

v2 ( m, n) v3 (m, n)

D Figure

D u1 (m, n)

u(m, n)

u2 (m, n) u3 (m, n)

9.48 Hierarchical search. Shaded area shows the region where match oc­ curs. Dotted lines show regions searched.

406 Image Analysis and Computer Vision Chap. 9

Sequential search. Another way of speeding up search is to compute the cumulative error p q ep,q (i, j) � 2: 2: jv(m, n) - u(m + i, n +j)j,

p ::s. M, q ::s. N (9. 137)

m= ln=l

and terminate the search at (i, j) if ep. q (i, j) exceeds some predetermined threshold. The search may then be continued only in those directions where ep, q (i, j) is below a threshold.

Another possibility is to search in the i direction until a minimum is found and then switch the search in the j direction. This search in alternating conjugate direc­ tions is continued until the location of the minimum remains unchanged.

Hierarchical search. If the observed image is very large, we may first search a low-resolution-reduced copy using a likewise reduced copy of the template. If multiple matches occur (Fig. 9.48), then the regions represented by these loca­ tions are searched using higher-resolution copies to further refine and reduce the search area. Thus the full-resolution region searched can be a small fraction of the total area. This method of coarse-fine search is also logarithmically efficient.