Averaging Line Images Tristan Lewisy
[email protected]
Robyn Owensy
[email protected]
Adrian Baddeleyz
[email protected] y
Department of Computer Science, The University of Western Australia z Department of Mathematics, The University of Western Australia
Abstract We describe a method for combining several images of line features to obtain an `average' line image. Applications are shown to radar maps of aircraft ight paths, and line sketches of human faces. The technique is an adaptation of the `distance average' of random sets introduced by Baddeley and Molchanov. It involves computing the distance transform of each input image, calculating the pixelwise mean (or median or trimmed mean) of distance values, and applying a threshold or ridge-tracing operation to obtain a binary image.
Keywords: Average face; average ight path; distance average; distance transform; image combination; median; random sets; Vorob'ev mean. CR Classi cation: G.3 [Probability and Statistics]: Statistical Computing. I.4.m [Image Processing]: Miscellaneous.
AMS Subject Classi cation (1991 revision): 68U10, 60D05, 60-07.
1 Introduction It is often desired to combine multiple images into a single `average' image. The purpose may be to fuse several dierent images of a single scene into one `best' image of that scene, or to collate images of many similar but dierent objects into an image of the `typical' object. Applications of the rst kind are common in remote sensing and video image enhancement. Applications of the second kind include the characterisation of particle size and shape in the microscopic analysis of sand and powders [15] and diverse studies of biological shape [2, 7]. This paper investigates methods for averaging a set of line images. We apply the techniques to line sketches of human faces, and radar images of aircraft ight paths. A line image is de ned here as a binary image consisting of one or more curvilinear features. We do not impose any constraints on the straightness, connectivity or thickness of the line features. Line images are ubiquitous in image analysis, particularly in object recognition and document processing, and
as the output of edge detection algorithms. It is therefore important to have eective techniques for computing an average of several line images. There is scant literature on methods for image averaging. In practice it seems that when a need for image averaging arises, it is generally performed using ad hoc methods, and the output adjusted until it is acceptable. From an industrial or scienti c viewpoint it would be preferable to have a systematic solution which is widely applicable and whose potential weaknesses are well understood. When the data are greyscale images, it is conventional to compute the ordinary pixelwise average of the data images, possibly after standardising the images by warping and equalisation. However, when the data are binary images, it is not sensible to take the pixelwise average; the resultant average image is no longer binary, and may not retain essential features of the input images. An ideal averaging procedure would preserve the key geometric and topological properties of the input images, while suppressing unwanted `noise' or variability in the input. Properties that should be preserved include area, length, thickness, position, orientation, and connectivity of the components within the image. In the special case of line images, the output of the averaging procedure should ideally be a line image too. Examples of unwanted noise or variability include `false positive' and `false negative' pixels (individual pixels erroneously classi ed as object instead of background, and vice versa) and misregistration or position errors (features or pixels erroneously shifted or rotated). It is desirable that image averaging procedures be insensitive to misregistration of the input images, since there are many applications (particularly in remote sensing) where perfect registration cannot be attained. This highly nonlinear problem can be conveniently formulated in the language of random set theory, as several writers have shown. A binary image can be regarded as a subset X of twodimensional space (discrete or continuous) by treating the `foreground' pixels as constituting the subset. Our task is then to compute an average of several subsets X1 ; : : : ; X which may be regarded as n dierent outcomes of the same random set X . n
Averaging methods which have been developed in the literature of random sets include the Frechet mean, Aumann mean, radius vector mean, Vorob'ev mean, and distance mean [1, 7, 15, 16, 18]. Of these, only the distance mean [1, 7] is appropriate for averaging line images. The Frechet mean is dicult to implement since it is de ned only as the solution of a certain optimization problem. The Aumann mean and radius vector mean are only appropriate to convex features and star-shaped features respectively. The Vorob'ev mean [16, 18] is obtained by simply thresholding the pixelwise average of the binary input images. While the Vorob'ev mean is applicable to all binary images and often performs well, in the case of line images it is extremely sensitive to displacement errors and eectively unusable. In this paper we compute averages of line images using the distance mean [1, 7] and develop some extensions and modi cations of the technique. The distance average is de ned as a threshold of the pixelwise average of the distance transforms of the input images. Our modi cations include replacing the pixelwise average by the pixelwise median, and replacing the threshold operation by ridge tracing. The plan of the paper is as follows. Section 2 summarises relevant de nitions and background. Section 3 describes the distance average technique of Baddeley & Molchanov [1] that is the basis of our work. Our modi cations and extensions of this technique are introduced in section 4. The new techniques are tried on simple synthetic images in section 5. Applications to real data are 2
shown in sections 6 and 7, which deal with aircraft ight paths and human faces, respectively. Finally section 8 presents some conclusions.
2 Background 2.1 Mathematical formulation We assume that all images are de ned on the same pixel raster S ; usually S is a subset of a square or hexagonal grid. An image is a function f : S ! R assigning to each pixel s 2 S a numerical value f (s). A binary image is one which takes only the values 0 and 1. We shall interpret pixels with the value 1 as `foreground', `object' or `feature' pixels, and pixels with value 0 as `background'. If b is a binary image then the set of pixels with value 1,
B = fs 2 S : b(s) = 1g; will be called the `foreground' of b. In this way, binary images b can be identi ed with subsets B of the pixel raster S , and we do this implicitly without further comment.
2.2 Pixelwise mean and Vorob'ev mean If f1 ; f2 ; : : : ; f are images, their pixelwise mean is the image f de ned by n
f(s) = n1
X f (s); n
(1)
i
i=1
that is, the value of f at pixel s is the arithmetic mean of the values f1 (s); : : : ; f (s) of the n input images at the same pixel. Note that if b1 ; : : : ; b are binary images, their pixelwise mean b gives, for each pixel s, the proportion of the n images which cover s. This is often termed the coverage function ; a coverage value of 0.5 means that in exactly half of the data images, the pixel in question is in the foreground. n
n
The Vorob'ev mean of several binary images b1 ; : : : ; b at level p, where 0 p 1, is the binary image obtained by thresholding the coverage function at level p: E b = fs 2 S : b(s) pg: (2) n
V;p
In practical terms, the Vorob'ev mean is computed by obtaining the average pixel value at all points in the input images and then thresholding this at the value p to produce either an object or background pixel in the nal combined image. With the threshold set at 50% this technique is known as the Vorob'ev median . Automatic thresholding, that is, algorithm-driven selection of the threshold p for the Vorob'ev mean, is desirable in some applications, for computational eciency and to avoid subjective in uences. Vorob'ev [18] proposed that p be chosen so as to minimise the dierence between the area of the Vorob'ev mean and the average area of the input sets B . It is trivial to prove that the optimal p equals the average value, over all pixels s in S , of the pixelwise mean coverage b(s). i
3
More generally these concepts may be extended to (potentially) in nite populations or distributions of images. An in nite population of binary images may be formulated as a random subset X of S [9, 14]. For a random set X de ne the coverage probability function p : S ! R by p (s) = Pfs 2 X g; (3) so p (s) is the probability that X contains s, in other words, the probability that s is a foreground pixel of X . Then the Vorob'ev mean [18] of X at level p is E X = fs 2 S : p (s) pg: (4) X
X
X
V;p
X
The following is trivial to establish.
Theorem 1 Let V denote the Vorob'ev mean of n images B ; : : : ; B at threshold t. Then 1
t
n
s t implies V V ; V0 is the empty set; V1 is the union of the input images; V1 is the intersection of the input images. s
t
=n
Thus we can view the Vorob'ev mean as an operator intermediate between intersection and union. Indeed the collection of images V for all t 2 [0; 1] is a complete lattice. t
2.3 Sensitivity to displacement A disadvantage of the Vorob'ev mean is its sensitivity to the misregistration or displacement of thin features. Equation (1) performs an average of corresponding pixel values f (s) in each of the images f . Slight misregistration of the pixel rasters in the dierent images, or slight displacements of a feature between one image and another, may have a large eect on the Vorob'ev mean. The eect is large if the features are thin in the direction of displacement. An extreme example is shown in Figure 1. The two input images each contain a horizontal line; in the second image (Figure 1 (b)) the line is shifted down by 2 pixels. The Vorob'ev median (that is, the Vorob'ev mean at threshold 50%) in Figure 1 (c) is blank. This extreme sensitivity makes the Vorob'ev mean unsuitable for applications to line images. i
i
(a)
(b)
(c)
Figure 1: An example to show a disadvantage of the Vorob'ev mean for two images that are not exactly aligned. Image (b) is the same as image (a) except that the line has been shifted down two pixels. The Vorob'ev mean of (a) and (b) is the blank image (c).
For this research, eorts were made to ensure that all images were accurately registered before the averaging process. 4
3 Distance average Baddeley and Molchanov [1] introduced an alternative averaging method for binary images. Brie y, this consists in computing the distance transform of each input image, calculating the pixelwise average of the distance transforms, then thresholding this average distance transform at a suitable value. In this section we summarise the original distance average technique presented by Baddeley & Molchanov [1] and some of its properties. Section 3.1 summarises the well-known distance transform. Section 3.2 describes the distance average procedure. Sections 3.3{3.4 draw attention to some of the good and bad properties of this procedure.
3.1 Distance transform The distance transform technique was originated by Rosenfeld & Pfaltz [12, 13] and extended by Borgefors [3, 4]. See also [5, 17]. Suppose that the distance between any two pixels s; t in the raster S is measured according to some metric d(s; t). This may be the Euclidean distance between s and t, the number of horizontal or vertical steps needed to move from s to t, or some other measure. Given a subset B of S , we may then de ne the `distance' from a given pixel s to the subset B as the smallest distance from s to any pixel t belonging to B : d(s; B ) = minfd(s; t) : t 2 B g: The distance transform of a binary image B is the image f with pixel values f (s) = d(s; B ); for all s 2 B: That is, the value of the distance transform at pixel s is equal to d(s; B ), the shortest distance from s to B . The distance transform of any binary image can be computed in just two passes over the image [3, 4, 13]. To further reduce computational eort it is usual to adopt a discrete, rational approximation to the exact Euclidean distance, so that the distance transform can be calculated in integer arithmetic. Borgefors [4] investigated several discrete approximations and cautioned that dierent approximations can lead to quite dierent results. She recommended two optimal choices, known as the chamfer (3,4) and chamfer (5,7,11) masks which have maximum relative errors of 8.09% and 2.02% respectively. We adopt these approximations in the sequel.
3.2 Distance average Baddeley & Molchanov [1] (see also [7]) introduced an alternative to the Vorob'ev averaging procedure for binary images, calling it the distance average . The procedure has three steps: 1. Compute the distance transform of each input image; 2. Combine the distance transforms via the pixelwise mean to produce an average distance image; 5
3. Threshold to produce the nal binary image. In formal terms, for binary images B1 ; : : : ; B de ne their average distance image d to be the pixelwise mean of the individual distance transforms, d(s) = 1 d(s; B ); for all s 2 S: (5) n
X n
n =1 Then de ne the distance average of B1 ; : : : ; B to be the binary image obtained by thresholding d, E B = fs 2 S : d(s) tg (6) where t 0 is a chosen threshold value. Note that the direction of the inequality in the threshold is reversed with respect to (2), (4). Similarly for a random set X in the pixel raster S , de ne the expected distance to X from a xed pixel s by d (s) = E [ d(s; X )] (7) and de ne the distance mean E X of X as a threshold of d analogous to (6). i
i
n
d;t
X
X
d;t
Implementation of the distance average requires decisions about the choice of distance transform (for step 1) and threshold (for step 3). The distance transform may be implemented with various discrete approximations, as noted in the previous section. Other permissible variations of the distance function (discussed for example in [1]) include the square distance function and signed distance function. Baddeley & Molchanov proposed a range of automatic thresholding procedures. The average of the distance transforms should be thresholded at various levels t, creating binary images B ( ) = fs 2 S : d(s) tg: [Note again that the collection of images B ( ) for all t 0 is a complete lattice.] Then t should be selected to minimise the `average error' 1 (B ( ) ; B ) t
t
X n
n
t
i
i=1
where B are the input images and (; ) is some measure of the discrepancy between two images. i
Implementation of this automatic thresholding procedure depends on the choice of error measure . A simple example is to take (A; B ) equal to the absolute value of the dierence in area between binary images A and B . The optimal threshold t is then the value for which the area of B ( ) is most nearly equal to the average of the areas of the input images B . This value can be determined from the histogram of d. Hence the automatic thresholding is analogous to histogram matching in this simple case. However, for general , this approach to automatic thresholding is complex and will unavoidably require computation of the thresholded images B ( ) for many t values [1]. t
i
t
3.3 Insensitivity to misregistration The extreme sensitivity of the Vorob'ev mean to misregistration or displacement of thin features (described in section 2.3) is not shared by the distance average. For example, Figure 2 shows 6
(a)
(b)
(c)
Figure 2: The result of combining the distance transforms of images (a) and (b) (identical to Figure 1 (a){(b)) and then thresholding. The result is image (c).
the result of applying the distance average to the two line images of Figure 1 (a) and (b). The distance average image, Figure 2 (c), is a horizontal line three pixels thick. The next result shows that linear or nonlinear displacement of a feature perturbs the distance transform by an amount less than or equal to the distance over which the feature is displaced.
Theorem 2 Let b and b0 be two binary images on the same raster S which are equivalent under a mapping w : S ! S , that is, b0 (s) = b(w(s)). Let h = max d(s; w(s)) 2 s
S
be the maximum distance over which any pixel is displaced by w. Then the distance transforms of b and b0 dier by at most h:
max jd(s; B ) ? d(s; B 0 )j h: 2 s
S
Proof: Since B 0 = fw(t) : t 2 B g we have, for any s 2 S , d(s; B 0 ) = minfd(s; w(t)) : t 2 B g: By the triangle inequality, d(s; w(t)) d(s; t) + d(t; w(t)) d(s; t) + h. Thus d(s; B 0 ) minfd(s; t) + h : t 2 B g = d(s; B ) + h:
The triangle inequality also gives d(s; t) d(s; w(t)) + d(w(t); t) d(s; w(t)) + h so that d(s; B ) d(s; B 0 ) + h and the result follows.
Corollary 3 Let B ; : : : ; B be deformations of an original image B each with maximum dis1
n
placement h, in the sense of the Theorem above. Then the distance average at threshold h will contain the original image: E d;h B
B:
Proof: Since d(s; B ) = 0 if and only if s 2 B , the previous Theorem gives, for any s 2 B , d(s) h and hence s 2 E B . d;h
Thus the distance average is insensitive to displacement, at the cost of producing results which are generally thicker than the original. 7
(1)
(2)
average
Figure 3: Two synthetic line images (1) and (2) with a small angular dierence, and their distance
average.
(1)
(3)
average
Figure 4: Two synthetic line images (1) and (3) with a large angular dierence, and their distance average.
3.4 Dependence on angle The distance average technique responds to dierences in angular orientation in a somewhat subtle fashion. Figure 3 shows two synthetic line images which dier by a small acute angle. Their distance average is a thickened line segment, with position and orientation intermediate between the two lines. This behaviour is appropriate; a similar example was presented in [1]. However Figure 4 shows two lines which are inclined at a larger, obtuse angle; their distance average is a rather nonsensical blob near the intersection point of the lines. It is unclear what one would like the average of two such lines to be.
4 Extensions of the distance average 4.1 Pixelwise median and trimmed mean As we have seen, a problem with combining distance transformed images by the pixelwise mean is that lines in the nal binary image appear thicker than they are in the input images. This occurs when the input images are slightly misregistered or features are displaced. Thresholding the mean distance image at a lower value will reduce this thickness but at the cost of removing other parts of the image. 8
We propose instead that the distance transforms of the input images be combined by using the pixelwise median . Recall that the median of a set of numbers x1 ; : : : ; x is that value m = medfx1 ; : : : ; x g which is greater than exactly half of the numbers x . The median can be computed by sorting the x 's into ascending order and taking the middle value; if n is odd, the median is the +1 th largest value, while if n is even we take the average of the 2 th and ( 2 +1)th 2 largest values. The pixelwise median of images f1 ; : : : ; f is med f (s) = med ff1 (s); : : : ; f (s)g; for all s 2 S: n
n
i
i
n
n
n
n
n
Applying the pixelwise median to the distance transforms, we nd that the thickening of images is reduced. This is illustrated in Figure 5 for the toy example of Figures 1 and 2. The average image produced by using pixelwise median combination is a line one pixel thick, lying between the two lines present in the input images.
(a)
(b)
(c)
Figure 5: Example showing reduction of thickening artefact by using the pixelwise median to combine the distance transforms. Images (a) and (b) are the same input images as in Figures 1 and 2. Image (c) is the average image obtained using the pixelwise median combination.
The following result expresses a connection between the Vorob'ev mean and the abovementioned modi cation of the distance average.
Theorem 4 The Vorob'ev median is equivalent to applying the distance transform to the input images, combining these distance transformed images via the pixelwise median and then thresholding the combined distance image at 0. Proof: The median of a set of n nonnegative numbers is equal to 0 if, and only if, at least half of the numbers are equal to 0. Thus the pixelwise median of distance transforms, medfd(s; B )g, i
equals 0 if and only if at least half of the values d(s; B ) equal 0, that is, if and only if at least half of the sets B contain s. The result follows. i
i
Since the Vorob'ev median is sensitive to misregistration and displacement, this indicates that the distance average with pixelwise median technique also has this sensitivity for small values of the threshold. Another alternative which we shall explore in experiments is to replace the mean by the trimmed mean . The 5% trimmed mean of a set of numbers is obtained by throwing away the largest 5% and the smallest 5% of the data and computing the arithmetic mean of the remaining values. This strategy eliminates the in uence of extreme data (distance) values on the mean but is only of value when there are many data images.
4.2 Ridge tracing An alternative to obtaining the nal binary image via thresholding is to use ridge tracing [6] or watershed algorithms [14]. These attempt to locate ridges in the topography of the input 9
greyscale image. We have adopted the ridge tracing procedure known as ordered homotopic thinning [10]. In brief, pixels in the combined distance image are visited in order of increasing distance value, and deleted if this will not alter the homotopy of the image. The algorithm stops when no more pixels can be deleted. This procedure ensures the result is a connected set of lines. It is only suitable for computing a distance average when the inputs are line images.
5 Experiments with synthetic images To assess the performance of the distance average technique and our various modi cations of it, we conducted some simple experiments involving images of straight lines. In our rst experiment, we generated 30 binary images containing a single straight line in the 135 diagonal direction, displaced by random distances following a normal distribution with standard deviation equal to 7 pixel steps in the 45 direction. Figure 6 shows three of these images and the superposition of all 30 images. Figure 7 shows the distance averages obtained using the chamfer (5,7,11) distance transform, combining the distance transforms using the pixelwise mean, median or 5% trimmed mean, and thresholding. All three methods produce a line or lines in the correct orientation, all with the characteristic thickening. Interestingly the result using the pixelwise median consists of two lines; this may be attributed to the existence of a gap in the spread of lines in the input data set, visible in the superposition image in Figure 6. The pixelwise median of the distance transforms has a `W' shaped transect in this example. It is arguable that this is the correct behaviour in such circumstances. Note also that there is variation in the lengths of the lines in the data set, due to cropping in nite lines at the boundary of the pixel raster. Consequently the distance averages of the shifted lines are shorter than the maximum length of the lines in the data set. The pixelwise median technique is the least aected. Similar behaviour was observed when this experiment was repeated with = 1, 2, 3, 5 and 10.
Figure 6: Three examples from a set of 30 images consisting of diagonal lines randomly displaced from their original position, and (right ) superposition of all 30 images. Our second experiment took the same diagonal line and applied 30 dierent rotations by random angles which followed the Von Mises distribution [8] with concentration parameter = 10. Figures 8 and 9 show the data and the distance averages respectively. All three methods produce an ellipse in the correct orientation and concentric with the centre of rotation of the lines. The pixelwise median seems to perform best in this case. Similar behaviour was observed when this experiment was repeated with = 0:5, 1, 2, 5 and 50.
10
Figure 7: Average of the 30 shifted line images obtained by three variants of the distance average technique. In each case the chamfer (5,7,11) distance transform of each image was taken. The three images (left to right ) are the result of thresholding the pixelwise mean, median, and 5% trimmed mean, respectively, of the distance transforms.
Figure 8: Three examples from a set of 30 input images consisting of lines randomly rotated from their original diagonal direction, and (right ) superposition of all 30 images.
Figure 9: Average of the 30 rotated line images obtained by three variants of the distance average technique. In each case the chamfer (5,7,11) distance transform of each image was taken. The three images (left to right ) are the result of thresholding the pixelwise mean, median, and 5% trimmed mean, respectively, of the distance transforms.
11
6 Aircraft ight paths Our rst application is to the analysis of aircraft ight paths. Air trac control radar and monitoring systems record the ight paths taken by all aircraft in the vicinity of an airport. It may be desirable to calculate an `average' ight path for the purposes of surveying or monitoring air trac, or for use in airport planning or town planning, or for the assessment of aircraft noise complaints.
6.1 Data Figure 10 shows examples from a set of 76 images of the ight paths of aircraft on landing approach to Perth International and Domestic Airports (which share the same runway). These were scanned from printed records kindly supplied by Mr Graham Moyle, Airservices Australia, Perth Domestic Airport, Perth, Western Australia. In the original data and in Figure 10 the
ight path is overlaid on a map of the underlying suburbs. For the averaging process we extracted only the ight paths for averaging, after rst registering the images using the suburb maps. The data images separate naturally into ve groups according to the direction of approach to the airport. One example from each of the ve sets is shown in Figure 10. The groups were identi ed as SW, SE, N, NW and NE containing 3, 27, 13, 6 and 27 images respectively. The relative sizes of the groups are typical for trac at these airports. In what follows, averages are computed for each group separately.
6.2 Choice of technique We have seen that the Vorob'ev mean is hypersensitive to misregistration and displacement of line features. Even though our data images are accurately registered, there is substantial variation in the aircraft tracks. Hence the Vorob'ev mean is not appropriate here. Our choice is to adopt the distance median technique. This computes the distance transform of the input images, combines the distance transformed images via the pixelwise median, and then thresholds to produce the nal binary image. Combination of the distance transformed images via the pixelwise median is chosen over combination via the pixelwise mean or trimmed mean because the median reduces the thickness of lines in the nal image. We use both the chamfer (3,4) and chamfer (5,7,11) versions of the distance transform. Since the data separate naturally into ve groups of paths taking quite dierent directions, and in view of the artefacts that may be expected if such distinct groups are combined (section 3.4), we chose to compute the distance median of each group separately and then superimpose the ve results to obtain a nal average.
6.3 Results Figure 11 shows the result of computing the distance averages of each of the ve groups of ight paths (pixelwise median combination, chamfer (5,7,11) distance transform, manual threshold) and nally superimposing these averages. The NW, NE, N, SE and SW groups were thresholded at levels 31, 21, 5, 20 and 15 respectively. These threshold levels were obtained by manual 12
SW approach
SE approach
NW approach
N approach
NE approach Figure 10: Examples of aircraft track images. The ve examples show aircraft approaching the airport
from the south west, south east, north west, north and north east of the image respectively. In each image, the aircraft track is shown in black while the background is in grey. Data reproduced by kind permission of Graham Moyle, Airservices Australia, Perth Domestic Airport.
13
adjustment of levels originally computed by the automatic thresholding procedure. Identical results were obtained with the chamfer (3,4) distance transform although the threshold values were correspondingly dierent. A slight thickening of the average ight path is evident, but it is not as severe as for the pixelwise mean and trimmed mean combinations (not shown here). While in general the choice of discrete approximation to the distance transform can aect the result, this does not seem to be occurring in this case. The fact that we obtain the same output using dierent distance transforms is useful for computational eciency. On a Sun Sparc 5, the chamfer (5,7,11) distance transform took on average 30% more CPU time than the chamfer (3,4) distance transform.
Figure 11: Left: combined distance average of the ight paths. Right: same image superimposed on the suburb map. The combined distance average was obtained by superimposing the distance averages of the ve groups of images. For each group of input images, the distance average was computed by thresholding the pixelwise median of the distance transforms of the images in that group. Results were identical for the chamfer (3,4) and chamfer (5,7,11) distance transforms. To apply ridge detection, we constructed a combined average distance image. Figure 12 shows the chamfer (3,4) and chamfer (5,7,11) variants. This combined image is the pixelwise minimum of the ve group averages. For each group of images, the average distance image is the pixelwise median of the distance transforms of the images in that group. The combined average distance image may be used for various purposes in its own right, in this case for example to assess the concentration of ights over the air trac control area. It is an indication of the diculty of the averaging problem. The results in Figure 11 show the characteristic thickening of the distance average. Thresholding takes place at a value greater than or equal to the step length between horizontal neighbouring pixels which (for the chamfer (3,4) distance transform) implies a threshold value of 3 or greater. This means that at least one pixel either side of the 0 pixels in the combined distance image will be preserved in the nal image, causing thickening. More importantly, the distance averages are not connected whereas all the input images consist of a single connected line. This occurs because the thresholding operation ignores topological properties. As foreshadowed above, we may avoid this eect by applying ridge detection to the average distance image. Figure 13 shows the result of applying ridge detection to the combined 14
(a)
(b)
Figure 12: Combined average distance transforms for the ight path images. Left: chamfer (3,4) distance transform; right: chamfer (5,7,11).
Figure 13: The ridge detected version of the combined mean distance image shown in Figure 12, chamfer (3,4) variant.
15
average distance image in Figure 12 (a). Note that we are still left with spurious lines on the image that correspond to localised ridges. This is particularly evident in the bottom right of the image. There is a need here for averaging techniques that preserve the topological properties of the input images.
7 Human faces Our second application is related to psychological research into the perception of human beauty [11]. It is theorised that faces appear beautiful if they exhibit symmetry and `averageness'. Studies suggest that arti cially created `average' faces are more beautiful than the component faces that are used to create them. Techniques used previously in the psychological literature to obtain average faces include I. pixelwise averaging of greyscale images (after suitable deformation), and II. manually identifying certain ducial (reference) points on the faces, averaging the twodimensional vector locations of the ducial points, and joining the averaged points by line segments to yield a line sketch of the average face [11]. An example of the implementation of method II is shown in Figure 14. The original source is a grey scale image of a head-and-shoulders portrait. The image is marked by hand with 169 prede ned ducial points. Given the coordinates of these points, certain of them are linked by line segments (by computer graphics) to produce a line image as shown in Figure 14. This method guarantees that the average face will have all the features of a normal face. Reasons for seeking an alternative to method II are to investigate the perceptual eect of other kinds of averaging, and to avoid the labour-intensive analysis of individual images. If the distance average is used, there is no essential need to identify ducial points, and we could replace the manual procedure by an edge detector. This would also avoid subjectivity and error in the identi cation of ducial points.
7.1 Data Figure 15 shows a set of 10 line images of faces of male subjects, kindly provided by Professor G. Rhodes of the Department of Psychology at the University of Western Australia. There are some similarities between the face image data and the aircraft ight paths, notably the substantial variation in the position or presence of some lines. However the face images are also dierent in that all faces contain certain standard features, such as the eyes and nose, which do not vary so greatly in position although they do vary in shape. The images exhibit close registration of the nose, eyes, mouth, and the eye brows, whereas other facial features, such as the chin area and the hairline show greater variation in position.
7.2 Results We adopt the distance average technique, with several choices for the distance transform (chamfer (3,4) and chamfer (5,7,11) distance transforms), for the combination of the distance images 16
Figure 14: Greyscale portrait of a human subject showing 169 reference points identi ed manually (left), and the resulting binary face image (right) after deformation to register inter-pupil distance. Data reproduced by kind permission of Professor G. Rhodes.
(pixelwise mean, median, and 5% trimmed mean) and for computing the nal binary image (thresholding and ridge tracing). Figure 16 shows the average faces produced by using the chamfer (3,4) distance transform. Similar results were obtained using the chamfer (5,7,11) distance transform. The three images are the results of combining component images via the pixelwise mean, median, and trimmed mean. Thresholding was performed automatically by matching the area of the thresholded image to the average area of the input images. It is evident in the images in Figure 16 that combining distance transforms via the pixelwise median produces a more detailed binary face as the output. The distance images obtained after combining via the pixelwise mean and trimmed mean are smoother than the distance images produced by combining via the pixelwise median. It is for this reason that the thresholded version of the distance image produced from pixelwise median combination shows greater detail. The distance averaging procedures used to produce the output images do not guarantee a complete facial binary image. One aspect of this is that the components of the average face image are not necessarily connected. Figure 17 shows the average faces resulting from ridge tracing the combined distance images. The three images shown are the result from combining separate images via the pixelwise mean, median, and trimmed mean. All outputs now are connected, and the facial components are connected to each other. This could be alleviated by analysing the connected components separately as was done with the aircraft ight path data. The `cracked' look of the ridge traced face based on pixelwise median combination is further evidence that the distance image produced by this combination method is less smooth than the distance images produced by the other combination methods. A less smooth distance image will have more ridges, and hence more line segments in the ridge traced image. The face produced through combination via the trimmed mean is the best out of the three faces shown in Figure 17. As with the image produced through pixelwise mean combination, it shows all facial features (such as the mouth, eyes, nose, eye brows) but there are less spurious lines 17
Figure 15: The full set of 10 binary face images used in the study. Data reproduced by kind permission of Professor G. Rhodes.
(a)
(b)
(c)
Figure 16: The average faces produced through the use of the chamfer (3,4) distance transform. Image (a) involved pixelwise mean combination, image (b) involved pixelwise median combination, and image (c) involved combination via the pixelwise trimmed mean. Automatic thresholding by matching areas
18
(a)
(b)
(c)
Figure 17: The average face produced by ridge tracing the combined distance image instead of thresh-
olding. Image (a) involved combining via the pixelwise mean, image (b) involved combining via the pixelwise median, and image (c) involved combining via the pixelwise trimmed mean. The chamfer (3,4) distance transform was used in the production of these images.
in the face. This shows that use of the trimmed mean for combination produces a smoother distance combined image, that is, less ridges, and hence less spurious lines in the ridge traced image. This smoother distance image is produced because of the ability of the trimmed mean to remove outliers. Unlike the face averaging techniques used by Rhodes [11], the average faces produced through the use of the distance average techniques are not guaranteed to represent a full face. The best (and most detailed) approximation to a full face is obtained by combining component distance transformed images via the pixelwise median and then thresholding. The average faces produced via this combination method are still missing the forehead and chin areas. The use of ridge tracing, instead of thresholding, to produce the nal binary image produces a better approximation to a full face. However the appearance of spurious lines in the nal binary image is a problem. This was also evident when using ridge tracing to produce the average aircraft ight path. Again, the appearance of these lines can be reduced by smoothing the distance image before ridge tracing or by pruning the ridge traced image. When pixelwise trimmed mean combination is used, however, the average face produced via ridge tracing is a very good approximation of the faces used as inputs to this experiment. The number of spurious lines in this image is less than the number of spurious lines in the other ridge traced images.
8 Discussion and conclusions Image averaging is still a largely unexplored area of image processing. Averaging techniques have previously been developed for binary images in general, but these techniques are not appropriate to the special case of line images. This paper has presented the application of distance averaging to line images. We have developed modi cations to improve its behaviour on this class of images. Modi cations included the use of the pixelwise median and trimmed mean to combine the distance transforms of input images, and the use of ridge tracing instead of thresholding the combined distance image. In the application to ight paths, it was essential to separate the input data into groups of similar images which 19
were then averaged separately. While the modi ed technique performs creditably, it must be admitted that the resulting average images do not preserve some of the desirable topological or geometric properties of the input images. Conservation of area or length can be guaranteed by adopting an automatic thresholding procedure which matches the area or length of the thresholded image to the average area or length of the input images. However, if area is preserved, the resulting line images tend to be shorter and thicker than they ought. A more signi cant problem is the preservation of connectivity. The thresholding technique generally does not preserve connectivity. Ridge tracing does guarantee to produce a completely connected image, but this may actually be more connected than the input images, and may result in the appearance of spurious line segments. This can be corrected to some degree by smoothing the distance image (before ridge tracing), analysing separately the connected components in the images, or by pruning the ridge traced image. Future work could try to improve topological properties by investigating alternatives to thresholding and ridge-tracing, such as watershed algorithms [14]. The aircraft ight paths example suggests that the pixelwise mean or median of the distance transforms may have intrinsic worth as an indicator of the concentration of ight paths over each location. Performance of the techniques on the `faces' example might have been improved by separating out the connected components of each input image and averaging these components separately. For example one would compute an average nose separately from the average hairline. This leads naturally into the problem of averaging multi-valued images, such as the labelled or classi ed images common in remote sensing.
Acknowledgements We warmly thank Graham Moyle of Airservices Australia, Perth Domestic Airport, for granting access to his data on aircraft ight paths, and Gillian Rhodes of the Department of Psychology at The University of Western Australia (UWA) for kindly providing the binary images of faces. We also gratefully acknowledge Chris Pudney of UWA who applied his ridge tracing techniques to the distance images, and Ilya Molchanov and Nial Friel at the University of Glasgow for helpful comments on early drafts of this paper. The image manipulation procedures were implemented in C language additions to the VIP image processing system developed by the Robotics and Vision Research Group in the Department of Computer Science, UWA. The advice of Peter Kovesi is gratefully acknowledged. Adrian Baddeley was partly supported by a grant from the Australian Research Council.
References [1] A. J. Baddeley and I. S. Molchanov. Averaging of random sets based on their distance functions. Journal of Mathematical Imaging and Vision, 8:79{92, 1998. [2] F. L. Bookstein. Morphometric Tools for Landmark Data: Geometry and Biology. Cambridge University Press, 1991. 20
[3] Gunilla Borgefors. Distance transformations in arbitrary dimensions. Computer Vision, Graphics, and Image Processing, 27:321{345, 1984. [4] Gunilla Borgefors. Distance transformations in digital images. Computer Vision, Graphics, and Image Processing, 34:344{371, 1986. [5] Kenneth R. Castleman. Digital Image Processing, pages 470{477. Prentice-Hall, New Jersey, 1996. [6] Robert M. Haralick. Ridges and valleys in digital images. Computer Vision Graphics and Image Processing, 22:28{38, 1983. [7] A. Mancham and I. S. Molchanov. Stochastic model of randomly perturbed images and related estimation problems. In K. V. Mardia, C. A. Gill, and I. L. Dryden, editors, Image Fusion and Shape Techniques, pages 44{49. Leeds University Press, 1996. [8] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis, chapter 15.3, page 429 . Academic Press, 1979. [9] G. Matheron. Random Sets and Integral Geometry. John Wiley and Sons, New York, 1975. [10] C. J. Pudney. Distance-based skeletonization of 3D images. In Proceedings of TENCON'96: IEEE Region 10 Conference on Digital Signal Processing Applications, pages 209{214, Perth, Western Australia, November 1996. [11] Gillian Rhodes and Tanya Tremewan. Averageness, exaggeration, and facial attractiveness. Psychological Science, 7(2):105{110, 1996. [12] A. Rosenfeld and J. L. Pfaltz. Sequential operations in digital picture processing. Journal of the Association for Computing Machinery, 13(4):471{494, October 1966. [13] A. Rosenfeld and J. L. Pfaltz. Distance functions on digital pictures. Pattern Recognition, 1:33{61, 1968. [14] J. Serra. Image Analysis and Mathematical Morphology. Academic Press, London, 1982. [15] D. Stoyan and I. S. Molchanov. Set-valued means of random particles. Journal of Mathematical Imaging and Vision, 7:111{121, 1997. [16] D. Stoyan and M. Stoyan. Fractals, Random Shapes and Point Fields, pages 108{116. Wiley, Chichester, 1995. [17] B. J. H. Verwer. Local Distances for Distance Transformations in Two and Three Dimensions. PhD thesis, Technical University of Delft, Delft, The Netherlands, 1991. Delft University Press. [18] O. Yu. Vorob'ev. Srednemernoje Modelirovanie (Mean-Measure Modelling). Nauka, Moscow, 1984. In Russian.
21