Sensor Geometry and Sampling Methods for Space ... - Springer Link

Pattern Analysis & Applications (2002)5:369–384 Ownership and Copyright  2002 Springer-Verlag London Limited

Sensor Geometry and Sampling Methods for Space-Variant Image Processing Cheyne Gaw Ho, Rupert C. D. Young and Chris R. Chatwin School of Engineering and Information Technology, University of Sussex, Brighton, East Sussex, UK Abstract: Space-variant imaging sensors have many advantages over conventional raster imaging sensors. They provide a large field of view for a given pixel count while maintaining a high resolution at the centre of the field of view and, in addition, produce a mapping that is scale and rotation invariant. The effectiveness of the sensor depends greatly upon the geometry used and the sampling methods employed. In this paper, we define a sensor geometry and introduce an ideal weighted sampling method, where the pixels in the image lying at the intersection of sensor cells, are subdivided into smaller sub-pixels, and an interpolation method using a variable width interpolation mask, whose size varies exponentially with the size and shape of the cells in the sensor array. We compare the computational requirements of these methods, and show that they are scale and rotation invariant, when the image is scaled or rotated about its centre, giving the sensor a functionality similar to that provided by the retinal mapping present in the mammalian retina. These results illustrate the advantages that can be obtained in real-time tracking applications in computer vision, where computational and memory requirements need to be kept to a minimum. Keywords: Artificial retina; Complex logarithmic mapping; Computer vision; Interpolation; Retinal mapping; Space-variant sampling

1. INTRODUCTION Space-variant imaging sensors have a wide visual field and high resolution at the same time, which offers a good compromise between large visual field, acceptable resolution and data reduction, whilst also using a minimum amount of memory and computational resources. The sensor array consists of concentric exponentially spaced rings of pixels, which increase in size from the centre to the edge. The complex logarithmic mapping is defined as w = (log r,␪), where r and ␪ are the polar co-ordinates of the original image. This produces a log-polar mapping, or logmap, which is scale, rotation and projection invariant about its centre or focus of expansion. Thus, these geometric transformations can be simplified by transforming them into the mapped space, where they become shifts in memory. Computational speed and available memory are of fundamental importance in real-time applications. Hardware realisation of a space-variant sensor has been attempted by several research groups [1–3] producing varied effects due to the difficulty of producing exponentially spaced arrays of Received: 29 August 2001 Received in revised form: 15 January 2002 Accepted: 15 January 2002

photodiodes. An easier, but more computationally expensive, method of implementing this sensor geometry, is to take an image from a conventional x-y raster imaging sensor and sample the pixels contained in the image at various points, corresponding to the location of each of the pixels in the sensor array. In this case, the sampling process and the interpolations required to produce a logmap become of fundamental importance for real-time applications. In this paper we propose several methods to quickly and efficiently produce the logmap of an x-y raster image whilst still retaining the most important data in the image. Data is sampled using a sensor geometry based upon a Weiman polar exponential grid [4–6]. We introduce an ideal weighted sampling method, where the pixels are divided into smaller sub-pixels, and an interpolated sampling method, using a variable width interpolation mask, whose size varies exponentially with the size and shape of the cells in the sensor array. We compare the computational requirements of these methods and also show that they are scale and rotation invariant, when the image is scaled or rotated about its centre. We then show the effect of varying our map parameters and their effect on the overall resolution. Finally, we introduce a blind spot into the centre of our mapping, and discuss several ways in which the resolution at the periphery

370

C. G. Ho et al.

can be increased, by sacrificing some of the data in the centre of the mapping.

Section 2.3. In addition, the mapping also provides some invariances, which are described below.

2. COMPLEX LOGARITHMIC MAPPING

2.2.1. Rotation Invariance. If an image in the z-plane is rotated by an angle ␣ about the origin, then z = rei(␪+␣)

2.1. Description

The complex logarithmic mapping is a conformal mapping from a circular retinal region into a rectangular region. Concentric circles in the z-plane are mapped into vertical lines in the w-plane and radial lines in the z-plane are mapped into horizontal lines in the w-plane (Fig. 1). The complex logarithmic mapping can be written as: w = log z

(1)

Hence, w = log r + i␪ + i␣ = u + iv + i␣

2.2.2. Scale Invariance. If an image in the z-plane is scaled by a factor ␤, then z = r␤ei␪

z = x + iy = re

(2)

w = logr + log␤ + i␪ = u + log␤ + iv

r = 冑x + y

␪ = arctan

2

冉冊 y x

(3) (4)

Hence, w = logr + +i␪ = u + iv

(5)

where u = log r and v = ␪

(6,7)

Thus an image in the z-plane with coordinates x and y is mapped to the w-plane with coordinates u and v. The image produced in the w-plane is generally referred to as a logpolar mapping or logmap. 2.2. Properties

There are many attractive properties of this mapping which are useful for image processing. For a given pixel count the mapping provides a wide field of view but with a central highly focused area. These properties are discussed in the

(10)

Hence,

where 2

(9)

This results in a vertical shift in the mapped image by the rotation angle.

where i␪

(8)

(11)

This results in a horizontal shift in the mapped image by the log of the scale change. 2.2.3. Projection Invariance. If an observer or camera translates towards a fixed point in the z-plane, then the mapped image will shift horizontally in the w-plane. The size and shape of this image will remain unchanged. This is similar to progressively scaling the z-plane image, but with a different change in the perspective ratio. These invariances are useful for object recognition, optic flow and template matching, since complex multiplications in the z-plane become additions and subtractions in the mapped space. However, these properties are only true if they are with respect to the origin or focus of expansion, since the mapping is not shift invariant. In the next section we analyse the geometry of the polar exponential sensor array, and show how it can be used to generate the logmap of an x-y raster image at different resolutions. Different sampling interpolations of the raster image will also be discussed. 2.3. Inverse Mapping

The inverse mapping of the log-polar mapping is simply a polar (r⬔␪) view of the Cartesian (u,v) co-ordinates of the logmap since z = ew. From Eq. (5), w = u + iv, hence z = eu+iv = eu [cos(v) + isin(v)] = x + iy

(12)

x = eu cos(v)

(13)

y = eu sin(v)

(14)

where

and

From Eq. (3), Fig. 1. Complex Logarithmic Mapping.

r = 冑x2 + y2

Geometry and Sampling Methods for Space-Variant Sensors

= 冑[eu cos(v)]2 + [eu sin(v)]2 = eu

(15)

Thus, u = logr. Similarly,

␪ = arctan

冉

eu sin(v) eu cos(v)

冊

=v

371

a conformal mapping where the angularity was proportional to cell size, produced a sensor array where the sensor cells were roughly square and increased exponentially in size with distance from the centre of the mapping. This sampling geometry is similar to one first proposed by Weiman [4–6]. 3.1. Definition of Sensor Parameters

(16)

The rows of the logmap represent the spokes of the inverse mapping and the columns of the logmap represent the rings of the inverse mapping. There are an equal number of pixels in each of the rings of the inverse mapping. These pixels increase in size, from the centre to the edge of the mapping, in concentric exponential circles. This results in a huge reduction in the number of pixels required, since the pixels of the original image are averaged together at the edges of the mapping, to produce a blurred peripheral view. However, they are not averaged together at the centre, resulting in a highly focused central area, retaining all the useful information in the image. This sampling structure is similar to the one found in the mammalian retina. The centre of the retina consists of a very high concentration of uniformly spaced photoreceptors. These increase in receptive field outside the fovea in concentric exponential circles, causing resolution to vary exponentially from the centre of the retina. Real-time space-variant tracking systems for remote vehicle teleoperation and autonomous robot navigation can be derived using this principle [1,7–12]. The sensor geometry provides high acuity vision at the centre of the field of view were the attention is directed, but allows some vision towards the periphery. In the mammalian eye, the periphery is more sensitive to movement, which serves to re-direct the direction of gaze to possible areas of interest. It may be possible to exploit the space-variant nature of human vision to provide reduced bandwidth video-imagery with little apparent degradation in visual quality. It has been pointed out by one of the referees that the mapping is particularly suitable for a videophone image, in which high resolution is required at the centre of the videophone screen where an individual’s face is positioned. However, the reduction of resolution at the periphery of the screen, where there is little useful information, would provide a valuable reduction in the bandwidth required to transmit the image. In all such systems, the memory and computational requirements need to be optimised to provide an acceptable performance with finite computational resources.

The sensor array produced consists of u concentric, exponentially spaced rings of n pixels. The indices of the logmap, which is a rectangular grid of square cells, corresponds to the spokes and rings in the sensor array. Thus, the polar (r⬔␪) co-ordinates in the sensor array will be mapped to the Cartesian (u,v) co-ordinates in the logmap, as shown in Fig. 2. We define the grain of the array, n, as the number of sensor cells in each of the rings of the array. Hence there are n spokes in the sensor array. These n spokes cover a range of 2␲ radians, and so the angular resolution of each sensor cell from the centre of the array is 2␲/n. This is known as the granularity of the array, g, and has a considerable effect on its resolution: granularity = g = 2␲/n

(17)

We define the gain of the array as the ratio of pixel sizes between any two successive ring of pixels in the sensor: gain = k =

r1 exp(u1 2␲/n) = exp(2␲/n) = exp g, umin = 0, u1 = 1 = rmin exp(umin 2␲/n)

(18) This represents the resolution of the array, which decreases from the centre to the periphery. At lower gains the sensor has a finer overall resolution than at higher gains. Thus, we can compute the position of each ring and spoke in the sensor array in terms of the granularity: r(u) = exp(u.g)

(19)

␪(v) = v.g

(20)

where u and v are the integers, varying from umin to umax and from 0 to (n − 1), which represent the indices of the logmap. We define the field width of the sensor, R, as the distance

3. POLAR-EXPONENTIAL SENSOR GEOMETRY For real-time applications, computational speed and available memory are of fundamental importance, since images can only be stored with a finite amount of pixels. Therefore, to achieve a wide visual field and a high resolution, the data in the input image needs to be sampled using a sensor geometry based upon the log-polar mapping. We found that

Fig. 2. Polar-exponential sensor geometry for a sensor with 12 spokes and 5 rings resulting in a logmap with 12 rows and 5 columns.

372

C. G. Ho et al.

from the centre to the edge of the mapping. This is the width of field that the sensor covers. If we assume that the fixation point is in the centre of the image and the field width is set to half the size of the image, then a circular visual field will be produced using most of the data in the original, square format, image. This will omit the data at the edges of the original image, however. If this information is required, then the field width, R, needs to be set to a value that is √2 times larger. However, this results in the inclusion in the mapping of the (zero padded) regions outside of the original boundaries of the square image and also complicates operations when the image is rotated about its centre. This extended mapping range is only justified when there is an absolute requirement that the information contained in the corners of the original image be included in the mapping. In many instances, the videophone application being a good one, this will not be the case, the area of interest being in the centre of the image with the information lost at the corners of the image likely to be of little importance. The smaller mapping area would thus be perfectly adequate under such circumstances. 3.2. Pixel Ranges in the Mapping

Equations (19) and (20) specify the locations of the sensor cells in the sensor array. These cells are roughly square and increase exponentially in size from the centre to the edge of the mapping. Thus different numbers of pixels in the xy image space map to each pixel in the log-polar space and so the range of each sensor cell needs to be computed. We define the range of a logmap pixel at array location (u,v) to be situated between r(u) and r(u + 1) in the radial direction and between ␪(v) and ␪(v + 1) in the angular direction, and centred at [rc(u) ⬔␪c(v)] from the centre of the mapping, as shown in Fig. 3. Thus, u(r) = floor[(1/g).logr]

(21)

spoke from the centre of the sensor array. Thus, the centre of a sensor cell at (u,v) will be located at [rc(u) ⬔␪c(v)] from the centre of the array, where rc (u) = [r(u) + r(u + 1)]/2 = [exp(u.g) + exp((u + 1).g)]/2 = exp(u.g).[(1 + exp(g))/2] = r(u).[(1 + k)/2]

␪c(v) = [␪(v) + ␪(v + 1)]/2 = [v.g + (v + 1).g]/2 = v.g + g/2 = ␪(v) + ␲/n

v(␪) =

再

floor[(1/g).␪]

␪ⱖ0

n + floor[(1/g).␪] ␪ ⬍ 0

(22)

where r and ␪ are the distance and angle of each ring and

Fig. 3. Pixel ranges in the inverse mapping.

(24)

Hence the centre of each sensor cell will be scaled by a factor of (1 + k)/2 radially and shifted by ␲/n angulary from their computed mathematical position. 3.3 Sensor Cells in the Mapping

The number of sensor cells in the mapping is determined by the grain of the sensor array and by the range of u values in the mapping, i.e., n.u, where u = umax − umin + 1. umin can be determined from Eq. (21), given the grain, n, of the sensor array: umin = floor((1/g).log(rmin)) rmin ⬎ 0

(25)

where rmin is the radius of the first ring from the centre of the mapping. Due to the singularity at the origin of the mapping, there is an infinite number of rings in the sensor array. Hence, rmin is usually set to 1 so that umin = 0. The last column of pixels in the logmap at umax, map between r(umax) and r(umax + 1) in the inverse mapping. Therefore, we can compute the field width of the sensor, R, using Eq. (19): R = r(umax + 1) = exp((umax + 1).g)

and

(23)

(26)

Hence, if the field width and the granularity of the sensor is known, umax can be computed: umax = floor[(1/g).log(R) − 1]

(27)

Thus the number of sensor cells in the mapping and the range of pixels which they cover can be determined by the field width of the sensor, R, the radius of the first ring from the centre of the mapping, rmin, and the grain, n. By varying the radius of the innermost ring, rmin, the information at the centre of the mapping can be disregarded and a blind spot can be introduced into the mapping. This blind spot can be filled with a high resolution array of uniformly spaced rectangular cells, as in the retina, or with an r-␪ mapping. We can also vary the size of the blind spot and effect the overall resolution of the mapping by introducing an offset, dr, into the mapping. Therefore, the radius of each of the rings will be set with respect to the offset from the centre of the mapping, rather than with respect to the centre of the mapping itself, as shown in Fig. 4. Hence, Size of Blind Spot = rmin + dr

(28)


Fig. 4. Sensor cell geometry with offset, dr.

rc(u) = r(u).[(1 + k)/2] + dr

(29)

R = exp((umax + 1).g) + dr

(30)

Therefore, umax = floor[(1/g).log(R − dr) − 1]

373

be reconstructed to form the inverse mapping from the data stored in the logmap using Eqs (21) and (22). Figure 5 shows the two basic mapping methods used to produce the logmap of an x-y raster image. Using weighted methods, the intensity of each image pixel is assigned directly to the proportion of the corresponding sensor cells that it covers. Each logmap pixel is then computed by dividing the total intensity by the total weight of the pixels within the range of its sensor cell. If the total percentage of each pixel falling within each sensor cell is computed, then the mapping looks like the one in Fig. 5(a). If each image pixel only contributes once to each sensor cell, i.e. a many-to-one mapping, then the mapping looks like the one in Fig. 5(b). This can be very quickly computed using look up tables, but is not fully scale and rotation invariant about the centre of the mapping. Interpolation methods, like the one shown in Fig. 5(c), attempt to produce a mapping which is fully scale and rotation invariant, by interpolating the intensities of the surrounding image pixels at each sensor cell location. 4.1. Weighted Methods

(31)

4. LOGMAP OF AN X-Y RASTER IMAGE

4.1.1. Many to One Pixel Remapping. For many-to-one mapping each image pixel only contributes once to each logmap pixel. This can be quickly achieved using a look up table, to compute the location of each image pixel in the mapped space, so that the pixels in the image are read in raster order and immediately assigned to the corresponding logmap pixel. The intensity of each logmap pixel is then equal to the average of the intensities of the image pixels falling within its range. At the periphery of the mapping, many image pixels contribute to one logmap pixel to produce an averaged peripheral sensor view. However, at the centre of the mapping, each image pixel contributes to several logmap pixels. If each image pixel is read only once, some logmap pixels located at the centre of the mapping will not have any intensities assigned to them. This results in the sensor geometry shown in Fig. 5(b) and gives rise to a very broken mapping which is not fully scale and rotation invariant, as shown in Fig. 6(c). Bederson et al. [12–14] avoided this problem by removing the central part of the mapping and varying his map parameters so that each sensor pixel only contributes towards one logmap pixel. Bederson et al. [15–17] state this is a more accurate model of the striate cortex and retina and is thus a more accurate reflection as to what actually happens in real mammalian visual systems. To produce an unbroken mapping the data at the centre of the mapping would either have to be mapped back into the mapping or removed by varying the values of rmin and dr. Figure 6(b) shows the results of remapping the missing pixels into the broken mapping (Fig. 6(c)).

The complex logarithmic mapping can be implemented from a conventional x-y raster image by assigning the greyscale intensities of each image pixel to the corresponding logmap pixels. Alternatively, the pixels in the image may be sampled at the coordinates of each logmap pixel. The image can then

4.1.2. Ideal Mapping. The many-to-one mapping does not compensate for pixels that fall between sensor locations, since each pixel is mapped to one logmap pixel only. A more ideal mapping can be produced by assigning the percentage of the intensity of each overlapping image pixel

3.4. Notation

Given the sensor geometry defined above, we can now define any such sensor as a combination of the following input parameters: 1. The grain of the sensor array, n. This represents the number of sensor cells in any ring about the centre of the mapping. 2. The field width of the sensor, R. This can be used together with the grain to calculate the value of umax. 3. The radius of the first ring of the mapping, rmin, from the offset, dr. This can be used to calculate umin, and hence the number of sensor cells in the mapping and the range of pixels which they cover. 4. The offset, dr, from the centre of the mapping. Hence the blind spot in the mapping will be of size rmin + dr. Therefore, we characterise all the following examples using this (n, R, rmin, dr) notation. The four numbers given in brackets within the captions to Figs 6, 10, 13 and 15–17 are the values of these mapping parameters used to generate the image to which they refer. In the next section we examine different sampling methods which can be used to interpolate the intensities of the pixels falling within the range covered by each of the pixels in the sensor array.

374

C. G. Ho et al.

Fig. 5. Ideal sensor geometry (a), sensor cell geometry for many to one mapping (b) and variable width interpolation mask (c).

Fig. 6. Many-to-one mapping of image (a), with (b) and without (c), remapping in the pixels at the centre of the mapping. Map parameters are (256,128,1,0).

to the logmap pixels that it covers. This was achieved by subdividing each pixel into smaller sub-pixels and assigning a proportion of the intensity of each pixel to the sensor cell that each sub-pixel falls within. This is shown in Fig. 7, where each overlapping image pixel is subdivided into 16 smaller sub-pixels, each with 1/16th of its intensity. This increases the computation time, which can be reduced with lookup tables, so that non-overlapping pixels are not unnecessarily subdivided. Figure 8 shows logmaps of Fig. 6(a) produced when the image pixels are expanded into one (a), four (b) and sixteen (c), sub-pixels, using map parameters (64,128,1,0). The many-to-one mapping produces a very coarse image, since the overlapping pixels contribute only to the nearest logmap pixel, and so not all the surrounding pixels to each logmap pixel are used. The coarseness of the image decreases as the number of sub-pixels increases, since more of the overlapping pixels are used to contribute to each logmap pixel.

With 16 sub-pixels, only a few of the overlapping image pixels do not contribute to each logmap pixel, as shown in Fig. 7. Increasing the number of sub-pixels beyond sixteen contributes to a more accurate percentage of each pixel, but increases the number of computations required. If the image is large and the mapping contains a large number of rings and spokes, each image pixel will contribute to several logmap pixels. It can then become more computationally efficient to sample the image pixels for every logmap pixel, instead of assigning each image pixel to several logmap pixels. 4.2. Interpolation Methods

The logmap can also be produced by computing the location of each logmap pixel, instead of each image pixel, in the inverse mapping and sampling the surrounding pixels in the image at each logmap pixel location. Several different sam-


375

Fig. 7. Ideal mapping.

Fig. 8. Comparison of weighted methods: (a) many-to-one mapping, (b) ideal mapping expanded to four sub-pixels, (c), expanded to 16 subpixels. Map parameters are (64,128,1,0).

pling methods were experimented with to optimise the mapping, without loss of data. Since the logmap pixel locations vary in exponential concentric circles, a nearest neighbour interpolation method worked well at the centre of the mapping but not as well further out, producing very choppy results, since data was lost which needed to be averaged together to produce a blurred peripheral view. Alternatively a bilinear interpolation of the four nearest pixels could be sufficient at the edges of the mapping, depending on the

grain, but would produce a very blurred result near to the centre, where essential data would be interpolated together. The interpolation was varied so that several pixels were interpolated together at the periphery of the mapping reducing to one pixel at the centre, so that no data was lost or was incorrectly interpolated together. This was achieved by applying a mask to the image at each location, to interpolate the intensities of the surrounding pixels. The size of the mask was varied exponentially with each

376

C. G. Ho et al.

ring of sensor cells to closely approximate the size and shape of each sensor cell in the inverse mapping (Fig. 9). In Section 3.2 we defined the range of each logmap pixel at array location (u,v) to be between r(u) and r(u + 1) in the radial direction and between ␪(v) and ␪(v + 1) in the angular direction. We therefore applied a square mask of size [r(u + 1) − r(u)] to the centre of each sensor cell, which was located at [rc(u)⬔␪c(v)] from the centre of the mapping. Therefore, Mask Width = X(u) = r(u + 1) − r(u)

(32)

This mask, K, varies exponentially in size with the distance from the centre of the mapping. It was filled with a rectangular array of ones to average the data, and produce a mask of resolution, h, and size X(u)/h. For an averaging mask of width, X(u) = 0.5 and resolution, h = 0.1:

K=

冤冥

1 1 1 1 1 1 1 1 1 1 1 25 1 1 1 1 1 1 1 1 1 1

I xc(u,v) + i.h

j=1

−

i=1

(33)

冊

X(u) + h X(u) + h , yc(u,v) + j.h − .K(i,j) 2 2

where xc(u,v) and yc(u,v) are the coordinates of the logmap pixel locations in the image: yc(u,v) = rc(u)sin␪c(v)

= exp(u.g).[exp(g) − 1]

1 1 1 1 1

冘冘冉

X(u)/h X(u)/h

L(u,v) =

xc(u,v) = rc(u) cos␪c(v)

= exp((u + 1).g) − exp(u.g) = r(u).[k − 1]

Thus the intensity of a logmap pixel at L(u,v) can be calculated from the original grey level intensity image I:

(34)

Figure 5(c) shows the sensor geometry resulting from this sampling method. As can be seen, the masks vary exponentially in size with distance from the centre of the mapping. The pixels are effectively divided into 1/h sub-pixels, depending on the resolution of the mask, h. Unlike the weighted methods, the mask does not depend on the location of each image pixel, and can be computed only with information of its size and position. If the image is large and the number of rings and spokes are very great, then interpolation methods can be less computationally expensive than the ideal mapping. As with the ideal mapping, if the mask is used to average the intensity of the pixels which it covers, then the image pixels which fall completely within the mask can be averaged without being divided into sub-pixels. However, unlike the ideal mapping, this does not need to be precomputed into look up tables, as it is determined only by the size and position of the mask.

Fig. 9. Variable width interpolation mask.


The actual range of pixels covered is greater than that shown depending upon the resolution of the mask. However, due to the Cartesian nature of the mask, it does not fully conform to the ideal sensor cell geometry shown in Fig. 5(a). This results in some oversampling of the data, which can cause problems when tracking moving objects within the logmap, when using a rectangular averaging mask. Figure 10 shows logmaps of Fig. 6(a) produced using nearest neighbour (a), bilinear (b) and variable width interpolation mask (h = 0.1) (c), interpolation methods. The nearest neighbour mapping does not sufficiently sample the data at the edges of the mapping, where the sensor cells are large, resulting in a very coarse logmap (Fig. 10(a)). This produces an inverse mapping (Fig. 10(d)), with insufficient data at the edges of the mapping. The bilinear interpolation does not sufficiently sample the data at the edges of the mapping, where there are many pixels per sensor cell. It also oversamples the data at the centre of the mapping, resulting in blurred data in the logmap (Fig. 10(b)), and in the centre of the inverse mapping (Fig. 10(e)). Increasing the grain of the array reduces the number of pixels which are averaged together at the edges of the mapping, but

377

increases the size of the logmap and the number of computations required. The variable width interpolation mask varies exponentially in size with the geometry of the sensor cell. This produces a smoother logmap (Fig. 10(c)), and inverse mapping (Fig. 10(f)), which is blurred at the edges, where several pixels are averaged together, and highly focused at the centre, where there is a one to one mapping. The singularity at the origin of the mapping occurs since log (0) = −⬁. This is a blind spot in the mapping, which varies with the map parameters, rmin and dr. Weiman suggested replacing this with an r-␪ mapping [6] or with an array of rectangular pixels [5–6,18]. Schwartz [12–14] avoided this problem by removing the central part of the mapping as noted in the previous section. Other research groups have developed their own custom sensors with logarithmic spiral geometries [1–3,8]. 4.3. Computational Requirements

The logmap was formed using the various methods described above, with all necessary variables precomputed in look up

Fig. 10. Comparison of interpolaton methods: (a) nearest neighbour, (b) bilinear interpolation, (c) variable width interpolation mask. (d), (e), (f) inverse mapping of (a), (b), (c). Map parameters are (64,128,1,0).

378

C. G. Ho et al.

tables, for images of different sizes with a grain, n, of 64. The average execution time and number of floating point operations were computed on a 200 MHz Pentium Pro, running Matlab 4.2c.1 under windows NT4.0., and are shown in Table 1. The nearest neighbour and bilinear interpolation methods produced the logmap very quickly. However, with bilinear interpolation, the data is insufficiently sampled at the edges of the mapping, and oversampled at the centre. Increasing the size of the image results in an increase in the computation time with the size of the logmap, and not with the size of the image. However, this results in very sparsely spaced sample points throughout the mapping, which cannot be sufficiently sampled using bilinear interpolation. As shown previously, the many-to-one and ideal mapping with four sub-pixels produces a coarse logmap, since not all of the overlapping pixels are used to contribute to each logmap pixel. Increasing the number of sub-pixels increases the computation time proportionally. This is reduced by precomputing in the lookup table the image pixels which do not overlap and do not need to be subdivided. By using lookup tables, the number of flops used in the ideal mapping with 16 sub-pixels, can be reduced from 16 times to between four and six times as many flops as the many to one mapping. The number of flops varies with the grain of the array and the image size. If the grain is very large with respect to the image size, then there are more overlapping pixels, and less pixels which do not need to be subdivided, resulting in a greater number of flops. The variable width interpolation mask with an averaging mask of resolution h = 0.25, is approximately equal to subdividing each pixel by 4. This produces a coarse mapping and so higher resolutions are required to produce a smooth logmap, without undersampling. Doubling the size of the image, with the grain kept constant, results in a doubling in the width of the mask and so results in a fourfold increase in the computation time required. This can be reduced by precomputiong those pixels which do not overlap the edges of the mask in a lookup table.

5. COMPARISON OF SAMPLING METHODS To compare the methods described above, a 256 × 256 image (Fig. 6(a)) was taken and transformed into the mapped space with parameters (64,128,1,0), using the many to one mapping, ideal mapping (16 sub-pixels) and variable width interpolation mask (h = 0.1). The resulting logmap was then two dimensionally correlated, using a phase only filter [19], with a mapped version of itself, which had been rotated and scaled prior to being transformed. This was repeated for several scale factors and rotation angles, until sufficient data had been collected for analysis. 5.1. Rotation Invariance

Equations (8) and (9) show that the rotation of an image about its focal point results in a vertical shift in the mapped image by the rotation angle. Since our mapping is periodic, the logmap also wraps around when the image is rotated. Thus if the image is rotated to the left, the rows of pixels in the logmap that are shifted vertically out of range at the top reappear at the bottom of the logmap. The test image was digitally rotated using a bicubic interpolation method. To aid with rotation, a circular visual field was chosen (R = 128) so that the data present at the edges of the image was ignored. The grain was set to 64 to enable better discrimination between the different sampling methods, a higher grain resulting in a higher overall resolution. The input image was digitally rotated to the left in increments equal to the granularity (2␲/64 radians or 5.625°), so that the logmap would shift vertically from bottom to top by one row of pixels each time the image was rotated. Thus, Rotation angle = (2␲/n).r = g.r

r = 0,1,2,. . .,(n−1) (36)

Figure 11 shows the correlation peak intensity at different rotation angles for many-to-one mapping, ideal mapping (16 sub-pixels), and variable width interpolation mask

Table 1. Average computation time and number of flops for different logmap formation methods Method

Weighted

Many-to-one Ideal (4 sub-pixels) Ideal (16 sub-pixels)

Nearest neighbour Bilinear Interpolation Variable (h = 0.25) Variable (h = 0.125) Variable (h = 0.1)

Time (s)

MFlops

Time (s)

MFlops

Time (s)

MFlops

64 × 64

(64,32,1,0)

128 × 128

(64,64,1,0)

256 × 256

(64,128,1,0)

0.05 0.06 0.16

0.033641 0.080545 0.320746

0.16 0.22 0.61

0.109182 0.217078 0.808863

0.71 0.82 2.20

0.405907 0.637627 1.964212

0.05 0.05 0.11 0.21 0.33

0.026880 0.067200 0.127828 0.408790 0.619705

0.17 0.22 0.33 0.87 1.27

0.032256 0.080640 0.408633 1.539060 2.396523

0.61 0.66 1.32 3.41 5.05

0.037632 0.094080 1.527164 6.050033 9.458084


Fig. 11. Correlation peak intensity at different rotation angles for many to one mapping (Series 1), ideal mapping (16 sub-pixels) (Series 2), and variable width interpolation mask (h = 0.1) (Series 3).

(h = 0.1). The many-to-one mapping showed the greatest ripple, since it produced the coarsest images. At 90°, 180°, 270° and 360° the correlation peak intensity was the same as the autocorrelation value, due to the Cartesian nature of the input image. At other angles, it was of lower height, decreasing with distance from the correlation position and reaching a minimum at angles corresponding to ␲/64 (2.8125°). Increasing the granularity decreases the distance offset for each pixel and improves the overall resolution. Other errors occurred from the rotation method and from sampling errors, previously described. The ideal and variable width interpolation methods produced identical results with a lower autocorrelation value, which was the same at 90°, 180°, 270° and 360°. In between the values were lower and varied with little ripple. Thus, an image could be determined to be rotation invariant if it fell within a certain threshold of the energy of the autocorrelation peak energy of the input image. This value changes with different input images for a given grain. 5.2. Scale and Projection Invariance

379

Fig. 12. Correlation Peak intensity at different rotation scale factors for many-to-one mapping (Series 1), ideal mapping (16 sub-pixels) (Series 2), and variable width interpolation mask (h = 0.1) (Series 3).

Figure 12 shows the correlation peak height at different scale factors using many to one mapping, ideal mapping (16 sub-pixels), and variable width interpolation mask (h = 0.1). The results produced were identical. The initial decrease from the autocorrelation value is probably due to the high sensitivity of the phase only filter to the small disruptions in the image produced by re-sampling – it must be remembered that the use of a phase only filter to correlate the images is equivalent to edge enhancing the images prior to matched filtering [19]. For scale factors between 1.1 and 1.48 the correlation peak height remains approximately constant before dropping off linearly. Since the image was digitally zoomed, less data was available in the centre of the mapping as it was scaled, and so the correlation values decrease linearly with scale factor from 1.48 onwards. 5.3. Primary Peak Location

Figure 13 shows the pixel coordinate of the primary peak in the correlation plane of the logmap (many to one mapping) of the image as it is scaled and rotated. Correlating the logmap with a logmap of the rotated scene produced a

Equations (10) and (11) show that scaling an image about its focal point results in a horizontal shift in the mapped image by the log of the scale change. The test image was digitally zoomed in increments equal to the gain of the array (Eq. (37)), so that the logmap would shift horizontally from left to right by one column of pixels with each increment. However, unlike the rotation invariance, this results in a loss of data. As the columns of pixels in the logmap are shifted from right to left, when the image is scaled, the data shifted out of the mapping does not wrap around to the left. Instead, this data is lost and the data at the centre of the mapping becomes more highly focused: Scale Factor = exp(s.g) = ks

s = 0,1,2,. . .

(37)

Furthermore, since the image was digitally zoomed there is less data available in the scene, since the data in the image has already been sampled. Other errors also occur from the interpolation methods used in the digital scaling algorithm.

Fig. 13. Effect of rotation and scaling on logmap of Fig. 8(a) with mapping parameters (64,128,1,0).

380

C. G. Ho et al.

Fig. 14. Inverse mapping of a 512 × 512 image (a) at different granularities, (b) n = 1024, (c) n = 512, (d) n = 256, (e) n = 128 (f) n = 64.

single correlation peak in the correlation plane. This shifted vertically for scale changes in proportion to the amount of scaling, and shifted horizontally for rotation, in proportion to the amount of rotation. Additionally, the correlation peak wraps around the edges of the correlation plane for rotation past 180°, due to the periodic nature of the mapping i.e. it is a circular correlation. As the image is zoomed, the location of the peak moves upwards by constant amounts. However, the peaks do not wrap around, since data is lost as the image was zoomed in, and the data at the centre becomes more focused. This results in a drop in the correlation peak height as the image is scaled. (However, some of this loss must also be due to sampling effects and the scaling algorithm used on the finite resolution image.) In summary, the results show that the ideal and variable width interpolation methods produce better results than the many to one mapping when the image is scaled or rotated about the centre of the image. Rotation of the input image results in a translation in the mapped image, and a corresponding displacement of the correlation peak in the correlation plane. Scaling of an image towards the focus of expansion results in an orthogonal movement in the mapped image and a corresponding translation of the correlation peak. Therefore, by observing the location of the correlation peak in the correlation plane, the amount of scaling or rotation of the input image can be determined.

However sampling errors still occur due to the Cartesian nature of the input image and in the methods used to scale and rotate the image. Higher values of grain will reduce the errors and lead to finer overall resolution, although at an increased computational cost. 5.4. Resolution

In this section we discuss the effect of varying the map parameters on the resolution, depth of field and field of view of the mapped image, using the variable width interpolation mask described in Section 4.2. We also investigate the effect of varying the size of the blind spot in the centre of the mapping and its effect on resolution. 5.4.1. Effect of Varying the Granularity. Figure 14 shows images reconstructed from logmaps of a 512 × 512 image at different granularities. As can be seen, the overall resolution is determined by the grain of the array. When n = 512, there is a one to one mapping at the periphery resulting in a sharp peripheral area. However, there is no extra information at the centre of the mapping since the logmap is the same size as the input image. At lower values of grain the resolution at the periphery becomes more blurred but there is still enough information available at the centre to recognise the image. The image at n = 64 (Fig. 14(f)) is 64 × 56 pixels in size, but retains enough


381

Fig. 15. Input image at (a) 512 × 512, (b) 256 × 256 and (c) 128 × 128 pixel resolution. The corresponding logmaps are shown in (d), (e) and (f). The corresponding inverse mappings are (g) with mapping parameters (128,256,1,0), (h) with mapping parameters (128,128,1,0) and (i) with mapping parameters (128,64,1,0). Grain is fixed at 128.

information to track the image and determine its orientation. This represents a compression ratio of 73.1:1, facilitating real-time image processing operations. 5.4.2. Effect of Different Resolution Inputs. Figure 15 shows the effect of varying the field width by mapping an image sampled at different resolutions using a grain of 128. These mappings represent different compression ratios ranging from 18.3:1 to 1.5:1. At the periphery of the mappings the required information is still retained since the data there is interpolated by the mapping to produce a blurred peripheral area. However, the size of input image ultimately determines the available resolution at the centre of the

mapping. For smaller input images the mapping is less focused at the centre, where there is less data present. Therefore the information in the scene can be more effectively retained if it is sampled to a higher resolution before being mapped. This also represents a greater compression ratio, but occurs at the expense of computation time and available memory. 5.4.3. Effect of the Blind Spot. Due to the geometry of the log-polar mapping, the image pixels at the centre of the mapping are sampled several times, and so most of the data stored in the logmap corresponds to this area. By varying the size of the blind spot, some of the data at the centre

382

C. G. Ho et al.

Fig. 16. Effect of varying rmin on a 256 × 256 image. Offset dr is set to zero. The mapping parameters are (a) (128,128,1,0), (b) (128,128,25,0) and (c) (220,128,25,0). The corresponding logmaps (d), (e), (f).

of the mapping can be removed, and the available storage space can be reassigned to the peripheral areas of the mapping. Equation (28) shows that the blind spot in the mapping is dependent upon the radius of the inner ring, rmin, and its offset from the centre of the mapping, dr. In this section we discuss the effects of varying these parameters on the size of the blind spot and their effect on the overall resolution. Effect of varying rmin: we can change the value of the inner ring of the sensor, rmin, so that the information at the centre of the array is not used. This reduces the overall size of the resulting image. Therefore, a larger grain can be used to produce an finer resolution image without any extra storage requirements. From Figs 16(d) and 16(e), it can be seen that changing the value of rmin from 1 to 25 changes the compression ratio from 5.2:1 to 15.5:1 at the expense of the data at the centre of the mapping. Figure 16(f) shows that the available storage space can be allocated instead to an array with a higher grain to achieve a better peripheral resolution, at the expense of the data at the centre of the mapping. This blind spot can also be filled with an array of rectangular pixels

or with an r-␪ mapping, to increase the resolution at the centre of the mapping. Effect of varying the offset, dr: the gain of the array determines the change in depth that occurs between successive rings of pixels, and has a greater effect at larger ring distances. Therefore, by increasing the offset, dr, so that the radius of each ring is set from the offset rather than from the centre of the mapping, the resolution at the periphery can be increased. However, this will increase the size of the blind spot and reduce some of the information available at the centre of the array. Figure 17 shows that, although the overall size of the logmap has remained constant, the resolution at the periphery has been increased at the expense of the data in the centre of the mapping. We can also increase the value of rmin in the mapping to remove the stretched parts of the logmap, where the data is difficult to recognise. Therefore, by adjusting the values of the inner radius and the offset, together with the grain and field width of the sensor array, we can increase the peripheral area and reduce the size of the image, at the expense of the data present at the centre of the mapping.


383

Fig. 17. Effect of varying offset dr on a 256 × 526 image. rmin is set to 1. The mapping parameters are: (a) (128,128,1,0), (b) (128,128,1,15) and (c) (128,128,1,25).

6. CONCLUSION In this paper, we have initially outlined the important features of the complex logarithmic mapping and the advantages of using log-polar sensors over conventional raster sensors for space-variant image processing applications. The effectiveness of our sensor model depends greatly upon the geometry used and the sampling methods employed to interpolate the pixel intensities together. We introduce an ideal weighted sampling method, where the pixels in the raster image lying at the intersection of sensor cells, are subdivided into smaller sub-pixels, to produce a very smooth mapping, which is scale and rotation invariant. Instead of computing the location of every image pixel in the logmap, interpolation methods are used to interpolate the surrounding image pixels at each sample point. Similar results to the ideal mapping can be obtained using a variable width interpolation mask, whose size varies exponentially with the size and shape of the cells in the sensor array. We compare the computational requirements of these methods and show that they are scale and rotation invariant, when the image is scaled or rotated about its centre. Due to the Cartesian nature of our mask, there exists a small margin of error, where pixels outside the range are inadvertently interpolated together, and where pixels inside the range are not used. However, these results very effectively illustrate the advantages that could be obtained in real-time tracking applications by the use of this sampling geometry, where computational and memory requirements need to be kept to a minimum.

We have also shown the benefit of varying the map parameters and the effect of introducing a blind spot into the centre of the mapping. This has the effect of increasing the resolution of the peripheral data at the expense of the data in the centre of the mapping.

References 1. Sandini G, Dario P. Active vision based on space-variant sensing. 5th International Symposium on Robotics Research 1989, pp 75–83 2. Etienne-Cummings R, Van der Spiegel J, Mueller P, Zhang MZ. A foveated silicon retina for two dimensional tracking. IEEE Transactions on Circuits – II Analogue and Digital Signal Processing 2000; 47(6): 504–517 3. Van der Spiegel J, Born I, Claeys C, Debusschere I, Sandini G, Dario P, Fantini F. A retina-like space variant CCD sensor. SPIE Charge-Coupled Devices and Solid State Optical Sensors 1242 1990; 133–140 4. Weiman CFR, Chaikin G. Logarithmic spiral grids for image processing and display. Computer Graphics and Image Processing 1979; 11: 197–226 5. Weiman CFR. 3-D sensing with polar exponential sensor arrays. Digital and Optical Shape Representation and Pattern Recognition: Proc. SPIE Conf. on Pattern Recognition and Signal Processing 938 1988, pp 78–87 6. Weiman CFR. Exponential sensor array geometry and simulation. Digital and Optical Shape Representation and Pattern Recognition: Proc. SPIE Conf. on Pattern Recognition and Signal Processing 938 1988, pp 129–137 7. Weiman CFR. Video compression via log-polar mapping. Real-

384

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

C. G. Ho et al. Time Image Processing II: SPIE Symposium on OE/Aerospace Sensing 1295 1990, pp 266–277 Tistarelli M, and Sandini G. On the advantages of polar and log-polar mapping for direct estimation of time-to-impact from Optical Flow. IEEE Transactions on Pattern Analysis and Machine Intelligence 1993; 15(4): 401–410 Schwartz EL, Greve D, Bonmasser G. Space-variant active vision: definition, overview and examples. Neural Networks 1995; 8(7/8): 1297–1308 Jain R, Bartlett SL, O’Brien N. Motion stereo using ego-motion complex logarithmic mapping. IEEE Transactions on Pattern Analysis and Machine Intelligence 1987; 9(3): 356–359 Frazier J, Nevatia R. Detecting moving objects from a moving platform. Proc. 1992 IEEE International Conference on Robotics and Automation 1992, pp 1627–1633 Bederson B, Wallace R, Schwartz EL. A Miniaturised spacevariant active vision system: Cortex-I. Machine Vision and Applications 1995; 8: 101–109 Wallace R, Ong P-W, Bederson B, Schwartz EL. Space variant image processing. International Journal of Computer Vision 1994; 13(1): 71–90 Yeshurun Y, Schwartz EL. Shape description with a space-variant sensor: algorithms for scan-path, fusion and convergence over multiple scans. IEEE Transactions on Pattern Analysis and Machine Intelligence 1989; 11: 1217–1222 Rojer AS, Schwartz EL. Design considerations for a space-variant visual sensor with complex-logarithmic geometry. 10th International Conference on Pattern Recognition 1990; 2: 278–285 Schwartz EL. Computational anatomy and functional architecture of striate cortex: a spatial mapping approach to perceptual coding. Vision Research 1980; 20: 645–669 Schwartz EL, Merker B, Wolfson E, Shaw A. Computational neuroscience: applications of computer graphics and image processing to 2D and 3D modelling of the functional architecture of the visual cortex. IEEE Computer Graphics & Applications 1988; 8(4): 13–23 Weiman CFR, Juday R. Tracking algorithms using log-polar mapped image co-ordinates. Intelligent Robots and Computer Vision VIII: SPIE Proceedings on Intelligent Robots and Computer Vision 1192 1998, pp 843–853 Horner JL, Gianino PD. Phase only matched filtering. Applied Optics 1984; 23: 812–816

gained wide experience in optical systems engineering and image/signal processing techniques. He participated in two European funded electro-optical projects involving pan-European collaboration between leading European Universities and Industry. The second of the projects was proposed and led by Glasgow University. In 1995 he was appointed a lecturer in the School of Engineering at the University of Sussex, a Senior Lecturer in 1998 and a Reader in 1999. There, he is continuing research into various aspects of optical pattern recognition, digital image processing and electro-optics system design, and applying this to a wide range of problems of industrial relevance. He has over 70 publications in peer reviewed academic journals and international conferences, many of them invited as papers to special issues, and has been invited as a keynote speaker to several conference sessions. He chairs sessions in the conference on Optical Pattern Recognition held each year by SPIE in Orlando, Florida. He is a member of the Society of PhotoOptical Instrumentation Engineers (SPIE), the Optical Society of America and the IEEE.

Chris Chatwin holds the Chair of Industrial Informatics and Manufacturing Systems (IIMS) at the University of Sussex, UK, where, inter alia, he is Director of the South East Advanced Technology Hub (SEATH), the IIMS Research Centre and the Laser and Photonic Systems Research Group. Before moving to Sussex, Professor Chatwin spent 15 years at the University of Glasgow, Engineering Faculty, Scotland, where as a reader he was head of the Laser and Optical Systems Engineering Centre and Industrial Informatics Research Group. He has published two research level books – one on numerical methods, the other on hybrid optical/digital computing – and more than 150 international papers which focus on optics, optical computing, signal processing, optical filtering, holography, laser materials processing, laser systems and power supply design, laser physics beam/target interactions, computational numerical methods, robotics, instrumentation, digital image processing, intelligent digital control systems and digital electronics.

Originality and Contribution

Cheyne Gaw Ho graduated from Sussex University with a first class Honours degree in Mechatronics in 1996, and obtained a DPhil in 2000 for his research on sampling methods in image processing. He is currently employed by Sensatech Ltd, and is working on the application of tomographic reconstruction techniques for capacitive sensor arrays applied to mine detection. His current interests are in capacitive sensor arrays and associated signal processing techniques for signal conditioning and reconstruction.

Computational speed and available memory are of fundamental importance in real-time vision applications. Hardware realisation of a space-variant sensor has been attempted by several research groups producing varied effects due to the difficulty of producing exponentially spaced arrays of photodiodes. An easier, but more computationally expensive method of implementing this sensor geometry, is to take an image from a conventional x-y raster imaging sensor and sample the pixels contained in the image at various points, corresponding to the location of each of the pixels in the sensor array. In this case the sampling process and the interpolations required to produce a logmap become of fundamental importance for real-time applications. In this paper we propose several methods to quickly and efficiently produce the logmap of an x-y raster image whilst still retaining the most important data in the image. Data is sampled using a sensor geometry based upon a Weiman polar exponential grid. We introduce an ideal weighted sampling method, where the pixels are divided into smaller sub-pixels, and an interpolated sampling method, using a variable width interpolation mask, whose size varies exponentially with the size and shape of the cells in the sensor array. We compare the computational requirements of these methods and also show that they are scale and rotation invariant, when the image is scaled or rotated about its centre. We then show the effect of varying our map parameters and their effect on the overall resolution. Finally, we introduce a blind spot into the centre of our mapping, and discuss several ways in which the resolution at the periphery can be increased, by sacrificing some of the data in the centre of the mapping.

Rupert Young obtained both his undergraduate and PhD degrees from Glasgow University Engineering Faculty. Until 1993 he was employed within the Laser and Optical Systems Engineering Research Centre at Glasgow, during which time he

Correspondence and offprint requests to: R. Young, School of Engineering and Information Technology, University of Sussex, Brighton, Sussex BN1 9QT, UK. Email: r.c.d.young얀sussex.co.uk

18.

19.

Sensor Geometry and Sampling Methods for Space ... - Springer Link

Sensor Geometry and Sampling Methods for Space ... - Springer Link

Suggest Documents

Spatial methods for plot-based sampling of wildlife ... - Springer Link

Spatial methods for plot-based sampling of wildlife ... - Springer Link

Hermitian Geometry and Complex Space-Time - Springer Link

Sampling and measurement methods for

Sampling the conformational space of membrane ... - Springer Link

Comparison of statistical sampling methods with ... - Springer Link

Investigating probabilistic sampling approaches for ... - Springer Link

Geometry Motivated Variational Segmentation for ... - Springer Link

Geometry of Masses - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link

Methods and Protocols - Springer Link