FPGA-based Pipeline Architecture to Transform Cartesian ... - CiteSeerX

4 downloads 428 Views 9MB Size Report
May 25, 2009 - the tracker by reducing the size of the images ... incorporate foveal vision to trackers but two issues arise as big ..... It is important to mention that.
FPGA-based Pipeline Architecture to Transform Cartesian Images into Foveal Images by Using a new Foveation Approach Jose Martinez, Leopoldo Altamirano National Institute ofAstrophysics, Optics and Electronics Computer Science Department Luis Enrique Erro # 1. Sta Ma. Tonantzintla Puebla, 72840, Me'xico {josemcr, robles}@inaoep. mx

Abstract In Vision systems the image processing represents a bottle neck because the big amount of information that should be analyzed. Working with variant spaces over the visual field has been widely proposed as a way to reduce such information. Foveal vision is one of these proposals by providing a way to transform the visual field obtained with conventional cameras into a sampling with high resolution at the center and decreasing over the periphery such as in mammal vision systems. In this paper, an FPGA based architecture to transform conventional images into foveal images is presented. The hardware algorithm has been taken from new proposal to foveate images. Strategies as parallelism and pipeline are exploited to obtain a high performance and thus, with both of them, reduction in the visual field and the transformation in real time of the digital images into foveal images, a vision system can accelerate its performance and reaches real time restrictions.

1. Introduction Foveal vision has emerged as an alternative approach in computer vision for several vision tasks. At the beginning inspired by the biological primates' vision system, foveal vision has been modeled through some mathematical formulations to allow changing from invariant space, implied in conventional digital images, to variant space based on a non uniform sub-sampling over the visual field with high resolution at the center and sparser along the periphery. This sampling allows a reduction of information which means a time cost reduction in the processing time, very useful to be used in vision systems with real time constrains. The Log Polar Transform (LPT) [1] is one of the most representative approaches to work with foveal space and several works have been based on to solve

many problems such as image registration [2], steganography [3], control and visual servoing [4] and mainly on object recognition in indoor scenes [5] and object tracking [6,7]. This last, has been paid of much attention because object tracking algorithms are complex and demand a high processing time. Thereby, foveal vision emerged as a solution to reduce the amount of information to be processed for the tracker by reducing the size of the images without losing significant information. The premise is based on the idea of fixate the area with high resolution over the object of interest while the area with decreasing resolution or periphery is used to keep details of the background. The advantage of the non uniform resolution in this case remains in the idea of assigning importance to the object in relation of their proximity to the center or foveal area (area with high resolution). The decreasing sub-sampling implied by the LPT helps to reduce the noise produce by object in the scene that are not important but allows preserving information of them at the same time. The previous ideas motivate to the challenge of incorporate foveal vision to trackers but two issues arise as big problems for researches. (1) The foveation process or well, the process of transforming conventional Cartesian images into foveal images still consumes processing time that affects the real time constraint. (2) The LPT produces non linear images which makes even more complex the task of location and recognition of the object of interest. Moreover, many algorithms designed for object tracking are based on the linear relation of pixels which is broken when the shape of the object is become in an abrupt or split shape in the foveal image [8], see Figure l.e. These two concern lead to the proposal of (1) hardware solutions for the foveation process based on the LPT such as hardware algorithms [9] or even the log polar camera [10]. (2) New foveation proposals to solve the problem of non linearity of the

1-4244-0690-0/06/$20.00 ©2006 IEEE. Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on May 25, 2009 at 08:23 from IEEE Xplore. Restrictions apply.

LPT such as the Exponential Cartesian Geometries (ECG) [11] and of course their corresponding hardware solution to save processing time in the foveation process. In fact, despite of the advantages of ECG, a big disadvantage is that ECG is a pyramidal solution that involves the generation of a set of images to make up the pyramid and to keep the real time constraint ECG should be implemented in hardware algorithms by either VLSI [12], MIMD architecture [13] or FPGA solutions [14]. Furthermore, the problem of the image pyramid besides the processing time is the change of resolution between the levels of the pyramid, which can introduce ambiguity to determine the position of the object of interest [8]. In this paper, the FPGA-based architecture for a foveation algorithm is presented. This foveation algorithm is part of a new proposal that we are using for an object tracking application. The proposal overcomes the problems of non-linearity by proposing a non-uniform sub-sampling that preserves the linearity and adjacency relation of pixels. To explain this work, the paper has been divided in the following sections. Section 2 describes the new foveation approach. Section 3 describes the FPGA architecture based on this new approach. Section 4 shows experiments related with the architecture and results and finally in section 5 conclusions are discussed.

2. New Foveal Cartesian Approach Figure 1 shows the basis of the Log Polar Transform and how a grid of points, divided by rings and sectors, models the retina. These points can be mapped to a log polar coordinate system. This mapping is used to transform images such as it is shown in Figure 1. As it can be seen, a point (xy) in the Cartesian plane is mapped to the log polar plane by calculating the angle respect with the center and its radius estimated by a logarithmical distance from the center as well. However, such modeling leads to a deformation of the image (Figure l.e) and although the size of the foveal image is reduced considerably, the non linearity makes very hard to apply

conventional algorithms such as correlation to detect and recognize the object of interest. Thinking in how to design a new foveation method, the solution was to attack the problem in inverse way. Thus, the foveal map is supposed as a square image where a square center corresponds to the foveal area with high resolution (green area in the Figure 2.a) and, the squares rings around it corresponds to the periphery rings (brown and red squares in Figure 2.a). This is similar to a compaction of the original model of Schwartz [1] with the difference that instead of circular rings and equispaced angle division for photoreceptors, the rings are square rings and photoreceptors vary on each square ring to allow the compaction. By this way, to locate each pixel of the foveal image into the Cartesian image, it is only necessary to expand the squares in incremental distance from the center, leaving the square area over the center of the image (Figure 2.b). In order to transform foveal coordinates into Cartesian coordinates, it is necessary to know what ring corresponds to what coordinate, and what the distance of this ring is in the Cartesian image. Here a proportion zAp(s) arises, where the 5 denotes the number of ring. Thus, the foveal coordinate, (once the center of the image fcx is subtracted) should be multiplied by this factor and added with the center (xo,yo) of the fovea in the Cartesian image. It is important to say that the factor zip is denoted by a vector because this factor is different with each ring because the proportion in which rings are located is not linear, it is supposed to be a logarithmical proportion. A similar procedure should be done to calculate the conversion of Cartesian coordinates to foveal coordinates, with the difference that coordinates should be subtracted from the center (xo0yo) and later divided by the factor eAp(4) and added with the center value fcx. Clearly in both cases, mapping and inverse mapping, if the coordinate belongs to the fovea then this does not suffer any transform.

Cartesian coordinates: (x,y)

l

p = log

(x - x0)2 + (y _ y0)2

7i arctan 1KY- o Foveal coordinates:

(p, 7)

(a) (b) (c) (d) (e) (f) Figure. 1. (a) Distribution of the photoreceptors in according to Log Polar Transform. (b) Exemplification of the log polar grid positioned over the Cartesian plane. (c) The log polar plane resulting of mapping from Cartesian to foveal coordinates, angular and log radial positions. (d) A Cartesian image where the log polar sampling is shown. (e) The log polar image obtained by making foveation with the Log Polar Transform. (f) Mapping from Cartesian coordinates to log polar coordinates

Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on May 25, 2009 at 08:23 from IEEE Xplore. Restrictions apply.

In table 1, both procedures: mapping (column 2) and inverse mapping (column 3) are defined as well as the way in which the number of ring 5 can be found in each case. An example of this transform is shown in Figure 2. Finally, the distance in which square rings should be located in the Cartesian image can be defined by the vector p(s) defined in table 1 (column 1), where 5 denotes the number of ring. This relation is taken from the model of Kruger in [1] but with slightly modifications. The distance in which each rings is positioned is defined by an exponential proportion whose base a is defined by the number of desired rings (Nr), the fovea radius (po) and the maximum distance (pmax) where the last ring should be positioned. The advantages of this transform are that in order to get a moving fovea it is only necessary to change (xo,yo) values. The computation of the transform by using the mapping or the inverse mapping is fast. In the case of using the inverse mapping as foveation method, a relation one-to-one between the foveal image and the Cartesian image is established. Thus, a look up table can be done to save the calculation of the transform. In the case of using the mapping

procedure, an averaging process should be done because the relation is many-to-one.

3. Hardware implementation Hardware algorithms have become a reality because now days there are many tools to make possible their development. The Very High Speed Integrated Circuit Hardware Description Language VHDL gives to the designer an easy way to program digital circuits. In addition, software tools like ISE or Xilinxs ToolBox for Matlab allow an easy design of digital circuits by using VHDL and a downloading of the design to FPGA boards for fast simulation and test. For this work, Xilinxs ToolBox for Matlab was used to design, co-simulate and test the hardware architecture of the foveation process described in the previous section and specifically, the implementation in hardware algorithm of the inverse mapping showed in Table 1, third column.

Table 1. Algorithms for the Foveal Cartesian Geometry

Factor a definition Nr: pmax:

Po

Mapping

)Mapping(x,y) 1. In Po) x x-xo a-e 2. y y-yo Number of Rings 3. p = max(lxl, y) I

In

Pmax

Maximum distance where the last ring should be positioned Fovea radius

4. 5.

6. 7.

ifp < po then x' = x +fcx

y =3y +fcy else

Ap($)= p

=

9.

X*

XI

3. 4. 5.

x'

=

1K... AN0I+fcyl

x'-fCx

y' = y'-fcx r = max(x'l, y') if r < po then x = xI+XO y = y+Yo

7.

else

10.

x y

x

+ fx

I

1.

2.

6.

Square Rings Distance Definition 8. plo loga PO = 1, ..., Nr p(

Suggest Documents