configuration. The rectified images can be thought of process is based on Look-Up Tables (LUTs) relating as captured by a new stereo system, obtained by.
LUT-based Image Rectification Module Implemented in FPGA Cristian Vancea, Sergiu Nedevschi Computer Science Department, Technical University of Cluj-Napoca, Gh. Baritiu 28, RO-400027, Cluj-Napoca, Romania Cristian. Vancea@cs. utcluj. ro, Sergiu.Nedevschi@cs. utcluj. ro
Abstract This paper presents a real-time hardware architecture able to perfomto simultaneously. image rectfcation and image distortion removal. The entire process is based on Look-Up Tables (LUTs) relating pixels from rectiFied image and original image with sub-pixel precision. For increased flexibility, we created a parameterized VHDL version of the design, which allows us to generate different hardware configurations, based on adjustable parameters: image resolution, number of bits to store the sub-pixel precision. Other advantages of the proposed solution that worth to be mentioned are scalability (can be replicated for any number of rectified images) and portability on many FPGA-based hardware platforms. We analyze the performance of different configurations on a VirtexE600 FPGA. As rectifying an image based on bilinear interpolation has a blurring effect, with negative consequences on 3D reconstruction, we increased the sub-pixel precision and studied the impact on 3D lane detection accuracy, processing time and resource usage inside the chip.
1. Introduction
30 fps. The images were processed through a series of
The goal of stereovision applications for Autonomous Cruise Control (ACC) systems is to compute properties of the 3D world using a pair of two or more images. Crowded urban traffic environments are more difficult to be handled if sufficient 3D information is not available. Enough information can be extracted from dense disparity/depth maps, however generating them, represents a computationally intensive task, because it involves finding for each pixel in one image, the corresponding pixel in the other image(s). Correct corresponding point is defined as the pixel representing the same physical point in the scene. We deal with a ID search along the epipolar line. If the images were obtained using a canonical stereovision system, the search is performed on the same row or
1-4244-1491-l/07/$25.00 ©32007 IEEE
column. Such a system requires image planes to be parallel, epipolar lines to be collinear and parallel to one of the image axes. Even if obtaining a canonical system is almost impossible, it is possible to apply a rectification process over the images, which will make them appear as if they were captured using a canonical configuration. The rectified images can be thought of as captured by a new stereo system, obtained by rotating the original cameras around their optical centers. Most algorithms for image rectification calculate the new rectifying Perspective Projection Matrix (PPM) used to project 3-D points onto the rectified image planes [1, 3, 4, 8]. The rectification matrix is obtained using the original PPM and the new PPM. The original PPM is computed using the intrinsic and extrinsic parameters, which can be calculated through a calibration process [5, 6]. Other methods exploit the fundamental matrix [2, 7] in order to extract the epipolar line geometric information. Several integrated stereovision systems are provided with different real-time hardware-based implementations. They perform image rectification in the early stages of the stereo reconstruction phase. The team from Carnegie Mellon University [9] developed a five camera stereovision machine capable to generate 200x200 dense depth maps, at a speed of
147
consecutive steps consisting in Laplacian of Gaussian (LOG) filtering, histogram equalization and image rectification. The design was built using an array of C40-DSP modules, high-speed ROMs, RAMs, pipeline registers, convolution units, digitizers and ALUs. The PARTS (Programmable And Reconfigurable Tool Set) engine consists in 16 Xilinx 4025 FPGAs, and 16 one-megabytes SRAMs. The FPGAs are connected in a partial toroidal topology, each being connected with two adjacent SRAMs. The interconnection strategy adopted in [10] allows a high communication capability between all elements distributed on board. This system achieves a performance of 42 fps for 320x240 disparity images. SRI International's Small Vision Module [11] was based entirely on a ADSP 2181 running at 33MHz.
With a theoretical performance of 8 fps on 160x 120 images, it achieved a practical performance of only 6 fps due to communication overhead with the host computer. Previous to disparity computation, images were transformed by applying image rectification and LOG filtering. TYZX developed a stereo depth computation module [12]. The design based on a highly parallel pipelined architecture was implemented in ASIC. The pair of two stereo cameras connects directly to the board, so the PCI Bus and memory were not burdened with image data. They developed three families of devices having a baseline of 5cm, 22cm, and 33cm, respectively. The images were rectified in order to obtain a canonical configuration. Recently, graphic processing units (GPUs), on the graphics boards, have become increasingly programmable. A multi-view stereo algorithm running completely on the GPU is presented in [14]. Several matching methods are compared, however images are rectified first. The proposed method was implemented in OpenGL. Tests on a ATI Radeon 9800 graphics card revealed a processing time of 45ms for a pair of 512x512 images. A miniature stereo-vision system was introduced in [13]. It was capable to generate high-resolution disparity maps using a trinocular system attached to the FPGA-based hardware module. Running at a frequency of 60MHz, it reached a speed up to 30 fps. Before disparity computation, captured images are rectified based on PPM, and filtered using LOG. In conclusion, many previous hardware-based stereovision machines have as purpose dense 3D reconstruction of the environment, which requires image rectification as a pre-processing step. Unfortunately most of them don't even mention any details about the rectification method used, neither specific description of the hardware design performing this step. In this paper we present a pipeline hardware architecture, capable to perform real-time image rectification. A previous work on this matter was presented in [15]. However, in present case, the rectification is based on Look-Up Tables (LUTs), therefore we are able to combine image rectification with image distortion removal, in one unique step, with insignificant costs on the processing time. The architecture is suitable only for rigid stereo-heads, which are calibrated only once. The LUTs are computed off-line, based on camera parameters, and loaded into the chip, at system startup. The hardware architecture is described in VHDL. The image resolution can be dynamically set using VHDL generics (configuration parameters), and the entire module is
148
technology independent though it can be ported on
different FPGA devices. In this context we are able to take advantage of the great speed technology that appears on market, processing time being strictly dependent on it. Tests performed on 640x512 images with an old class FPGA board (VirtexE family) revealed a small processing time, which makes our proposed solution suitable for real-time applications.
2. Image rectification and image distortion
removal
Considering de pinhole model, it is possible to compute for each pixel in the original image, its corresponding
position in the rectified image, if only rotation of the camera is considered. According to the mathematical model presented in [8] the relation between the two pixels position is: 1
ir = AC RI Ro A-' i =Ac [x'/z' y/z 11T (1)
Z where: . [x' y' z =RC (2) ir, io - position of the pixel in the rectified image and
image
orinl
original image;
Ac, AO
matrix of intrinsic parameters for canonical camera and original camera; RC, R, - rotation matrix for canonical camera and original camera. Radial distortion of a point p[x, y, 1]T (where x = x'/z', y = y /z. are normalized coordinates expressed in Camera Reference Frame) may be estimated using the following set of equations [16]: [ r -
xy(k
r2 +k 2
r
+...;
.
(3)
yrJ [y(k, r2 + k2 r4 +...)J where: r2 = x2 + y2 kl, k2, ... - radial distortion parameters (one or two parameters are enough to compensate such type of distortion). Another type of distortion appears when the lens curvature centers of the optical system are not collinear [17]. This distortion has a tangential component: fx'l 2p, x (r2 +2 x2) (4) p (r2+2 y2) +2 p2. x y L
r
-ay
.y+p,
where:
r2 = x2+ y2
PI, P2 - tangential distortion parameters.
Inserting these distortions into (1) we obtain the following eqution relating original pixel and rectified pixel position:
ir
=AC.
~XaXr +a'X
y+yr +yf
(5)
where:
1T. [ X' / Z',iZ, f Based on (5) we are able to generate the LUT
[X, y, 1]
=
relating pixels from rectified image and original image, with sub-pixel precision. The strategy is to parse the original image, and use the LUT to find the corresponding position of each pixel in the rectified image. As described in [15], the rectified pixel intensity
is computed using bilinear interpolation of the four
W>TSDbIN~ _".VU,
set, data transfers should take place on each clock cycle. Data Splitter is responsible for receiving LUT data containing the position in the original image for current pixel in the rectified image. This data encapsulates the integer part and fractional part of the horizontal (u) and vertical (v) image coordinates which means they have to be split first, in order to be sent to other modules, separately. It must be mentioned that the hardware architecture does not work with real numbers. Consequently the fractional part is converted to integer by left-shifting with a number of desired precision bits.
neighboring pixels in the original image.
,
3. Hardware implementation
UT-E
Image rectification can be divided in two major steps: 1. offline computation of LUT for each camera (performed once for a given calibrated stereo-
system);
2. online rectification based on LUT and bilinear interpolation. The second phase is time consuming therefore we propose a fast hardware-based solution, implemented in FPGA. The design requires the presence of only two DRAMs, one for the LUT and one for the image to be rectified. The input of the architecture will be the LUT and the original image, and the output will be the rectified image. However the LUT is loaded only once, when the system is started, while the images are loaded sequence by sequence. The hardware architecture can be divided in several major parts (Fig. 1): * Data Splitter; * Image Reader; * FIFO; * Join Module; Bilinear Interpolator. T The image to be rectified and the LUT are stored in two different memories, in order to avoid bottlenecks. The communication with these memories is handled through memory controllers. All above mentioned modules represent an independent pipeline structure therefore we used the one-way DFLOW (Data FLOW) communication protocol, which was introduced in [15]. Its architecture contains a DATA bus and two handshaking signals: RDY and WEN; when source has valid data, it outputs the data onto the DATA bus and enables the WEN signal; when destination iS ready to receive data it sets the RDY signal high; when both RDY and WEN are
149
DATA FROM
MEMORY
CONbT ROLLER
/
DATA FROM LUT MEMORY CONTROLLERZ
"_WEN
_.
IMAGE
MEMORY
'CONTROLER
I
D
F
U _Z
IMAG
ME-
EMPTY
N3y"
0
3
1jr
PY
IMAGE
,>.
PIX_DATA BILINEAR DATA NLGs INTERPOLATOROUV.YEY m--JAX E(MITyQ PlWf Figure 1. Image rectification module (for simplicity reasons we did not represent the clock signals attached to each module) Sometimes the number of bits needed to store the integer part and fractional part exceeds the number of V
bits in one word of memory. In such a case, each pixel
position is stored on two memory data words (Fig. 2).
We have developed two separate hardware designs
(Fig. 3 and 4) in order to handle each case separately.
Both solutions use a counter which registers the number of received data words in order to signal when the entire LUT was completely scanned. In case the input data is stored on two separate data words, an extra one bit register is needed, in order to decide whether current input data word is the first or the second. This register oscillates its output values between 0 and 1. An important observation is that the splitter de-asserts the ready signal on the inputa reset port once the LUT has been completely loaded, until commnand is received.
Image Reader receives the integer part of u and v coordinates inside the original image and outputs the four neighboring pixels found at positions (u, v), (u+1, v), (u, v+l), (u+l, v+1). As the original image is stored inside the memory, this module has to compute the corresponding memory addresses for the required pixels, and send them to the memory controller. In response, the memory controller replies with the required data which is further sent to the output, in
Considering data is stored inside the SDRAM in 32-bit words, it means four successive pixels are accessible by performing one read operation. As a consequence, the Image Reader module was built as an efficient cache (described in 15]) that can both provide pixels at a much higher rate, and can also significantly reduce the memory load.
pairs of four neighboring pixels.
FRAC_RDY FRN
RST
C>T= I I| UW~ ~ ~ ~ ~ ~ Lrac,\'lrec
>_
Memory data word
IN_Rn s
CLK
Unused bits~
fA
A
Unused bits
FINISHED
Fractional Part for u coordinate
EN
COUNTER
Cqmege_sze-1
Fractional Part for v coordinate
Figure 4. Data Splitter architecture in the case when one memory data word is stored per pixel.
Integer Part for
Integer Part for
u
coordinate
v coordinate
The FIFO module receives data from the input, when available, and sends it to the output, whenever requested, based on DFLOW protocol. There are cases when data propagated through separate paths needs to be synchronized under one unique flow. This is the case for the pipelined branches containing the neighboring pixels and fractional part of u and v coordinates (Fig. 1). At one point these flows have to be joined, in order to supply the input data for Bilinear Interpolator. As a consequence we have developed a Join Module (Fig. 5) containing two separate input ports and one output port. All input/output ports communicate using the DFLOW protocol.
(a)
Memory data word
Unused bits
FractionalPanFractiona PartvIntegerPartf Integer Part for coordinate fora coordinate
v coordinate
(b) Figure 2. LUT storage in memory: a) when using two memory data words; b) when using one memory data word.
9DY I
FRAC RDY
IN>= | T 1 o
RErv ~a ~ ~ ~
N-R DY__
_R
~
Ri1|iDA1A2~~~~~~~~~~~~~~~DT ;~~~~~~~~~~~IT_[) ~
~
~
~
UIZ-Vot~~~~~~~~~~~~~~~~WE2
~
~
~~
DRG
U
EG
CLK~~~~~~~~~~~~~~~~~~L
OUT-MY
~~~~~~~Figure5. Join Module architecture. INT_Viz@@2ElNBilinear Interpolator receives the four neighboring
5I $ I
Figure 3. Data Spitter architecture in the case when two memrydat wordsare stred pe pixel
pixels along with sub-pixel offsets (fractional values converted to integer) on u and v axes, interpolates the
150
values according to (6), and sends the resulting pixel intensity to the output: P = P1*(l- Vfrac) + P2*yV,,C = PI + (P2-P4)*Vfrac (6) where: P1=pix1*(l-Ufrac)+ piX2* Ufrac=PlXI+(PiX2-PiX1)*Ufrac (7)
First phase consists in loading the LUT from the software module, into one of the memories, through the PCI. We use Memory Controller units which are specifically designed to control these memories. They contain three input ports and one output port, which were modified to support communication using DFLOW protocol. * write address input port; * write data input port; * read address input port; * read data output port.
P2=PiX3*(l- Ufrac)+PiX4* Ufrac=PiX3+(PiX4- piX3)*Ufrac (8)
piX1, pix2, piX3, pix4 - neighboring pixels intensities. As (7) and (8) can be performed in parallel, the interpolation process was optimized to only a pair of consecutive multiplication stages. In addition, the number of multiplication units was reduced to 3. From a top-level view (Fig. 6) the time diagram of the rectification process encounters three major phases
Synch RST=1
(Fig. 7):
1. LUT load into memory; 2. original image load into memory; 3. rectified image computation. Once the first phase is completed, the process enters a continuous cycle containing phase two and phase
three._
PHS14El_RP5T0 RST=1 PRASE2 PHAE3ERST=
J
~~~~~~~~~~~~~~~~~~~~~~~~~~~PttASE3_FINISHEEN1
three.
CouTER
PHASE1 RST=1
PHASE2RST=1I
UP
PHASE3_RST=O
_rwh _ST 1-1 TERf)TOWz= = . _ t S z Fl!SISHED
I-. "DATAWWR
IN V%EN
N_l>TA i 1 '
%R_DTA IN_~~~~~~~~~m bw
MY
Vk
PHASE3 RST=1
61through a set of specific signals and buses, according to
-athe type of memory used (e.g. for SDRAM we may encounter RAS, CAS, bank address, cell address, data, -DOt^_X* N T0FROM, R mask, chip select). Addresses to Memory Controllers are generated by w =_1A1A-..Y.aCounter Dflow units (Fig. 8). These are counters which
M
_
CONM-OT>-LERO_OT t _WRADDQOR Yo RD ADDA_R _A
PHASE1RST=1 PHASE2RSThO
Raw communication with the memory is handled
tS_ADOR_RDY RO_VDDR_RDY M1NOW
A
RGDAT
W-0 k-,
L R~Awq-%ew
PHSE2 PHASE2 FINISHED=1
WR-DATkRoy R-D^TkRDY 6w
ASEI
Figure 7. Top-level Control Unit was deigned to handle 3 states.
||
VW
PAE
1
PO DATA =W U1EN Q_sS[
kw
Y
lincrement themselves when data is requested on DFLOW protocol. Once they reach a given value (size of the LUT or size of the image), they start counting again from 0.
IITOFO -IME
|||__|__________
l llll LVT-< sM*E A*DDR- pc SVh_
p;
lll|yDATA
MEIGPEXDAY
NXRDY PST
i Dt
PIDATJTA
GA=SEDTA
P_FINSHED
Synch_RST_
O I
FINISHED
- _.p~d~ _ST
1
,tPtiset-Foptt | 1 lThe TplvlControl Unit was designed to handle 1, 2 and 3, as shown in Fig. 7. As a consequence Figure 6. Toplevel view of the architecture (for the original image iS loaded during the second phase, simplicityreasonswedidnotPrepresentLthTclock while in the last phase image Retfication Module signals attached to each3module). (Fig. 1) is enabled, in order to generate the rectified CONTROLUDTl
LJ I I ~~~~~~Phase
151
image. This cycle containing phase 2 and 3 is repeated frame by frame, until a global reset is encountered.
thus we needed only one data word inside the LUT, corresponding to each pixel of the rectified image. As a
consequence, the LUT in the second case became two times smaller.
4. Experiments The proposed LUT-based hardware rectification architecture was tested using a Strathnuey board equipped with a Ballyderl DIME module containing a VirtexE FPGA (model V600EFG680) and 2x64MB SDRAMs. The LUT and the images were sent from PC through the PCI bus and the rectified images were read back into the PC. The LUT contents and rectification matrix were computed offline. The steps used were: 1. move LUT through PCI bus to one of the FPGA-board memories; 2. move image through PCI bus to the other FPGA-board memory; 3. process image into FPGA; 4. load rectified image back to PC through PCI and jump to step 2. The resulting images were tested against those obtained using a software implementation. For reason of space inside the VirtexE FPGA chip, we implemented hardware rectification only for images captured with the left camera. A comparison between results obtained with both solutions can be seen in Fig. 9. The pixels in the rectified image (Fig. 9.a), which had corresponding pixels outside the original image, were initialized to zero, therefore they appear as black borders on the margins of the rectified image. When using a hardware implementation we have the advantage of defining any number of precision bits when performing the bilinear interpolation. The more precision bits used, the better the results. On the other hand, the software implementation uses a fixed number of precision bits by shifting all rectification parameters to the left with 8 bits. In this way, the computations involved are performed using only integers, which can be easily implemented using MMX instruction set. The hardware solution works in parallel with the processor allowing it to perform other tasks. The software results were obtained using 8 bits for sub-pixel precision. For the hardware setup we used two different configurations. The width of one LUT data word is 32 bits, out of which 19 bits are needed to represent the integer part of u and v coordinates. Therefore only 13 bits remain to represent the fractional part. First configuration used 8 precision bits for both axes, making a total of 16 precision bits, which means we had to use two LUT data words for each pixel in the image (see Fig. 2). The second configuration used 7 precision bits for u axis and 6 precision bits for v axis, which makes a total of 13 bits,
152
(b
50 40 30
20
10
121 145 1690 193 217 241 V"tAy (c) Figure 9. a) Rectifled Image with hardware module; b) Difference between hardware and software 1
49 73
97
byrtfor ciarityreaspons;Cl)tensitoeswere multipiled
average difference Image on 40 consecutive frames.
As seen in Figures 10 and 11, point reconstruction and object detection gave almost similar results when performing software and hardware rectification. All tracked objects were estimated at the same distance, in front of the car, in all cases. However differences appear in the accuracy of lane detection. The borders of the detected lanes are re-projected at slightly different positions inside the rectified image, due to small differences in 3D coordinates of the reconstructed points. These are caused by different pixel intensities obtained after rectification, in all tested cases. Moreover, even if current lane was detected clearly, the neighboring lane appears only when using image rectification with increased sub-pixel precision. In the other case - decreased sub-pixel precision - the image is a little bit more blurring and some points situated in the far left side of the image are ignored in the lane detection phase, due to improper 3D reconstruction.
_r _~~~~~~~~~ 'Ii'§
_ _s~~~~_______
(a)(a
__
E
(b)
(b)
(c)
Figure 10. Lane detection and object detection using rectified Images with: a) software module; b) hardware module with 8 precision bits on both axes; c) hardware module with 7 precision bits on u axis and 6 precision bits on v axis (bright grids - current lane; dark grids - side lane; points inside boxes tracked objects points; boxes - delimit tracked objects).
153
(c)
Figure 11. Top-view of the reconstructed points using: a) software rectification; b) hardware rectifi[cation with 8 precision bits on both axes; c) hardware rectification with 7 precision bits on u axis and 6 precision bits on v axis (bright grids - current lane; dark grids - side lane; points Inside boxes tracked objects points; boxes - delimit tracked objects; points around lane borders - road points).
The size of the images accepted by our hardware design can vary by simply changing a few configuration parameters of the VHDL description. In our tests we used 640x5 12 images. We were able to run our architecture at 90MHz. We registered a total time of 11 .8ms (85fps) when using two LUT data words for each pixel (8 precision bits on both axes) and 10.6ms (94fps) when using one LUT data word for each pixel (7 precision bits on u axis and 6 precision bits on v axis). The time required to transfer an image to/from FPGA is between 2.5 and 3ms. On a Virtex4-LX160 FPGA, running at 500MHz, we estimate a processing time of 1.2ms, without considering the image transfer
Conference on Pattern Recognition., Vol. 1, pp. 11-16, Rome, Italy, 1988. [4] A. Fusiello, E. Trucco, A. Verri, "A compact algorithm for rectification of stereo pairs", Machine Vision and Applications, 12, No. 1, pp. 16-22, 2000.for Matlab, [5] J.Y. Bouguet, Vol. Camera Calibration Toolbox
Available: http://www.vision.caltech.edu/bouguetj/calib_doc
[6] S. Nedevschi, C. Vancea, T. Marita, T. Graf, "On-line calibration method for stereovision systems used in vehicle applications", IEEE Intelligent Transportation Systems Conference, pp. 957 - 962, Toronto, Canada, September, 2006. [7] M.efficient Pollefey , R. Korch, L. V. Gool. "A simple and rectification method for general motion", between .PGAand PC. Regarding the Conference on Computer Vision, pp. 496bnsidetweetheFhipG Rtaringa short reshourover foInternational inside chip, Tande 501, Corfu, Greece, 1999. Table 1 contains overviewgew for different configurations. [8] C. Vancea, S. Nedevschi. "Analysis of different image rectification approaches for binocular stereovision Table 1. Resource usage Inside VirtexE FPGA. systems". IEEE International Conference on Intelligent Slices FlipFlops LUTs BRAMs Computer Communication and Processing, Vol. 1, pp. resource usage
Total
j______ Total
6912 2 LUT words/pixel 1705 Percentage of tot1 24%
13824 13824
1 LUT word/pixel1 1670 Percentage of totalj 24%
2020 15%
r 13824 13824 1
2075 2459 Percentage 15% 1 17% j
22
oft.ta 133% T 2380 17%
135-142, Cluj-Napoca, Romania, September, 2006. [9] T. Kanade, A. Yoshida, K. Oda, H. Kano, M. Tanaka. "A
72 72 24%
15%
17
Stereo Machine for Video-rate Dense Depth Mapping
Its New Applications". International Conference on 33_and Computer Vision and Pattern Recognition, pp. 196-202,
San-Francisco, USA, June, 1996.
22 1
~~~[10] J. Woodfill,
33%
5. Conclusion In this paper we proposed a hardware solution for the rectificatlon. It was built based on LUTs, therefore we were able to combine image rectification with image distortion removal in one single warping process. For greater flexibility, the hardware architecture was described in VHDL, with configurable parameters such as: image resolution, number of precision bits to be used in computations. Tests on a VirtexE family FPGA proved real-time performance of the system. Regarding the accuracy, the results were very close to those obtained with an
problem of image
already existing software module.
6. References [1] A. Fusiello, E. Trucco, A. Verri, "Rectification with unconstrained stereo geometry", British Machine Vision Conference, pp. 400-409, 1997. [2] C. Loop, Z. Zhang, "Computing Rectifying Homographies for stereo vision", IEEE Conference on Computer Vision and Pattern Recognition, Fort Collins, CO Vol. 1, pp. 125-131, June 23-25, 1999. [31 N. Ayache, C. Hansen, "Rectification of Images for Binocular and Trinocular Stereovision", International
154
B. von Herzen. "Real-Time Stereo Vision on The PARTS Reconfigurable Computer". IEEE Symposium on FPGAs for Custom Computing Machines, pp. 201-210, ISBN 0-8186-8159-4, Napa Valley, CA, USA, April, 1997. [11] K. Konolige. "Small Vision Systems: Hardware and Implementation". 8"' International Symposium on Research. October, J. I. Woodfill, Gordon, Japan, G. Hayama, R. Buck, [12] Robotics "Tyzx1997. DeepSea High Speed Stereo Vision System", IEEE Computer
Society Workshop on Real Time 3D Sensors and Their Use, Conference on Computer Vision and Pattern Recognition, Washington D.C., June 2004. [13] Y. Jia, X. Zhang, M. Li, L. An, "A Miniature StereoVision Machine (MSVM-lI) for Dense Disparity Mapping", 17"h International Conference on Pattern Vol. pp. 728-731, August, 2004. [14] Recognition, R. Yang, M. 1,Pollefeys, "A Versatile Stereo Implementation on Commodity Graphics Hardware", Journal of Real-time Imaging, Vol. 11, pp. 7-18, 2005. [15] C. Vancea, S. Nedevschi, M. Negru, S. Mathe, "RealTime FPGA-based Image Rectification System", JIP International Conference on Computer Vision Theory andApplications, Vol. 1, pp. 93-100, Setubal, Portugal, February, 2006. [16] J. Heikilla, 0. Silven, "A Four-step Camera Calibration Procedure with Implicit Image Correction", Conference on Computer Vision and Pattern Recognition (CVPR), p. 1106, 1997. [17]C. Slama, Manual of Photogrammetry, American Society of Photogrammetry and Remote Sensing, Falls Church, Virginia, USA, 1980.