Real-Time, 3-D-Multi Object Position Estimation and Tracking

Real-time, 3-D-multi object position estimation and tracking J. Kaszubiak and M. Tornow and R.W. Kuhn and B. Michaelis Otto-von-Guericke University Magdeburg Faculty of Electrical Engineering and Information Technology Institute for Electronics, Signal Processing and Communication (IESK) PO box 4120, D-39106 Magdeburg, Germany [email protected] Abstract For autonomously acting robots and driver assistance systems powerful optical stereo sensor systems are required. Object positions and environmental conditions have to be acquired in real-time. In this paper a hardware-software co-design is applied, acting within the presented stereophotogrammetric system. For calculation of the depth map an optimized algorithm is implemented as a hierarchical parallel hardware solution. By adapting the image resolution to the distance, real-time processing is possible. The object clustering and the tracking is realized in a processor. The density distribution of the disparity in the depth map (disparity histogram) is used for object detection. A Kalman filter stabilizes the parameters of the results.

The detailed structure of the objects can be ignored, as only the mean positions of the objects need to be calculated. These positions have to be determined with a mean relative accuracy over a large measuring range (approximately 10-120m). This has to be achieved in a short period (one image capture time) regarding new objects too. These requirements cannot be met by using simple tracking algorithms without an exact starting position. Image pairs captured by two cameras are compared using area correlation (KKFMF [2]). The objects are partitioned out from the distance map by a cluster algorithm and tracked over several images (e.g. with a Kalman filter). Afterwards the results are presented to the operator in a convenient format. The system response time must be optimized for continuous real-time processing, to allow the driver or the higher ranking system to react in time.

2. Hardware-software co-design

1. Introduction Autonomous acting cars and robots require fast environment recognition systems. Especially real-time tasks such as driver assistance systems ([3], [4]), need to have short response times. This can be achieved by employing massive parallel hardware as well as fast reliable software. Our application is the online 3-D-object detection, position estimation and tracking of vehicles in a grabbed video stream. For this purpose a stereophotogrammetric camera system is mounted on a vehicle. For the real-time measurement an online calculated depth map is necessary. In our case the realtime requirement for rear view measurement is a continuous online calculation of a 25 Hz video stream. For the processing chain different algorithms with different demands are needed: • real-time calculation of the depth map • object detection and position estimation • tracking of relevant objects within the image sequence

The aim of using a hardware-software co-design is to guarantee short and deterministic calculation times, needed for real-time processing. Thus the algorithms have to be separated with respect to their functionality. Dataflow oriented algorithms with opportunities for parallel computing should be realized in programmable hardware. Control flow oriented algorithms with a sequential working scheme should be realized in a processor based software. In Fig. 1 the hardware-software co-design of the presented system is shown. The platform used is the Altera FPGA EPXA10 with approximately 40,000 logic cells and a hardcore ARM9 processor with 166 MHz clock frequency. The correlation search can be performed by a single hardware correlator when the algorithm is subdivided and a uniform processing method is chosen. Because of the constant data flow the hardware correlator including the chosen robust quality criterion squared KKFMF can be realized as a fully parallel and synchronous design in this hardware device (see

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

Hardware

Software Kalman filter

Edgedetection

Correlation

Subpixel interpolation

Clustering & 3D

Kalman filter

Figure 1. Hardware-software Co-design section 4). After correlation a subpixel interpolation is applied, to increase the accuracy in distance. For hardware implementation the subpixel resolution is limited to 1/8 Pixel. The most important advantage of the hardware implementation is the possibility of computation while imaging. Using the hardware at maximum performance shortest system response times for calculating the depth map can be reached The data is then transmitted to the ARM9 processor for further complex processing steps such as clustering (section 3) and Kalman filters [5] for tracking. Furthermore section 5 describes a solution of calculation the 3-D-points with simultaneous correction of systematic errors capable for embedded software. Because the captured imagery depicts rapidly changing scenes, namely fast moving vehicles in road traffic, the incoming data stream needs to be acquired as fast as possible. The hardware-software co-design described above will achieve this objective.

3. Software – Clustering and Tracking

over time. The depth map is the interface between the hardware and the software. A car causes an accumulation of similar disparity values in the depth map. These disparity values depends on the distance (depth) of the vehicle. Looking along the columns of the depth map gives one a depth histogram of one image. In this histogram cars appear as local maxima (Fig. 2b). While analyzing these histograms over time it is possible to track cars. A second map (accumulated histogram) similar to the depth histogram is generated to accomplish this task (Fig. 2c). The values in the accumulated histogram corresponds with the age of the maxima in Fig. 2b. If a value in the histogram of one image is above a certain threshold this belongs to a raised object. At these positions the age in the accumulated histogram is increased by one. The age is decreased if there is no raised object at this position. If the age reaches a certain threshold it is part of raised moving object. All regional connected parts are a single cluster and are labelled with an unique cluster number. For the center of gravity of the clusters the 3-D-coordinates are calculated. To save computations costs only these points are handed over to the Kalman filters. For each cluster one filter exist. The Kalman filter is used for the tracking and the estimation of the distance, the lateral off-set and the speed of the clusters. Through the low pass characteristic of a Kalman filter fast changes like jumps are softened and a better accuracy is achieved. The result of the clustering is shown in Fig. 2d. Each cluster has an unique number and a velocity relative to the camera vehicle. The result of analyzing the distance of a moving object is shown in Fig. 3.

Figure 3. Filtered distances and distance error for an example

Figure 2. Typical example for 3-D-analysis

4. Hardware – Depth map The aim of clustering is to detect raised moving objects such as cars in the depth map (Fig. 2a) and to track them

The measuring range in this application is of the order of some few meters to 120 meters. 3-D-points have to be cal-

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

culated in real-time in this range. According to the given timing-requirements a hardware-implementation [6] is the most capable solution for this task. Therefore a hardwareoptimized correlation method for 3-D-measurement [7] has to be developed, thus it can be achieved within 100s. The basic idea is that far away objects are very small (a few pixels) with correspondingly low disparity values, objects at close range are relatively large with high disparity values. The use of high resolution for close-range objects is unnecessary for position determination. Therefore we propose to reduce the image resolution for measuring the distance to objects at close range, so that a smaller disparity reduces the number of correlation values in comparison to that of a full search. The benefits of considerable formalization gained by introducing hierarchical layers are discussed below.

4.1. Methods of area correlation in stereo photogrammetry A system of two digital cameras arranged in the normal case of stereophotogrammetry serves for image capture. The cameras must be aligned very accurately with respect to one another or the images have to be rectified, so that the image lines are located sufficiently close to the epipolar lines. The fields of view of both cameras still overlap at significant distances. Thus, a very large measuring range, which depends to a great extent on the resolution of the cameras, is achieved for the distance Z. Selected areas of an image pair are compared to determine the 3-Dposition of an object with respect to the camera positions. The distance to the object can be characterized by the horizontal displacement (disparity) of the best fitting areas in the images. The disparity in the x-direction is called ∆u in this paper. Applying the theorem of intersecting lines in the normal case of stereo photogrammetry to the scenario, the 3-D-position X, Y, Z is expressed as Z=

f ·B ; ∆u

X =x·

B ; ∆u

Y =y·

B ∆u

(1)

Under the specified conditions only length of base B, focal length f , the x, y-coordinates of the measured object in the left image and disparity ∆u are needed for calculation purposes. For more details on calculating the 3-D-points in our case see section 5. A quality criterion Q for finding the corresponding image parts for area correlation is introduced. The unnormalized cross correlation function and other quality criteria are subject to additive and multiplicative interference. The normalized zero-mean cross correlation function ([2], KKFMF) is a good and robust solution to these critical problems.

The complex zero-mean normalized cross correlation function is used for further calculations, as it is most suitable for the application at hand. Because the hardware implementation of the square root function is very difficult, the squared zero-mean normalized cross correlation function is used instead. The edge detection is performed by using the denominator of the KKFMF, which is proportional to the variance, as a criterion for the relevance of this reference block.

4.2. Hierarchical position measurement Layers with dedicated distance ranges of different resolutions are introduced. They are generated by reducing the line resolution by a set factor (e.g. 1/2) for each subsequent layer (see. Fig. 4). Therefore two neighboring pixels are replaced by one using a simple low pass filter. layer pixel 0 1 2 0

i

imax line j

1 2

Figure 4. Generated layers

The search area for correlation is doubled for each higher layer by keeping the number of correlations for a block constant in all layers . To determine the disparity, epipolar lines are correlated with their counterparts in the image pair as described in section 4.1. In every layer rectangular blocks (16x1 pixels) from the reference image and search image are chosen and compared to each other. Non overlapping reference blocks are arranged regularly covering the whole image. The reference block is shifted pixel by pixel over the search region (here 32 pixels) for performing the search. While shifting the pixels in one line an affinity value is calculated for each resulting pair of patterns. By the combination of the correlation results of all layers the algorithm provides the full measuring range. The most benefit for hardware implementation is gained when the number of pixels shifted is the same as the size of the reference block. The correlation data from all layers are now assembled for evaluation purposes. The locations of the corresponding blocks are determined by taking the maxima above a given threshold. This relates to the disparities ∆u. Only maxima corresponding with object features (object edges in this case) are used. The disparities ∆u for the maxima of the quality criterion are then calculated with subpixel accuracy using quadratic interpolation; the 3-D-coordinates can be calculated with Eq. 1. Each layer in the hierarchical structure represents a definite measuring range. Fig. 2a shows a depth map (gray

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

Measurements have shown that the systematic errors have to corrected. Therefore we can assume that Objects at close range are scanned by lower resolutions and Objects furthest away (maximum distance for useful measurements) are captured with the initial resolution. Because images from close-range objects are analyzed by low resolution, the absolute error during 3-D-position calculation is increased. The resulting relative error within the detection range is nearly constant. Using a standard camera model [1] the systematic errors of the camera system can be compensated. The corrections ∆Z and ∆X can be calculated with ∆Z = d · Z 2 + e · Z + f and ∆X = g · Z + h · X + i (d, ..., iR). By combining of ∆Z and ∆X with Eq. 1 gives us Z=

l k +m + 2 du du

(k, l, m ∈ R)

(2)

With Eq. 2 only the coefficients k, l and m have to be acquired during a calibration process. There is no need for determining the base and focal length. The derivations for X and Y are similar. A typical example for the effect of calibration shows Fig. 5. The results for the Z-direction (coordinate of primary interest) with a calibrated system are shown in Fig. 6.

90 50

absolute error real dist.

130 4 90

40

50

20

2

30 10

10 10

50 90 measured dist.

0

0 50 90 130 measured dist.

absolute error [m]

5. Accuracy. errors and calibration

real dist. [m]

values indicate distance) generated by the process described above.

Figure 6. Results. Field tests (left: static measurement, right: taken in a moving vehicle)

of 1024x500 pixel. Moving objects are detected and tracked stable. A hierarchical method is applied to adapt the resolution adequate to the distance. The generated depth map is very robust, thus clustering with a histogram is possible. The proposed method is very well suited for real-time applications with low percentage errors, where variations in distance are very large. Furthermore, this method yields robust measurements for images containing common image disturbances. Trials have proved the specified attributes. In comparison to a software implementation, the realization as a hardware-software co-design is more suitable for real-time applications. Using hardware and embedded software gives a speed factor of 12 compared to a PentiumIII-800 implementation (without image capture time). Acknowledgement: This research project was supported by BMBF grant FKZ 03i1226a, BMBF/LSA grant 0028IF0000, AiF grant KF0452101KSS2 and EU grant 0046KE0000.

References

Figure 5. Errors before/after calibration

6. Conclusion and experimental results The proposed system was tested under laboratory conditions with a diminished base B as well as in real road scenarios. For real-time conditions the hardware is limited by a pixel clock of 28 MHz whereby the software is limited by a frame rate of 25 Hz and a camera ROI (Region of Interest)

[1] J. Albertz and W. Kreiling. Photogrammetric Guide. Herbert Wichmann Verlag GmbH, 4. edition, 1989. [2] P. F. Aschwanden. Experimenteller Vergleich von Korrelationskriterien in der Bildanalyse. PhD thesis, ETH Zrich, 1993. [3] M. Bertozzi, A. Broggi, A. Fascioli, and S. Nichele. Stereo vision-based vehicle detection, 2000. [4] C. Knoeppel, A. Schanz, and B. Michaelis. Robust Vehicle detection at large distance using low resolution cameras. In Proceedings of the International Conference on Intelligent Vehicles, pages 36–41. IEEE Industrial Electronics Society, 2000. [5] S. Lee and Y. Kay. A kalman filter approach for accurate 3d motion estimation from a sequence of stereo images. 10th International Conference on Pattern Recognition, pages 104– 108, 1990. [6] K. Saneyoshi, K. Hanawa, K. Kise, and Y. Sogawa. 3-d image recognition system for assist, proceedings of the intelligent vehicles 1993 symposium. Proceedings of the Intelligent Vehicles 1993 Symposium, pages 60–64, 1993. [7] M. Tornow, B. Michaelis, R. Kuhn, R. Calow, and R. Mecke. Hierarchical method for stereophotogrammetric multi-objektpostion measurement. Pattern Recognition, DAGM Symposium, pages 164–171, 2003.

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

Real-Time, 3-D-Multi Object Position Estimation and Tracking

Real-Time, 3-D-Multi Object Position Estimation and Tracking

Suggest Documents

Simultaneous Visual Object Recognition and Position Estimation ...

Realtime object extraction and tracking with an ... - Semantic Scholar

SCOPES: Smart Cameras Object Position Estimation System

SCOPES: Smart Cameras Object Position Estimation System

Realtime Experiments in Markov-Based Lane Position Estimation ...

Realtime Experiments in Markov-Based Lane Position Estimation

Realtime Experiments in Markov-Based Lane Position Estimation ...

Tracking and Reconstruction of Vehicles for Accurate Position Estimation

Tracking and Reconstruction of Vehicles for Accurate Position Estimation

Object Detection and Tracking

MORTAL: Multiple Objects Realtime Tracking And ...

Coupled Detection and Trajectory Estimation for Multi-Object Tracking

Multisensor dense estimation of 3D motion and object tracking based ...

Smart Cameras Object Position Estimation System - ANDES Lab

Realtime Organ Tracking for Endoscopic Augmented Reality ...

IRJET- Object Detection and Tracking

IRJET- Real Time Video Object Tracking using Motion Estimation

Robust Object Tracking with Fuzzy Shape Estimation - International ...

Object Tracking Benchmark - CiteSeerX

Object Tracking under Occlusion

Object Tracking in

Multiple Object Tracking

Object Tracking for

Realtime Omnidirectional Stereo for Obstacle Detection and Tracking ...