Sebastian Tuermer, Franz Kurz, Peter Reinartz. German Aerospace Center (DLR). Institute of Remote Sensing Technology (IMF). Oberpfaffenhofen, Germany.
AIRBORNE TRAFFIC MONITORING SUPPORTED BY FAST CALCULATED DIGITAL SURFACE MODELS Sebastian Tuermer, Franz Kurz, Peter Reinartz
Uwe Stilla
German Aerospace Center (DLR) Institute of Remote Sensing Technology (IMF) Oberpfaffenhofen, Germany
Technische Universitaet Muenchen (TUM) Photogrammetry and Remote Sensing (PF) 80290 Munich, Germany
ABSTRACT Vehicle detection in dense urban areas is often complicated due to car-like objects on rooftops which result in false positive detections. This can be easily avoided by using a digital surface model (DSM) calculated from two consecutive images to exclude those regions. However, in the real-time case traffic information has to be gathered rapidly and the calculation of the DSM for the whole image takes a lot of time. The presented approach suggest a method where the disparity image is only calculated for areas of interest. These areas are selected by projecting the road segments from a road database in the original image using the collinearity equation. The local coordinates of the detected vehicles are then transformed back in the UTM coordinate system using the collinearity equation again. It can be shown that the search area for the detector is significantly reduced and which also leads to improved results of the detection. Index Terms— vehicle detection, real-time, aerial imagery, disparity image, road database 1. INTRODUCTION In recent years, traffic monitoring from remote sensors have gained more and more attraction [1]. Especially in the real time case where several valuable applications, like support of mass events [2] or disasters [3], are existing. A reasonable explanation for choosing remote sensors is the limitation of the standard sensors for traffic monitoring in these situations. Induction loops and stationary ground based video cameras just deliver data from local positions and could be destroyed in the case of hazards or earthquakes. Also the floating car principle does not give information on whether the road is destroyed or on how much free space remains on the road for emergency vehicles to go through. Instead, more precise information can be received by employing airborne optical camera systems. Furthermore, when the position of all cars and not only of moving cars [4] is aimed, the most important step, in the processing chain of traffic monitoring approaches from optical sensors, is the detection of the vehicles. Once this task
978-1-4673-1159-5/12/$31.00 ©2012 IEEE
is done analysis of parking spaces or tracking of vehicles [5] can be completed. Various methods can be used to extract cars from aerial imagery [6, 7, 8, 9], but problems with carlike objects such as dormers or certain trees are common. A widespread solution is the limitation of the search area using road databases [10, 11] if it is assumed that cars are located on roads. Unfortunately, the accuracy of most databases is often poor. Sometimes a reason is the lack of enough GPS satellites in areas with high buildings (especially for urban canyons in north-south direction). Also road databases for navigational purposes are intentionally inaccurate to determine the position more easily (e.g. in the case of multi-lane roads for the determination of a certain lane). Alternatively, a DSM can be used to support object detection from aerial imagery [12]. For instance by utilizing a previous generated DSM, which is sometimes derived from airborne laser scanning (ALS) data [13]. But the goal is to adapt it to the task of real-time car detection and therefore we just use the disparity image at the approximate position where roads are located. In this paper, we present an approach to quickly limit the search space for the detection of vehicles from aerial imagery. In order to select only the regions where cars are likely to appear, we suggest projecting the imprecise road segments from the database to the original image using the collinearity equation. In a further step the disparity image is calculated from these regions of two consecutive images. First results show that the detection quality can be significantly enhanced. 2. METHOD The proposed technique is designed for an aerial camera system with a global positioning system (GPS) and an inertial measurement unit (IMU) attached. Moreover, the algorithm to detect cars in aerial images has been explained in another publication [14] and will not be under further discussion in this paper. A detailed description of the generation of the DSM and an evaluation of its impact on car detection accuracy can be found in the following sections. At first, an overview of the strategy is shown in Fig. 1.
6837
IGARSS 2012
Fig. 2. Road axis from Navteq database are marked in red. The green rectangle shows the patch which is cut out from the original image due to a certain distance from the road axis. This area is used in the further processing step to calculate the disparity image.
Fig. 1. Workflow 2.1. Extraction of candidate regions The candidate regions are extracted from the original image, that means from the image without geocode. This is done by projecting axes of roads from a database to the image. A position in the image (x , y ) can be calculated with the collinearity equation (Eq. 1, 2):
x =
x0 ·
−c
r11 (X − X0 ) + r12 (Y − Y0 ) + r13 (Z − Z0 ) r31 (X − X0 ) + r32 (Y − Y0 ) + r33 (Z − Z0 )
(1)
y = y0 − c ·
r21 (X − X0 ) + r22 (Y − Y0 ) + r23 (Z − Z0 ) r31 (X − X0 ) + r32 (Y − Y0 ) + r33 (Z − Z0 )
images utilizing the semi global matching (SGM) algorithm [16]. The basic steps of the stereo vision method have following properties [17]: 2.2.1. Matching cost computation We use the Census transform to compute the similarity value of two matched pixels. It is based on small windows and is considered robust in the case of discontinuities [18]. The matching cost is computed using the Hamming distance. 2.2.2. Aggregation of cost and disparity computation
(2)
where x0 , y0 , c are the interior orientation and X0 , Y0 , Z0 , rij are the parameters of the exterior orientation. The road segment coordinates (X, Y ) are received from the road database and an approximate Z can be extracted from a global DEM (e.g. Shuttle Radar Topography Mission (SRTM)). Then a certain area around the axis is cut out. The width and length of this area is related to the road category stored in the database. An example is shown in Fig. 2, where the axes of the roads projected to the image are marked in red, while the area that is cut out from a certain axis is marked with an green rectangle. 2.2. Calculation of the disparity image This step explains how the disparity images from the extracted road segments are calculated. Firstly, we improve the exterior orientation by a bundle adjustment. Therefore two consecutively imaged road segments are matched with the SURF operator [15]. Afterwards we calculate the disparity
Due to the global algorithm we optimize an energy function. Here the energy E(D) is defined as [16]: (C(p, Dp ) + P1 · T · [|Dp − Dq | = 1] E(D) = p
+
q∈Np
P2 · T · [|Dp − Dq | > 1])
q∈Np
(3) Where D is the disparity map, the first term aggregates pixel matching costs and the two next terms add penalties (P1 , P2 ) in the case of small or large disparity changes. 2.2.3. Refinement of disparities We receive subpixel accuracy by fitting a local parabola to the aggregated costs around the minimum. Additionally, to remove outliers, we match pixel of image one to pixel of image two and vice versa. The disparity is rejected if there is no consistency. Also small holes are filled using a morphological closing. In a final step, the ground area is extracted using a technique similar to the minimum error thresholding [19].
6838
3. RESULTS
5. CONCLUSION
The examples are based on aerial imagery with a resolution of 15 cm. They were taken with the 3K+ camera system [20] which has an imaging frequency of 3 Hz. The original image overlaid with the calculated ground area from the disparity image is shown in Fig. 3. This leads to following different results, where Fig. 4 (a) is the result with determination of ground area and Fig. 4 (b) is calculated without ground area.
Following achievements save calculation time and contribute to the real-time ability: • No need to calculate an ortho-image → all work is done using the original image • No need to calculate the disparity image for the whole scene → just roads from the database are used • No need to calculate the original DSM, the disparity image without geocode is enough → only local coordinates of detected vehicles are transformed to global UTM coordinates • No need to examine regions classified as not ground by the vehicle detector → runtime of the car detector is reduced 6. REFERENCES [1] S. Hinz, R. Bamler, and U. Stilla, “Editorial theme issue: Airborne und spaceborne traffic monitoring,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 61, no. 3-4, pp. 135–136, December 2006. [2] D. Rosenbaum, B. Charmette, F. Kurz, S. Suri, U. Thomas, and P. Reinartz, “Automatic traffic monitoring from an airborne wide angle camera system,” in 2008 Beijing XXI ISPRS Congress, 2008. [3] S. Sasa, Y. Matsuda, M. Nakadate, and K. Ishikawa, “Ongoing research on disaster monitoring uav at jaxa’s aviation program group,” in SICE Annual Conference, 2008, pp. 978–981.
(a) with previous extraction of ground area
(b) without extraction of ground area
Fig. 4. Comparison of two car detection results. Hypotheses are marked with a red cross. (Sample from Fig. 2 but rotated)
4. DISCUSSION On the one hand, the presented ground region in Fig. 3 shows that not all parts of the roads have been detected as ground (yellow A). This error might be due to the trees in the close surrounding which complicated finding a corresponding intensity value. On the other hand the extraction of the ground area near fac¸ades worked quite fine, as the place marked with an yellow B in Fig. 3 shows. Especially such fac¸ades with windows, that have the similar size of a car, pose difficulties to the car detector and could be successfully excluded.
6839
[4] Vladimir Reilly, Haroon Idrees, and Mubarak Shah, “Detection and tracking of large number of targets in wide area surveillance,” in ECCV, 2010. [5] I. Szottka and M. Butenuth, “An adaptive particle filter method for tracking multiple interacting targets,” in IAPR Conference on Machine Vision Applications, 2011, pp. 6–9. [6] A. Kembhavi, D. Harwood, and L.S. Davis, “Vehicle detection using partial least squares,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 33, no. 6, pp. 1250–1265, June 2011. [7] T. Mauthner, S. Kluckner, P.M. Roth, and H. Bischof, “Efficient object detection using orthogonal NMF descriptor hierarchies,” in Annual Symposium German Association for Pattern Recognition, 2010.
Fig. 3. Regions colored in turquoise show the extracted ground areas from the disparity image. [8] H. Grabner, T. T. Nguyen, B. Gruber, and H. Bischof, “On-line boosting-based car detection from aerial images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 63, no. 3, pp. 382 –396, 2008. [9] S. Hinz, “Detection of vehicles and vehicle queues in high resolution aerial images,” Photogrammetrie - Fernerkundung - Geoinformation (PFG), vol. 3/04, pp. 201– 213, 2004. [10] D. Lenhart, S. Hinz, J. Leitloff, and U. Stilla, “Automatic traffic monitoring based on aerial image sequences,” Pattern Recognition and Image Analysis, vol. 18, no. 3, pp. 400–405, 2008. [11] K. Kozempel and R. Reulke, “Fast vehicle detection and tracking in aerial image bursts,” in CMRT09, U. Stilla, F. Rottensteiner, and N. Paparoditis, Eds. 2009, vol. 38, pp. 175–180, ISPRS. [12] G. Pacher, S. Kluckner, and H. Bischof, “An improved car detection using street layer extraction,” in Computer Vision Winter Workshop, 2008. [13] X. Huang, L. Zhang, and W. Gong, “Information fusion of aerial images and lidar data in urban areas: vector-stacking, re-classification and post-processing approaches,” International Journal of Remote Sensing, vol. 32, no. 1, pp. 69–84, 2011.
[14] S. Tuermer, J. Leitloff, P. Reinartz, and U. Stilla, “Automatic vehicle detection in aerial image sequences of urban areas using 3d hog features,” in Int. Arch. of Photogrammetry, Remote Sensing and the Spatial Information Sciences, Paris, France, 2010, vol. XXXVIII. [15] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” Computer Vision and Image Understanding (CVIU), vol. 110, no. 3, pp. 346– 359, 2008. [16] H. Hirschmueller, “Stereo processing by semi-global matching and mutual information,” Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008. [17] P. d’Angelo and P. Reinartz, “Semiglobal matching results on the isprs stereo matching benchmark,” in ISPRS Hannover Workshop, Germany, 2011. [18] H. Hirschmueller and D. Scharstein, “Evaluation of stereo matching costs on image with radiometric differences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 9, pp. 1582–1599, 2009. [19] J. Kittler and J. Illingworth, “Minimum error thresholding,” Pattern Recognition, vol. 19, pp. 41–47, 1986. [20] F. Kurz, S. Tuermer, O. Meynberg, D. Rosenbaum, J. Leitloff, H. Runge, and P. Reinartz, “Low-cost optical camera systems for real time mapping applications,” Photogrammetrie, Fernerkundung, Geoinformation (PFG), vol. 2012, no. 2, pp. 159–176, 2012.
6840