on-road vehicle detection though knowledge fusion. ..... being present at location x,, with size.s,, ..... Confidence-rated Predictionsâ, Machine Learning. vol. 37, no ...
An Integrated Framework of Vision-based Vehicle Detection with Knowledge Fusion Ying Zhu
Dorin Comaniciu Visvanathan Ramesh
Martin Pellkofer Thorsten Koehler Siemens VDO Automotive AG Siemens Corporate Research Regensburg, Germany Princeton, NJ, USA {ying. zhu; dorintomaniciu; visvanathanxamesh} @ scr.siemens ,com { martin.peIlkofer; thorsten.koehler} @ siemens.com
Abstract-Thii paper describes an integrated framework of on-road vehicle detection though knowledge fusion. In contrast to appearance-based detectors that make instant dedsions, the proposed detection framework fuss appearance, geometry and motion information over mukiple image frames. The knowledge of vehicldnon-vehicle appearance, scene geometry and vehicle motion is utilized through prior models obtained by learning, modeling and estimation algorithms. It is shown that knowledge fusion largely improves the robustness and reliability of the detection system.
I. ~NTRODUCTION With a decreasing cost of optical sensors and increasing computing power of microprocessors, vision-based systems have been widely accepted as an integral part of the feasible solutions to driver assistance. In this work, we address one of the centra! tasks in driver assistance, i.e. on-road vehicle detection in a monocular vision system. The ability of detecting other vehicles on the road is essential to sensing and interpreting driving environment, which enables important functions like adaptive cruise control and pre-crash warning. In our system, an on-board CCD camera is facing forward to capture image sequences of the road, and vision algorithms are developed to detect preceding vehicles driving in the same direction as the host car. Vehicle detection requires effective vision algorithms that can distinguish vehicles from complex road scenes with goad accuracy. A big challenge comes from the large variety of vehicle appearance as well as varying scenes in various driving environments. Vehicles vary in size, shape and appearance, which leads to considerable amount of variance in image appearance of the vehicle class. Illumination changes in outdoor environments introduce additional variation in vehicle appearance. Meanwhile, unpredictablc traffic situations create a wide range of non-stationary backgrounds with complex clutters. Moreover, high reliability and fast processing are required in driver assistance applications, which also increases the difficulty of this task. A. Previous work
Various vision algorithms have been proposed to solve the problem of vehicle detection [19]. Empirical knowledge about vehicle appearance, such as symmetry, horizontal and vertical occluding edges around vehicle boundary, has been extensively used by several methods [2], 131, [4], [13], [I71
0-7803-8967 -1/05/$20.00 a2005 IEEE.
to detect the rear-view appearance of vehicles. These methods are computationally efficient but lack robustness because the parameters (e.g. thresholds) involved in edge detection and hypothesis generation are sensitive to lighting conditions and the dynamic range in image acquisition. To achieve reIiable vehicle detection, several appearance-based methods have been proposed [I], [8], ill], 1141, [15], [18], where machine learning and gattem classification techniques are exploited to obtain elaborated classifiers that separate the vehicle class from other image patterns. Bayesian classifiers were used for classification in [ll], 1151, where mixtured Gaussian and histograms were used respectively to model the class distribution of vehicles and non-vehicles, In [SI, [9], neural network classifiers were trained on image features obtained from local orientation coding. Support vector machines (SVM) were trained on wavelet features in [l], 1141, [IS]. Besides vehicle appearance, another useful cue is image motion. Motion-based detectors include those discussed in [7],[12], [20]. E. Proposed approach
Many methods proposed in previous work use partial knowledge for vehicle detection. For example, appearancebased methods mainIy utilize the knowledge about vehicle and non-vehicle appearance, while motion-based detectors focus on the knowledge about relative vehicle motion. we believe that to make the detection system reliable, alI the available knowledge should be utilized in a principled way. In this work, we propose an integrated framework to fuse the knowledge about appearance, motion and geometry for reliable detection of vehicles in an image sequence. The prior knowledge of appearance is utilized through classifiers trained by the Adaboost algorithm [ l f j to distinguish vehicles and non-vehicles. The knowledge about scene and vehicle geometry is utilized through constraints imposed on the location and size of vehicle appearance on the image plane, given that camera parameters are available and vehicles are moving on a road plane. The knowledge of motion is utilized in associating vehicIe appearance over successive image frames. A Bayesian framework is developed to fuse the information of appearance, geometry and motion over time for reliable detection of consistent vehicte appearance in image sequences.
- 199 -
The rest of the paper is organized as follows. In section
Frame1
t+2
t +I
t
Frame 2
Frame m
Knowledge fusion
fusion Appearance trajectories
Fig. 1 ,
decision
11, we present a probabilistic formulation of vehicle detection in image sequences and introduce an integrated framework of knowledge fusion. In section 111, we describe our approaches to obtain the prior models of appearance, geometry and motion as well as to fuse different information online. System performance is reported in section IV and conclusions are drawn in section V.
Fig. 2.
11. INTEGRATED DETECTtON FRAMEWORK On-road vehicle detection is different than detecting vehicles in still images. In an on-board vision system, preceding vehicles appear in multiple image frames consistently. The information of vehicle appearance, vehicle motion as well as scene geometry can be exploited jointly to ensure robust and reliable detection. Appearance information provides strong discrimination for distinguishing vehicles from nonvehicles. Motion information has the ability of associating vehicle appearance over time. With ,temporal data association, detection becomes more robust against isolated errors made by appearance detectors. The knowledge about scene geometry induces strong constraints on where a vehicle on the road would appear on the image plane. Incorporating geometry information into detection can reduce certain errors such as detecting vehicles in the sky or on trees. We formulate the problem as detecting consistent vehicle appearance over multiple frames. Denote { 11,12, . . I , 1 as m consecutive image frames, (xk,s k ) as the vehicle location (xk)and size ( s k ) in the k-th frame, and Ik(xk,sk) as the image patch of size sk at location xk of the k-th frame ( k = 1, - . . ,m). Essentially, {(xl, SI),. . . , (xm,.sm)) defines a trajectory of vehicle appearance on the image plane (figure 1). Given the observation of m consecutive image frames ( I k } r = l and the knowledge of scene geometry, the likelihood of consistent appearance of an on-road vehicle on the image plane is expressed as Pm((.l, m
.
JJ
. ..
811,
1
(Xm I s m ) 111>
. . . ,Ll)
trajectory {(xl,S I ) , . . , (x,,, s,,,)} given the knowledge of scene geometry. The subscript g is used in the notation to indicate geometry information being exploited. The third term Pa(Ik(xk,s k ) E vehicle) defines the probability that the image patches I k ( x k , s i ; ) ( k = 1,". ,712) belong to the vehicle class, where the subscript IL in the notation indicates the use of appearance information. With the above probabilistic formulation, the proposed integrated framework of knowledge fusion is shown in figure 2. The prior models of appearance Pa,geometry py and motion p , are used to fuse and propagate information over time. To detect on-road vehicles in an image sequence, we start with the appearance and geometry models Po,p , to generate initial hypotheses of vehicle appearance. Using the motion model p,, we track the initial hypotheses over .successive frames. Consequently, the initial hypotheses evolve into hypotheses of vehicle appearance trajectories. After a number of image frames, the likelihood of consistent appearance being a vehicle is compared with a threshold to decide whether the appearance trajectory represents a vehicle or non-vehicle, Compared to various appearance-based detectors developed in previous work, strong geometry and motion constraints ate exploited in the proposed framework to improve the reliability of the overall detection system. Because the information about appearance, geometry and motion is accumulated over a short period of time before making a decision about vehicle existence, there is slight delay in the initial confirmation uf a vehicle trajectory. In practice, to avoid significant delay, we can choose to wait a small number of frames (e.g. < 10 frames) before confirming a vehicle trajectory. Once a vehicle is confirmed, the integrated framework consistently tracks it over time.
nT=l
p g ( ( x k ,sk)Iscene geometry)
(11
k=l
m
.
An integrated framework of knowledge fusion
P,(Ik(xk, s k ) E
vehicle)
11I. PRlOR MODELS In this section, we present the algorithms adopted in our work to acquire the prior models of appearance
k= 1
The first term p,((xl,sl);.. , (xm,sm)lI1,-.. , I m ) defines the likelihood of the appearance on the trajectory { (xl: sl),. . . ,( x m ,s m ) } being consistent. We use subscript m in the natation because this term incorporates motion information to determine temporal association of object app , ((xk,s k ) lscene geometry) pearance. The second term defines the likelihood of. an on-road vehicle appearing on the
2c
Pa,geometry
ps and motion p , in ( I ) . A. Appearunce prior
The prior knowledge 6f vehicle and non-vehicle appearance provides discriminant information for separating the vehicle class from the non-vehicle class. A machine learning algorithm, Adaboost [16], is adopted to learn appearance priors from vehicle and non-vehicle examples. The boosting
.
points in the 3D world p w are mapped to points on the 2D image plane pim.
i
i
Ptm
T
Pig. 3.
Vehicle find non-vchicle training samples
=T P ~ ,
(5)
= TzntTpTezt
The entire image formation process consists of perspective projection (T,) and transformation induced by internal and external camera parameters (TLTLt, TeZt).Assume a vehicle is on a flat road plane, vehicle size in the world sw is known and the internal and external camera parameters GIant, are available. Given the location of vehicle appearance on the image plane xinl, the size s i m of the vehicle appearance on the image plane can be easily determined as a function of xilrs and 0 = { s W 7 Oi,t, Oezt}.
eert
technique has been shown very effective in learning binary classifiers for object detection [21]. Denote an image sample by I and its class label by 1 (1 E {+13 -l}). The method finds a highly accurate classifier H ( T ) by combining many classifiers { h , ( I ) } with weak performance.
where hj(1) E {+1,-1} Given a set of labeled training samples { ( I i ,l i ) } , the Adaboost algorithm chooses { a j } by minimizing an exponential loss function cxp(--li Cj h j ( I % )which ) is determined by the classification error on the training set. Follow the approach adopted in [21], we use simple image features to define weak classifiers. Feature values are thresholded to produce weak hypotheses. The optimal thresholds are automatically determined by the boosting algorithm. An additional procedure of joint optimization on { a j ] is performed to further reduce the error rate of the final classifier. In our system, separate classifiers are used to classify cars and trucks from the nonvehicle class. Vehicle samples were collected from traffic videos captured in various driving scenarios. Non-vehicle samples were collected from image regions containing background clutters and extended through bootstrap. It has been shown in [6] that the posteriori probability can be derived from the classifier response f ( 1 ) .
ci
Sirri
=g(Xim,sw,
@zntl
Oezt)
(6)
In practice, the flat road assumption may be violated, vehicles vary in size and camera calibration may not be very accurate. To address such variance of the parameters, we use a probability model to characterize the geometry constraint. The conditional distribution of vehicle size s,, on the image plane given its location can be modeled by a normal distribution N ( . ;p , U ’ ) with mean fi = g ( x i m ,s, OIntrOeat)and variance U’ determined by the variance of the parameter set 8 = { s w , OiPltiOest} and the deviation of the road surface from the planar assumption.
Given the geometry constraint, the likelihood of a vehicle being present at location x,, with size.s,, on the image plane is given by
A uniform distribution is assumed for the prior probability of the vehicle location xtm. Consequently, the geometry model ps in ( I ) is formulated as m
Use tl and -1 as the class label of vehicles and non-vehicks respectively, the probability term Pa in ( I ) can be evaluated
as m
P, (lk (XI;, SC) E vehicle)
8.Geometry prior
Scene context plays an important role in improving the reliability of vehicle detection system. Strong constraints on where vehicles are likely to appear can he inferred from the knowledge of scene geometry. Through perspective projection,
Information about the road geometry can be used to refine the the distribution model of xim. As figure 4 shows, due to perspective projection, there is strong correlation between the vehicle position and the vehicle size in the image plane. Such prior knowledge about vehicle and scene geometry is represented through a probabilistic model. Geometry prior is very useful in constraining the search space of vehicle detectors, and ruling out fahe detections that violate the geometric constraint but are misclassified by vehicle detectors.
-201
-
be described k i t s covariance. The covariance of the unbiased estimate \a, U,, U& can be derived 1101 as follows. A
cov{[aU , z r uyl;. = L + ~ ( A ; A ~ ) - ~ Fig.4.
Geometry constraint on appearance size and location.
Given the vehicle location xk and size S L in the k-th frame as well as the observed image frames ( I k , I,+, 1, we can estimate the vehicle location and size in the ( k 1)-th frame through the following affine transform
+
C. Motion model
To derive the motion model pm in (I), we assume Markov property of vehicle dynamics in the image plane, i.e. given the s t ) at time t, future location and vehicle location and size (xt, size (xt+k, st+k) (k 2 1)are independent of past observations (11,. . . , &-I}. With this assumption, the motion model p, used in the fusion framework (1) can be written as PIn((Xl,Sl),
'. ' ,(xm,%",
n
..=I":
[zk+~,$/k+l,Sk+l]'
1
.. . 7 I m )
k=
n
Pm((Xk+l, Sktl
I
sk)+ Ik+1
Cov( [ Z k t l , Y k + l ,
Ik)
0
0
ck
'
[a,'&,Uy]k
-E{[%Ux,Uy];) = Ck . Cm{ [a,U,, u y ] ; } . c k
Sktl]'}
c;, (16)
1
m-1 0:
1 01
E{[~k+l,Yk+liSk+l]'}
m-1
= pm ((x1 81)Ir1)
01
h
(10) product term ~ ~ ( ( ~ k + l , s h + l ) l ( x k , s k ) , ~ ~ + ~ ,repreI~) sents the liketihood of an vehicle moving from location xk, size 8k in frame Zh to location xk+1. size s1;+1 in frame I k + l given that {Ik, I k + l } are observed. To solve the likelihood term pm((xk+17sk+l)l (xk, s k ) , I k + l , Ik), we extend the motion estimation algorithm in i.51 to estimate affine motion with translation U = [u5,uy] and scaling a. Under the brightness constancy assumption,
4 (XI = 4+1 ( a x + 4 x = [z,y]';
U = [%Uy]'
Given the unbiased estimate [a,U=,uY]k and its covariance C m {[U, uz,u y ] i } obtained by the motion estimation algorithm, the likelihood term p,.,.,(xk+~lxk,Ik,1 k - l ) (10) can be modeled as a multivariate normal distribution. h
Pm((Xk+lr -qk+l)!(xkj Sk),Ikk+lrlk)
k=l
(11)
the optical flow equation is generalized to
[V;m) '
'
= \ox&Jx)
X I .
+ .XI
@Jk(X)%
+ a,Ik(x)%J
- a,Ik(x)
(12)
where o x l k ( x ) = [&I(x),a,I(x)]' and &I(x) are spatial and temporal derivatives at image location x. An unbiased estimation of scaling and translation vector can be obtained by solving a least-square problem.
By far, we have discussed how to obtain the prior models of appearance, geometry and motion. Using these prior models, we perform knowledge fusion on the frame level. Initially, appearance and geometry models are used to generate hypotheses of vehicle appearance. From (4) and (9), the likelihood of a vehicle appearance, i.e. length-1 trajectory, is given by 1 1 K p , ( ( x l , s1)lscene
geometry) . Pa(I1(xl,
E vehicle) (19) We prune the initial hypotheses and keep trajectories of high likelihoods. Hypotheses are updated sequentially over time using appearance, geometry and motion information (4)(9) ( 17).
[~,u,,uyl; = E{[u,uz,~,l;slxk,h + l , I k I l =(A;A~)-~A;B~
lk+1 Cc
where N is the number of pixels in the local image region used for estimation. The uncertainty in the parameter estimate can
"
$1)
l k . p m ( ( x k + l , Sk+l)l(Xk,Sk)rIk-tlrIk)
-P~((X~+I, sl;+l)b e n e geometry) .pa(Ik+l(xk+l,Sk+l) E vehicle)
202 -
(201
I Fig. 5. Classifier performance. (a) Car classifier; (b) Truck classifier. (horizontal axis-number of features used, vertical axis-composite error rate as the average of the miss detection rate and the false alarm rate.)
i
I
Fig. 6. Examples of false detection eliminated by information fusion. (Eliminated false detection is shown by green boxes. Correct detection is shown by red boxes. Two numbers shown under a box are normalized detector response .in the current frame, and the index of the corresponding vehicle trajectory.)
where the trajectories are extended into the new image frame Ikfl. h + l r %+l)
= argmax(,,,)
PIIl((Xl S)I(Xk,
4,h + l , Ik)
*pg((x, s) [scene geometry)
aPa(lk+l(x,s) E vehicle)
'
(21) For computational efficiency, trajectories with low likelihood values can be terminated during the fusion process. After the information is accumulated over a number of frames, decisions are made by tbresholding the likelihood values. E vehicle E non-vehicle
1, > T otherwise
Iv. SYSTEM PERFORMANCE In our system, the camera is calibrated. We collected 950 examples of rear-view images of cars and 700 examples of rear-view images of trucks. Separate classifiers were trained to detect cars and trucks. Classifier performance on training data is shown in figure 5. With 150 simple features, the composite error rate (the average of miss detection rate and false alarm rate) of the car classifier is around and the composite error rate of the truck classifier is around The number shows that truck appearance is more difficult to
classify compared to cars due to different degrees of withinclass variance. The number of frames used in fusion was set below 10. During our testing, we have observed a large degree of performance improvement by fusing appearance, geometry and motion information. Figure 6 shows examples of the false detection eliminated by the fusion approach. Note some false alarms have a higher response from appearancebased detectors than some true vehicle instances. By incorporating geometry and motion information and fusing appearance information over time, we were able to eliminate sporadic false alarms in detection and misses in tracking generated by appearance-based detectors. Overall, the system performs sufficiently well under moderate lighting conditions, In 20 videos captured of different driving scenarios including inner. city and highway driving, without any false detection, the system correctly detected 89 out of 94 vehicles. The 5 misses include 4 car appearances under dark lighting (under bridges) and 1 truck appearance. Examples of detection results is shown in Figure 7. V. CONCLUSIONS
In this paper, we proposed an integrated framework for vision-based on-road vehicle detection. Reliability of visionbased systems is a critical factor in driver assistance and
safety applications. The main contribution of this work is a probabilistic fusion approach to build reliable vehicle detection systems. Knowledge fusion has been shown to be very effective i n improving system performance and reliability. By exploiting the knowledge of scene geometry and object motion, the false alarm rate of appearance-based vehicle detectors is considerably reduced. Furthermore, the fusion framework is naturally extended to vehicle tracking.
REFERENCES [ 1 S. Avidan, “Support Vector Tracking”, IEEE CO?$ Conipufer K&n and Pattern Recognition 200 I . 121 A. Bensrhair, M. Bertozzi, A . Broggi, P. Miche, S. Mousse1 and G. Toulminet, “A Cooperative Approach to Vision-based Vehicle Deteciion“, IEEE Conference on Intelligeiit Transportalion Systenis, pp. 109214. 2001. 131 M. Betke, E. Hsritaoglu and L. Davis, “Multiple Vehicle Detection and Tracking in Hard Real Time”, IEEE Symposium 011 InreUigem Vehicles, pp. 351-356,1996. 141 A. Brvggi and P. Cerri, “Multi-Resolution Vehicle Detection Using Artifical Vision”, IEEE lnrelligcnf Vehicle Symposirrm. pp. 310-3 14.
2004.
IS] D. Comaniciu, “Nonparametric information fusion for mution estima[6]
171 [8]
[91 101
I I]
121 131 141
IS] 161 171
181 1191
[ZO] [21]
Examples of vehicle detection results. (Stopped vehicles are also detected if not occluded.)
Fig. 7.
-204.
tion”, lEEE Int. Cunf Camputer Visiort and Purlern Recogtiition, pp. 59-66, 2 0 3 . J , Friedman. T. Hastie and R. Tibshirani. “Addition Logistic Regression: a Slatistical View of Boosting“, Ann. Statkt vol. 28, no. 2, 337C407, 2000. A. Giachetti, M. Campani. and V. Turre, “The Use of Optical Flow for Road Navigation”, IEEE Trorr. Robdiics ond Automation, vol. 14, no. I, pp. 34C48, 1998. C. Goerick, D. No11 and M. Werner, “Artiticial Neurat Networks in Rea1 Time Car Detection and Tracking Applications”. Pattern Recognitiun Letters, vol. 17. pp. 335-343, 1996. U. Handmann, T. Kalinke, C. Tzomakas. M. Werner and W. v. Seelen. “An Image Processing System for Driver Assistance”, Image and \/ision Computing. vol. 18, pp. 367-3%. T. Hastie, R. Tibshirani and J . Friedman, The Elernenfs of Sfatisfical Learning: Datu Mining. Inference and Prediction, Springer-VerIag, 2001. T. Kato. Y. Ninomiya and 1. Masaki. “Preceding Vehicle Recognition Based on Learning From Sample Images”, IEEE Trans. Intrlliqenf Trumporrution Sysfenrs, vol. 3, no. 4, pp. 252-260. Dec. 2002. W. Kruger, W. Enkelmann, and S . Rossle, “Real-time estimation and Tracking of Optical Flow Vectors for Obstacle Detection”, IEEE Intelligent Vehicle Symposium, pp. 304C309, 1995 N. Matthew, P. An, D.Chamley and C. Harris, ”Vehicle Detection and Recognition in Greyscale Imagery”. Confml Errgineeririfi Prrrctice, 1.0: 4, pp. 473-479, 1996. C. Papageorgiou and T. Poggio, “A Trainable System for Ohjcct Detection”. hrfernational Jouriral of Cunpirer Wsion. vol. 38, nu. 1. pp. 15-33, 2000. H. Schneidcrman and T. Kanade, “‘A Statistical Method for 3d Object Demcllon Applied to Faces and Cars”, lEEE Cor$ Campirrer Vi,r.run und Pattern Racognition, pp. 746-75 I , 2000. U. E. Schapire and Y. Singer. “Improved Boosting Algorithms Using Confidence-rated Predictions”, Machine Learning. vol. 37, no. 3, pp. 297-336, 1999. N.Srinivasa, “Vision-based Vehicle Detection and Trxking Method for Foraard Collision Waming” em IEEE Intelligent vehicle symposium. pp. 626C631, 2002. Z. Sun, R. Miller, G. Bcbis and D. DiMeo, “ A Real-time Precrash Vehicle Detection System”, IEEE Work.\hop on Applications r t j Compurer Ksion, pp. 171-176, 2002. Z. Sun, G. Bebis, and R. Miller. “On-road Vehicle Detection Using Optical Sensors: A Review”, IEEE Con$ Intelligent Trunsprfrifion Syrtems, pp. 585-590. 2004. D. Willeninn and W. Enkelmiirin, “Robust Obstacle Detection and / € E L Con$ Intelligent Trun&porturion Tracking by Motion AW~Y$IS”, SyStL‘IIIs. 1997. P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, IEEE Conf Cotnprrrrr Wsion and Paltern Recognition, 200 I.