Automatic camera calibration for driver assistance systems - CiteSeerX

10 downloads 106766 Views 94KB Size Report
PIPCA - Graduate School on Applied Computing. Av. Unisinos, 950. ... sors to equip a vehicle for obtaining information about the surrounding environment, since ..... the robustness of lane counting for adverse weather/painting conditions, and ...
Automatic camera calibration for driver assistance systems Anderson A. G. A. Ribeiro, Leandro Dihl and Cl´audio R. Jung UNISINOS - Universidade do Vale do Rio dos Sinos PIPCA - Graduate School on Applied Computing Av. Unisinos, 950. S˜ao Leopoldo, RS, Brasil, 93022-000 E-mails: [email protected], [email protected], [email protected]

Abstract- This article describes an algorithm for automatic calibration of embedded cameras used in driver assistance systems. The proposed method employs a lane detection algorithm to obtain road boundaries, and relies on the expected projection of the road in image coordinates to determine the parameters for the projective mapping. Such parameters are obtained automatically, and the only user-informed parameter is the lane width.

1. INTRODUCTION Calibration of cameras installed inside vehicles is essential to obtain information of the real world, such as distances and speeds, from images. This issue has received lots of efforts in last years due to the request of industry to increase vehicle safety and to develop a autonomous navigation systems. Optical cameras are among the most advantageous sensors to equip a vehicle for obtaining information about the surrounding environment, since they provide rich information, are software configured, and can be acquired at low cost. For some applications, such as lane departure warning systems [1], the analysis of image coordinates only can be sufficient. However, camera calibration is required to determine the distance and relative speed to other vehicles or obstacles in word coordinates. Several calibration methods are found in literature, using different approaches and techniques. Some authors use more then one camera to have 3D information of the word or allow the camera to move to explore projective geometry constraints [2]. Other techniques rely on a single fixed camera. Guiducci [3] uses information of the road itself, like road width, length and spacing between longitudinal discontinuous stripes, allied with camera parameters. The method described in [4] makes uses of geometrical structures (calibration pattern) with a motionless vehicle. This calibration also can be made in the production line, but it is very sensitive, any camera displacement or change in the vehicle, like tires change, jeopardizes the calibration and it must be remade. Other techniques include correction of geometrical lens distortion [5] increasing precision at cost of a complicated analysis. The algorithm described in this article is suited for one fixed camera, that should be installed under the rear mirror, aligned with the central axis of the vehicle. The procedure is done in real time over the moving vehicle. No camera parameters are needed, but the algorithm requires two external A, Ribeiro is also at Universidade Federal do Rio Grande do Sul Instituto de F´ısica, Caixa Postal 15051, 91501-970 - Porto Alegre, RS, Brazil

information: the road width and the vehicle speed (which can be obtained directly from the vehicle embedded electronics). As it will be explained in Section 4, the lane width accounts only for horizontal distances, and it does not interfere in the computation of vertical distances (which are more relevant in most obstacle detection applications). It should be noticed that the geometrical lens distortion is not considered in this work. In Section 2 we present a short review on the lane detection method employed in this work. In section 3 we explain the proposed method, providing the projection equations from image coordinates to world coordinates. Section 3 describes the method to detect the parameters needed to obtain the transformation, while section 4 provides an error analysis. Some experimental results are presented in section 5 and the last section is devoted to our conclusions.

2. LANE DETECTION In this work, We adopted the linear-parabolic model for lane boundaries described in [6]. The model consider a coordinate system matching image coordinates and a threshold xm that separates near and far fields. It is a combination of a linear function for the near field and a parabolic function for the far one, having continuous and smooth connections. The linear portion provides robust information on the local orientation of the vehicle, while the parabolic portion is flexible to follow curved portions of the road. The lane boundary model is given by:  k a + bk (x − xm ) , x > xm k f (x) = , 2 k k k a + b (x − xm ) + c (x − xm ) , x ≤ xm (1) where k ∈ {l, r} denotes which lane boundary (left or right). That indicates that each lane boundary is characterized by three coefficients ak , bk , ck , where bk represents the local orientation of the lane, and ck is related to its curvature in the far field. Once these parameters are determined for a given frame, image edges are computed in a neighborhood of detected lane boundaries in the subsequent frame. The parameters  ak , bk , ck for this subsequent frame are obtained by minimizing the following weighted squared error: m n h  i2 X  2 X Mnki ynk i − f k xkni + Mfkj yfkj − f k xkfj Ek = , i=1

j=1

(2) where xkni , ynk i , for i = 1, ..., m, represents the coordinates of the non-zero pixel of the thresholded edge image in the near field, and Mnki the respective magnitude, 

  for k = {l, r} . In the same way, xkfj , yfkj and Mfkj , for j = 1, ..., n, denote, respectively, those characteristics for the j edge pixel in the far field. More details on the lane detection algorithm can be found in [6]. In this work we are interested mostly in the linear part of the lane model, but we use the information of the parabolic part to know when the vehicle is in a straight portion of the road, and perform the calibration in this moment.

u (x2,−y2) x

R

(x3,y3)

L

y v

(x0,−y0)

(x1,y1)

W

3. THE PROPOSED MODEL The general problem of finding 3D world coordinates from an image demands stereo vision or the knowledge of intrinsic camera parameters. For driver assistance systems we can focus our attention on the road itself. Hence, we need the map from image to word coordinates only for points over the road, and not for all image points. Also, for flat roads the problem resumes to plane-to-plane mapping, achieved by the following Equation [7]: du+ev+f au+bv+c , y= , gu+hv+1 gu+hv+1

x=

(3)

where (u, v) are the coordinates of one point in the first plane (in our case, on the road), and (x, y) are the corresponding coordinates in the second plane (the image). To define a unique projection we need four image points and the four respective points in world coordinates. Let us consider that the camera is placed at the center of the vehicle and aligned with the vehicle’s central axis. Then, if the vehicle is moving in a straight portion of the road and parallel to the road central axis, the rectangular shape of the road is mapped into a trapezoid in image coordinates, as illustrated in Figure 1. In this case, simple algebraic manipulation show that b = c = f = h = 0, and the general projective mapping given by Equation 3 is simplified to: u=

x dx − ay ; v=− , a − gx e (a − gx)

(4)

where a=

y0 y3 − y1 y2 y0 + y1 L , d= , y2 + y3 R (y2 + y3 ) R

y0 + y1 y0 + y1 − (y2 + y3 ) e= , g= . W (y2 + y3 ) R

(5)

Image points (x0 , −y0 ), (x1 , y1 ), (x2 , −y2 ) and (x3 , y3 ) can be obtained directly using the linear portion of the lane boundary model, and W is the expected lane width (which must be informed). Parameter L can be trivially computed from image points, and next we describe an automatic procedure to obtain the remaining parameter R. 3.1. Detecting parameters As explained before, it is assumed that the vehicle is moving in a straight portion of the road. In fact, such hypothesis can be verified from the linear-parabolic lane model. The

Fig. 1. Projection of world coordinates (u, v) to image coordinates (x, y). parameter ck of Equation (1) is related with the road curvature, r and in straight portions of the road, it is expected that c + cl ≈ 0. As defined in [8], the current frame is related to a straight portion of the road if: r c + cl < Tc , (6)

where Tc = 0.1 is a threshold. Also, the relative lateral offset lo (t) of the vehicle at each frame t is computed, according to the procedure described in [9]. If the vehicle is moving parallel to the central road axis in a certain time period, then lo (t) ≈ constant. Hence, we consider that the vehicle presents such parallel movement during a set of frames t − M, ..., t if: max{lo } − min{lo } < Tp ,

(7)

where Tp = 0.1 is a threshold, and max{lo }, min{lo } are, respectively, the maximum and minimum values of lo (t) within frames t − M, ..., t. From all the frames of a given video sequence, we retrieve a subset of contiguous frames in which conditions (6) and (7) are both satisfied, to ensure that the vehicle is moving on an approximately straight portion of the road and parallel to it. In this subset of frames, we obtain parameter R by counting the number of discontinuous stripes along detected boundaries, as explained next. Let us assume that at least one of the boundaries have discontinuous lane markings, and the length of each stripe (and space between adjacent stripes) is constant in a given portion of the road. If ls denotes the length of a pair stripe + spacing (in world coordinates), then the distance from the vehicle to an image point consisting of n pairs stripe + spacing is given by R = nls . In theory, ls could be known a priori, since it should be defined in traffic legislation. However, this length may not the same for all kind of roads (highways, federal roads, state roads, etc), so we compute ls on-the-fly by counting the number of stripes in a given period of time. Let mr (t) and ml (t) denote the sum of edge magnitudes within rectangular regions placed at the bottom of the image, centered at right and left lane boundaries, as shown in Figure 2(a). If one of these boundaries present discontinuous markings (let us assume that it happens in the left boundarym without loss of generality), then the corresponding plots of ml (t) would

oscilate as the vehicle pass in painted or unpainted regions. An example of such behavior is shown in Figure 2(b). The distance between two adjacent local maxima corresponds to the duration of a pair stripe + spacing. Hence, we compute the average duration tm of a pair in time period, and then compute ls = V tm , where V is the speed of the vehicle. In general, 5 − 7 seconds at speed of 80 km/h are enough to have around 20 − 25 stripes plus spacings, for video sequences acquired at 15 frames per second (FPS). (a)

x

(b) 1.1

detected pairs by its length ls . With this, the coordinate system (u, v) (world coordinates) is assigned from the (x, y) coordinates by the parameters (5) . 4. ERROR ANALYSIS When dealing with a algorithm to determine some properties is indispensable to determine its performance, there is, the accuracy that the parameters are computed. In this section the algebraic expressions for the expected errors and its analysis is presented. As explained in the end of Section 3, image points (xi , yi ) can be directly obtained from the lane boundary model, and the parameter L can also be trivially computed from the image, so we focus our error analysis in the external parameters R and W . First of all, let us express the world coordinates in terms of these parameters. Equations (4) and (5) imply that: u=

sum of magnitudes

1.05

Rx y0 +y1 y2 +y3 L

1

W v=

0.95

(y0 + y1 ) minimum median maximum m (t)

0.9

l

0.85 0



20

40

60

80

100

frame

(c) 4



y0 +y1 −(y2 +y3 ) x (y2 +y3 )

y0 y3 −y1 y2 (y2 +y3 ) x





y0 +y1 y2 +y3 Ly

y0 +y1 −(y2 +y3 ) x (y2 +y3 )



, 

(8)



y0 +y1 y2 +y3 L

(9)

It is clear from Equations (8) and (9) that u depends linearly on R (and does not depend on W at all), while v depends linearly on W (and does not depend on R at all). Hence, imprecisions in W do not affect vertical distances, and imprecisions in R do not affect horizontal distances. Furthermore, it is trivial to show that:

projected lane magnitude

3.5

∆R ∆v ∆W ∆u = and = , u R v W

3

(10)

2.5 2 1.5 1 0.5 0 0

20

40 60 image x−axis

80

100

Fig. 2. (a) One frame of the video sequence. (b) Plot of ml (t). (c) Detected lane markings. The last step of calibration is to take one frame in the interval where the counting was made to relate this length with word coordinates. For that purpose, we compute the edge magnitudes in a neighborhood around the lane boundary, and project it horizontally as illustrated in Figure 2(a). The projected signal is oscilating (as shown in Figure 2(c)), and each plateau corresponds to a lane marking. As expected, the length of each plateau decreases as the distance from the vehicle increases (due to camera projection), but in general three pairs stripes+spacings can be detected. Finally, parameter R is obtained by multiplying the number of

meaning that the relative errors in u and v are equal to the relative measure errors in R and W , respectively. Possible sources of errors for R are speed imprecision, a missing stripe on the road or an error on the count of stripes. An error of 5% in the speed implies an error of 5% on u. One missing stripe along the time counting, over typically 20 − 25 generates an error < 5%. On the other hand W is informed by the user, and typical values are around 3 m. It is important to notice that, for several features in lane assistance systems (such as obstacle detection), it is more important to achive higher accuracy in vertical distances (e.g. to estimate distances from other vehicles), so that inaccuracies in the user-provided parameter W would not have a significant impact. 5. EXPERIMENTAL RESULTS In this section we show some examples of the calibration. All video sequences were captured with a resolution of 240 x 320 pixels and 15 FPS. For the video sequence illustrated in Figure 2, the range of analyzed frames was 430 − 520, totalizing 91 frames

or 6.07 seconds at speed of 80 km/h. This sequence has 18 pairs of stripes plus spacing, there is, each pair corresponds to 6.42 m. The reference frame 444, illustrated in Figure 2(a), was used to obtain automatically the value of R. Using the expected lane width W = 3, the obtained projection parameters are a = 17.0586, d = 1.7753, e = 84.2838, and g = 0.1973. The second example consider a range of 102 frames in a video sequence of 1694 frames. Figure 3(a) illustrates a key frame in this video sequence, along with the trapezoid detected for camera calibration. Figures 3(b) and 3(c) show, respectively, the plots of ml (t) and the count of three pairs strip + spacing. In this sequence each pair corresponds to 7.55 m, and obtained projection parameters are were a = 12.0848, d = 1.3572, e = 67.2656, and g = 0.1508. (a)

boundaries, assuming flat road condition. The proposed algorithm automatically detects if the vehicle is moving in a straight portion of the road and moving approximately parallel to the central road axis. The expected relation between the rectangular road and the projected trapezoid is used to devise simple expressions for the plane-to-plane projection, based on the lane width W (user-defined) and the distance R - in world coordinates - from the vehicle to a certain image point. Such distance R is automatically obtained, based on the counting of lane markings and the speed of the vehicle. Finally, a theoretical error analysis was presented. The algorithm is very simple and fast, and can be ran again on-the-fly if the camera is displaced while the vehicle is moving. Future work will concentrate on a thorough experimental validation of obtained parameters, improving the robustness of lane counting for adverse weather/painting conditions, and estimating parameter W automatically by exporing the vehicle dynamics during lane changes. Acknowledgements: The authors would like to thank Brazilian research agency CNPq for financial support. REFERENCES

(b) 1.2

[1] J. W. Lee, “A machine vision system for lane-departure detection,” Computer Vision and Image Understanding, vol. 86, no. 1, 2002.

sum of magnitudes

1.1 1 0.9 0.8 minimum median maximum m (t) l

0.7 0.6 0

20

40 frame

60

80

(c) 7

projected lane magnitude

[3] A. Guiducci, “Camera calibration for road applications,” Computer Vision and Image Understanding, vol. 79, no. 2, 2000. [4] M. Bellino, T. Merenda, and S. Kolski, “Calibration of an embedded camera for driver-assistant systems,” in Proceedings of IEEE International Conference on Intelligent Transportation Systems, 2005, pp. 354–359.

8

6 5

[5] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2000.

4 3

[6] C. R. Jung and C. R. Kelber, “Lane following and lane departure using a linear-parabolic model,” Image and Vision Computing, vol. 23, no. 13, 2005.

2 1 0 0

[2] L-L. Wang and W-H. Tsai, “Camera calibration by vanishing lines for 3-d computer vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 370–376, 1991.

20

40 60 image x−axis

80

100

Fig. 3. (a) One frame of the video sequence, and the trapezoid detected for camera calibration. (b) Plot of ml (t). (c) Detected lane markings.

6. CONCLUSIONS In this work we presented an automatic procedure for camera calibration using a linear-parabolic model for the lane

[7] E.R. Davies, Machine Vision: Theory, Algorithms, Practicalities, Third Edition, Morgan Kaufmann, 2005. [8] C. R. Jung and C. R. Kelber, “An improved linearparabolic model for lane following and curve detection,” in Proceedings of SIBGRAPI, Natal, RN, October 2005, pp. 131–138, IEEE Press. [9] C. R. Jung and C. R. Kelber, ,” in Proceedings of IEEE International Conference on Intelligent Transportation Systems, October 2005, pp. 348–353.

Suggest Documents