3d perception for new airbag generations

Copyright © 2002 IFAC 15th Triennial World Congress, Barcelona, Spain

3D PERCEPTION FOR NEW AIRBAG GENERATIONS S. Boveriea, M. Devyb, F. Lerasleb a

Siemens VDO Automotive SAS - B.P.1149 av. P. Ourliac 31036 TOULOUSE Cedex France. b Laboratoire L.A.A.S./C.N.R.S. - 7, av. du Colonel Roche31077 TOULOUSE France.

Abstract: Providing new generations of airbags with reliable information about the vehicle inner space occupancy in order to minimize inappropriate inflation is a real challenge. Within this paper 2 techniques to rebuilt the 3D cockpit scene, are presented; beside the well-known stereoscopic vision principles, 3D reconstruction based on matricial sensor combined with an infrared structured light emitter is described. From the 3D points acquired by these sensors a 3D description of the seat area is built, pertinent attributes are extracted, and using a previously learnt data base, the current seat situation is identified amongst the learnt situations (empty seat, baby seat in normal or rear position, occupant, … ). First promising results are depicted.

Keywords: Video sensing, airbag, classification, 3D reconstruction

1. INTRODUCTION These last years a special effort has been focused on the improvement of passive safety both by car manufacturers and suppliers all over the world. Frontal airbags for the driver and the passenger are now mounted in almost every new car. In brief lateral airbags will be massively installed. The most part of these airbag systems are operating in open loop conditions; that means, when an impact is detected the airbag is automatically inflated without any feedback from the seat occupant nature. This operating mode is not without problems. It has caused some dramatic situations: Babies installed on a baby seat, in rear position, thrown to the back of the vehicle when airbag inflates. Passengers in “out of position” situation who are injured by the airbag. In US, a 97th statistic shows that airbags saved about 1600 lives. It is also acknowledged that they killed 32 children and 20 small adults. New generations of airbags need to be more ”intelligent” to provide optimized reaction with

respect to the vehicle inner space situation and thus to minimize the injury risk. The improvement of this function is related with the introduction in the automobile application field of new sensor concepts. These “intelligent” sensors will be able to provide synthetic information whether to the decision algorithms or directly to the driver. It implies the adjunction to the “basic sensing devices” integrated up to now in the cockpit, of upper signal processing levels which allow to extract the pertinent information from a set of measurements, to fuse the information provided by several sensors, to classify them, . . . The objective of this project was to study and to develop perception means and techniques able to provide reliable information about: q The cockpit occupancy. q The occupant nature or the presence of a child seat. q The evaluation of the distance between the passenger and the dashboard;

q q

The evaluation of the occupant morphology, of its volumetric distribution in the cockpit, The determination of critical situation for the passenger including the out of position (legs on the dashboard, . . .), see figure 1.

Figure 1: Scenarii for occupant detection Then this information will be used by airbag Electronic Control Units (ECUs) in order to take the appropriate decisions (see figure 2). To deal with the complexity of the problem the introduction of 3D reconstruction of the scene is necessary. Two different solutions based on 3D camera principles have been explored. Beside the well known stereoscopic vision principles, 3D reconstruction based on matricial sensor combined with an infrared structured light emitter solutions is presented. From the depth images acquired by these sensors, a 3D description of the seat area is built; during a preliminary learning step, pertinent attributes have been selected and learnt for every possible seat situations. This attribute vector is extracted from the current depth image, and used to identify the current seat situation among the learnt ones (empty seat, baby seat in normal or rear position, occupant, … ). These two 3D vision methods will be presented and discussed in this paper, with respect to several criteria: density of the acquired depth images, acquisition speed, mechanical and safety constraints, The first promising results with these different technologies will be presented.

Video sensor + image processing

Classification Decision making Accelerometre

Figure 2 Global system architecture

Airbag

2. 3D VISION BASED ON STRUCTURED LIGHTING This approach developed by Siemens Automotive in collaboration with LAAS-CNRS and ONERADOTA is concerned with 3D active vision system based on a matricial sensor combined with an infrared structured light emitter (see Figure 3). This principle is currently used in other application fields as robotics, architecture, . . . (See: K. Boyer et al, 1987; E. Mouaddib et al, 1997; J. Rosenfeld et al 1986; G. Stockman et al, 1989). A light pattern is projected onto the scene, the 2D deformation of this pattern in the image plane due to the objects contained in the scene is analyzed. Then from a calibrated system triangulation techniques allow to rebuild re-build a 3D image of observed scene. These applications are usually developed in supervised environmental conditions (light control, . . .), with low real time constraints and with no light power restrictions. The latest step of this process is the extraction of specific characteristics from the 3D reconstruction and then the classification.

Figure 3: 3D vision based on structured lighting principles 2.1 Principles The approach is original in the sense that the system has to cope with specific automotive constraints and to present a good robustness with respect to an uncontrolled environment. These constraints have lead to the development of specific measurement devices and algorithmic developments 2.2 Optical devices and sensor description The efficiency of such a system depends on some system characteristics like the resolution of the sensor and light emitter, the gap between the light emitter and the sensor, the calibration accuracy, the radiance of the beams, its location in the cockpit... The gap between the light emitter and the sensor must be large enough to allow 2cm accuracy on 3D points located at 1m from the camera. A too large gap will lead to mounting problems and to occluded beams. A 6cm gap has issued good results. The light emitter is composed of a laser diode (830nM wavelength), a Damman diffraction grid that split the original beam into several beams and a

to deal with problems like environmental lightening (i.e. shadows projections or direct sunlight), passenger movement, surfaces with low reflexivity, specular reflections. The second step is related to dot labeling, which objective is to insure the matching analysis of the projected dots and their respective beams. The labeling process is based on the sequential application of several constraints: Epipolar constraint: for each spot i, after distortion correction, and each beam Df, the distance in the image plane between the spot and the projection of Df is calculated. Which distance exceeding a certain tolerance is discarded. Thanks to this first constraints the number of candidate beams to correspond to a light dot is drastically reduced (about 95%). Depth constraints: for cockpit occupancy reconstruction, the triangulation must yield to 3D points whose depth lies within the range [0,1.5m]. All the 3D points whose depth is bigger that this threshold are discarded. Topological constraints: Unicity (a projected beam can be associated to at most one image spot and vice versa) and order constraints allow to evaluate the matching confidence. 3 different optimization techniques have been evaluated clique-max, continuous relaxation and discrete relaxation (see Lerasle et al). This last one has shown the better compromise between speed and matching performances.

2.3 Algorithms principles The occupant detection algorithms principles can be described thanks to a multi-step process that is summarized onto Figure 4. The 3D-reconstruction process can be described thanks to a 3-step algorithm. The first step concerns the extraction of the light dot projected on the scene, by the way of conventional image processing techniques that have

Figure 4: Occupant detection algorithm principle

3D reconstruction process

concave planar lens to spread the illuminated area that allows an illumination field range of 90°x70°. The light emitter resolution is quite critical since we want to know if there is a person or an object and distinguish head, arms, from the body. Current developments have been performed with a 11x11 array of beams that demonstrate a good compromise between 3D reconstruction accuracy and calculation time. In order to brighten the dots the radiance of each beam should be maximized by optimizing the output power (limited by the eye safety requirements) and narrowing the beam diameter. The maximal output power is function of the wavelength of the emitted light and of the operating modes of the illuminator. As an example for an illuminator wavelength of 850nm, operating in a pulse mode of 1ms duration every 10 ms, the maximum permissible power considering a non-intentional vision is O.78mW (International Standard IEC 825-1). Pulse duration of 100µs would allow an increase of the permissible power of a factor 2. Image acquisition is carried out using a single short focal length CCD camera with a 2.6 mm objective providing an angle of view of 130°x100°. The calibration determines the intrinsic parameters corresponding to the well-known perspective projection onto the image plane. It’s performed off line. It must take into account intra and inter sensor/illuminator characteristics and lens characteristics (see M. Devy et al. 1997). The camera calibration process consists in determining in one global step both the extrinsic, intrinsic and distortion parameters from the matching of a set of points defined on a planar calibration target with their projection on the image plane. To reduce the measurement errors, the computation of the parameters is done with multiple pose of the calibration target. In addition an off-line calibration process is required to identify the laser beams equation parameters in the camera coordinate system. It uses multiple views of a calibration target where we project the lasers beams. The camera/illuminator device is located in the overhead console position (see Figure 3) which appears to be the most efficient position since it provides the best overview of the passenger seat even if it has two drawbacks. The first one is the need of a very wide angle of sight of the illuminator/camera (>90°). The second one is that the compactness requirement gets critical which could lead to mount the ECU in a distant location.

Dot extraction

Dot labelling Off-line calibration

Classification process

3D reconstruction

Extraction of characteristics

Classification

From the off-line calibration phase and the labeling process results, 3D coordinates can be reconstructed.. Out of the 3D points the 3D shape is then built (see Figure 5). The aim of the classification is to identify the different potential configurations according to Fig 1.

Useful information is related with several aspects: • The detection of the passenger nature and its classification • The evaluation of the distance between the passenger and the cockpit dashboard. • The evaluation of the occupant volume distribution in the cockpit and the extraction of critical situations (out of position like legs on the dashboard, . . .) The classification methodology is based on a hierarchical process. The first level is related with the detection of the occupant category: (a) Empty seat (b) Baby seat (c) Something on the seat This first classification is achieved thanks to several processing: • Analysis of the motion of the objects that are located within the scene observed by the sensor. • Extraction of specific characteristics related to each type of occupancy in some analysis areas defined in the vehicle longitudinal projection plan . In case of something is detected on the seat (c) a second processing level estimates the position of the occupant with respect to the different operating zones of the airbag. This very simple technique is based on the analysis of the number of spots in each zone. This information is then fused in order to provide a final decision to the airbag ECU. 2.4 Results The different algorithms have been implemented in “C” code within a PC environment. 3D reconstruction is performed each 50ms.

Figure 5: Different phase of Occupant detection process, illumination, 3D reconstruction with 19x19 beams illuminator The results show that the system is able to give a very good approximation of the volumetric distribution of the occupancy within the observed scene. In addition a very good estimation of the distances between the dashboard and the occupancy is achieved (+/- 2cm) (see figure 5). This value depends, mainly, of the calibration process.

The classification even if it is based on very simple principles gives promising results. The different tests that have been performed in real situation have proved that it is able to make the distinction between the most important classes of occupant category listed before. In addition this technique allows to make the distinction between the different dashboard proximity situations. In a further step, the adjunction of the time evolution of the situation will lead to an improvement of the classification robustness.

Figure 6: Different phases of occupancy detection process. Projection of the 3D image in the vehicle longitudinal plane and classification results

3. 3D RECONSTRUCTION BASED ON STEREOVISION Stereoscopic vision is based on the principle that depth information can be provided by triangulation techniques from two images (acquired for example by 2 CCD cameras) with a common part in their field of view. For many years a lot of developments (Matthes and al, 1994; Banks and al, 1997; Zabih and al, 1994) have leaded to a mature 3D perception method. Our developments take profit of these techniques and aim to adapt them to the constrained automotive context: real time performances, dense and accurate reconstruction, low cost technology, . . The performances of our stereovision method are good enough to fulfill these requirements: (1) the complete algorithm is executed in 250ms on 128x128 images, so that we can tackle weak real time constraints; (2) it provides a 3D dense reconstruction} on the passenger seat (between 3000 and 5000 3D points) so that a large variety of situations (e.g. passenger in advanced or extended position, different objects, … ) could be sufficiently characterized to provide good inputs for a classifier, and (3) if a passenger is detected, we obtain an accurate enough positioning of the head within the cockpit (~2cm).

Our pixel-based stereo algorithm aims to match pixels between the left and right images (see.Gautama et al, 1999 for a detailed analysis). The 3D-reconstruction process can be described thanks to 4 steps: • The off-line calibration determines the parameters of the stereo sensor: camera models inter camera situations, lens distortions… it allows computing the epipolar geometry between the two cameras. • The rectification process corrects the original images, to perform a perfect virtual alignment of the 2 cameras and their epipolar lines. Two matched pixels must be on the same line of the rectified images. • The correlation process finds matched pixels from the right and left images. Several scores (ZNCC, CT –Census Transform-, … ) have been evaluated in order to improve the robustness of the process with respect to environmental conditions. We show that the CT score allows building a more dense disparity map than the classical scores. Nevertheless matching can be found only on textured areas of the images. • At last the 3D-reconstruction process based on triangulation techniques and on the calibration results, computes a depth map from the disparity one. Our classification strategy uses a scene description as a set of discriminant and invariant attributes. An attribute can be local (a specific detail located in a precise location: for example, number of points acquired in a given area of the cockpit) or global (for example, number of points that belongs to planar faces). This invariance issue is important, because all global attributes could fail, if they are not adapted with respect to the seat position and orientation; a specific iterative method has been designed in order to find these seat parameters.

and a lighting unit. The different algorithms have been implemented in “C” code within a PC environment. First classification algorithms have also been implemented and tested successfully (see Fig 8)

Figure 8: 3D reconstruction based on stereo vision – Classification 4. COMPARISON OF THE 2 APPROACHES ADVANTAGES AND DRAWBACKS Both systems have been implemented in a vehicle and tested in real conditions. At last, no specific assumptions have been done on image quality. 3D vision based on structured lighting shows some very good advantages for automotive application: • The sensor can be realized in CMOS technology, cheap and high integrated solution can be viewed • First results show that 3D reconstruction using a restricted number of information allows a quite fast process with respect to Airbag requirements but some limitations in terms of resolution can be viewed.

Stereoscopic vision prototype A prototype has been implemented in a demo-car (see Fig 7). It encompasses a 2 CMOS camera set

Fig. 7: 3D reconstruction based on stereo vision – camera-set prototype (a) – vehicle integration (b)– 3D reconstruction (c)

• 2D gray scale images are still available and can be used for complementary processing An homogeneous distribution of the light dots in the scene allows very good reconstruction capabilities Current drawbacks concern the compactness of the measurement device (camera + light emitter) the effect of uncontrolled sun light disturbances and back-light. Nevertheless this problem could be easily solved thanks to new CMOS logarithmic response camera, which are much less sensitive to very bright illuminations. At last, the reconstruction is sometimes not precise enough to guaranty all scenarios classification. Advantages of conventional stereoscopic vision are mainly related with the very good resolution of the reconstructed image useful for classification purpose. In addition it is weakly influenced by back-lights. As it consists in a passive system, eye safety requirements are not concerned with. Some current drawbacks remain the calculation time and the non-homogeneous distribution of the 3D reconstruction that can not guarantee that the field of interest will always be described. 5. CONCLUSION New airbag generations will need more and more information about the automobile cockpit occupation: nature of the occupant, volumetric distribution, distances to the dashboard. It is now obvious that 3D reconstruction of the cockpit inner space is necessary. Uses of video sensing for extracting this information looks to be one of the real potential solutions. This technology is supported by the drastic reduction of the sensor cost but also by the exponential improvement and cost reduction of the image processing supporting hardware. Within Siemens, in collaboration with some labs and institutes, both in France and Germany, several video-based approaches are currently evaluated for performing 3D reconstruction: conventional stereoscopic vision, 3D vision based on structured lightning, and time of flight techniques. The results presented within this paper all show the very high potentialities of these techniques that will have to be confirmed. Beside the well known stereoscopic technique which gives a very high reliability of the reconstructed 3D images, but still remains slightly heavy in terms of computational time, uses of a structured light looks to propose an interesting compromise between accuracy and speed. Nevertheless, sensing data extracted with this technique sometimes doesn’t look enough to recognize all scenarios. A combination of the two techniques should give optimal results. The presence of light dots on unstructured area should give matching where stereo by itself could not find points. Nevertheless taking into account automotive constraints (size, cost, . . ) such solution doesn’t look realistic. Current orientations are to consider stereoscopic vision for

further developments. Specific efforts are dedicated to the improvement of computational time ACKNOWLEDGEMENT The developments performed with 3D reconstruction based on structured lightning and stereoscopic vision have been supported and co-funded by the French Ministry of Research and Education (MENRT) in the context of the PREDIT II program REFERENCES Boyer K. and Kak A., (1987). Color-Encoded Structured Light for Rapid Active Ranging IEEE Trans. On PAMI 9(31):14-28. Banks J., Bennamoun M., Corke P., (1997). Fast and robust stereo matching algorithms for mining automation,' Proc. of the Int. Workshop on Image Analysis and Information Fusion, Adelaide, Australia. Devy M., Garric V., Orteu JJ., (1997). Camera Calibration from Multiple Views of a 2D Object, using Global non-Linear Minimization Method Proc. of the Int. Conf. IROS, Grenoble, France. Gautama S., Lacroix S., Devy M. (1999). Evaluation of stereo matching algorithms for occupant detection. Proc. of the Int. Workshop on Recognition, Analysis, and Tracking of Faces and Gestures (RATFG-RTS 99), Corfu, Greece,. Matthies L., Grandjean P. (1994). Stochastic Performance Modeling and Evaluation of Obstacle Detectability with Imaging Range Sensors, IEEE Trans. Robotics and Automation, Vol.10, No.6, pp.783-792. Lerasle F., Lequellec J.M., Devy M. (2000.) Beams Labeling in a Structured Light Senso. Int. Conf. on Pattern Recognition, volume 1, pp 782-785, Barcelone. Mouaddib E., Batle J., Salvi J. (1997). Recent Progress in Structured Light in order to solve the Correspondance Problem in Stereo Vision, ICRA, volume 1, pages 130-136. Rosenfeld J., Tsikos C. (1986). High-speed Space Encoding Projector for 3D imaging Optics Illumination and Image Sensing for Machine Vision, pages 146-151. Stockman G., Hu G. (1989). Surface Solution using Structured Light and Constraints Propagation IEEE Trans. On PAMI, 11(4): 390-402.. Zabih R., Woodfill J. (1994). Non-parametric Local Transforms for Computing Visual Correspondence Third European Conference on Computer Vision, Stockholm, Sweden.