HUMAN IDENTIFICATION USING LASER SCANNERS AND IMAGE SENSORS Hiroki TAKAHASHI, Katsuyuki NAKAMURA, Huijing ZHAO, Ryosuke SHIBASAKI Graduate School of Civil Engineering, Shibasaki lab ,University of Tokyo 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8505 JAPAN Tel:(81)-3-5452-6417 Fax:(81)-3-5452-6417 E-mail:
[email protected] KEY WORDS:
Laser scanner, Image sensor, Hybrid sensing, Human identification
ABSTRACT: We propose hybrid-sensing system for human tracking. This system uses laser scanners and image sensors and is applicable to wide and crowded area such as railway station. Concretely, human tracking using laser scanners is at base and image sensors are used for human identification when laser scanners lose persons by occlusion, entering room or going up stairs. We developed the method of human identification for this system. Our method is following: 1. Best-shot images (human images which show human feature clearly) are obtained by the help of human position and direction data obtained by laser scanners. 2. Human identification is conducted by calculating the correlation between the color histograms of best-shot images. It becomes possible to conduct human identification even in crowded scenes by estimating best-shot images. In the experiment in the station, some effectiveness of this method became clear. 1.
INTRODUCTION
Human tracking is the important fundamental technique for surveillance system and spatial planning of building space etc. Although many researches on this technique are using image sensors such as CCD camera[1-3], they are still immature in terms of reliability and accuracy especially in crowded scenes. On the other hand, the methods using laser scanners[4] are relatively robust in crowded scenes and can track over wide area. However, once a chased person is lost by occlusion, entering room or going up stairs, it is impossible to chase him/her as the same person again because laser scanners cannot obtain human appearances. To overcome this weak point, we propose hybrid-sensing system for human tracking. Laser based method is at base and image sensors are used for human identification when laser scanners lose persons. Re-tracking is enabled by this human identification using image sensors. In this system, it is possible to track persons continuously over wide area such as whole building space. In this paper, we propose the method of human identification for hybrid-sensing system. Many researches have been made about human identification using image sensors[5,6], but they don’t examine enough in the hard condition that occlusion is severe and human directions change consistently such as railway station. Our method is applicable to these hard conditions by using laser scanners together with image sensors, because laser scanner can obtain exact positions and directions of persons even in crowded scenes. By the help of these data, the accuracy of human identification in image sensors can improve. 2.
METHOD
2.1. Laser-Based Tracking and Fusion of Laser and Image Sensor Laser-based tracking is conducted by scanning ankles with multiple laser scanners on the
floor and calculating patterns of ankle’s movement[4]. In this method, the precision of tracking is 90-100% in not crowded places and more than 80% in crowded places. Fig.1 represents the result of tracking in this method. Laser data and results of tracking are projected image sensors after time synchronization. Laser data are on world coordinates (Xw,Yw,Zw), so they must be transformed into figures on image coordinates (X,Y) (Fig.2). This transformation is conducted with Tai’s method[7].
Fig.1
Tracking pedestrians by using laser scanners
Fig. 2 Geometric relationships between laser scanner and image sensor
2.2. Extraction of Human Regions After fusion of laser and image sensor, human extraction in image sensors is conducted. First, we extract rough human regions with human position data obtained by laser scanners by the below process. 50×50cm square is made around a point of a human position. This square is parallel to the walking direction of the person. By extending this square vertically, a rectangular solid(Human Box) is made. The region within the Human Box is rough human region (Fig.3(a)). Then, human regions are extracted with background subtraction. In our method, the Gaussian mixture model is used in background model for background subtraction[8]. This method has the advantage of being flexible enough to handle variations in lighting and moving scene clutter. However, the human regions extracted by this method or general method of background subtraction have some noises(Fig.3(b)). So, we use Human Box obtained by laser scanners. The exact human regions are extracted by calculating common regions of Human Box and human regions extracted by background subtraction(Fig.3(c)). Highly accurate human extraction is achieved than normal background subtraction by the help of laser scanners. Laser scanners have another advantage of getting the number of persons in the image. If we segment regions by connectivity, a human often separate into two or more regions. However, in this method, errors of segmentation can be diminished because there is only one person in one Human Box.
(a)Original image and Human Box Fig. 3
(b)Background subtraction Human Box and noises removal
(c)After noises removal
2.3. Estimation of Best-Shot Images The human regions are segmented and human images are extracted. These human images are used for human identification when laser scanners lose persons. However, all human images cannot be used because some persons in images overlap other persons or look very small etc. So, we have to estimate the human images which show human feature clearly. These images are called best-shot images. The barometers of best-shot images are resolution(Rk,t), direction(Dk,t) and occlusion(Ok,t). M k ,t
Rk ,t D k ,t
(1)
I g& k ,t v&k ,t
(OC k ,t
Ok ,t
(2)
Fk ,t )
(3)
M k ,t
Resolution means what size of the human region pixels is. Mk,t is the size of person k obtained at time t and I is the size of all pixels in image sensor. Direction means human direction against image sensor. The vector from image sensor to person k (gk,t) and the vector of human moving direction (vk,t). Occlusion means degree of overlapping regions. OCk,t is the size of the regions in which other persons overlap and Fk,t is the size of the frame-out regions. The estimated values for best-shot images β are calculated point by point. w are the weights and f are the response functions of each barometer. Then, best-shot image β* is estimated by maximizing estimate values. One best-shot image is obtained per one track. k
(t )
wR f R ( Rk ,t ) k
wD f D ( Dk ,t )
arg max t
k
wO f O (Ok ,t )
(4)
(t )
(5)
Fig.5 shows response functions. The utility of human images with overlapping regions is low, so the decrease of score in occlusion is especially sharp. Each weights are wR = 1,wD = 2,wO = 3. In brief, human images which have no overlapping regions and turn to the front and whose sizes are large are estimated.
(a)Resolution Fig.5
(b)Direction Response function of each barometer
(c)Occlusion
2.4. Human Identification Best-shot images are separated into upper and lower regions and transformed into HSV color histograms(H(h,s) (h=1..16, s=1..8)). V means brightness of image, so it is not used for identification so that the result of identification doesn’t depend on differences of brightness. This histogram and times and positions where best-shot images are obtained are conserved, and
they are used for human identification. Firstly, compared images CI are decided by time and position restriction. Time and position restriction means excluding the human image which cannot be the same person considering to time and position. Next, matching score ms between person k and j is calculated by following formula. Matching scores between k and all compared images are calculated and the same person k* as k is decided. Hu means the histogram of upper region and Hl means that of lower region. 8
ms ( k , j )
16
H k ( h, s ) H j (h, s )
(6)
H Hu , Hl s 1 h 1
k
(7)
arg max ms(k , j ) j CI
. 3.
EXPERIMENT
3.1. Outline of Experiment We experimented in the station 180 thousand persons use per day. The concourse where we experimented has 40m×60m spread(Fig.2). We used eight laser scanners(SICK LMS-200) and a CCD camera (SONY DEF-SX900). Laser scanners were set so that occlusion could be low and the measurable range could be wide. They were connected with wireless LAN. Each laser scanner measures every 0.25 degrees on 30m semicircle with 19fps. CCD camera have 720× 480 size and 7.5fps frame rate and shot a part of the concourse(Fig.6). We picked up 30 pairs of same person’s best-shot images(so, 60 images) obtained by the CCD camera. Obtaining 2 same person’s best-shot images means that laser scanners lost the person and obtained 2 tracks of the person. This person-loss is caused by occlusion because the station has heavy congestion. Fig.7 represents the example of best-shot images obtained in the station.
Fig.6
Screen of CCD camera
Fig.7
The example of best-shot images
3.2. Result We calculated matching score among 60 images. Then, if the image which got highest score was the same person, it is correct. Tab.1 represents the result and comparison between our method and other color space or method. Tab.1
Accuracy of method for human identification
Method 1. HSV color histogram (Exclude V) (Our method) 2. HSV color histogram 3. RGB color histogram 4. Average of RGB color 5. Our method + Time restriction (within 15 seconds)
Accuracy 76%(46/60) 43%(26/60) 38%(23/60) 17%(10/60) 90%(54/60)
Comparing with other color histograms, our method(HSV and exclusion of V) showed highest accuracy. Average of RGB color (4th method) means that calculating matching score by the similarity of each RGB average, but this was very low accuracy. Moreover, adding to time restriction, comparing with only images obtained within 15 seconds, accuracy was very high, 90%. This restriction is possible because it seems to takes less than 15 seconds to walk on the screen area. Considering to the experiment in the hard condition that occlusion is severe and human directions change consistently, this accuracy is very high. It results from obtaining the ‘good’ human images thanks to best-shot estimation. 4.
CONCLUSION
In this paper, we proposed the method of human identification which is applicable even in the hard condition(severe occlusion etc). Then, we made sure that this method is effective by the experiment in the station. In this experiment, we regard that it is correct if the image which got highest score was the same person, but seeing in more detail, the accuracy that the same person’s images were included within 10th high score image was 98%. So, it is effective to add other human feature to barometer of identification for increasing accuracy. We think that main barometer is color information obtained by image sensors and it is good to add human velocity or height etc. Furthermore, the accuracy will increase more by setting time and position restriction in more detail. We considered the human identification of a pair of human images obtained in narrow area(one camera). Next, it needs to develop the method to be applicable to images obtained by different cameras, because we must obtain human images by different cameras if persons are lost by entering room or going up stairs. It will be problem to change brightness heavily between different cameras. So, we need develop the method which is robust in such situation. REFERENCES [1]W.Hu, T.Tan, L.Wang, S.Maybank “A survey on visual surveillance of object motion and behaviors” IEEE Trans. Systems,man,and cybernetics Vol.34, No.3, pp.334-352, 2004 [2]D.M.Gavrila “The visual analysis of human movement: A survey” Computer Vision and Image Understanding Vol. 73, No. 1, pp. 82–98, 1999 [3]J.K.Aggarwal, Q.Cai “Human motion analysis: A review” Computer Vision and Image Understanding Vol. 73, No. 3, pp. 428–440, 1999 [4]H.Zhao, R.Shibasaki “A novel system for tracking pedestrians using multiple single-row laser -range scanners” IEEE Trans. System, Man and Cybernetics Vol.35, Issue.2, pp.283-291 2005 [5]Y.Ivanov “Multi-model human identification system” Seventh IEEE Workshops on Application of Computer Vision Vol.1, pp164-170, 2005 [6]M.Sato, R.Yamazaki “Person search system using face and clothing features” Matsushita Tech J Vol.52, No3, pp.163-167, 2006 [7]R.Tsai “A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses” IEEE Journal, Robotics and Automation Vol.3, Issue.4, pp.323-344 [8]C.Stauffer, W.E.L Grimson “Adaptive background mixture models for real-time tracking” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol.2, pp2246