Object Oriented Face Detection Using Range and Color ... - IEEE Xplore

1 downloads 0 Views 450KB Size Report
Imaging Media Research Center, KIST. 39-1 Hawolgok-dong, Seongbuk-gu, Seoul, Korea email: fkimsh,knk,asc,[email protected]. Abstract. This paper ...
Object Oriented Face Detection Using Range and Color Information Sang-Hoon Kim, Nam-Kyu Kim , Sang Chul Ahn , Hyoung-Gon Kim Department of Electronic Engineering, Korea University 1 Anam-dong 5 ga, Seongbuk-gu, Seoul, Korea Imaging Media Research Center, KIST 39-1 Hawolgok-dong, Seongbuk-gu, Seoul, Korea email: fkimsh,knk,asc,[email protected]

Abstract

This paper proposes an object oriented face detection method using range and color information. Objects are segmented from background using stereo disparity histogram that represents the range information of the objects. Matching pixel count (MPC) disparity measure is introduced to enhance the matching accuracy, and removes the e ect of unexpected noise in boundary region. For the high-performance implementation of the MPC disparity histogram, redundancy operations inherent to the area-based search operation are removed. To detect facial regions among segmented objects, skin-color transform technique is used with the generalized face color distribution (GFCD) modeled by 2D Gaussian function in a normalized color space. Using GFCD, the input color image can be transformed into a gray-level image enhancing only facial color components. To detect facial information only in the de ned range, both results from range segmentation and color transform are combined e ectively. The experimental results show that the proposed algorithm works well in various environments with multiple human objects. Moreover, the processing time for a test image is not exceeding 2 seconds in general purpose workstations. The range information of the objects can be useful in MPEG-4 where natural and synthetic images can be mixed and synthesized.

1 Introduction

As facial object carries important information, its application expands rapidly in recent years, such as interactive computer game, security systems, and various interactive man-machine interfaces. Generally, facial informations can be used for the facial region detection, facial feature extraction, personal identi cation, and facial expression understanding[1]. It could also be used for the face synthesis and animation that could provide very ecient coding method as sug-

gested by MPEG-4 SNHC ad-hoc group. New technologies for the face detection and recognition have been developed by combining conventional mathematical tools and novel image processing idea. However, technologies are not sucient for the real applications due to the rst step of facial region detection have strong constraints like simple background and constant brightness. The primary information used for the face detection is shape, color and motion of faces. Head shapes are generally modeled as oval geometry and used to verify the facial region. However, shape varies as head moves,which makes it dicult to use. Recently, as color images can be obtained at low cost, color information is used by color transform with color distribution of face. Although it provides good initial guess about the facial region, it is still not enough to separate face from usual background environment. This paper proposes a novel object oriented face detection method using range and color information. Objects are segmented from background using stereo disparity histogram(DH) that represents the range information of the objects and background. Matching pixel count (MPC) disparity measure is introduced to enhance the matching accuracy, and removes unexpected noise in boundary region. High-performance MPC disparity histogram is implemented by removing redundancy operations inherent to the area-based search operation. To detect the facial regions among segmented range objects, skin-color transform technique is used. Because facial colors form a cluster at speci c location in a normalized color space, the generalized face color distribution (GFCD) can be modeled by 2D Gaussian function. Using GFCD model, the input color image can be transformed into a gray-level image enhancing only facial color components. The experimental results show that the proposed algorithm

2 Object Segmentation using MPC Disparity Histogram 2.1 MPC depth map

Stereo Image Grab

Down Sampling

RGB Normalization

LoG & MPC

GFCD Modeling & GFCD Filtering

Disparity Analysis

Face Color Region Extraction

Segmentation

Common Region Extraction

Labelling

Final Face Acquisition

Figure 1: Block diagram of the proposed face detection algorithm works well in various environments with multiple human objects and colorful background. Moreover, the processing time for a test image is not exceeding 2 seconds in general purpose workstations. The range information of the object can be useful in future image compression technique like MPEG-4. The suggested algorithm consists of three main stages as shown in Figure 1. First, disparity map and disparity histogram are computed by MPC stereo matching. Based on the disparity histogram, range objects are segmented with the range information selectively. Second, after normalization of the color components, color transform is performed with GFCD that is modeled by the 2D Gaussian function in normalized color domain. The transform results in enhancement of the facial color region. Last step combines above two results to search the facial regions in certain range and labels each region.

Stereo vision is a passive approach to get 3D depth information using two or more image data by using computer [2]. Several approaches with adaptive windows and multi-baselines are reported to overcome the problems of occlusion and boundary overreach[4][6]. Although the area based stereo matching has a problem of occlusion, it is widely used because of its simple computational structure and dense disparity map. There are several similiarity measures used for the area based matching, such as SSD(Sum of Square distance), SAD(Sum of Absolute Distance), and NCC(Normalized Correlation Coecient). Although the NCC similiarity measure shows less noisy from the brightness and camera di erences compared with other similiarity measures [3], the above three methods have common error probability due to the accumulations of the di erences in all pixels within search windows. So boundary overreach occurs and boundary line of same disparity moves to the direction that decreases the variation of the pixel values. This paper suggests MPC(Matching Pixel Count) similiarity measure that accumulates only the count of similar pixels in the corresponding windows. The basic concepts of the MPC algorithm to compute stereo depth map are as follows. 1. represent disparity data using minimum integer value as possible, 2. use new similiarity measure that enhances computational speed and correctness using the concept of absolute distance, not sum of distance, 3. minimize the e ect of minor similiarity by using threshold value to the similiarity results. If the pixel value of the right image at position (x; y) is R(x; y) and corresponding pixel value of the left image at position (x + d; y) is L(x + d; y), then the disparity value D(x; y) of the MPC algorithm can be represented as follows.

D(x; y) = argmax MPC (x; y; d);

(1)

d

MPC (x; y; d) =

X x;y

2

T (x; y; d);

(2)

W

where

R(x; y) , L(x + d; y)j  V T (x; y; d) = f 10 ;; jotherwise:

th

(3)

*

-

= * +

*

* (a)

(b) V(x,y+1,d)=V(x,y,d)+T(x,y+1+wr,d)-T(x,y-wr,d)

(a)

* (c)

(d)

=

+

*

*

-

*

(e) MPC(x+1,y,d)=MPC(x,y,d)+V(x+1+wc,y,d)-V(x-wc,y,d)

Figure 2: Performance comparison of various similarity measures with Random Dot test images (a) left image (b)right image (c) SAD (d) NCC (e) MPC Here, V represents a pre-de ned threshold value used as the pixel similarity measure. The T (x; y; d) represents a similiarity value of a certain pixel at position (x; y) within window W, when searching disparity value is d. T values are computed for all d values of search range, and the d value with maximum MPC value represents the best matching position. Figure 2 shows its results of various similarity measures tested on the simple "edding Cake" random-dot image. It shows that MPC is excellent method that results in correct object boundary. th

2.2 Ecient implementation of the MPC

The total number of computational operations required for the area based stereo matching is (I  I )  (W  W )  Sr, where (I  I ) is the resolution of image, (W  W ) is the size of a window and S represents disparity search range. This enormous computational burden can be reduced by removing the redundant computations inherent in the MPC operation. Let V bu er stores the matching result between vertical lines in window, and W bu er stores the matching result between windows, the computational process of the MPC can be represented as follows r

r

c

r

r

c

V (x; 0; d) =

c

c

r

wr X

,

j=

T (x; j; d);

(4)

wr

V (x; y + 1; d) = V (x; y; d) + T (x; y + 1 + wr; d) ,T (x; y , wr; d); (5) W (0; y; d) =

wc X

i=

,

wc

V (i; y; d);

MPC (x + 1; y; d) = MPC (x; y; d) +

(6)

(b)

Figure 3: Principle of removing redundancy in MPC algorithm

V (x + 1 + wc; y ; d) , V (x , wc; y; d):

(7)

Here, wr = ( r2,1) ; wc = ( c2,1) . MPC bu er contains the MPC measure at the new position. Each update of MPC requires two update of T values that are de ned in Equation 3. The total number of computational operations with this redundancy operation is given by (I  I )  Sr, which is independent of the window size. The principle of reducing computational redundancy is presented graphically in Figure 3. W

r

W

c

2.3 Range object segmentation

To reduce the amount of computations and total processing times, original stereo images are down sampled by factor of 4, and LoG(Laplacian of Gaussian) operations are performed to enhance the edge regions of the stereo image. After these initial operations, the MPC disparity map are computed and used for the disparity histogram (DH) where the occurrence frequency of each disparity value are de ned. It represents the locations and numbers of objects according to the disparity distribution. Using this DH, we can classify the objects from complicated background which are located in a di erent distance. This concept is applied to the suggested algorithm and contributes to the exact seperation of background and human objects from the image. Figure 4 shows the results of disparity map and disparity histogram of the test image "Santa". In Figure 4 (d), x-axis represents disparity values and y-axis represents the total occurrence number of pixels with that disparity value. The resolution of original images is 512  480 and down sampled by a factor of 4 to 128  120 before stereo matching op-

(a)

(b)

(c)

6000 5000 4000 3000

Figure 5: GFCD in q=(r,g) domain

2000 1000

14

12

8

10

6

4

2

0

0

(d)

Figure 4: Stereo matching result of Santa test image (a)right image (b)left image (c)disparity map (d)disparity histogram eration. In Figure 4 (d), we can see that background is located with the disparity value of 3 or 4, and the object region is located with the disparity range of 9 to 12.

3 Face Enhancement Using Color Transform

3.1 Generalized face color distribution: GFCD

Also, normalized color values can be expressed with r, g from the relation of r + g + b = 1 [5]. Therefore, all color image pixels can be expressed to q = (r; g) after intensity normalization. Color distribution of a facial area in an image concentrates on the small region in the normalized color domain. When normalized color pixels are used for the color histogram, similar color regions are clustered in a small area in the normalized color space with Gaussian distribution. Therefore, the generalized face color distribution (GFCD) can be represented by 2D Gaussian P distribution, G(m; 2 ), as follows, m = (r; g); (9)

Color information provides an important clue for the detection of human face for various images. However, there exists several diculties using color information for the face detection. That is,

1. The variation of the face color distribution is likely to occur due to the camera model and the illumination di erences. 2. There are some di erences in face color distribution with the di erence of races, sexes and ages. Therefore, the skin-color distribution model should represents all the variations mentioned above to work succesfully for the various environments. Because above two diculties are primarily due to the intensity component, normalized RGB model can minimize these diculties. Generally, colors of each pixel are expressed by the combination of R,G,B components, i.e., Q = (R; G; B ), where the range of each component value is [0,1,...,255]. Since I = R + G + B represents intensity value, each color component can be normalized by dividing it with intensity value as follows.

r = RI ; g = GI ; b = BI :

(8)

r = N1

N X

g = N1

N X



r; i

(10)

g;

(11)

i=1

i

i=1



= 0 0 ; (12) where r and g represent Gaussian P means of r and g color distribution respectively, and 2 represents covariance matrix of each distribution. Fig.5 represents the GFCD color distribution in the normalized color domain. X2

rr

gg

3.2 Color Transform using GFCD

Using the de ned GFCD, the input color image I is transformed into a gray level image Z enhancing only those color values according to the GFCD function. The color transform using GFCD on image I is de ned as f : R2 7! R : Z (X ) = G(g(X ); r(X )); X 2 I; (13) where X is the coordinate of a pixel in image I . Because the result image Z is gray-level image enhancing the facial color regions, it is easy to extract the most signi cant region using a morphological simpli cation operation.

Consequently, certain color image is ltered by face color mask and transformed to be a new image composed of intensity values, from 0 to 255, that means probability of face color.

4 Experimental Results

Three test stereo image pairs were acquired with two color cameras that are aligned to meet the epipolar constraints. Datacube system including color input controls was used for the stereo acquisition. It was attempted to get worst test images in various situation to prove the robustness of the suggested algorithm regardless of the environmental variation. For example, various size of the objects, the direction of the face, various number and location of objects, light intensity were tried to get the test images which include complex and actual background environment. Figure 6 (a), (g), and (m) show the right images of the acquired test images. Experimental results of each stage of the suggested algorithm is also shown in Figure 6. Figure 6 (b), (h), and (n) show disparity maps of each test image, respectively. Figure 6 (c), (i), and (o) show object segmented images from each disparity map respectively. Figure 6 (d), (j), and (p) show facial color enhanced image using GFCD and Figure 6 (e), (k), and (q) represent the elected facial color regions that result from range segmented objects and color segmented objects. Final results of bounding box on input images are shown in Figure 6 (f), (l), and (r). Figure 6 (g) represents two men with di erent distance from camera. The disparity map of Figure 6 (h) and disparity segmented results of Figure 6 (i) show that the two objects can be separated using disparity values. For example, the object with the disparity value of 11 can be separated from the behind object with disparity value 6. Segmented images are combined with the color transformed gray-level image to result range selective facial object detection. Figure 6 (m) shows the side view of a facial object. Figure 6 (p) shows that the noise sources with the similar color to the face are embedded in background of Figure 6 (m). Figure 6 (q) and (r) show that the facial object can still be separated from the background. Figure 7 (a),(b) and(c) show disparity histograms of Figure 6 (b), (h) and (n) respectively. It clearly represents the number of objects in the space and these objects can be segmented by group of area in disparity histogram. Moving object can also be detected using the suggested concept.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(m)

(n)

(o)

(p)

(q)

(r)

Figure 6: The results of face detection for three test images (a)(g)(m) original images (b)(h)(n) disparity maps (c)(i)(l) segmented maps (d)(j)(p) GFCD transformed image (e)(k)(q) common region (f)(l)(r) nal detection

5 Conclusions

6000 5000 4000 3000 2000 1000 10

12

14

10

12

14

10

12

14

8

(a) 6000 5000 4000 3000 2000 1000 8

6

4

2

0

References

(b)

8

6

4

2

4000 3500 3000 2500 2000 1500 1000 500 0 0

[1] Roberto Brunelli and Tomaso Poggio. Face Recognition : Features versus. Templates. IEEE Trans. PAMI, 15(10):1042{1052, Oct. 1993. [2] U. R. Dhond and J. K. Aggarwal. Structure from Stereo-A Review. IEEE Trans. Syst. Man Cybern., 19(6):1489{1510, Nov./Dec. 1989. [3] Olivier Faugeras et al. Real time correlationbased stereo: algorithm,implementations and applications. Technical report, INRIA, 1993. [4] T. Kanade et al. A stereo machine for video-rate dense depth mapping and its new applications. In Proceedings of Computer Vision and Pattern Recognition Conference, San Francisco, June 1996. [5] H. Martin Hunke. Locating and Tracking of Human Faces with Neural Networks. Technical report, CMU-CS-94-155, 1994. [6] M. Okotumi and T. Kanade. A Multiple Baseline Stereo. IEEE Trans. PAMI, 15(4):353{363, 1993.

6

4

2

0

0

0

This paper proposed an object oriented face detection method using range and color information. Objects are segmented from background using stereo disparity histogram that represents the range information of the objects. MPC disparity measure is introduced to enhance the matching accuracy, and removes the effect of unexpected noise in boundary region. For the high-performance implementation of the MPC disparity histogram, redundancy operations inherent to the area-based search algorithm are removed by ecient computational method. To detect the facial regions among segmented objects, skin-color transform technique is used. Because facial colors form a cluster at speci c location in a normalized color space, the GFCD can be modeled by 2D Gaussian function. Using GFCD, the input color image can be transformed into a gray-level image enhancing only facial color components. The experimental results shows that the proposed algorithm works well in various environments with multiple human objects. In future work, it will be considered for the facial components extraction based on the suggested face detection algorithm. Moving information can also be incorporated into the suggested algorithm to enhance the performance. The range information of the object can be useful in MPEG-4 where natural and synthetic images can be mixed and synthesized.

(c) Figure 7: Disparity histogram of (a) Figure 6-(b), (b) Figure 6-(h), and (c) Figure 6-(n)

Suggest Documents