Local Binary Patterns for Human Detection on

0 downloads 0 Views 326KB Size Report
been widely used for efficient texture classification. ... number of bins and enhance the classification ... Because the variety of colours and textures of clothes.
Ninth IEEE International Symposium on Multimedia 2007

Local Binary Patterns for Human Detection on Hexagonal Structure Xiangjian He1, Jianmin Li2, Yan Chen1, Qiang Wu1, and Wenjing Jia1 1 Faculty of Information Technology University of Technology, Sydney Australia [email protected] 2 School of Computer and Mathematics Fuzhou University Fujian 320002, China [email protected] for similarity measurement in order to reduce the number of bins and enhance the classification efficiency while improving the recognition accuracy. Research works done in [3] and [4] have shown that LBP based methods can produce good results for face recognition in 2D images [1]. The LBP operator can be seen as a unifying approach to the traditionally divergent statistical and structural models of texture analysis. On the other hand, Mahalanobis Distance Map (MDM) was applied in [5] as a simple method to represent the appearance of human figures statistically. This representation is based on the geometrical structures of objects rather than commonly used pixel value patterns. Pixel values can provide significant information regarding objects in many cases. If the colour/brightness of a target is stable, appearance model based on pixel values can be very effective. For some objects like humans, however, we cannot expect particular pixel values. A human figure contains several parts covered by various colours of clothes. Because the variety of colours and textures of clothes is enormous, it is impossible to learn all possible patterns of clothes. Conversely, the geometrical locations of body parts are similar for all human figures and MDM-based method focuses on more geometrical entities than pixel values. Therefore, MDM-based method for human detection is robust to human colour variety. MDM can extract common image features independent of object colour variety such as a shirt’s colour/texture differences. However, the approach merely using MDMs cannot perform perfectly because the Mahalanobis distance still misses some subtle difference due to mean operation during the distance computation.

Abstract Local Binary Pattern (LBP) was designed and has been widely used for efficient texture classification. LBP provides a simple and effective way to represent texture patterns. Uniform LBPs play an important role for LBP-based pattern/object recognition as they include majority of LBPs. On the other hand, Human detection based on Mahalanobis Distance Map (MDM) recognizes appearance of human based on geometrical structure. Each MDM shows a clear texture pattern that can be classified using LBPs. In this paper, we compute LBPs of MDMs on a hexagonal structure. The circular pixel arrangement in hexagonal structure results in higher accuracy for LBP representation than on square structure. Chi-square as a measure is used for human detection based on uniform LBPs obtained. We show that our method using LBPs built on MDMs has a higher human detection rate and a lower false positive rate compared to the method merely based on MDMs. We will also show using experimental results that LBPs on hexagonal structure lead to more robust human classification.

1. Introduction Local Binary Pattern (LBP), originally designed for efficient texture classification provides a simple and effective way for object/pattern recognition such as face recognition [1][2]. Uniform LBPs as defined in [3] have played an important role for matching based on LBP histograms. Uniform LBPs include most of LBPs and contains more significant features such as bright/dark spots, flat/edge areas than non-uniform LBPs. Hence, we need to include only uniform LBPs

65 DOI 10.1109/ISM.2007.19

Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 19, 2009 at 04:09 from IEEE Xplore. Restrictions apply.

In the sense of discarding pixel values, MDM-based method is similar to edge-based object recognition. However, edge-based approach is generally too sensitive to local information and image noise [5] and hence is not suitable for human detection. In this paper, we take the advantages of LBP and MDM shown above, and present an algorithm that computes LBPs for the MDMs obtained. Because each MDM has a clear texture pattern that can be well classified using LBPs, whether an image should be classified as a human image or non-human image is measured using Chi-square measurement on uniform LBPs computed. We use experimental results to show that MBM with LBP for human detection demonstrates a higher true positive rate and a lower false positive rate. We construct LBP based on a hexagonal image structure [6]. We will show that LBPs defined on the hexagonal structure have much higher percentages of uniform LBPs. Therefore, LBP-based object/pattern recognition on hexagonal structure has potentials for more accurate and efficient classification. The rest of this paper is organized as follows. In Section 2, the concepts for constructing MDMs for human detection are briefly reviewed. In Section 3, we briefly introduce LBP coding methods both on square and hexagonal structuress. An algorithm combining MDMs with LBPs for human detection is presented in Section 4. Experimental results that compare pure MDM-based method with our MDM+LBP-based method are demonstrated in Section 5. We conclude in Section 6.

ranged from 1 to M*N to distinguish the blocks and hence denote the blocks by X1, X2, …, XMN.

Separation of an image area into

Figure 1.

blocks [5]. For each given block Xl (l = 1, 2, …, MN), let us denote the average and the covariance of all pixel values in the block by xl and Σl respectively. Then, the Mahalanobis distance (denoted by di,j ) between blocks i and j (i, j = 1, 2, …, MN ) is computed by

d i, j =

( xi − x j ) 2 Σi + Σ j

.

(1)

The MDM (denoted by D) of the original given image area is defined as an MN × MN image of which the pixel values are determined by an MN × MN matrix as

D = [d i , j ] MN ×MN .

2. Mahalanobis Maps for Human Detection For human detection, Mahalanobis distance is used to measure differences in pixel value distributions among regions [5]. Each image area is divided into sub-regions (blocks) and Mahalanobis distances are computed between any two blocks. The set of Mahalanobis distances calculated for one image area is called a Mahalanobis Distance Map (MDM). More details about how to calculate the MDMs of input images are reviewed as follows [5]. Let us consider an image area of size of m × n (see Figure 1, left). In this paper, the value of a given pixel at row s and column t (1≤ s ≤ m, 1≤ t ≤ n) is the grey scale from 0 to 255 and is denoted by xs,t. In general, the pixel value xs,t can be described as a multidimensional vector, of which each component represents a pixel feature. The image area is then divided into M × N smaller and non-overlapping blocks such that each block has p × q pixels (see Figure 1, right). We use a number

(2)

3. Local Binary Patterns LBP was originally introduced by Ojala et al. in [7] as texture description and defined on square structure. LBP features have performed very well in various applications including texture classification and segmentation [1]. The basic form of an LBP operator labels the pixels of an image by thresholding the 3 × 3 neighborhood of each pixel by the grey value of the pixel (the centre). An illustration of the basic LBP operator is shown in Figure 2. Note that the binary LBP code is circular. The basic LBP consists of 8 binary bits representing an integer ranged from 0 to 255. In order to reduce the complexity of computation, uniform LBPs have been defined as shown in [1]. A uniform LBP is a LBP that contains at most two bit wise 0-1 or 1-0 transitions [1]. For example, 00000000,

66

Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 19, 2009 at 04:09 from IEEE Xplore. Restrictions apply.

00001000, 11000111 are uniform patterns. With this extension, there are 58 uniform LBP code patterns for 8 bit basic LBP code, and 256-58=198 non-uniform LBP code patterns. These uniform LBP codes function as templates for microstructure such as bright spot (00000000), flat area or dark spot (11111111), and edges of varying positive and negative curvatures (others) [8]. As shown in [8] that uniform patterns accounted about 90% of all patterns in the experiments with texture images.

process before computing the LBP codes on hexagonal structure. If we define a uniform LBP on hexagonal structure in the same way as on square such that each uniform LBP contains only two 1-0 or 0-1 transitions, then the number of uniform LBP codes is reduced from 58 on square structure to 32 on hexagonal structure. Like uniform LBPs on square structure, uniform LBPs on hexagonal structure function as bright spot (000000), flat area or dark spot (111111), and edges of varying positive and negative curvatures (others). Because of smaller number of uniform LBPs compared with those on square structure and more even contribution of neighboring pixels to their reference pixel, one can expect that image recognition based on hexagonal structure using LBPs not only improves the recognition accuracy but also increases the recognition speed.

Figure 2. Calculation of LBP code from 3×3 neighborhood [4].

4. MDM+LBP Human Detection In this section, an algorithm built on both MDM and LBP is presented on hexagonal structure. We first follow the approach as shown in [5] to compute two MDMs that are for human class and non-human class respectively and then compute the difference of the two MDMs. LBPs on each MDM are obtained after that. Based on the histograms of uniform LBPs, an image area is classified as either a human figure or a non-human figure using the Chi-square measure.

Similar to the construction of basic LBP on square structure, we now construct basic LBP on hexagonal structure as shown in Figure 3.

4.1. Algorithms Using MDMs First, a large number of sample images with humans and without humans are collected. Then, the MDMs of all sample images as shown in Equation (2) are calculated using Equation (1). Let us assume that there are K sample images with humans and K sample ,k be the (i, j) element images without humans. Let d iobj ,j

Figure 3. Calculation of basic LBP code on

hexagonal structure.

,k be the (i, j) of k-th MDM with human and d ibck ,j element of k-th MDM without human respectively, where i, j = 1, 2, …, MN and k = 1, 2, …, K. Furthermore, let

By defining the basic LBP codes on hexagonal structure, the number of different patterns has been reduced from 28 = 256 on square to 26 = 64 on hexagonal structure (a collection of 7 hexagonal pixels is shown in Figure 3). More importantly, because all neighboring pixels of a reference pixel have the same distance to it, the grey values of the neighboring pixels have the same contributions to the reference pixel on hexagonal. Furthermore, because all six neighboring pixels are exactly on a circle with radius 1 (assuming the distance between neighboring hexagonal pixels is 1) and centre at the reference pixel, unlike on square structure, there is no need to perform an interpolation

d

obj i, j

1 = K

K

∑d

obj , k i, j

,d

bck i, j

k =1

1 K 1 = K

2 σ obj (i , j ) =

2 σ bck (i , j )

K

∑ (d

1 = K

K

∑d

bck ,k i, j

k =1

obj , k i, j

2 − d iobj ,j ) ,

bck , k i, j

2 − d ibck ,j ) ,

k =1 K

∑ (d k =1

67

Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 19, 2009 at 04:09 from IEEE Xplore. Restrictions apply.

,

wi , j =

bck 2 (d iobj , j − d i, j ) 2 2 σ obj ( i , j ) + σ bck ( i , j )

where

.

(3)

1, I { A} =  0,

Then, the Averaged MDM for the K sample images with humans is defined as the MN × MN image of which the pixel values are determined by the MN × MN matrix represented by

D obj = [d iobj , j ] MN × MN .

H = ( H (0), H (1), ", H (31)).

(4)

(5)

χ

(6)

2 T ,G

( H T (i ) − H G (i)) 2 =∑ , H T (i) + H G (i ) i

(10)

where HT(i) and HG(i) represent the i-th components of HT and HG respectively for i = 0, 1, …, 31, and G stands for either obj or bck corresponding to either histogram with humans or without humans.

determines the difference between the distributions of the two averaged distance maps derived from Equations (4) and (5) above. The MN × MN image corresponding to Equation (6) is called the Difference MDM. We then follow the algorithm proposed in [6] to convert each MDM to an image on a hexagonal structure using bilinear interpolation. These images are call MDMs on hexagonal structure.

4.3. Human Detection We present our approach for human detection in this subsection. More clear description with examples can be found in Section 5 below. A straightforward way to classify an image area T is to compare the results obtained using Equation (10) for G = obj, bck. T is classified as a human image if and only if

4.2. Algorithms Using LBPs Recall that each MDM has a clear texture pattern. Hence, LBP-based classifier should work well to distinguish the MDMs from each other. Because uniform LBPs account most LBPs, we consider only the 32 uniform LBP code patterns as 32 LBP types on hexagonal structure for human detection in this paper. For each pixel p, let LBP(p) denote the LBP type at p. Let S denote the MDM area, and H(i) (i = 0, 1, …, 31) denote the uniform LBP histogram over S corresponding to bin value i, i.e.,

H (i) = ∑ I {LBP( p) = i},

(9)

The histogram H contains information about the distribution of the micro-patterns, such as edges, spots and flat areas, over the image plane S. For a given image area T, let us denote its uniform LBP histogram derived from its MDM by HT. Let us also denote the uniform LBP histograms derived from the averaged MDMs with humans and without humans by Hobj and Hbck respectively, then we can use the following Chi-square dissimilarity measures to classify the testing image:

Note that wi,j (i, j = 1, 2, …, MN ) in Equation (3) actually represents the Mahalanobis distance between the corresponding elements in Dobj and Dbck defined in Equations (4) and (5). Therefore, the matrix

W = [ wi , j ] MN × MN

(8)

We can now define the uniform LBP histogram H over S on hexagonal structure by

Similarly, the Averaged MDM for the K sample images without humans is defined as the MN × MN image of which the pixel values are determined by the MN × MN matrix represented by

D bck = [d ibck , j ] MN × MN .

A = true, A ≠ true.

χ T2,obj < χ T2,bck .

(11)

This method, however, has following two disadvantages. As each MDM has huge number of elements, the use of all elements for LBP computation is time consuming. Furthermore, as one will see in Section 5, the element values of Averaged MDMs with humans and without humans are very similar for most of elements, counting all elements for Chi-square comparison in Equation (11) degrades the recognition accuracy.

(7)

p∈S

68

Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 19, 2009 at 04:09 from IEEE Xplore. Restrictions apply.

For a faster and more accurate recognition, we reduce the region for the Chi-square comparison to a much smaller block that contains much higher percentage of significant difference between the two Averaged MDMs (i.e., a block showing higher percentage of bright elements in the Difference MDM). To further improve recognition accuracy and reduce the false positive rate, a threshold (denoted by χ2) is selected so that the given image area is classified as a human image if and only if

χ T2, obj < χT2, bck

and

χ T2, obj < χ 2

MDMs on hexagonal structure. These converted MDMs are displayed in Figure 5. Then, we compute LBPs from the Averaged MDMs shown in Figure 5. For each selected image, its LBPs are also computed.

(12)

hold in the significant MDM block. Figure 5. Averaged MDMs with humans (left) and without humans (right) represented on hexagonal structure.

5. Experimental Results 400 sample images of size 72 × 36 with humans and another 400 sample images without humans taken in a park are used for the computation of Averaged MDMs with humans and without humans. The image block size is 3 × 3. Hence, M = 24 and N = 12 in Equations (4)-(6). The Averaged MDMs of size 288 ×288 are displayed in Figure 4 (left and middle). Figure 4 (right) shows the Difference MDM describing the difference between the distributions of the two MDMs. We use another 800 images consisting of 400 human images and 400 non-human images of size 72 × 36 for testing and performance analysis.

When we simply use Equation (11) to compare the uniform LBPs the MDM domain as shown in Figure 5, we achieve 96% recognition rate together with high 64% false positive rate. When we restrict the region for Chi-square comparison using Equation (11) to the block between rows 101 and 194 and between columns 2 and 79 only on square structure, the recognition rate remains 96% but the false positive rate reduces to 56%. From Figure 4 (right), the restricted area shows a much higher percentage of bright components (pixels) than the whole MDM domain. To further improve the recognition accuracy and reduce the false positive rate, a Chi-square threshold is used and Equation (12) is applied. We finally achieve the recognition rate of 92% with false positive rate of 8%, which are much better than those obtained using the pure MDM-based approach. We display a few human images and non-human images in Figure 6. In Figure 6, the only human image that is not recognized is the 6th one (too dark?) on the top row, and the only non-human image that is classified as a human image is the last one (too bright?) on the bottom row. We summarize the above findings in Table 1. On square structure, similar results as shown in Table 1 are obtained. To compare results obtained on square structure with those on hexagonal structure, we randomly select three testing images with humans and three images without humans. We compute the χ T2,obj values for the

Figure 4. Averaged MDMs with humans (left) and without humans (middle), and Difference MDM (right).

With the pure MDM-based approach as shown in [5], after a very time consuming LDM learning process based on the three MDMs displayed in Figure 3, 50 most significant components in the Difference MDM are computed and used for recognition. The recognition rate (true positive rate) is 86.75% and false positive rate (the rate that non-human images are mistakenly classified as a human image) is 11.5%. To test the new approach proposed in this paper, we randomly select 25 human images and 25 non-human images from our dataset. We first convert the two average MDMs shows in Figure 4 to the corresponding

selected six images on the whole MDM domain. Their values are shown in Table 2.

69

Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 19, 2009 at 04:09 from IEEE Xplore. Restrictions apply.

have shown that the proposed method in this paper outperforms the pure MDM-based method. Unlike the pure MDM-based method, our LBP+MDM approach does not go through time consuming LDM learning process. We merely observe the small region in the MDM domain that shows higher percentage of bright pixels in the Distance MDM and perform the uniform LBP histogram matching in this region using the wellknown Chi-square measure. This has greatly reduced the processing time. It is expected that the recognition can be further improved if the matching is restricted to the brightest components (pixels) in the Distance MDM. If we do not take into account the massive time for LDM-based learning, one may also consider to perform the LBP matching on the significant components obtained through LDM process as seen in [5]. We rather leave this to the future work. Furthermore, the recognition rate can also be improved when more samples with more variety of human appearance and more variety of background are included into the learning process. We have also shown that implementation on hexagonal structure has potential for easier and more accurate classification. Moreover, because there are much less uniform LBP types on hexagonal structure than on square structure, performance on hexagonal structure is also more efficient.

Figure 6. Examples of images used for testing. Table 1. Comparison of various methods

Table 2. Comparison of results on square and

hexagonal structures

References

In Table 2, Obj 1, Obj 2 and Obj 3 represent the three images with humans, and Bkgd 1, Bkgd 2 and Bkgd 3 represent the three images without humans. After the χT2,obj values are computed on square and

[1] S. Z. Li, C. S. Zhao, X. X. Zhu and Z. Lei, “2D+3D Face Recognition by Fusion at Both Feature and Decision Levels”, In Proceedings of IEEE International Workshop on Analysis and Modeling of Faces and Gestures, Beijing. Oct 16, 2005.

hexagonal structures respectively, the values are entered in Table 2 as displayed. The Mahalanobis distance between the human images and non-human images are then computed. On square structure, the distance is 1.06 with is more than 10% smaller than 1.14, the distance on hexagonal structure. This indicates that the difference between the human images and non-human images on hexagonal structure is much bigger. Therefore, it is easier to classify human or nonhuman images on hexagonal structure than on square structure.

[2] M. Pietikäinen and G. Zhao, “Dynamic texture recognition using local binary patterns with an application to facial expressions”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 2007, pp.915-928. [3] T. Ahonen, A. Hadid and M. Pietikäinen, “Face description with local binary patterns: application to face recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2006, pp. 2037-2041. [4] M. Pietikainen Hadid and T. Ahonen, “A discriminative feature space for detecting and recognizing faces”, CVPR2004, pp.797-804.

6. Conclusions

[5] A. Utsumi and N. Tetsutani, “Human detection using geometrical pixel value structure”, 5th IEEE International Conference on Automatic Face and Gesture Recognition, FGR2002, pp.34-39.

In this paper, we have presented a new approach to human detection using LBPs built on MDMs on hexagonal structure. We first obtain MDMs that are less sensitive to the variety of colours and illumination changes. LBP-based classification that has been proved working well for texture images like MDMs is then applied for human detection. The experimental results

[6] X. He, W. Jia, N. Hur, Q. Wu, J. Kim and T. Hintz, “Bilateral Edge Detection on a Virtual Hexagonal Structure”,

70

Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 19, 2009 at 04:09 from IEEE Xplore. Restrictions apply.

Lecture Notes in Computer Science, LNCS, Springer, 2006, Vol.4292, pp.1092-1101.

[8] T. Ojala and M. Pietikainen, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.24, No7, 2002, pp.971-987.

[7] T. Ojala, M. Pietikainen and D. Harwood, “A comparative study of texture measures with classification based on feature distributions”, Pattern Recognition, Vol.29, 1996, pp.51-59.

71

Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 19, 2009 at 04:09 from IEEE Xplore. Restrictions apply.

Suggest Documents