IEICE TRANS. FUNDAMENTALS, VOL.E89–A, NO.11 NOVEMBER 2006
2990
LETTER
Special Section on Image Media Quality
JPEG Quantization Table Design for Face Images and Its Application to Face Recognition Gu-Min JEONG†a) , Member, Chunghoon KIM†† , Hyun-Sik AHN† , and Bong-Ju AHN† , Nonmembers
SUMMARY This paper proposes a new codec design method based on JPEG for face images and presents its application to face recognition. Quantization table is designed using the R-D optimization for the Yale face database. In order to use in the embedded systems, fast codec design is also considered. The proposed codec achieves better compression rates than JPEG codec for face images. In face recognition experiments using the linear discriminant analysis (LDA), the proposed codec shows better performance than JPEG codec. key words: JPEG, quantization table, face image, face recognition
1.
Introduction
Data compression is essential for the saving and transmission of information and is widely adopted according to the development of communications. There are various standards and formats for image compression. Especially, in still image coding, JPEG [1], [2] becomes a defacto standard. Once JPEG is used only for PC, but in nowadays, it becomes more popular for the embedded systems such as digital camera, handset or PMP. Though JPEG 2000 [3] has a better compression performance than JPEG, the encoding and decoding are 3-10 times slower than JPEG. JPEG 2000 may be inadequate for the embedded system with low processing power CPU. Therefore, JPEG based codec design is widely used for the mobile system or embedded system with low processing power. On the other side, with the development of recognition technologies, various recognition methods such as finger print, voice and face have been commercialized. They are used mainly for security, safety or entertainment. Among these recognition methods, face recognition [4]–[16] has been widely researched since it is easy to obtain recognition data and it is relatively convenient to the user. In the face recognition, it is essential to store the face data. The face data storage methods can be divided into two. One is keeping the features for the classification and the other is saving the images themselves. Though the data size is relatively small in the first methods, the face images are not preserved and one cannot see the picture. In case of saving the images, though the data size is large, the images can be Manuscript received December 5, 2005. Manuscript revised April 26, 2006. Final manuscript received July 14, 2006. † The authors are with the School of Electrical Engineering, Kookmin University, Korea. †† The author is with the School of Electrical Engineering and Computer Science, Seoul National University, Korea. a) E-mail:
[email protected] DOI: 10.1093/ietfec/e89–a.11.2990
preserved. In this paper, the JPEG based codec is designed for the saving of face images, the Yale database. Especially, in order to upgrade the compression efficiency, simple but efficient quantization table design is discussed considering the R-D optimzation [17]. For the implementation in the embedded system, the implementation of [18] is considered based on Arai DCT [19] instead of binDCT C7 [20]. Face recognition experiments are performed using the the Principle Component Analysis (PCA) [4] and the Linear Discriminant Analysis (LDA) [8] for original images, JPEG compressed images and the images with proposed codec. Experimental results show that the proposed codec has compression gain and 1 dB PSNR gain, and achieves better recognition rates than JPEG. The proposed method can be easily adopted to JPEG 2000. The remainder of this paper is organized as follows. In Sect. 2, related works are briefly summarized. In Sect. 3, codec design for face images is discussed. In Sect. 4, the face recognition experiments presented and the conclusion and future work follow in Sect. 5. 2.
Related Works
2.1 Face Recognition In the face recognition, the intensity of each pixel in a face image is used as an input feature. Since there are about tens of thousands of pixels in face image, it is redundant to use all of those pixels as features for face recognition. Thus the subspace methods, which can project patterns to a lower dimensional space, are widely used. Fig. 1 shows a conceptual diagram of a subspace. There are three extracted features { f 1, f 2, f 3} in this subspace. The probe image will be classified correctly because the within-class distance (d1) is smaller than the between-class distances (d2, d3). The Eigenface method, based on the principal component analysis (PCA), is used to find the set of projection vectors (called Eigenfaces) that maximize the scatter of projected samples [4]. This method has become one of the principal approaches in face recognition [5], [6]. However the Eigenface method does not use class information, and finds only the most expressive features in terms of meansquare error [7]. Meanwhile, the Fisherface method, based on the linear discriminant analysis (LDA), is used to find the set of projection vectors (called Fisherfaces) that maximize the determinant of the between-class scatter matrix
c 2006 The Institute of Electronics, Information and Communication Engineers Copyright
LETTER
2991
Fig. 3
Sample images cropped to the size of 120 × 100.
D(Q) + λR(Q) where, λ is an Lagrange multiplier, D, R, Q denote the distortion, rate, quantization table, respectively. Since it takes much time to compute R-D optimization, an efficient algorithm based on R-D optimization is proposed in this paper. For the images which have similar distribution, the algorithms can be adopted. For the implementation, fast decoder for wireless handset in [18] is considered. 2.3 Yale Database
Fig. 1
Conceptual diagram of a subspace.
Fig. 2
JPEG baseline decoder.
(S B ) and at the same time minimize the determinant of the within-class scatter matrix (S W ) [8]. The Fisherface method is known to perform better than the Eigenface method [8]– [11] except when the training data set is very small [6]. Recently, several variants of LDA [12]–[15] have been introduced, mainly to solve the small sample size (SSS) problem [16]. The SSS problem arises in face recognition tasks because the dimension of the input space is larger than the number of training samples. Thus, S W in input space becomes singular and LDA cannot be applied directly. In the Fisherface method, PCA is applied first in order to make S W nonsingular [8]. The Null-space LDA (N-LDA) [14], [15] is a new approach to solving the SSS problem. It is from the fact that the null space of S W contains a lot of discriminative information [14]. By projecting samples into the null space of S W using a set of projection vectors W1 , the within-class samples are projected to one point. Then, a set of eigenvectors W2 of S B , corresponding to larger eigenvalues, is found in the null space of S W . The N-LDA method uses the set of projection vectors W = W1 W2 . 2.2 JPEG Compression JPEG (Joint Picture Expert Group) [1] has been a standard format for still image in 1992. Among 4 modes in JPEG, baseline mode is widely used. Fig. 2 shows the flow of JPEG decoding. The compression of JPEG depends basically on the quantization table. To design the quantization table according to the image, R-D optimization [17] can be applied. RD optimization is an algorithm to efficiently optimize ratequality. In R-D optimization, objective function is given as
The codec design is performed for the Yale database. The Yale database contains 165 gray images of 15 individuals, gathered with different facial expressions, with or without glasses, and under different lighting conditions. Each image is cropped to the size of 120 × 100 by bilinear interpolation. Histogram equalization and normalization with zero means and unit variances are then applied to the cropped images. Fig. 3 shows 11 sample images of one person after histogram equalization. In the Yale database, there are 11 images per one person. Ten images differ just a little from the normal image without expression. The quantization table is designed for the 15 normal images and it can be adopted for the rest 150 images. This is because there are large similarities among the images of the same person. 3.
Codec Design for Face Images
3.1 Codec Design Fig. 4 shows the structure of the proposed codec. The quantization table is redesigned applying R-D optimization. The implementation of the codec is based on the results in [18]. Arai DCT [19] is used instead of binDCT [20] for the compression efficiency. The basic algorithm for the proposed design is as follows. • Step 1: Select normal 15 images for each person. • Step 2: Compress and decompress the image using JPEG codec. • Step 3: Plot R-D curve and select the quantization table which makes PSNR 1 dB higher than JPEG quantization table. • Step 4: Calculate the average of 15 quantization table and obtain the final quantization table. • Step 5: Compress and decompress 165 images using the final quantization table. Fig. 5 shows the algorithm to obtain the final quantization table.
IEICE TRANS. FUNDAMENTALS, VOL.E89–A, NO.11 NOVEMBER 2006
2992 Table 2 Average image quality and size for each person (average of 11 images for each person).
Person
Fig. 4
Fig. 5
Table 1 image).
Face image compression algorithm.
Final quantization table selection.
Image quality and size for the image without expression (normal
Normal Image Person 1 Person 2 Person 3 Person 4 Person 5 Person 6 Person 7 Person 8 Person 9 Person 10 Person 11 Person 12 Person 13 Person 14 Person 15 Average
Person 1 Person 2 Person 3 Person 4 Person 5 Person 6 Person 7 Person 8 Person 9 Person 10 Person 11 Person 12 Person 13 Person 14 Person 15 Total average
JPEG Compression Size Quality (Bytes) (dB) 3382 36.00 3023 37.13 2994 37.98 3309 35.19 3302 36.26 3099 36.61 3164 36.70 3395 35.50 3298 36.18 3088 36.76 3269 35.71 3223 36.09 3332 35.58 3302 35.43 3129 37.09 3220.6 36.28
R-D opt. Proposed for each image codec Size Quality Size Quality (Bytes) (dB) (Bytes) (dB) 3402 37.22 3358 37.17 3054 38.20 2907 37.60 3117 39.17 2807 38.06 3143 36.18 3333 37.06 3348 37.28 3279 37.17 3066 37.79 3001 37.48 3278 38.20 3092 37.46 3327 36.77 3365 37.09 3185 37.04 3207 37.31 2989 37.72 2944 37.74 3249 36.75 3304 37.13 3224 37.33 3164 37.28 3286 36.76 3351 36.93 3223 36.64 3306 37.18 3107 38.00 2994 37.60 3199.9 37.40 3160.8 37.35
3.2 Test Results for the Yale Database Table 1 shows the test results for JPEG compression, R-D optimization based compression and the compression with final quantization table. The results in the column of “R-D opt. for each image” are those in Step 3 and the results in the column of “Proposed codec” are those which are compressed with the quantization table in Step 4. Though the final quantizaton table is an average, the
Average (JPEG) Size Quality (Bytes) (dB) 3335 36.14 2968 37.36 2925 38.18 3200 35.73 3290 36.19 3088 36.88 3167 36.46 3390 35.45 3247 36.39 3093 36.75 3194 36.05 3270 35.96 3279 35.74 3278 35.51 3104 37.17 3189 36.40
Average (Proposed Codec) Size Quality (Bytes) (dB) 3292 37.17 2839 37.79 2743 38.14 3182 37.30 3257 37.23 2963 37.62 3118 37.34 3368 37.01 3162 37.41 2976 37.75 3210 37.21 3238 37.21 3278 37.01 3280 37.12 2985 37.65 3126 37.40
results of proposed design are similar to those of Step 3. It is because that the images are face images and have similar distribution. Table 2 shows the results in which the final quantization table is applied. For 165 images, the compression is done using final quantization table. Due to the page limitation, the average values for the same person (11 images) are presented in Table 2. As seen in Table 2, the final quantization table also shows good performances for the images with expression. The average size is a little bit smaller than that of JPEG and the PSNR is 1dB higher than that of JPEG. The results show that the proposed method is simple and efficient. It should be noted that the proposed method can be easily adopted to the JPEG 2000. 4.
Face Recognition Experiment
We performed our experiments using the Yale database. The 11-fold cross validation [13] was used to evaluate the performance of a new codec algorithm. In this scheme, one image from each subject was randomly selected for testing, while the remaining images were used for training. There were 150 images in the training set and 15 images for probing. The nearest neighborhood method was used to make a decision based on the L1 (Manhattan) metric. This experiment was repeated 11 times so that every image could be tested once. Fig. 6 shows the recognition rates of the various methods using different compression algorithms for the Yale database. Both PCA+LDA [8] and N-LDA [15] are used for feature extraction. As can be seen in Fig. 6, the new codec achieves 99.4% and 100.0% recognition rates by using PCA+LDA and N-LDA, respectively. From the experimental results, it is found that the new codec gives better recognition rate than the JPEG compression method in spite of a higher compression rate.
LETTER
2993
be easily adopted to the JPEG 2000. The quantization table design for JPEG 2000 remains future work too. Acknowledgments This work was supported in part by the research program 2006 of Kookmin University in Korea. References
(a) PCA+LDA
(b) N-LDA Fig. 6 Recognition rates of the various methods using different compression algorithms for the Yale database.
5.
Conclusion
In this paper, a codec design method has been proposed and implemented for the Yale database which are face images and the face recognition experiment has been done using proposed codec. The design of codec is based on JPEG and R-D optimization. Since the face images have similar distribution to one another, better performance can be obtained using just one table. From the design for the Yale database, the proposed codec shows better performance than JPEG. Also, the recognition experiment for original images, JPEG images and the images using proposed codec shows that the proposed codec gives better recognition rate than JPEG. In the design of the proposed method, the designed quantization table has been obtained by averaging the each table from the R-D optimization. During the process, the dependencies for the each image can be eliminated. The compensation for this using the neural network with PCA remains future work. Also, it should be noted that the proposed method can
[1] G.K. Wallace, “The JPEG still-picture compression standard,” Commun. ACM, vol.34, pp.30–44, April 1991. [2] Independent JPEG Group, http://www.ijg.org [3] M.D. Adams, The JPEG-2000 Still Image Compression Standard, ISO/IEC JTC 1/SC 29/WG 1 N 2412, Dec. 2002. [4] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cognitive Neurosci., vol.3, no.1, pp.71–86, 1991. [5] A. Pentland, B. Moghaddam, and T. Starner, “View-based and modular eigenspaces for face recognition,” Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp.84–91, 1994. [6] A. Martinez and A.C. Kak, “PCA versus LDA,” IEEE Trans. Pattern Anal. Mach. Intell., vol.23, no.2, pp.228–233, 2001. [7] D.L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol.18, no.8, pp.831–836, 1996. [8] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell., vol.19, no.7, pp.711–720, 1997. [9] C. Liu and H. Wechsler, “Robust coding schemes for indexing and retrieval from large face databases,” IEEE Trans. Image Process., vol.9, no.1, pp.132–137, 2000. [10] X. Wang and X. Tang, “A unified framework for subspace face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol.26, no.9, pp.1222–1228, 2004. [11] C. Kim, J. Oh, and C.-H. Choi, “Combined subspace method using global and local features for face recognition,” Int’l Joint Conf. Neural Netw., 2005. [12] J. Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, “Face recognition using LDA-based algorithms,” IEEE Trans. Neural Netw., vol.14, no.1, pp.195–200, 2003. [13] H. Yu and J. Yang, “A direct LDA algorithms for high-dimensional data with application to face recognition,” Pattern Recognit., vol.34, pp.2067–2070, 2001. [14] L.-F. Chen, H.-Y.M. Liao, M.-T. Ko, J.-C. Lin, and G.-J. Yu, “A new LDA-based face recognition system which can solve the small sample size problem,” Pattern Recognit., vol.33, pp.1713–1726, 2000. [15] H. Cevikalp, M. Neamtu, M. Wilkes, and A. Barkana, “Discriminative common vectors for face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol.27, no.1, pp.4–13, 2005. [16] A. Webb, Statistical Pattern Recognition, 2nd ed., Wiley, Chichester, UK, 2002. [17] M. Crouse and K. Ramchandran, “Joint thresholding and quantizer selection for transform image coding: Entropy-constrained analysis and application to baseline JPEG,” IEEE Trans. Image Process., vol.6, no.2, pp.285–298, Feb. 1997. [18] G.M. Jeong, S.W. Na, D.H. Jung, J.H. Kang, and G.P. Jeong, “A multimedia service implementation using MJPEG and QCELP in wireless handset,” Lecture Notes in Computer Science, vol.3597, pp.190–199, July 2005. [19] Y. Arai, T. Agui, and N. Nakajima, “A few DCT-SQ scheme for images,” IEICE Trans., vol.E-71, no.11, pp.1095–1097, Nov. 1988. [20] J. Liang and T.D. Tran, “Fast multiplierless approximations of the DCT with the lifting scheme,” IEEE Trans. Signal Process., vol.49, no.12, pp.3032–3044, Dec. 2001.