occurrence, three groups of 59 local binary patterns are extracted ... Digital image producing devices such as cameras, cell- phone ... phone camera models. .... Samsung L74. 3. 686. SA1. Samsung NV15. 3. 645. SA2. Sony DSC-H50. 2. 541.
2012 IEEE International Conference on Multimedia and Expo
CAMERA MODEL IDENTIFICATION USING LOCAL BINARY PATTERNS Guanshuo Xu and Yun Qing Shi New Jersey Institute of Technology, Newark, NJ, USA({gx3, shi}@njit.edu) ABSTRACT In digital image forensics, camera model identification seeks for the source camera model information from the given images under investigation. To achieve this goal, one of the popular approaches is extracting from the images under investigation certain statistical features that capture the difference caused by camera structure and various in-camera image processing algorithms, followed by machine learning and pattern recognition algorithms for similarity measures of extracted features. In this paper, we propose to use uniform gray-scale invariant local binary patterns (LBP) as statistical features. Considering 8-neighbor binary cooccurrence, three groups of 59 local binary patterns are extracted from the spatial domain of red and green color channels, their corresponding prediction-error arrays, and their 1st-level diagonal wavelet subbands of each image, respectively. Multi-class support vector machines are built for classification of 18 camera models from ‘Dresden Image Database’. Compared with the results reported in literatures, the detection accuracy reported in this paper is higher.
Fig. 1. A common digital image producing pipeline.
Source camera identification has two different categories and both are popular. The first branch is to match images under investigation with one specific camera. Researchers look for features that capture unique information of that specific camera, such as sensor defects [1] and sensor dust patterns [2]. It is generally a two-class problem, i.e., the specific camera that took the image versus all the other cameras, including other cameras from same make and model. The second branch is called source camera model identification. Its task is to find brand and model information from a given image. It differs from the first branch in that features should capture characteristics of camera models instead of characteristics of individual cameras. So the classification problem becomes one model versus other models. In this paper, we focus on this branch. Fig. 1 gives us an overview of a common image processing pipeline inside digital cameras. Each of these processing stages could be implemented differently by the manufacturers of different camera models. Previous researches focus on certain stages in this pipeline, such as lens defects [3], color filters array (CFA) and demosaicing [4-7], JPEG compression [8], etc. Some consider more than one stage or the whole pipeline [9-12]. Practical experimental settings for camera model identification require more than one camera from each model in order to remove the ambiguity of whether the features, on which the classifiers are built, capture camera model characteristics or individual camera characteristics [11-13]. Within each model, testing images should not come from the same individual cameras that are involved in training. However, most of the previous researches only use one camera to represent a camera model due to the limitations of camera sources. In [13], Gloe, Borowka and Winkler implemented a feature based camera model identification method which is a combination of the features proposed in [9] and six proposed camera white balancing features. In their experiments, 44 cameras spanning 11 camera models from ‘Dresden Image Database’ [14] were used. Based on carefully designed experiments, they draw
Index Terms— camera model identification, local binary patterns, LBP, source device identification 1. INTRODUCTION Digital image producing devices such as cameras, cellphone camcorder and scanners are nowadays popular. Images as a proof of some important facts are sometimes used as evidence in court. Meanwhile, the popularity of image manipulation software enables simple manipulation of both the contents and source information of digital images, thereby compromises the convincingness of them as evidence. Therefore, knowing the source and authenticity of the images used as evidence is important. Although embedding watermarks at the image producing stage is a solution, it is not widely implemented by manufacturers of image producing devices. Usually, digital image forensics relies only on the digital image itself. Here in this paper, we address the problem of source camera identification from given images. Interested images are assumed to be captured by digital still cameras.
978-0-7695-4711-4/12 $26.00 © 2012 IEEE DOI 10.1109/ICME.2012.87
392
the conclusion that those features do capture model identification. In [14], 96.42% average true positive rate was reported using the same feature set with 18 camera models in ‘Dresden Image Database’ as experiment dataset. Binary similarity measures (BSM) calculated from three least significant bit-planes was used in [10] for camera model identification. Together with another two types of feature sets (HOWS and IQM), an overall identification rate of 95.1% were reported with a dataset consists of 16 cellphone camera models. In this paper, we propose to use uniform gray-scale invariant local binary patterns (LBP) [15] as statistical features. Considering 8-neighbor graylevel difference for each image pixel around a circle, 59 local binary patterns are extracted, respectively, from spatial domain of red and green color channels, their prediction-error 2D arrays, and the 1st-level diagonal wavelet subband of each image. Multi-class support vector machines are built for classification of 18 camera models from ‘Dresden Image Database’. Compared with the results in literatures, the detection accuracy reported in this paper is higher. The rest of the paper is structured as follows. In Section 2, the LBP features that we use, how to extract features, and why the features work effectively are introduced. In Section 3, experimental works are presented and some discussions are made. Conclusions are drawn in Section 4.
Fig. 2. (Left) Constellation of neighborhood. (Right) Examples of ‘uniform’ and ‘non-uniform’ local binary patterns.
Function s is defined by
1, x ≥ 0 s( x) = ® ¯0, x < 0.
According to Equations (1) and (2), graylevel difference is first calculated between center pixel and its eight neighbors. The difference will then be binary quantized and coded, producing local binary patterns, which, in essence, form an 8-dimensional histogram with a total of 28 = 256 bins. In addition, in [15], the concept of ‘uniform’ local binary patterns is introduced. For the meaning of “uniform” and “non-uniform”, readers are referred to Fig. 2 (Right) for some examples. Uniformity is satisfied when the number of binary transitions over a whole neighborhood circle is equal to or smaller than 2. As ‘uniform’ local binary patterns occupy the majority of the total number of histogram bins [15], those ‘non-uniform’ local binary patterns are merged to one bin, thereby suppressing the number of bins from 256 to 59 if P = 8 is considered.
2. PROPOSED METHOD In this section, we first give a brief description of uniform gray-scale invariant local binary patterns proposed in [15]. Our proposed feature extraction framework will then be introduced, and the efficacy of our proposed features for camera model classification will be demonstrated by a series of simulation tests.
2.2. Feature Extraction Framework
Inspired by the fact that a quite some of image processing algorithms, such as demosaicing, filtering, JPEG compression, are block-wise implemented inside cameras, it is reasonable to consider that some localized characteristics or artifacts have been generated. These characteristics or artifacts could be effectively captured by the uniform grayscale invariant local binary patterns, introduced in Section 2.1. Grayscale invariance is achieved by calculating difference between center and neighbor pixels’ graylevels. This process to some extent suppresses the influence of various image contents. The introduction of ‘uniform’ local binary patterns enables a natural feature dimensionality reduction which is desired by pattern classification algorithms. Therefore, we propose to use the uniform grayscale invariant local binary patterns as features to capture camera model characteristics. As most of the camera image processing algorithms work in spatial domain, a natural choice would be extracting features directly from graylevels of each color channel in spatial domain. From each color channel, a 59-dimensional
2.1. Uniform Gray-scale Invariant Local Binary Patterns In [15], uniform gray-scale invariant local binary patterns are introduced, which can be described by
LBPPu,2R = ¦ p = 0 s( g p − g c )2 p. P −1
(2)
(1)
where R is the radius of a circularly symmetric neighborhood used for local binary patterns calculation, P is the number of samples around the circle. In this paper, we set R = 1, P = 8 . g c and g p represent gray levels of the
center pixel and its neighbor pixels, respectively. The constellation of the circular neighborhood we use for local binary pattern calculation is shown in Fig. 2 (Left).
393
image processing tasks insides cameras. We use 20 raw images from Nikon D70 as our basic simulation dataset. All of them are from ‘Dresden Image Database’. The dcraw and Matlab are tools we use to mimic the image processing inside cameras. Five different kinds of image processing algorithms are considered, i.e., demosaicing, color space conversion, gamma correction, filtering, and JPEG compression, for each of which three different algorithms or parameter settings are implemented. After feature extraction and linear discriminant analysis (LDA), high-dimensional features are projected to 2-dimensional space (Fig.4. (a)-(e)), in which they are clearly clustered according to different algorithms instead of image contents, thereby demonstrating the discrimination ability of our proposed features.
Fig. 3. Feature extraction framework for one color channel.
LBP feature set is calculated by Equation (1) under the assumption of R = 1, P = 8 (Each 59-D LBP feature set are normalized to eliminate the influence of different image resolution). Besides, the same set of LBP features are extracted from prediction-error (PE) image. PE image is obtained by subtracting a predicted image from the original image. Considering a 2x2 image pixel block, prediction of a pixel value is achieved by [16], max( a, b) c ≤ min(a, b ) ° x = ® min(a, b) c ≥ max( a, b) ° a + b − c otherwise ¯ ^
Bilinear VNG PPG
(3)
where a, b are, respectively, the immediately horizontal and vertical neighbors of the pixel x. c is at the diagonal ^
neighbor of x, and x is the prediction value of x. As some image processing algorithms differ largely at edges such as demosaicing and filtering, the prediction error image, which is, in essence, a spatial domain highpass filtered image, is another ideal choice to extract features from. One of the side-effects of the gray-scale invariance feature of LBP is its insensitivity to monotonic graylevel transform in spatial domain. Although this could be a good feature for some applications, it is not desired for camera model identification, as some image processing algorithm such as gamma correction has spatial domain monotonic nature and thus the difference of these algorithms could not be captured by our LBP features. To enhance the discrimination ability, in addition to the spatial domain, wavelet domain is considered and we propose to extract another 59-dimensional LBP feature set from diagonal subband (HH subband) of 1st-level Haar wavelet transform. To conclude, from each color channel, we extract LBP features from original image, its prediction-error 2D array, and its 1st-level diagonal wavelet subband, resulting in a total of 59x3=177 features. The feature extraction framework of one color channel is shown in Fig. 3. Considering the fact that red and blue color channels usually share the same image processing algorithms, we only use red and green channels. Therefore, the final feature dimensions extracted from a color image is 177x2=354.
(a)
Demosaicing implemented in dcraw (VNG= Variable Number of Gradients , PPG= Patterned Pixel Grouping ) adobeRGB sRGB raw
(b)
Color space conversion (raw=original camera color space) BT709 gamma=1.6 gamma=1.0
2.3. Efficacy of Proposed Features
In this section, some simulation results are presented to demonstrate that our proposed features are able to capture traces caused by different algorithms at a couple of typical
(c)
394
Gamma correction (BT709 gamma=2.4)
3.2. Experimental Settings
average log median
(d)
In all of our experiments, multi-class support vector machines (SVM) [17] are trained and used as the classifiers for testing. From the whole dataset, we randomly select one camera for each model, and use all the images taken by the selected cameras for testing. Images from the rest of the cameras form the training data. This random selection procedure is iterated 20 times for each experiment. Involving images from more than one camera of each model (except those have only 2 cameras) for training can greatly reduce the chance of overtrainng [9]. Using the cameras that are not involved in the training procedures for testing makes the experiments more practical [13]. In each iteration, images for both training and testing are cut into six sub-images from centers. The final decision in testing stage is made for each image by majority voting based on the six individual decisions. This cropping and voting procedure not only increases the number of samples for training, but also brings robustness against the regional anomalies in testing images. A block diagram is shown in Fig. 5 which includes both the training and testing stages (only one image is shown in the testing stage).
Filtering (log=Laplacian of Gaussian) QF=100 QF=80 QF=60
(e)
JPEG compression (QF=quality factor)
...
...
...
3.3. Results and Discussions
The experimental results using our proposed features are reported in Table 2, which gives the average confusion matrix over 20 iterations. The average identification accuracy (true positive rate (TPR)) reaches more than 98% for 18 camera models. Note that in [14], average TPR of 96.42% is reported using the same camera models in ‘Dresden Image Database’ as experiment dataset. Our identification accuracy is higher by more than one percentage. For comparison purpose, we also tested the Markov features proposed in [12]. Fig. 6 displays the results by comparing the identification accuracy model by model between features proposed in this paper and those proposed in [12] using a bar graph. We can see that our proposed features outperforms the features proposed in [12] for most of the camera models. Although the average detection rate is high, we note that the detection rates for Nikon D200, Sony H50 and Sony W170 are 97.58%, 93.76% and 74.90%, respectively, which are relatively low. From Table 1, we can see that the number
Table 1. Experimental dataset Canon Ixus 70 Casio EX-Z150 Fujifilm FinePix J50 Kodak M1063 Nikon Coolpix S710 Nikon D70/D70s Nikon D200 Olympus MJU Panasonic DMC-FZ50 Pentax Optio A40 Praktica DCZ 5.9 Ricoh Capilo GX100 Rollei RCP-7325XS Samsung L74 Samsung NV15 Sony DSC-H50 Sony DSC-T77 Sony DSC-W170
...
Fig. 5. Block diagram of training and testing stages. FE=feature extraction.
We picked the same 18 camera models from ‘Dresden Image Dataset’ as used in [14]. The number of camera devices for each model ranges from 2 to 5. The number of images per model ranges from 405 to 2087. All the images are direct camera JPEG outputs which are captured with various camera settings. Details are given in Table 1. Camera Model
...
3.1. Dataset for Experiments
...
3. EXPERIMENTS AND DISCUSSIONS
...
Fig.4. (a)-(e) 2-D projection results from the whole feature set by LDA. Five different kinds of image processing are considered, each with three different algorithms or parameter settings. Markers with same color and shape denote one specific algorithm or parameter setting. Discrimination ability is shown via the clustering effects.
# devices
# images
3 5 3 5 5 2/2 2 5 3 4 5 5 3 3 3 2 4 2
567 925 630 2087 925 736 752 1040 931 638 1019 854 589 686 645 541 725 405
Abbr. CAN CAS FUJ KOD NIK1 NIK2 NIK3 OLY PAN PEN PRA RIC ROL SA1 SA2 SY1 SY2 SY3
395
of individual cameras of these three camera models is two, which means only one camera per model is involved in training. In this case, there exist two possibilities that cause the low detection accuracy for these three models. Either our feature could not capture model characteristics for these three camera models well, or our feature captures individual camera characteristics. In order to clear up this issue, we did some additional experiments by classifying images between the cameras of the same model, which tests withinmodel discrimination ability of our features. A higher identification rate here implies more chance of overtraining. Ideal identification rate would be that by random guess. Tables 3-5 show the results. In Table 3, we see that the detection accuracy between two Nikon D200 cameras is almost 80%, which is much higher than ideal (50%). So overtraining could be the cause of lower detection accuracy. This could possibly be solved by adding more devices of Nikon D200. For the other two Sony camera models, results in Table 4-5 demonstrate low intra-model similarity, thus eliminating possibility of overtraining. Therefore, we can conclude that our feature set could not reliably identify those two Sony camera models. The testing results of discrimination abilities of LBP features extracted from spatial domain, prediction-errors, and wavelet subbands of both red and green channels are shown in Fig. 7, from which we can see that the combined LBP features do improve the overall identification performance.
Table 4. Confusion matrix between two cameras of Sony H50
Actual
Nikon D200-1 Nikon D200-2
H50-2 45.07 51.68
Table 5. Confusion matrix between two cameras of Sony W170 Predicted W170-1 W170-2 60.57 39.43 40.45 59.55
Average TP=58.21 Sony W170-1 Sony W170-2
Actual
100 98.08
Average True Postive Rate (%)
98
96 94.65 94.11
94 91.91
92
91.76
91.63
DWT-R
DWT-G
91.03 90
88
86
R
G
PE-R
PE-G
ALL
Fig. 7. Classification accuracy using LBP features extracted from different image 2-D arrays and the combined features we propose.
4. CONCLUSION AND FUTURE WORK
We propose in this paper the uniform gray-scale local binary patterns as features for camera model identification. By combining features extracted from the original image, its prediction-error image, and the HH subband of the image’s 1st level wavelet transform, the proposed scheme has demonstrated improved performance in camera model identification. Our future work includes improving classification performance among models from same brands (two Sony models in experiments for example), applying feature selection and robustness tests against common image postprocessing algorithms.
Predicted D200-1 78.68 19.07
H50-1 54.93 48.32
Sony H50-1 Sony H50-2
Actual
Table 3. Confusion matrix between two cameras of Nikon D200 Average TP=79.81
Predicted
Average TP=53.64
D200-2 21.32 80.93
Table 2 Average confusion matrix obtained by SVM classification over 20 iterations. Average Classification accuracy is 98.08%.
Actual
Average TP=98.08 CAN CAS FUJ KOD NIK 1 NIK 2 NIK 3 OLY PAN PEN PRA RIC ROL SA1 SA2 SY1 SY2 SY3
Predicted PAN PEN 0.48 -
CAN 100 -
CAS 99.52 -
FUJ 100 -
KOD 100 -
NIK1 100
NIK2 -
NIK3 -
OLY -
PRA -
RIC -
ROL -
SA1 -
SA2 -
SY1 -
SY2 -
SY3 -
-
-
-
-
-
100
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
2.39
97.58
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
0.28 -
-
-
100 -
100 -
100 -
100 -
100 -
100 -
100 -
99.72 -
93.76 25.10
100 -
6.24 74.90
396
Proposed Markov 100
Average TP rate (%)
90
80
70
60
50
CAN
CAS
FUJ
KOD
NIK1
NIK2
NIK3
OLY
PAN
PEN PRA Camera Models
RIC
ROL
SA1
SA2
SY1
SY2
SY3
Fig. 6. Comparison of identification results using proposed features vs. Markov features in [12].
5. REFERENCES
[12]
[1] M. Chen, J. Fridrich, M. Goljan, and J. Lukas, “Determining Image Origin and Integrity Using Sensor Noise,” Information Forensics and Security, IEEE Transactions on, vol. 3, pp. 7490, 2008. [2] A. E. Dirik, H. T. Sencar, and N. Memon, “Digital Single Lens Reflex Camera Identification From Traces of Sensor Dust,” Information Forensics and Security, IEEE Transactions on, vol. 3, pp. 539-552, 2008. [3] K. S. Choi, E. Y. Lam, and K. K. Y. Wong, “Automatic source camera identification using the intrinsic lens radial distortion,” Opt. Express 14, pp. 11551-11565, 2006. [4] S. Bayram, H. Sencar, N. Memon, and I. Avcibas, “Source camera identification based on CFA interpolation,” in Image Processing, 2005. ICIP 2005. IEEE International Conference on, 2005, pp. III-69-72. [5] Y. Long and Y. Huang, “Image Based Source Camera Identification using Demosaicking,” in Multimedia Signal Processing, 2006 IEEE 8th Workshop on, 2006, pp. 419-424. [6] A. Swaminathan, W. Min, and K. J. R. Liu, “Nonintrusive component forensics of visual sensors using output images,” Information Forensics and Security, IEEE Transactions on, vol. 2, pp. 91-106, 2007. [7] H. Cao and A. C. Kot, “Accurate Detection of Demosaicing Regularity for Digital Image Forensics,” Information Forensics and Security, IEEE Transactions on, vol. 4, pp. 899-910, 2009. [8] E. Kee, M. K. Johnson, and H. Farid, “Digital Image Authentication from JPEG Headers,” Information Forensics and Security, IEEE Transactions on, vol. 6, pp. 1066-1075, 2011. [9] K. L. Mehdi, H. T. Sencar, and N. Memon, “Blind source camera identification,” in Image Processing, 2004. ICIP '04. 2004 International Conference on, 2004, pp. 709-712 Vol. 1. [10] O. Celiktutan, B. Sankur, and I. Avcibas, “Blind Identification of Source Cell-Phone Model,” Information Forensics and Security, IEEE Transactions on, vol. 3, pp. 553-566, 2008. [11] T. Filler, J. Fridrich, and M. Goljan, “Using sensor pattern noise for camera model identification,” in Image Processing,
[13]
[14]
[15]
[16]
[17]
397
2008. ICIP 2008. 15th IEEE International Conference on, 2008, pp. 1296-1299. G. Xu, S. Gao, Y. Q. Shi, R. Hu, and W. Su, “Camera-Model Identification Using Markovian Transition Probability Matrix,” in Digital Watermarking. vol. 5703, ed: Springer Berlin / Heidelberg, 2009, pp. 294-307. T. Gloe, K. Borowka, and A. Winkler, “Feature-Based Camera Model Identification Works in Practice,” in Information Hiding. vol. 5806, ed: Springer Berlin / Heidelberg, 2009, pp. 262-276. T. Gloe and R. Böhme, “The 'Dresden Image Database' for benchmarking digital image forensics,” presented at the Proceedings of the 2010 ACM Symposium on Applied Computing, Sierre, Switzerland, 2010. T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, pp. 971-987, 2002. M. J. Weinberger, G. Seroussi, and G. Sapiro, “LOCO-I: a low complexity, context-based, lossless image compression algorithm,” in Data Compression Conference, 1996. DCC '96. Proceedings, 1996, pp. 140-149. C.-C. Chang and C.-J. Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm