Multimed Tools Appl DOI 10.1007/s11042-017-5206-8
Expose noise level inconsistency incorporating the inhomogeneity scoring strategy Heng Yao 1,2 & Fang Cao 3 & Zhenjun Tang 2 & Jinwei Wang 4 & Tong Qiao 5
Received: 26 February 2017 / Revised: 25 July 2017 / Accepted: 4 September 2017 # Springer Science+Business Media, LLC 2017
Abstract Estimating variances in noise is of key importance in many image processing applications, such as filtering, enhancement, quality assessment, and detecting forgery. For the existing detection methods that are based on inconsistencies in noise, the conventional approach is to estimate the noise variance of each region first and then identify the regions with extremely higher or lower variance as splicing regions. However, due to the impossibility of completely separating image noise and inherent texture, inevitably, each estimate is overestimated, especially for regions that have more complex textures. In this paper, we consider the issue that the estimation of the noise of each region frequently is inaccurate due to the complexity of the texture of the region. Based on this consideration and motivated by the scoring strategy-based, object-proposal technique, an approach that incorporates the inhomogeneity scoring strategy is proposed to provide a more convincing result to expose imagesplicing manipulations. Specifically, first, the image is segmented into small patches, and the noise variance of each patch is computed by using the kurtosis concentration-based pixel-level noise estimation method. Then, the inhomogeneity score is computed using the spectral residual-based saliency measurement method. After using a linear equation fitting based on the estimated sample of variance and the inhomogeneity score of each patch, the suspicious
* Heng Yao
[email protected]
1
Shanghai Key Lab of Modern Optical System, and Engineering Research Center of Optical Instrument and System, Ministry of Education, University of Shanghai for Science and Technology, Shanghai 200093, China
2
Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
3
College of Information Engineering, Shanghai Maritime University, Shanghai 200135, China
4
School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
5
School of Cyberspace, Hangzhou Dianzi University, Hangzhou 310018, China
Multimed Tools Appl
region can be identified by seeking the conjunct patches that are out of the linear constraint. The experimental results demonstrated the efficacy and robustness of the proposed method. Keywords Image forensics . Splicing detection . Noise inconsistency . Inhomogeneity assessment . Noise estimation
1 Introduction As digital image acquisition devices have gained in popularity, such as a large variety of different types of digital cameras, thousands of digital images are being captured every day, and they are being used extensively in social media and news reports and as forensic evidence. However, the content of these images can be edited easily using sophisticated software, such as Adobe Photoshop. Once the accuracy of an image is called into question, it is worthwhile, but somewhat challenging, to identify the integrity and authenticity of the image. There are some image watermarking and hashing methods to guarantee the authenticity and copyright of the images [18, 19], however, these techniques require the image to be preprocessed and thus, their application scenarios have great limitations. To overcome these deficiencies, the work that exposes the traces of forgery in each image is referred to as digital image forensics, which can be divided roughly into two categories, i.e., determining the regions that have been pasted from the same source image and regions that have been tampered by inserting portions of different source images. For the first category, the tampering region in the image for identification is pasted from another region in the same image before undergoing some additional operations such as smoothing, rotation, resize, etc., and this kind of forgery is referred to as copy-move or copy-paste tampering. The state-of-the-art works to detect this kind of forgery include [5, 7, 10]. Although, the copy-move tampering can be well detected using the existing method, however, the circumstance that the spliced region is pasted from another source is more common than that from the same source. Therefore, the research on the second category have been paid more attentions in recent years. Specifically, there are many cues for image splicing detection, such as artifacts of resampling or rotation traces [26], demosaicing traces [16], JPEG compression traces [4, 11], blur traces [3], and random noise inconsistency characteristics [13–15, 28, 29]. In addition, there are also many studies that focus on the source identifications, including the distinction of computer graphics and photographic images [25] and the camera type identification for a given image [24]. Besides, there are some integrated forensic methods to incorporate the deep learning techniques, such as [22]. Due to the space limitation of this paper, more comprehensive survey can be referred to [20, 23]. Overall, to classify a given image into splicing or authentic image, a comprehensive analysis is recommended to be executed by incorporating plenty of above mentioned features, and among them, the noise feature can provide a significant cue for identification. The image forensic methods based on the inconsistency of noise features can be roughly grouped into two categories: the photo-response non-uniformity (PRNU) noise based methods and random noise assumption based methods. The PRNU noise is the main component of sensor pattern noise and primarily generated from the manufacturing imperfections in the silicon wafer. There are a lot of works dedicated to identifying the image authentication by detecting the inconsistencies of the PRNU fingerprints that remain in the suspicious images. Ghierchia et al. [6] proposed a forgery image detection method based on a Bayesian estimation method by using a Markov random field prior. Recently, Korus and Huang [9]
Multimed Tools Appl
proposed a PRNU-based tampering localization method using a multi-scale fusion approach. Although PRNU-based forensic technique has developed rapidly, as indicated in [13], such methods need to meet a prerequisite that some other authentic photos shoot by the same camera model are prepared in advance, and are therefore not blind ways in the strict sense. When we relax the noise type to all random noise, there are many blind methods to detect image splicing based on noise inconsistency. The image forensic method based on the inconsistency of noise features was first proposed by Mahdian and Saic [14], in their work, they considered the circumstance that some random noise was usually superimposed on the to-bepasted region to conceal the manipulation of splicing. Based on this consideration, the nonoverlapped blocks were segmented and their corresponding standard deviation of the noise for each block was determined through a median value selection of their high pass wavelet coefficients. Pan et al. [15] proposed a forensic method based on noise inconsistency features using a noise estimator based on projection kurtosis concentration, that was first proposed by Zoran and Weiss [30], to compute the variance of each block and compare with each other. Afterwards, the same authors of [15] extended their work to be capable of locally estimating the noise variance of each pixel rather than each block [13]. More details of [13] will be interpreted in Section 2 as background. Our previous work [28] concentrated on the issue that image noise variance may vary depending on the intensity of the original noise-free image. Thus, the noise characteristic of an image can be described by a function, that was referred to as noise level function (NLF) [12], with respect to image intensity. The NLF of each suspicious region was then estimated through exploring the strong relationship between NLF and the camera response function, and building a Bayesian maximum a posteriori (MAP) framework to optimize the NLF estimation with the finite samples. Although the merit of [28] is obvious that the intensity dependent noise assumption more coincides with the noise characteristics of the actual captured photos, however, the limitation of [28] is also evident that the suspicious regions were recommended to be selected manually and the work of NLF estimation need a certain number of pixel samples, i.e., the region for detection was suggested to be large enough. Recently, motivated by the principal component analysis (PCA) based noise level estimation method [17], Zeng et al. [29] classified the tampering and original regions using a k-means clustering method. In preparing this paper, we observed the phenomenon in which certain estimated variances of the local noise tended to be overestimated by most existing methods due to the influence of the complexity of the texture. Based on this observation and motivated by the scoring strategybased object proposal technique [2], each pixel of the image to be detected was allocated an inhomogeneity score to indicate the probable extent to which the noise level had been overestimated. After separating an image into small patches, the corresponding variances and inhomogeneity scores of the patches were computed. According to the approximate linear relationship between the inhomogeneity score and the estimated variance, the abnormal regions out of the linear constraint can be determined and used to identify any suspicious splicing operations. By using the inhomogeneity score strategy, the rate of detecting spliced images was maintained at a high level of accuracy, and the proposed method effectively reduced the occurrence of false detections for some authentic images with inhomogeneous complexities in their local texture. The rest of paper is organized as follows. Section 2 introduces the related work that how to estimate the noise variance of each pixel using projection kurtosis concentration. Section 3 interprets the motivation of this paper and Section 4 presents the concrete steps of the proposed method. Then, Section 5 shows and analyzes the experimental results, and finally, Section 6 concludes the paper.
Multimed Tools Appl
2 Background The projection kurtosis concentration-based noise estimation method was proposed by Lyu et al. [13]. In this section, first we discuss the basic concept of projection kurtosis concentration. Assume that x is a 2-dimensional, local patch from a noise-free natural image, and assume that each element in x can be treated as a random variable, denoted by x. The corresponding variance σ2(x) and excess kurtosis κ(x) of x are defined, respectively, as: σðxÞ2 ¼ μ2 −μ21
ð1Þ
And κðxÞ ¼
μ4 −4μ1 μ3 þ 6μ2 μ21 −3μ41 −3 μ22 −2μ2 μ21 þ μ41
ð2Þ
where, μ1, μ2, μ3, and μ4 are the expectations of x, x2, x3, and x4, respectively. In addition, the kurtosis of the projection of x onto a unit base w, denoted by κ(wTx), is referred to as its projection kurtosis with regard to w. In MATLAB, wTx was computed by operating a 2-D convolution of matrices, x and w, using the function conv2, thereby attaining the central part of the convolution, which was the same size as x. Assume that a set of symmetrical and orthogonal bases, denoted by {wk., k = 1, 2, …, K}, are constructed and that their corresponding projection kurtosis are denoted by {κ(wk Tx), k = 1, 2, …, K}, respectively. According to the observation of the amount of practical natural images, ignoring the extreme texture regions, the majority of the projection kurtosis was concentrated near a constant value, i.e., ð3Þ κ wT1 x ≈κ wT2 x ≈…≈κ wTK x Due to the absence of a noise-free image, the value of any κ(wk Tx) usually is unknown and must be estimated. For ease of optimization, each κ(wk Tx) was treated as a constant and was denoted byκ. To this point, we have introduced the phenomenon of projection kurtosis concentration, and more details are available in reference [13]. Then, its application for the estimation of local noise can be explained as follows. Based on the above phenomenon, assume that y is the noise contaminated patch of x and that the variance of the noise is σ ^ 2 , which is the parameter we wish to estimate. Project y onto the base set of {wk., k = 1, 2, …, K} and count their corresponding variances σ(wk Ty) 2, k = 1, 2, …, K, and projection kurtosis κ(wk Ty), k = 1, 2, …, K, respectively. According to reference [13], for each wk., the relationship between κ(wk T y) and the unknown parameters, κ and σ ^2 , can be derived by: qffiffiffi 2 T 2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi T ffi σ wk y −^ σ ^ κ wk y ¼ κ k ¼ 1; 2; …; K ð4Þ σ2 wTk y By minimizing the squared difference between both sides of Eq. (4), the original noise-free projection kurtosisκand noise variance^ σ2 can be estimated by 0 12 pffiffiffiffi pffiffiffiffiffi κ ∑Kk¼1 κk ∑Kk¼1 σ14 − ∑Kk¼1 σ2 k ∑Kk¼1 σ12 C B k k k κ¼@ ð5Þ A 2 K K 1 1 K ∑k¼1 σ4 − ∑k¼1 σ2 k
k
Multimed Tools Appl
and 0 σ ^2 ¼
1 ∑Kk¼1
B K− 1 @ σ2k
1
pffiffiffiffiffi ∑Kk¼1 κk C qffiffiffi κ
A
ð6Þ
where κk and σk are the shortened expressions of κ(wkTy) and σ(wkTy), respectively, and they can be computed by Eqs. (1) and (2), respectively. To this point, we have explained the estimation of the variance of each local patch. To extend the application to computing the local variance at each pixel location, following [13], the averaging operation of the variances of the local patch noises obtained with the difference patch sizes (such as 4 × 4, 8 × 8, 12 × 2, for example) was conducted to reduce the influence of the local texture. We denoted the calculated pixel-level variance map as V, which is constituted by {vl,m, l = 1, 2, …L and m = 1, 2, …, M}, where L and M are the row and column numbers of the image, respectively. To further generate the detection map according to [13], a proper threshold was selected manually, and in this paper, we set the value of 1.5 times the median value of V as the threshold empirically. Finally, the pixels that had noise variances greater than the threshold were connected further and merged using morphological operations. Specifically, for our MATLAB simulation, we used the function imfill to fills holes in the threshold processed image and function regionprops to identify the largest area of the connected area. Fig. 1 shows two examples of the estimation results of [13], in which (a) and (d) are two images downloaded from the Columbia Uncompressed Image Splicing Detection Evaluation Dataset1 (hereafter called the Columbia Dataset), (b) and (e) are their corresponding estimated pixel-level noise variance map V, and (c) and (f) are their corresponding refined results after the morphological operation. It is apparent that the spliced regions of both images can be detected very clearly by using the method proposed in [13].
3 Motivation of the proposed method Although the inconsistencies in the characteristics of local noise were exposed in [13], its performance is frequently affected by the textural structure of the image. In other words, for a natural, authentic image, the estimated variance of the noise at a pixel with complex texture is relatively higher than that with a smooth texture. That is the reason most global noise estimation methods, such as [21, 27], elaborately selected the most homogeneous patches as samples to compute the variance in the noise. In digital image forensic applications, measuring the variance of each pixel or patch is essential even if it has a complex texture. Therefore, the overestimation of high texture pixels is unavoidable in almost all existing noise estimation methods. To demonstrate this kind of overestimation, we applied the noise estimation method proposed in [13] on some authentic images, which also were downloaded from the Columbia Dataset, and there were some false detections among the detection results. Two examples of these false detections are shown in Fig. 2. It can be seen in Fig. 2 that brighter regions in each detection map indicate the higher estimated noise level, however, their actual noise levels were 1 Both authentic and spliced images can be downloaded from the website of http://www.ee.columbia. edu/ln/dvmm/downloads/authsplcuncmp/ .
Multimed Tools Appl
Fig. 1 Two examples of successful detections for the projection kurtosis concentration based method [13]. (a) and (d) are two test images canonxt_kodakdcs330_sub_11.tif and canong3_canonxt_sub_22.tif, respectively in Columbia Dataset. (b) and (e) are the estimated pixel-level noise variance map of (a) and (c), respectively, where the brighter regions indicate the higher noise variance. (c) and (f) are the further refined results of (b) and (e), respectively
exactly the same as the darker regions. Furthermore, most of the overestimations occurred in the inhomogeneous regions, i.e., the regions with more complex textures. Motivated by the concept of scoring based object proposal technique [2], we proposed an inhomogeneity scoring strategy to effectively suppress this kind of false detection. First, we divided the entire image into several non-overlapping patches, and, then, we estimated the variance in the noise of each patch by using a median value selection strategy onto the detection result of [13]. Next, the inhomogeneity score of each patch also was computed according to the relative complexity of its texture. The scoring criterion is discussed in the next paragraph. Fig. 3 shows samples of the estimated noise variances of Figs. 2(a) and 2(d) with respect to their inhomogeneity scores. Note that, we normalized the maximum score as 100 in our method. Fig. 3 shows that the noise level of each patch was approximately proportional to its corresponding score. Thus, for a given image, a straight line function can be fitted to describe the relationship between noise variance and the inhomogeneity of each patch. Then, we determined the confidence range to indicate the reliability of the estimations of the samples, and the samples that fell outside this range were identified as suspicious regions. The fitted lines (denoted by solid blue lines), determined the upper and lower bounds of the confidence interval (denoted by dotted red lines and dashed green lines, respectively), and tentatively identified suspicious patches (denoted by red squares and green diamonds) of Fig. 1(a), Fig. 1(d), Fig. 2(a), and Fig. 2(d) are shown in Figs. 4(a) - 4(d), respectively. As observed for each forgery image, i.e., Figs. 4(a) and 4(b), the inconsistency of the noise can be identified readily according to a certain amount of abnormal samples that are higher than the upper bound or lower than the lower bound. To further indicate the abnormal patches, the specific locations of abnormal patches in Fig. 4(a) and (b) are shown in (e) and (f), respectively, where red regions denote the suspicious patches with higher noise level. It can be seen in (e), the splicing region can be well localized by our inhomogeneous scoring strategy, while for (f), although the tampering region was not fully detected, the splicing manipulation can still be perceived by our subsequent operations which
Multimed Tools Appl
will be introduced in Section 4. For the authentic images in Figs. 4(c) and 4(d), most of the samples were identified correctly as trustworthy patches, and the false detections were amended effectively, except for some individual incorrectly detected samples, which can be removed through the multi-scale comprehensive assessment strategy described in the next section. After presenting the motivation of the proposed method, i.e., the approximately linear relationship between the variance of noise and the inhomogeneity score, the key point concerning how to score the inhomogeneity of each patch must be determined. To the best of our knowledge, no prior work related to forensic approaches has considered this issue, but it should be noted that the evaluation of the texture or inhomogeneity is of significant importance for many existing computer vision applications, such as measuring saliency, proposing the existence of objects, and filtering noise. As described in [2], four image cues, including multiscale saliency, color contrast, edge density, and straddled superpixels were used to measure the characteristics of each local window. Among them, multi-scale saliency is an efficient cue to describe the complexity of the texture of each patch. Thus, in this paper, following [2], we used the spectral residual-based saliency measure [8] as the descriptor of local inhomogeneity. In particular, the saliency map S of an image I is obtained at each pixel p as:
Fig. 2 Two examples of false detections for the projection kurtosis concentration-based method [13]. (a) and (d) are two test images, i.e., canonxt_32_sub_07.tif and canonxt_14_sub_09.tif, respectively, from the Columbia Dataset. (b) and (e) are the estimated pixel-level noise variance maps of (a) and (c), respectively. (c) and (f) are the further refined results of (b) and (e), respectively
Multimed Tools Appl
(a)
(b)
Fig. 3 Estimated noise variance and inhomogeneity score of each patch for Figs. 2(a) and 2(d). (a): patch samples from Fig. 2(a). (b): patch samples from Fig. 2(d). Both images are segmented into 100 patches
h i∘2 S ¼ G* ξ−1 eℓðI Þ eℑ ðI Þ
ð7Þ
where G is a Gaussian filter matrix used for smoothing the output, ξ−1()is the inverse Fourier transform function, ℓ(I)andℑ(I)are the spectral residual and the phase spectrum of the image, respectively, ∗ is the convolution operation, and the superscript of ∘2 is the component-wise square operation. The definition of the spectral residual is: ℓðI Þ ¼ logðℜ ðI ÞÞ−ψðlogðℜ ðI ÞÞÞ
ð8Þ
whereℜ(I)is the amplitude spectrum of image I, and ψdenotes the 3 × 3 mean filtering process. Figs. 5(a) - 5(d) show the saliency maps of Figs. 1 (a), 1(d), 2(a), and 2(d), respectively. Intuitively observed from Fig. 5, the brighter regions indicate the saliency regions as well as the inhomogeneous regions. Based on these observations, we collected all of the saliency values belonging to each patch from the entire saliency map and set the median value as its unnormalized inhomogeneity score. Through incorporating the inhomogeneity scoring strategy, some suspicious regions that were estimated to have higher noise level can be accurately correct to non-suspicious ones once these regions have relatively higher inhomogeneity scores. The exploitation of constraints between estimated noise variance and inhomogeneity is the main theoretical contribution of this paper.
4 Procedures of the proposed method According to the motivation introduced in Section 3, we propose a scoring strategy-based, noise-inconsistency detection method. Fig. 6 shows a block diagram of the proposed method and the concrete description of each step is given below: Step 1: Segment image into small patches.
Multimed Tools Appl
(a)
(c)
(e)
(b)
(d)
(f)
Fig. 4 The fitted lines, determined confidence upper and lower bound lines, and identified tentative suspicious patches of four test images. (a) for Fig. 1(a). (b) for Fig. 1(d). (c) for Fig. 2(a). (d) for Fig. 2(d). (e) and (f) indicate the tentative suspicious patches with higher noise level for Fig. 1(a) and (d), respectively. For (a)-(d), the fitted line, determined upper bound line, and lower bound line are denoted by a solid blue line, a dotted red line, and a dashed green line, respectively. The suspicious patches are denoted by red squares or green diamonds, and the sample number was 100
If the image for detection is a color image, first transfer the image into a gray image. Then, use the simple linear iterative clustering (SLIC) superpixel method [1] to segment the entire image into a specified quantity of non-overlapping patches. Assume that the segmentation number is set at N, and denote each patch as Pn, n = 1, 2, …, N. The reason we chose the SLIC method was its advantages in that it provided approximately the same size and a better boundary partition for each patch. Note that we did not divide the image into non-
Multimed Tools Appl
(a)
(b)
(c)
(d)
Fig. 5 Saliency maps of four test images. (a) for Fig. 1(a). (b) for Fig. 1(d). (c) for Fig. 2(a). (d) for Fig. 2(d). The brighter regions indicate the saliency regions and the inhomogeneous regions
overlapping square blocks here, because the texture characteristics provide a better perceptive through SLIC than does square block segmentation. Step 2: Estimate the variance of each patch.
Fig. 6 Block diagram of the proposed method
Input image
Step 1: Image segmentation
Step 2: Patch-level Step 3: Patch-level variance estimation inhomogeneity evaluation
Step 4: Tentative splicing region localization
Step 5: Final result refinement
Detection map
Multimed Tools Appl
First, the pixel-level variance map, V, is measured using the projection kurtosis concentration-based noise estimation method [13], as introduced in Section 2. Then, collect all of the variance values that belong to the specific segmented patch, Pn, and choose the median value to represent the variance of the corresponding patch. Denote the estimated σn 2 , n = 1, 2, …, N. variance of Pn as~ Step 3: Score the patches according to their inhomogeneity. As mentioned in Section 3, the untrustworthiness of noise estimation lies mainly in the overestimation of the complex texture regions. Thus, the saliency map of each pixel derived from Eq. (7) was used to represent the inhomogeneity score of each patch. Similar to Step 2, the median score of the scores within the same patch was set as the representative score of each patch. Then, we normalized the maximum score of all patches as 100, and the normalized inhomogeneous score of each Pn was denoted byS~n, where n = 1, 2, …, N. Step 4: Localize the tentative, most-probable splicing region. σn 2 , and inhomogeneous score,S~n , were estimated through For each patch, Pn, its variance,~ Steps 2 and 3, respectively. Collect the samples of {(~ σn 2 ,S~n ), n = 1, 2, …, N} and calculate the optimal fitting line according to the least squares criterion. Assuming that the function of the fitted line is σ ~ 2 S~ ¼ aS~ þ b, then define a ratio threshold, T, and a subset of the suspicious patches with excessively high level noise, ΦH, can be identified as:
.
2 2 ~ ð9Þ ~ n −aS n −b σ ~ ð50Þ > T ; n ¼ 1; 2; …; N ΦH ¼ Pn σ where σ ~ 2 ð50Þis the value of the fitted function σ ~ 2 S~ when S~ ¼ 50, and, meanwhile, another subset of suspicious patches with excessively low level noise, ΦL, is selected similarly by:
2 . 2 ~ σn ð10Þ σ ~ ð50Þ > T ; n ¼ 1; 2; …; N ΦL ¼ Pn aS n −b−~ Apparently, if bothΦHandΦLare empty sets, the proposed method provides an authentic detection result. Otherwise, due to the assumption that there is, at most, one splicing operation onto an image, the largest area of conjunctive regions in ΦHorΦLis selected as the most probable ^ The goal here was to discard the isolated, falselysuspicious splicing region and denoted byΦ. detected, extremely higher or lower regions. The setting of T is discussed in Section 5. In our practical execution, the segmentation number N was set at 50, 100, 200, and 400 to make a comprehensive assessment. For each value of N, the subsets of patches with excessive higher and lower noise levels according to Eqs. (9) and (10) were denoted by {ΦH050,ΦL050}, {ΦH100,ΦL100}, {ΦH200,ΦL200}, and {ΦH400,ΦL400}, respectively. The proposed comprehensive ^ is described in Algorithm 1. If assessment algorithm to determine the tentative detection map Φ ^ are less than a tampering-pixel threshold Q, that is also discussed in the pixels that belong to Φ Section 5, the image is identified as an authentic image and the entire detection process is complete; otherwise, the image is identified as a forgery, and the detection process continues with the execution of Step 5 for refinement.
Multimed Tools Appl
Step 5: Refine the final detection result
Multimed Tools Appl
Although the suspicious regions can be localized effectively during Step 4, usually, the detected regions tend to be smaller than the actual splicing regions. Therefore, the estimated pixel-level variance map V is incorporated to extend the final splicing regions that were detected. In [13], the final binary detection map was generated directly through a simple threshold determination. However, the limitation of this operation is that the threshold for each image is not fixed, and it must be selected manually to generate a satisfactory result. In the proposed method, an adaptive refinement algorithm is proposed to generate the detection map automatically, and the pseudo-code is provided in Algorithm 2. To this point, the specific steps to generate the detection map have been presented.
5 Experimental results To evaluate the efficiency of the proposed method that exposes the noise level inconsistency of regions from different sources, we conducted the proposed method on some authentic and splicing images. To further demonstrate the merits of the proposed method, we also compared our results with two state-of-the-art methods proposed by Lyu et al. [13] and Zeng et al. [29], where [13] was a pixel-level method based on projection kurtosis concentration, which has been briefly introduced in Section 2, and [29] was a PCA-based block-level method. Specifically, for [29], The blockwise noise level estimation with principal component analysis based algorithm was conducted and the tampered region was segmented from the rest region by a kmeans clustering. It should be noted that there are two objectives for the proposed method: one is the authenticity identification of the image and another is the localization of the splicing region if the image is justified as a forgery image. First, 180 splicing and 183 authentic images from the Columbia Dataset were involved in our experiments. Note that all the images were saved in the uncompressed TIFF format and the source photos used for composition were captured by a number of camera models and with different illumination environments, and the spliced regions were manually pasted from one photo to another photo. The computational complexity of the proposed method was evaluated in terms of running time. All the experiments were implemented through Matlab R2016a without any program optimization, and tested on a personal PC with i5–4300 M 2.6 GHz CPU and 12 GB memory. The average running time per image from the Columbia Dataset was 24.33 s, which can meet the needs of practical applications. Fig. 7 shows some examples of the splicing images to be detected from the database. It can be seen from Fig. 7 that due to the randomly paste, there is a distinct boundary lying between the two regions from different sources and the smaller region for each photo is defined as the splicing region. It is worthy be noted that the noise of each region is not intentionally added but naturally generated by camera itself. Before conducting the proposed method onto the test images, we need to determine the optimal ratio parameter T and tampering-pixel threshold Q involved in Step 4. For all forgery images, we defined the true positive rate (TPR) as the ratio of number of correctly detected images to the total involved forgery image number, i.e., 180. While for all authentic images, the false positive rate (FPR) was quantified with the ratio of number of falsely detected images to the total number of authentic images, i.e., 183. Based on these definitions, we evaluated the detection performance of the proposed method through adjusting the threshold T and Q, 0 CASIA 2.0 Tampered Image Detection Evaluation Database can be downloaded from the website of http://forensics.idealtest.org.
Multimed Tools Appl
Fig. 7 Six forgery images from the Columbia Dataset
respectively, to generate different TPRs and FPRs. Considering that the image size has an effect on Q in the actual application, Q is expressed in the form of PI/Q* in our method, where PI is the total number of pixels of the image I. Fig. 8(a) and (b) show the receiver-operator characteristics (ROC) curves of the detection performance for the fixed Q* and T, respectively. It should be noted that selecting the optimal T and Q* was a time-consuming iterative process, therefore, after several rounds of updates, we found that when T = 0.36 and Q* = 180, the TPR and FPR false achieved the best balance. In this case, the TPR and FPR are 0.73 and 0.33, respectively. To qualitatively demonstrate the efficacy of the proposed method, the detection results of Fig. 7 are shown in Fig. 9, in which from top to bottom, rows (i) – (vi) show the detection results of Fig.7(a) – (f), respectively and from left to right, the first to the third column show the corresponding results by Lyu et al. [13], Zeng et al. [29] and our method, respectively. Note that all the examples have been identified correctly as the splicing images and most splicing regions have been localized effectively by all of the methods. According to
(a)
(b)
Fig. 8 ROC curves of true positive rate and false positive rate for the Columbia Dataset without any post operation. (a): with variable T and Q* fixed at 180. (b): with variable Q* and T fixed at 0.36
Multimed Tools Appl
Lyu et al.
Zeng et al.
Proposed
(i)
(ii)
(iii)
(iv)
(v)
(vi)
Fig. 9 Comparisons of the splicing detection results of our method with the methods proposed by Lyu et al. [13] and Zeng et al. [29]. Rows (i) – (vi) correspond to the images of Fig. 7(a)-(f), respectively, and columns from left to right correspond to the detection results by [13], [29] and the proposed method, respectively
Multimed Tools Appl Authentic images
Lyu et al.
Zeng et al.
Proposed
Patch samples
Fig. 10 The comparative results on authentic images from the Columbia Dataset. Columns from left to right corresponds to the test authentic images, the detection results by Lyu et al. [13], Zeng et al. [15] and the proposed method, respectively. In addition, the far right column presents the distributions of the variance of each patch with respect to its inhomogeneity score for each test image
Fig. 11 Detection results on some splicing images from CASIA 2.0
Multimed Tools Appl
our simulations, it should be admitted that method of [29] performed better than [13] and the proposed method for efficiently localizing the splicing regions on most test splicing images, however, due to the use of k-means clustering strategy in which all the image blocks were mandatorily separated into two categories, this led to the circumstance that all the authentic images were also identified as splicing images, in other word, the false alarm rate is extremely high. Fig. 10 shows two examples of comparisons of [13, 29] and the proposed method on authentic images. It can be seen from Fig. 10 that there are distinct differences between three methods. Specifically, for [13, 29], some regions in both images are falsely identified as splicing regions due to the local noise level overestimation on some regions with higher texture complexities, while for the proposed method, these kind of overestimation have been partially amended through an inhomogeneity scoring strategy. For more intuitively demonstrating the highlight of the proposed method, the distributions of the variance of each patch with respect to inhomogeneity score are shown in the far right column of Fig. 10, in which, the approximately linear relationship between estimated noise and inhomogeneity score can be clearly observed.
Fig. 12 Detection results on some splicing images from designcrowd.com
Multimed Tools Appl
Fig. 13 ROC curves of true positive rate and false positive rate for the Columbia Dataset with some specific post-processings
Next, we evaluated the efficacy of the proposed method on another standard forensic test dataset CASIA 2.02 which includes 5123 tampered images and is more realistic than the Columbia Dataset. From our experimental results, a total of 4368 images were successfully detected as tampered images, so as to achieve 85.26% of the detection rate. Fig. 11 shows six examples of the detection result, where the top and bottom sub-images in (a) - (e) correspond to the test images and their corresponding detection results. It is worth noting that, for images in CASIA 2.0, the resolution is relative low, so the performance of splicing localization is not accuracy enough as shown in Fig. 11(d) and (e). Certainly, this is a common challenge for all noise inconsistency based forensic methods on CASIA 2.0 dataset. Then, we further tested our method on some dedicated composite images which were downloaded from website http://www.designcrowd.com. The test images from the site were submitted for the contest of realistic image manipulation and were saved in the JPEG format. Thus, the mission to detect the inconsistency from these images was a more challenging work. Six examples and their corresponding detection results are shown in Fig. 12, in which, the top and bottom sub-images in (a) - (f) correspond to the test images and their corresponding detection results using the proposed method. It can be seen that the most probable composite regions have been successfully exposed for (a) – (d). Note that for (e) and (f), the splicing regions are not directly exposed due to the interference of a large expanse of sky in both images, however, the splicing regions can still be identified through an indirect way. Specifically, for authentic images including buildings and sky simultaneously, there are significant differences between the above two contents, while for composite images, the buildings with lower estimated noise level will be grouped into the cluster of sky. At this point, the suspicious objects in the image can be further identified with the assistance of latter human analysis. Finally, we evaluated the robustness of the proposed method against some common postprocessing operations, including JPEG compression, up-sampling, and down-sampling. Fig. 13 shows ROC curves for forgery and authentic images from the Columbia Dataset, but all the images have undergone JPEG compressions (QF = 90, 80), 10% up-sampling and 2 CASIA 2.0 Tampered Image Detection Evaluation Database can be downloaded from the website of http://forensics.idealtest.org.
Multimed Tools Appl JPEG (QF = 90)
JPEG (QF = 80)
Up-sampling (10%)
Down-sampling (10%)
(a)
(b)
(c)
(d)
Fig. 14 Splicing localization results of the proposed method on Fig. 7(a) with some specific post-processings. (a) JPEG compression with quality factor of 90. (b) JPEG compression with quality factor of 80. (c) Up-sampling with factor of 10%. (d) Down-sampling with factor of 10%
down-sampling, respectively. Similar to Fig. 8(a), the ROC curves are with variable T and Q* fixed at 180. It can be seen in Fig. 13 that, for JPEG compressions, the performance curves are both decreased significantly, but there are no obviously changes for both of 10% up-sampling and down-sampling. The reason for the deterioration of JPEG compression might because the JPEG compression will generate new quantization noise and the newly generated noise characteristics will interfere the accuracy of the estimation of patch noise and inhomogeneity score. In addition, Fig. 14 shows an example of detection results which have undergone the post operations of JPEG compressions (QF = 90, 80), 10% up-sampling and down-sampling, respectively. From the above example, the performance of splicing region localization also becomes worse due to the interfere of JPEG quantization noise. Thus, how to improve the performance on JPEG compressed images is one of our future work.
6 Conclusions In this paper, an efficient image forensic method is proposed to expose splicing manipulation based on inconsistencies in the local noise feature. It was observed that the performances of almost all state-of-the-art noise estimation methods were adversely affected by the complexity of the textures. Thus, motivated by the scoring-based, object proposal technique, the inhomogeneity score of each pre-segmented patch was computed using a spectral residual-based, saliency detection method. After estimating the noise variance of each patch by using a kurtosis concentration-based, pixel-level noise estimation method, a sample of the noise variance and the inhomogeneity score of each patch were collected and then fitted by a linear increasing function. The conjunct patches with maximum area were identified as the most suspicious splicing region since they were not situated closely enough to the linear function. The efficacy of the proposed method was substantiated by our experimental results. As mentioned in almost all of the noise inconsistency-based forensic methods, since our method is based on the characteristics of noise, it will fail if the forged photos are composites of two images with similar noise levels. In this case, it is recommended that additional forensic cues need to be incorporated to generate more convincing detection results. In addition, it is worth noting that there are two main aspects of the proposed method that make it advantageous for use: 1) the more precise inhomogeneity scoring criterion will be beneficial and 2) a more rigorous derivation of the noise variance function can be developed with respect to the inhomogeneity score. Both of these topics will be addressed in our future work.
Multimed Tools Appl Acknowledgements We would like to thank Drs. Siwei Lyu and Hui Zeng for kindly sharing the codecs of their work. This work was supported in part by the National Natural Science Foundation of China (61702332, 61672354, 61562007), Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security (MIMS16-03), the PAPD Fund, and the CICAEET Fund.
References 1. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Suesstrunk S (2012) SLIC superpixels compared to state-ofthe-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2281 2. Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202 3. Bahrami K, Kot AC, Li L, Li H (2015) Blurred image splicing localization by exposing blur type inconsistency. IEEE Trans. Inf. Forensics Secur 10(5):999–1009 4. Bianchi T, Piva A (2012) Image forgery localization via block-grained analysis of JPEG artifacts. IEEE Trans. Inf. Forensics Secur 7(3):1003–1017 5. Chen B, Coatrieux G, Wu J, Dong Z, Coatrieux LJ, Shu H (2015) Fast computation of sliding discrete Tchebichef moments and its application in duplicated regions detection. IEEE Trans Signal Process 63(20): 5424–5436 6. Chierchia G, Poggi G, Sansone C, Verdoliva L (2014) A Bayesian-MRF approach for PRNU-based image forgery detection. IEEE Trans. Inf. Forensics Secur 9(4):554–567 7. Ferreira A, Felipussi SC, Alfaro C, Fonseca P, Vargas-Muñoz JE, Dos Santos JA, Rocha A (2016) Behavior knowledge space-based fusion for copy-move forgery detection. IEEE Trans Image Process 25(10):4729– 4742 8. Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition pp 1–8 9. Korus P, Huang J (2017) Multi-scale analysis strategies in PRNU-based tampering localization. IEEE Trans. Inf. Forensics Secur 12(4):809–824 10. Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy-move forgery detection scheme. IEEE Trans. Inf. Forensics Secur 10(3):507–518 11. Li B, Ng TT, Li X, Tan S, Huang J (2015) Revealing the trace of high-quality JPEG compression through quantization noise analysis. IEEE Trans. Inf. Forensics Secur 10(3):558–573 12. Liu C, Szeliski R, Kang SB, Zitnick CL, Freeman WT (2008) Automatic estimation and removal of noise from a single image. IEEE Trans Pattern Anal Mach Intell 30(2):299–314 13. Lyu S, Pan X, Zhang X (2014) Exposing region splicing forgeries with blind local noise estimation. Int J Comput Vis 110:202–221 14. Mahdian B, Saic S (2009) Using noise inconsistencies for blind image forensics. Image Vis Comput 27(10): 1497–1503 15. Pan X, Zhang X, Lyu S (2011) Exposing image forgery with blind noise estimation. In the 13th ACM Workshop on Multimedia and Security, pp. 15–20 16. Pasquale F, Tiziano B, De Rosa A, Piva A (2012) Image forgery localization via fine-grained analysis of CFA artifacts. IEEE Trans. Inf. Forensics Secur 7(5):1566–1577 17. Pyatykh S, Hesser J, Zheng L (2013) Image noise level estimation by principal component analysis. IEEE Trans Image Process 22(2):687–699 18. Qin C, Chen X, Ye D, Wang J, Sun X (2016) A novel image hashing scheme with perceptual robustness using block truncation coding. Inf Sci 361-362:84–99 19. Qin C, Ji P, Zhang X, Dong J, Wang J (2017) Fragile image watermarking with pixel-wise recovery based on overlapping embedding strategy. Signal Process 138:280–293 20. Qureshi MA, Deriche M (2015) A bibliography of pixel-based blind image forgery detection techniques. Signal Processing-Image Communication 39:46–74 21. Rakhshanfar M, Amer AM (2016) Estimation of gaussian, poissonian gaussian, and processed visual noise and its level function. IEEE Trans Image Process 25(9):4172–4185 22. Rota P, Sangineto E, Conotter V, Pramerdorfer C (2016) Bad teacher or unruly student: can deep learning say something in image forensics analysis? In Proc. of the 23rd International Conference on Pattern Recognition (ICPR), pp. 2503–2508 23. Stamm MC, Wu M, Liu KJR (2013) Information forensics: an overview of the first decade. IEEE Access 1: 167–200
Multimed Tools Appl 24. Valsesia D, Coluccia G, Bianchi T, Magli E (2015) Compressed fingerprint matching and camera identification via random projections. IEEE Trans. Inf. Forensics Secur 10(7):1472–1485 25. Wang J, Li T, Shi Y, Lian S, Ye J (2016) Forensics feature analysis in quaternion wavelet domain for distinguishing photographic images and computer graphics. Multimedia Tools and Applications (in press, DOI: 10.1007/s11042-016-4153-0) 26. Wei W, Wang S, Zhang X, Tang Z (2010) Estimation of image rotation angle using interpolation-related spectral signatures with application to blind detection of image forgery. IEEE Trans. Inf. Forensics Secur 5(3):507–517 27. Wu CH, Chang HH (2015) Superpixel-based image noise variance estimation with local statistical assessment. EURASIP Journal on Image and Video Processing 2015: 38 28. Yao H, Wang S, Zhang X, Qin C, Wang J (2017) Detecting image splicing based on noise level inconsistency. Multimedia Tools and Applications 76(10):12457–12479 29. Zeng H, Zhan Y, Kang X, Lin X (2017) Image splicing localization using PCA-based noise level estimation. Multimedia Tools and Applications 76(4):4783–4799 30. Zoran D, Weiss Y (2009) Scale invariance and noise in natural images. In Proc. of IEEE International Conference on Computer Vision, pp. 2209–2216
Heng Yao received the B.S. degree from Hefei University of Technology, China, in 2004, the M.S. degree from Shanghai Normal University, China, in 2008, and the Ph.D. degree from Shanghai University, China, in 2012. Currently, he is with School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, China. His research interests include digital forensics, data hiding, image processing, and pattern recognition.
Fang Cao received the B.S. degree in applied electronics from Shanghai Normal University, Shanghai, China, in 2002, the M.S. degree in signal and information processing from Shanghai Maritime University, Shanghai, China, in 2004, and the Ph.D. degree in communication and information system from Shanghai University,
Multimed Tools Appl Shanghai, China, in 2013. Since 2005, she has been with the faculty of the College of Information Engineering, Shanghai Maritime University, where she is currently a lecturer. Her research interests include image processing, computer vision and multimedia security.
Zhenjun Tang received the B. S. and M. Eng. degrees from Guangxi Normal University, Guilin, P.R. China, in 2003 and 2006, respectively, and the PhD degree from Shanghai University, Shanghai, P.R. China, in 2010. He is now a professor with the Department of Computer Science, Guangxi Normal University. His research interests include image processing and multimedia security. He has contributed more than 40 international journal papers. He holds six China patents. He is a senior member of China Computer Federation (CCF) and also a reviewer of more than 20 SCI-indexed journals, such as IEEE journals, IET journals, Elsevier journals, Springer journals, and Taylor & Francis journals.
Jinwei Wang received the Ph.D. degree in information security at Nanjing University of Science & Technology in 2007 and was a visiting scholar in Service Anticipation Multimedia Innovation (SAMI) Lab of France Telecom R&D Center (Beijing) in 2006. He worked as a senior engineer at the 28th research institute, CETC from 2007 to 2010. He worked as a visiting scholar at New Jersey Institute of Technology, NJ, USA from 2014 to 2015. Now he works as an associate professor at Nanjing University of Information Science and Technology. His research interests include multimedia copyright protection, image forensics, image encryption and data authentication. He has published more than 30 papers, hosted and participated in more than 10 research projects.
Multimed Tools Appl
Tong Qiao received the B.S. degree in Electronic and Information Engineering in 2009 from Information Engineering University, Zhengzhou, China, and the M.S. degree in Communication and Information System in 2012 from Shanghai University, Shanghai, China, and the Ph.D. degree in System Optimization and Dependability in 2016 from University of Technology of Troyes, Laboratory of System Modeling and Dependability, Troyes, France. The Ph.D. degree is funded by China Scholarship Council with UT-INSA project. He is currently a lecturer at Hangzhou Dianzi University, School of Cyberspace. His current research interests focus on steganalysis and digital image forensics.