Image and Video Source Class Identification

2 downloads 3347 Views 379KB Size Report
Advances of digital technology have brought us numerous cheap yet good-quality .... In [10], an encrypted digital signature file is separately generated for.
Image and Video Source Class Identification Alex C. Kot and Hong Cao

Abstract

Advances of digital technology have brought us numerous cheap yet good-quality imaging devices to capture the visionary signals into discrete form, i.e. digital images and videos. With their fast proliferation and growing popularity, a security concern arises since electronic alteration on digital multimedia data for deceiving purposes becomes incredibly easy. As an effort to restore the traditional trust on the acquired media, multimedia forensics has emerged to address mainly the challenges concerning the origin and integrity of multimedia data. As one important branch, source identification designs the methodologies to identify the source classes, software tools and individual source device of a given media based on its content. Through investigating and detecting the source features of various forms, a large number of identification methodologies have been developed in recent years and some achieved very good results. In this chapter, we review the history and major development in image and video source identification. By first establishing an overall picture of multimedia forensics, we discuss the role of source identification in relation with other forensic fields and elaborate the existing methodologies for identification of different source types.

1. Introduction Images and videos are traditionally labeled as "trustworthy" as they are known being captured through analogue acquisition devices to depict the real-world happenings [1]. This traditional trustworthiness is largely established upon the remarkable difficulties in modifying the multimedia content. For example, a film-based photo could only be altered by the specialists through the darkroom tricks, which were time consuming, costly and often come with a limited alteration extent. In Alex C. Kot School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. Email: [email protected] Hong Cao Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore. Email: [email protected]

2

Original

Altered

Figure 1 Image Forgery Examples. On the Top, a Pair of People are Delicately Removed. At the Bottom, Color Hue is Modified to Show as if the Picture was Captured in the Evening Time contrast, digital revolution has brought us today countless powerful devices and handy electronic tools, including high-resolution multimedia acquisition devices, high-performance computers, high-speed Internet connections, high-fidelity displays, highly advanced editing tools, etc. These tools largely replace their analogue counterparts to enable easy digital content creation, processing and communication at affordable cost and in mass scale. While one enjoys the conveniences of using these ubiquitous tools, a serious question shall be asked as “Can we still trust the images and videos acquired?” The popular Photoshop software has millions of registered users and fans worldwide. It can facilitate a computer-literate user to modify almost anything on a digital image in several simple steps. Similar user-friendly tools are also available for modifying video content. Most of the time, human eyes can hardly differentiate a genuine scene from a deliberately altered scene. As such, increasing fraudulent cases involving multimedia forgeries have appeared and we describe several recent impactful cases. In the first case [2], image for the most wanted terrorist, Osama Bin Laden, was found to be a syntactic creation instead of a real photo. In the second case [3], the inaugural photo of the Israel cabinet was digitally tampered with two female cabinet members being replaced. One can refer to [2, 3] for the stories of these forgeries. More famous forgery examples can be found in [4] and these media were mostly published in the world-renowned newspapers and magazines. Note that publishing forgery pictures is an unethical behavior. Once exposed, the forgeries would generally lead to de-

3

graded credibility of the party who publish them. In view of the potential damages caused by electronic alterations on the digital media in publishing industry, the US National Press Photographer Association (NPPA) has clearly stated in their guidelines for news photographs that "altering the editorial content of photograph, in any degree, is breach of the ethical standards recognized by the NPPA." [1]. Figure 1 also demonstrates several our created forgery examples in comparison with their original versions. These forgeries are tested to be hard to identify by human eyes when singly presented. Besides the visual content, another fragile yet important piece of information is the multimedia source, i.e. information about the acquisition device used, its model, type, its inclusive components and processing modules. When multimedia content is presented as an evidence in important scenarios, e.g. in a court room, its source information often represents useful forensic clue that need to be assured in advance. For instance, knowing the source device that captures a multimedia evidence can facilitate verification or tracing of the device owner as well as the camera taker. Knowing the device model or brand information can help a forensic analyst to tap onto rich online resources to know more about its acquisition device. This potentially improves the chance in detecting the underlying forgeries on the content. Though the source model information of a media may be read out directly from the metadata headers, e.g. from the exchangeable image file format (EXIF) of a photograph, these headers could be easily removed or counterfeited by a forger using ubiquitous editing tools. And such header alterations are believed difficult to recognize and trace. While images and videos are still popularly used as supporting evidences and historical records in growing number of applications from police investigation, law enforcement, journalist reporting, surveillance, insurance claims, medical and dental examinations, military, museum and consumer photography, scientific means in media source identification and forgery detection are in urgent needs. This urgency is stimulated by several new trends: 1) Proliferation of digital cameras and other multimedia acquisition devices is fast due to the improving photographic technology and the reducing cost. Digital cameras attached on mobile handsets now have become a necessity feature [5] for mobile phone users; 2) Media sharing on internet has become increasingly easy and popular. Media sharing and social network giants such as Youtube and Facebook gain huge success through promoting users to share their own multimedia contents online. At the same time, they also provide common platforms for the forgery medias to spread quickly to make great social and political impacts; 3) A significant portion of multimedia from the public have good news value and are sourced by various news agencies for making journalist reports [6, 7]. However, one big challenge [6] is how to authenticate the contents from the public, an unreliable source; 4) Following the human stemcell fake [8], which involves obvious image forgeries, scientific journal editors [9]

4

Figure 2 Classification of Multimedia Forensics Techniques have been actively seeking efficient forensic tools, which detect the deliberate image tweaks. This chapter focuses on the topic of image and video source identification. The remainder of this chapter is organized as follows: In Section 2, we gives a brief overview of multimedia forensics and introduce the forensic role of source identification. Section 3 introduces today's common media acquisition devices and their processing skeletons. Section 4 reviews the existing source identification techniques for different source types. Section 5 discusses some common machine learning techniques used in forensics. Section 6 talks about anti-forensics and the counter measure. Section 7 summarizes this chapter and indicates future challenges.

2. Overview of Multimedia Forensics Active and Passive Forensics With an ultimate goal to restore the “trust” towards digital multimedia and to alert deliberate tampering, multimedia forensics has attracted considerable attention in the recent decade. As shown in Figure 2, the proposed techniques generally fall into two categories, the active and the passive categories. The active approaches commonly require pre-computation of the some fragile property, e.g. a cryptographic hash of the media, or prior insertion of some secret protection data through information hiding before the media are sent through an unreliable public channel. Upon receiving these media, forensic conclusions can be reached by comparing the recomputed media property or the extracted hidden data with their original versions. For instance, two early works actively modify the camera structure to build the “trustworthy” or the “secure” digital camera to protect integrity of the pictures. In [10], an encrypted digital signature file is separately generated for each photo in a “trustworthy camera” for content authentication purposes. In [11],

5

a watermark with iris biometric data inclusive is embedded in a “secure camera” and the extracted watermark can be used for both integrity verification and camera taker identification. More active approaches can be found in [12] for forensic applications such as content authentication and digital rights management for various multimedia types. A recent approach in [13] also proposes to protect integrity of the emerging electronic ink data through a lossless data embedding technique.

Though the active approaches can be highly effective in securing multimedia data from malicious operations and for ownership verification, the strict requirement on cooperative end-to-end protocols is hardly met in many practical forensic scenarios. Moreover, the protective data embedded through hiding techniques are considered as artificial noises, which potentially degrade the media's quality. On the other hand, the passive approaches are non-intrusive in nature and do not impose the above constraints. Hence, they are readily applicable to a wide range of forensic applications. Two recent surveys can be found in [14, 15]. The common philosophy behind is that multimedia acquired through common acquisition devices are natural, containing rich source-induced fingerprints due to the imperfection of processing pipelines. These media occupy only a small and highly regularized space in the entire high-dimensional space. If accurately modeled and detected, the fingerprints or source features can be used to trace back the multimedia source from their content. Tampering likely disturbs some intrinsic regularity, introduces new tampering anomalies and leads to many forms of inconsistencies. If detected, they can be used as tell-tale signs to expose the forgery. As shown in Figure 2, passive forensics can be further divided into two sub-categories, through detecting some intrinsic regularity or through detecting some tampering anomalies. The formal focuses on detecting some unique intrinsic footprints for source identification or to assess their disturbances for tamper forensics. The latter detects specific anomalies leftover by some common tampering operations, which rarely exist in an untampered media.

Passive Forensics and Source Identification Though content forgery detection is the primary goal in multimedia forensics, the current research has encompassed a breath of topics listed below: 

Source identification related [17-72]: Which individual device, what sensor type, model and brand is the device used in capturing a given media? What is the media processing software used? Is a media acquired with a device as claimed? Given a set of media, can we cluster them according to their sources?

6



Tampering discovery [25, 51, 54]: Has a given media been tampered and where is the altered region?



Steganalysis: Has a media been concealed with a secret message? What is the message length, content and the embedding method used?



Recovery of processing history: What processes have an media gone through? What are the parameters used?



Recapturing identification: Is a given media captured on real scenery or on artificially created scenery using the modern display technology?



Computer graphics identification: Is a given media acquired using a camera device or artificially created using photorealistic computer graphic (PRCG) software?



Device temporal forensics: When is a given media captured? How can we order a set of media according to their acquisition time?

These topics address different security aspects of multimedia data, including their origin, content integrity, innocence against secret data embedding, status of being acquired by a device versus synthetic data creation, processing history, and their acquisition time. Out of these topics, source identification stands out as one important topic by itself. Though source identification addresses a separate issue and other forensic topics, e.g. forgery detection, are not necessarily related with source identification, we find in many existing solutions that they are closely linked in the following ways: 

Source identification methodologies can be used directly to address one type of forgery, where a forger modifies the metadata tags and falsely claim a media from one source as acquired from another source;



Multimedia forgery often involves mixing signals from multiple sources, which can lead to many source-dependant inconsistencies. These inconsistencies can be characterized either through using the source features or based on the outcomes of the source identification techniques at localized pieces of the media;



Many statistical source features are found to be good to address the tamper forensics issue due to their fragile nature towards tampering and manipulations. Similarly, features originally proposed for other forensic topics, e.g. image quality metrics for steganalysis and high-order wavelet statistics for detection of computer graphic images, are later found to work well in source device model identification;



Source identification can serve as an initial step for other forensic analyses, e.g. tampering discovery. For instance, a forensic analyst can identify the source of a multimedia first and then apply the corresponding source-specific template to discover the local tampering.

7

Source identification is commonly formulated as a machine learning task and Figure 3 shows a typical flowchart for identification of N different sources. In the training phase, the labeled representative data from N sources are provided. A set of source features are then extracted from the training data to form the corresponding set of feature vectors. As an optional step, we can learn the knowledge from the training data and perform feature refinement such as dimensionality reduction, feature selection, re-weighting or filtering in order to retain a compact set of highly discriminative features. The learned knowledge T is also applied in the test phase to perform similar refinement. Based on the refined features and their labels, a template for each source class or a classifier C is constructed through using a supervised machine learning technique. In the test phase, similar feature extraction and refinement are performed to have the refined test feature vector. The predicted source label is then obtained as the classification output of C on this vector. The source identification performance can be largely affected by several factors such as representativeness of the learning data, goodness of the forensic features and the learning algorithm used. The data representativeness issue can be alleviated if sufficient training data with good content variety are provided. The good forensic features are those that are stable against media content variations and comprehensively capture the unique characteristics of the source. The desired learning algorithm not only forms good decision boundary to classify the training data, but also generalizes well for the test data.

3. Image Acquisition Pipelines and Source Regularities Digital still cameras (DSC), camcorders and scanners remain today's the most common devices that acquire images and videos. Source identification research often requires tremendous knowledge on these visual signal acquisition pipelines to identify the unique and stable source features. Among the three types of devices, cameras and camcorder share very similar processing pipelines except cameras are optimized for capturing pictures while camcorders are for videos which consist of mainly an encoded time sequence of picture frames. Though a scanner looks quite different from a DSC, their processing pipelines share a great deal of similarity. To illustrate this, Figure 4(a) and (b) show the imaging system’s skeletons of a commercial single-sensor DSC and a scanner, respectively. In Figure 4(a), when light photons from a scene enter into a camera, they firstly pass through a lens system together with several optical filters including anti-aliasing filters (blurring filter) and infrared rejection filters. Before the photons arrive at the sensor array, the light is further filtered by an array of tiny color filters, known as a color filter array (CFA), where one color filter is placed on top of each tiny sensor. Filters of different color are typically arranged in a mosaic manner in 2×2 periodical lattice and this arrangement is commonly referred as a CFA pattern. Figure 4(c) shows some commonly known CFA patterns, where Bayer pattern has been dominantly used in commercial DSCs [16]. For each sensor, the amount of color-filtered pho-

8

(a) Trilinear CFA

Hardcopy Document Photo, etc

Trilinear Sensor panel

Lens

Software processes

A/D Amplifier + A/D converter

Noise reduction Color correction White balancing Gamma correction JPEG ...

Digital image Scanned image

Reflective lighting Movable Scanner Head

(b) Primary-Color CFAs

Complementary-Color CFAs

CMYG SupperCCD CMY (c) Figure 3 Comparison of a Single-Sensor Camera Processing Pipeline in (a) and a Scanner Processing Pipeline in (b). (c) Common Camera CFA Patterns Bayer CFA

RGBE

tons impinging on it during the camera's shutter-opening time is transferred into a voltage that is subsequently converted as an electronic reading. Since both the widely-used charge coupled device (CCD) and complementary metal oxide semiconductor (CMOS) sensors are monochromatic, i.e. they cannot distinguish colors, color filtering enables each sensor measuring a specific color at each pixel location. After the sensor data are captured, several compensation processes like color calibration, linearization, dark current compensation and flare compensation would take place as pre-demosaicing processes. Since only one color is available at each pixel, demosaicing reconstructs all missing color samples from the sensor samples in order to generate a full-color image. The reconstructed image further goes through possible processes like color transformation, white balancing, colorartifact removal, edge-enhancement, coring and JPEG compression before the end picture is available. The scanner's imaging pipeline in Figure 4(b) is similar to that of a DSC in terms of the CCD and CMOS sensors used and some software processes like white balancing and gamma correction. The main differences are: a) a scanner typically

9

converts a hardcopy with more predictable contents, e.g. a document, a paper photo or an art work into a digital image under controlled illumination while a DSC captures a scene under various natural lightings; b) the motion system of a scanner moves its imaging system across a document in real time and, only one raster line is captured at each instant. Each line is eventually measured three times at slightly different instants to cater for the three different colors. c) A scanner employs a trilinear CFA while camera processing often uses the CFA patterns whose colors are placed in mosaic manner, such as Bayer CFA. Besides the mainstream skeletons in Fig. 3(a) and (b), some exceptions are also available. For instance, instead of relying on color filtering, some imaging devices use light splitters, e.g. prismatic optics for a three-sensor camera or dichroic mirrors for a scanner, to separate and guide the light of different colors onto their dedicated sensors. Though these designs potentially lead to better image fidelity, the additional optics and electronics would substantially increase the device complexity, which is undesired. Though similar imaging skeletons are shared among majority of the commercial imaging devices, their differences shall not be overlooked due to the different hardware components and the proprietary software processing used. These differences contribute to the unique traces left over in the acquired visual signals. Previous forensics works have attempted to detect different types of source features and we classify them into the four categories below according to the processing skeletons in Figure 3.    

Optical component regularities including lens radial distortion [17], lateral chromatic aberration [18]; Sensor imperfections including sensor noise pattern [19-43] and statistics [4448], dust pattern [49, 50], which are introduced when light signal is converted into digital signal; Processing regularities including demosaicing [48, 52-60], JPEG compression, etc., which are introduced by the software processes; Other statistical regularities including high-order wavelet statistics [61-63], color features [61, 62], binary similarity metrics [63], image quality metrics [62, 63], statistical moments [64], etc.

The different regularities are associated with different origins and their detections are useful in various source identification tasks. For instance, detection of the lens regularity helps identify the source camera based on its lens. The sensor noise pattern or dust pattern are features associated with the sensor chip. Detection of such patterns is useful to identify individual devices. Processing regularities are associated with software processing unit. Their detection helps reverse engineering the digital technologies in the black box and estimate their processing parameters. Very often, the estimated parameters can also be used as source features to model a class of imaging devices sharing similar processing pipelines, e.g. cameras and scanners of the same model or brand.

10

4. Multimedia Source Identification Source identification is the science to identify the acquisition device or its certain attribute, e.g. the device model, software tools, for a given media. Depending on the outcomes, we categorize source identification into source class identification, software tools identification and individual source device identification. In the following section, we review the techniques for each category.

Source Class Identification Since media acquisition devices of the same class, e.g. the same model or brand, often share similar software processing modules and hardware components, the different source models can be identified by extracting features that are intrinsic and unique to each device class. The commonly-used features are the processing regularities associated with some software modules, e.g. demosaicing regularity and color features, or the regularities, e.g. noise statistics, image quality metrics, lens distortions, which are associated with some attributes of the hardware, e.g. quality of the sensor and distortion characteristics of the lens system.

Lens Distortion Features Lens radial distortion (LRD) is purported the most sever among the different types of lens distortions such as spherical aberration, astigmatism, field curvature, chromatic aberration etc. LRD is particular visible for the inexpensive wide-angle lens and manifest itself as rendering straight lines into curved lines in the acquired visual signal. The direction of LRD is along the radial axis and its magnitude is a function of the distance between the object and the lens central axis. Based on a second-order LRD model, Choi et al. [17] proposed to compute the distortion parameters through measuring the distortions on extracted straight-line edge segments. By using two estimated parameters as features together with the Kharrazi’s features in [61], an accuracy of 89.1% is achieved in identifying 5 cameras models based on a fixed optical zooming. However, the accuracy of using the distortion parameters alone can be severely affected if the amount of lens optical zooming is varied.

Lateral chromatic aberration (LCA) is formed due to the varying refractive indexes of lens materials for different light wavelengths so that light of different colors of cannot be perfectly focused. This causes misalignment in different color channels, which are small near the optical center but become larger at the locations that are farther away from the optical center. Based on a linear LCA model, Van et al.

11

[18] estimates the global model parameters, e.g. coordinates of the optical center and a scalar ratio, using an image registration technique that maximizes the mutual entropy between different color channels. By using these parameters as features for SVM classification, an identification accuracy of 92.2% was achieved for three different mobile phone models. The work also provided some results indicating that the LCA features cannot distinguish two individual mobile cameras of the same model.

Demosaicing Regularity Demosaicing regularity represents the most significant processing difference among various DSC models as the choice of CFA and demosaicing algorithm is usually fixed for a given model but likely differs for different models. Moreover, 2/3 color samples of a photo are interpolated with a demosaicing algorithm consisting of only a few formulas. As a key process that determines image fidelity, demosaicing has been extensively researched in the past few decades to reconstruct full-color images from the raw sensor data. Based on a similar assumption that Bayer CFA is adopted, the existing demosaicing solutions differ largely in terms of their edge adaptiveness, their grouping strategy for the adaptive methods, the reconstruction domain, the low-pass filter type, size, parameters, and the follow-up enhancement, refinement procedures. Table 1 summarizes some of these differences for 14 representative demosaicing algorithms published in the past few decades. These large differences enable each demosaicing algorithm to introduce some unique and persistent correlation throughout the output image, whose detection is desirable for multimedia forensics. The early detection works use some simple pixel-level interpolation models within each color channel. Generally, a demosaiced sample at location ( x, y ) of a color channel c can be expressed as I  x, y , c  



 i , j 

wij I  x  i, y  j , c    xyc

(1)

where {wij } are the weights representing the correlation characteristics induced by the demosaicing process and  xy denotes an interpolation error.  represents a predefined supporting neighborhood, which can be chosen accordingly. Figure 5 shows three interpolation models with different supporting neighborhoods. Based on the model in Figure 5(a), Popescu and Farid [51] proposed an expectation maximization (EM) algorithm for estimating a set of 24 interpolation weights by iteratively optimizing a posterior probability map and computing the variance of the interpolation errors. The probability map here contains the estimated probability for each sample being a demosaiced sample and the weights’ solution is derived to minimize the expected interpolation errors. The iterative EM cycles stops when a

12

Table 1 Comparison of Fourteen Representative Demosaicing Algorithms

Bilinear Bicubic Cok 87 Freeman 88 Laroche 94 Hamilton 97

Edge Adaptiveness Non-adaptive Non-adaptive Non-adaptive Non-adaptive Adaptive Adaptive

Kimmel 99

Adaptive

Li 01 Gunturk 02 Pei 03 Wu 04 Chang 04 Hirokawa 05 Alleysson 05

Non-adaptive Non-adaptive Adaptive Adaptive Adaptive Adaptive Non-adaptive

Algorithm

(a)

Domain (Refinement) Intra-channel Intra-channel Color hue Color diff. Mixed Color diff. Color hue (Inverse diffusion) Color diff. Color diff. (Iterative) Color diff. (Iterative) Color diff. (Iterative) Color diff. (Iterative) Color diff. (Iterative) Chrominance

(b)

Reconstructive Filter (Size) Fixed (33) Fixed (77) Fixed (33) Median (33) Fixed (33) Fixed (33) Adaptive (33) Adaptive Fixed (55) Adaptive (33) Fixed (33) Adaptive (55) Median (55) Fixed (1111)

(c)

Figure 4 Intra-Channel Correlation Models Adopted for Detection of Demosaicing Regularities where “” is a demosaiced sample and “” indicates a neighboring support sample. (a) Popescu and Farid model [51]; (b) Long and Huang’s model [54] and (c) Swaminathan, Wu and Liu’s model [55] for green channel with Bayer CFA assumed. stable solution of weights is reached. By using these weights as features for a linear discriminant classifier, an average accuracy of 97% was achieved in distinguishing 28 pairs of demosaicing algorithms, where the minimal pairwise accuracy is 88%. Bayram et al. [52, 53, 56] extended this EM algorithm to identify commercial DSCs by assuming a larger 5×5 interpolation window, and analyzed patterns of periodicity in row-averaged second-order derivatives in the nonsmooth and smooth image regions, respectively. The filter weights and the periodicity property are employed as features in constructing support vector machine (SVM) classifier. An average accuracy of 95.9% was achieved in identifying three

13

camera models. Long et al. [54] proposed to compute the normalized 13×13 pixel correlation coefficients based on the interpolation model in Figure 5(b), for each of the three color channels. These coefficients are used as features to train a neural network classifier to identify different source camera models. Using majority-vote fusion for decision outcomes from three color channels, a close to 100% accuracy was reported for identifying 4 commercial cameras of different models in the presence of 0-5% rejected or undetermined cases. This work also shows that the post-processes such as median filtering could severely affect detected correlation as well as the identification performance. For non-intrusive component analysis, Swaminathan et al. [55] introduced several new procedures in the demosaicing detection: a) estimation of the underlying CFA; b) division of the image into three regions including horizontal gradients, vertical gradients and the smooth region; c) separate estimation of the underlying demosaicing formulas for each region. Note that in the green-channel interpolation model used in Figure 5(c), each symbol "" denotes one sensor sample in the 77 neighborhood of a demosaiced sample "". These sensor samples are identifiable from the demosaiced samples after the underlying CFA is estimated. The interpolation models for the red and blue channels share a similar concept, except that the sensor samples are located at different sites. With an assumption that similar interpolation formula is applied in each region within each color channel, a total least square solution is employed to compute their corresponding weights as representation of the underlying demosaicing formulas. Experimentally, a total of 441 such parameters were computed as features in identifying 19 cameras of different models. With 200 images per camera, these features achieved an average identification accuracy of 86% using SVM. Though some reasonably good results are reported in the above methods, the intrachannel interpolation models are limited to characterize the correlation within a color channel only. On another hand, demosaicing algorithms, e.g. those in Table 1, often employ cross-color domains e.g. the color difference plane, for reconstructing the missing samples. This inevitably introduces the cross correlation between the red, green and blue channels. Moreover, the criterion of dividing the demosaicing samples into different edge groups and the corresponding reconstruction formulas can vary greatly from one algorithm to another. To comprehensively capture these demosaicing differences, Cao and Kot [57-60] proposed partial derivative correlation models for demosaicing detection. With an assumption of Bayer CFA being the underlying CFA, a demosaiced image I and its sensor data S can be written as  r , G , B 11  I   I xyc     R, g , B21   

R, g , B12 R, G, bxy

  r11    , S  S xy    g 21     

g12    bxy   

14

where R, G and B denote demosaiced samples in red, green and blue channels, respectively, and r, g, b denote the sensor samples, which are known before the demosaicing process. I xyc  I  x, y, c  and S xy  S  x, y  . For each demosaiced sample Iijc along t axis, a derivative-based demosaicing model is written as t  I xyc 



 i , j 

wijt  S xti , y  j   xyc

(2)

t  where I xyc is the partial second-order derivative of I xyc along t axis, S xyt  is the de-

  are the derivative weights. Suppose that t represents

rivative for S xy and wij  t

t  the x-axis, I xyc and S xyt  can be written as  I xyc  I x 1, yc  I x 1, yc  2 I xyc t

S xy   S x  2, y  S x  2, y  2S xy t

The above formulation makes use of the 2-by-2 periodicity property of the CFA lattice so that reconstruction of a missing sample along t axis can be deduced equivalent to estimating its partial second-order derivative. The main advantage for using derivatives is that such interpolation model largely suppresses the detection variations caused by different contents as the scenery-dependent local DC components are removed in the derivatives. Also since the support neighborhood  contains sensor derivatives from all three color channels, it allows extending the detection across the boundaries of three color channels naturally. Based on the derivative models, the detection algorithm uses an expectation maximization reverse classification to iteratively divide all demosaiced samples in 16 categories by minimizing the model fittings errors. This takes care of the different demosaicing grouping strategies that are commonly used. The derivative weights for each category are then estimated using a least square solution. It is demonstrated in a comparison that the demosaicing formulas in terms of derivative weights can be better used to re-interpolate the demosaiced samples from the sensor samples with less error than the intensity weights. Correspondingly, demosaicing features including derivative weights, error cumulants and sample distributions are suggested for source identification. The result shows that the selected 250 features using sequential floating forward selection (SFFS) achieved a remarkable accuracy of 97.5% in identification of 14 commercial camera models based on small cropped blocks of 512512. Not only for high-quality DSC images, Cao and Kot [59] also extended these features to identify the low-end images acquired by mobile devices. Using eigenfeature regularization and extraction [79], the reduced 20 features achieved 99% accuracy in identifying 9 mobile cameras of dissimilar models, which outperformed several other statistical forensic features recommended for mobile camera identification. Some limitations are that the identification performance can be affected by lossy JPEG compression with low quality factors. Close camera models may implement identical software modules so that they are hard to distinguish using processing regularities.

15

Other Statistical Features Other statistical features for source class identification include color features, high-order wavelet statistics, image quality metrics, binary similarity metrics, noise statistics, statistic moments, etc. For camera model identification, Kharrazi et al. [61] proposed 34 features including color features, image quality metrics and wavelet coefficient statistics. With a SVM classifier, these features yield an accuracy of 88% in identifying 5 camera models, where 3 models are from Canon brand. Celiktutan et al. [63] combined three sets of features including binary similarity measures, image quality metrics and wavelet coefficients statistics for identification of cell-phone camera models. With feature-level fusion and SFFS feature selection, it reported an average identification accuracy of 95.1% for 16 cell-phone models using SVM. This work also tested score-level fusion strategies with different rules, where a maximal accuracy of 97.5% is reported based on the product rule. Xu et al. [64] proposed a set of 390 statistical moments to characterize the image spatial representation, JPEG representation and pixel co-occurrences. Using SVM, these features achieved 97.6% accuracy to identify 6 camera models from 4 brands when one model is represented by one camera. But in the case that each model is represented by over 150,000 Flickr images from 30 to 40 cameras, the reported accuracy drops to 85.9% for identifying 8 models from 4 brands. The corresponding brand identification accuracy is 96.3%. Gloe et al. [65] combined color features [61], image quality metrics [63] and high-order wavelet statistics [63] to study several camera model identification related issues, such as the intra-model camera similarity, the influence of the number of devices per model and images per device in learning and the influence of image size. Based on 44 cameras from 12 models and using SVM classification, the result shows that the overall model identification accuracy can improve significantly when a larger percentage of images are used for training. In contrast, increasing the number of devices per model only contributes slightly toward better performance. The result also shows that cropping all images to smaller size of 25601920 would considerably degrade the overall model identification accuracy from 95.7% to 89.7%. For scanner model identification, Gou et al. [44, 45] proposed a set of 60 noise statistics features. These features characterize the noise statistics for different denoising algorithms, the distribution of wavelet coefficients and the statistics of neighborhood prediction errors. Based on 14 scanners from 11 models, about 94% model identification accuracy was achieved using a SVM classifier. Khanna and Delp [66] proposed to compute a set of gray-level co-occurrences and difference histograms at character level and at block level to identify the source scanner models from scanned text documents. Based on 5 scanner models, these features achieve more than 95% identification accuracy for cropped image size of 512512. In [67], the same authors also demonstrated that the block-level features show good robustness towards different font sizes and font shapes.

16

For device class identification, McKay et al. [48] combined the noise statistics features in [44] together with color interpolation coefficients [55] to identify different source classes including computer graphics, DSCs, mobile cameras and scanners and their source models. An average accuracy of 93.8% was achieved for classification of four types of devices, 97.7% for five cell-phone camera models, 96.2% for four scanner models and 94.3% for four camera models. Fang et al. [68] suggested combining 72 wavelet coefficients features and 12 noise moments to distinguish DSLR cameras from compact cameras. With 8 DSLR cameras and 12 compact cameras, a classification accuracy of 94.5% is achieved based on image size of 10241024. The results also show that the accuracy can drop to about 84% for a small image size of 256256. Khanna et al. [46] proposed to compute statistics of the row correlation scores and the column correlation scores from noise residuals as features to distinguish scanned images and camera captured images. These features are designed to capture the differences between 1D sensor noise pattern used in scanners and the 2D pattern for DSCs. Experimentally, 17 features achieved 98.6% accuracy in distinguishing images from 3 scanner models and 3 cameras using SVM. In a more practical test that images from 2 cameras and 2 scanners are used for training and image from the remaining one camera and one scanner are for testing, the identification accuracy drops to 93.7%.

Software Tools Identification There are also interests to identify the commercial software tools such as RAW converters and source encoders. Digital single lens reflex (DSLR) cameras are relatively high-quality DSCs that allow users saving the sensor data into proprietary RAW formats. RAW converters, also known as electronic darkroom, can then be used to develop these RAWs into full-color photos on a personal computer (PC). These RAW converters perform similar function as the camera’s internal software processing unit. However, since they are designed for PCs, sophisticated demosaicing and post-processing techniques that require substantial computing resources can be implemented. Moreover, they allow the users to experiment on the darkroom settings to achieve the desired effects. Cao and Kot [58] proposed to use the similar demosaicing features in [59, 60] for classification of ten commercial RAW converters such as ACDSee, Photoshop CS, PaintShop Pro, etc, based on cropped 512512 blocks. For each converter, four types of cropped blocks were differentiated depending on the cropping locations in 22 periodical lattice. For a total of 40 RAW-converter classes, their reduced 36 demosaicing features achieved a remarkable low identification error rate of 1.4%.

17

Figure 5 Different Types of Sensor Noises Images and videos are often lossy encoded for reduced size before they are transmitted or stored. Image encoders can adopt different types of coding techniques such as transform coding, sub-band coding and differential image coding. These coding techniques can be further divided based on the types of transformation used. Lin et al. [69] developed a set of techniques that determine the source encoder based on the intrinsic fingerprints in the output images. The algorithm starts by testing whether an image has gone through block-based processing, followed by estimation of the block size. The proposed technique makes uses of the fact that block-based processing often introduces periodical discontinuities at the block boundaries. After this test, the work then developed separate intrinsic fingerprint analyses cum similarity measures for testing on transform coding, sub-band coding and differential image coding, respectively. Experimentally, it achieved over 90% detection rate for identifying five types of lossy encoders when the peak signal to noise ratio (PSNR)