Retrieval of Bitmap Compression History

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8 No. 8, 2010

Retrieval of Bitmap Compression History Salma Hamdy, Haytham El-Messiry, Mohamed Roushdy, Essam Kahlifa Faculty of Computer and Information Sciences Ain Shams University Cairo, Egypt {s.hamdy, hmessiry, mroushdy, esskhalifa}@cis.asu.edu.eg

Abstract—The histogram of Discrete Cosine Transform coefficients contains information on the compression parameters for JPEGs and previously JPEG compressed bitmaps. In this paper we extend the work in [1] to identify previously compressed bitmaps and estimate the quantization table that was used for compression, from the peaks of the histogram of DCT coefficients. This can help in establishing bitmap compression history which is particularly useful in applications like image authentication, JPEG artifact removal, and JPEG recompression with less distortion. Furthermore, the estimated table calculates distortion measures to classify the bitmap as genuine or forged. The method shows good average estimation accuracy of around 92.88% against MLE and autocorrelation methods. In addition, because bitmaps do not experience data loss, detecting inconsistencies becomes easier. Detection performance resulted in an average false negative rate of 3.81% and 2.26% for two distortion measures, respectively.

Keywords: Digital image forensics; forgery detection; compression history; Quantization tables.

I.

INTRODUCTION

Although JPEG images are the most widely used image format, sometimes images are saved in an uncompressed raster form (bmp, tiff), and in most situations, no knowledge of previous processing is available. Some applications are required to receive images as bitmaps with instructions for rendering at a particular size and without further information. The image may have been processed and perhaps compressed with contain severe compression artifacts. Hence, it is useful to determine the bitmap history; whether the image has ever been compressed using the JPEG standard and to know what quantization tables were used. Most of the artifact removal algorithms [2-9] require the knowledge of the quantization table to estimate the amount of distortion caused by quantization and avoid over-blurring. In other applications, knowing the quantization table can help in avoiding further distortion when recompressing the image. Some methods try to identify bitmap compression history using Maximum Likelihood Estimation (MLE) [10-11] or by modeling the distribution of quantized DCT coefficients, like the use of Benford’s law [12], or modeling acquisition devices [13]. Furthermore, due to the nature of digital media and the advanced digital image processing techniques, digital images may be altered and redistributed very easily forming a rising

threat in the public domain. Hence, ensuring that media content is credible and has not been altered is becoming an important issue governmental security and commercial applications. As a result, research is being conducted for developing authentication methods and tamper detection techniques. Usually JPEG compression introduces blocking artifacts and hence one of the standard passive approaches is to use inconsistencies in these blocking fingerprints as a reliable indicator of possible tampering [14]. These can also be used to determine what method of forgery was used. In this paper we are interested in the authenticity of the image. We extend the work in [1] to bitmaps and use the proposed method for identifying previously compressed bitmaps and estimating the quantization table that was used. The estimated table is then used to determine if the mage was forged or not by calculating distortion measures. In section 2 we study the histogram of DCT AC coefficients of bitmaps and show how it differs for previously JPEG compressed bitmaps. We then validate that without modeling rounding errors or calculating prior probabilities, quantization steps of previously compressed bitmaps can still be determined straightforward from the peaks of the approximated histograms of DCT coefficients. Results are discussed in section 3. Section 4 is for conclusions. II.

HISTOGRAM OF DCT COEFFICIENTS IN BITMAPS

We studied in [1] the histogram of quantized DCT coefficients and showed how it can be used to estimate quantization steps. Here, we study uncompressed images and validate that the approximated histogram of DCT coefficients can be used to determine compression history. Bitmap image means no data loss and hence all what is required to build an informative histogram is expected to be present in the coefficients histograms. The first step is to decide if the test image was previously compressed because if the image was an original uncompressed there is no compression data to extract. When the image is decided to have a compression history, the next step is to estimate that history. For grayscale image, compression history mainly means its quantization table which will be the focus of this paper. For color image, this is extended to estimating color plane compression parameters that includes subsampling and associated interpolation.


(a) (a) Lena image

(c) JPEG compressed Q(3,3)=6

(c) (d) Fig. 2. (a) |X*(3,3)| where Hmax occurs at Q(3,3)=6. (b) |X*(3,4)| where Hmax occurs at Q(3,4) = 10 (c) |X*(5,4)| where Hmax occurs at Q(5,4)=22. (d) |X*(7,5)| where Hmax occurs at Q(7,5) = 41.

(d) Previously compressed bmp

Fig. 1. Histograms of X*(3,3).

Fig. 1(b) shows the approximated histogram H* of DCT coefficient at position (3,3) of the luminance channel of an uncompressed Lena image and the histogram of the image after being JPEG compressed with quality factor 80. It is clear that the latter contains periodic patterns that are not present in the uncompressed version. It was observed that the coefficient is very likely to have been quantized with a step of this periodic [15]. Now if that JPEG was stored in a bitmap uncompressed form, we expect the DCT coefficients to have the same behavior because nothing is lost during this format change. This is evident in Fig. 1(d) which shows an identical histogram to the one in Fig. 1(c). Hence, similar to the argument in [1], if we closely observe the histogram of H*(i,j) outside the main lobe, we notice that the maximum peak occurs at a value that is equal to the quantization step used to quantize Xq(i,j). This observation applies to most low frequency AC coefficients. Fig. 2(a) and (b) show |H|, the absolute histograms of DCT coefficients for Lena of Fig. 1(a) at frequencies (3,3) and (3,4), respectively. As for high frequencies, the maximum occurred at a value matching Q(i,j) when |X*(i,j)|>B, (Fig. 2 (c) and (d)), where B is as follows: *

Γ  X (i,j) X q (i,j)  B(i,j) 



0.5 c(u) c(v) cos

( 2u  1 )iπ ( 2v  1 )jπ .cos 16 16

(1)

u,v where Xq(i,j) is the quantized coefficient, and X*(i,j) is the approximated quantized coefficient, Γ is the round off error,



and c( )  1

1

(b)

(b) Uncompressed

2 for   0 otherwise

See [1, 11]. Sometimes we do not have enough information to determine Q(i,j) for high frequencies (i,j). This happens when the histogram outside the main lobe decays rapidly to zero showing no periodic structure. This reflects the small or zero value of the coefficient. At such cases, it can be useful to

estimate as many of the low frequencies and then search through lookup tables for a matching standard table. Estimating the quantization table of a bitmap can help determine part of its compression history. If all (or most of) of the low frequency steps were estimated to be ones, we can conclude that the image did not go through previous compression. High frequencies may bias because they have very low contribution and do not provide a good estimate. Moreover, this method works well also for uncompressed or lossless compressed tiff images. Fig. 3(d) shows the 96.7% correctly estimated Q table using the above method of a tiff image taken from UCID [16]. The X’s mark the “undetermined” coefficients. Now for verifying the authenticity of the image, we use the same distortion measures we used in [1]. The average distortion measure is calculated as a function of the remainders of DCT coefficients with respect to the original Q matrix: 8 8

B1 

  modD(i, j), Q(i, j)

(2)

i 1 j 1

where D(i,j) and Q(i,j) are the DCT coefficient and the corresponding quantization table entry at position (i,j), respectively. An image block having a large average distortion value indicates that it is very different from what it should be and is likely to belong to a forged image. Averaged over the entire image, this measure can be used for making a decision about authenticity of the image. In addition, the JPEG 8×8 “blocking effect” is somehow still present in the uncompressed version and hence blocking artifact measure, BAM [14], can be used to give an estimate of the distortion of the image. It is computed from the Q table as: 8

B2 (n) 

8

 D(i, j ) 

  D(i, j)  Q(i, j) round  Q(i, j) 

(3)

i 1 j 1

where B(n) is the estimated blocking artifact for the nth block.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8 No. 8, 2010 5 4 1 1 1 1 1 1 (a) Test image 3 5 6 6 7 10 20 29

4 5 5 7 9 14 26 37

4 6 6 9 15 22 31 38

6 8 10 12 22 26 35 39

10 10 16 20 27 32 41 45

4 1 1 1 1 1 1 1

3 1 1 1 1 1 1 1

2 1 1 1 1 1 1 1

1 1 1 1 14 12 13 13

1 10 10 10 12 13 11 12

1 10 10 10 12 11 12 12

1 10 10 10 12 11 11 12

(b) Estimated Q for uncompressed version (most low frequencies are ones). 16 23 23 35 44 42 48 40

20 24 28 32 41 45 47 X

24 22 22 25 31 37 X X

3 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 X

0 0 0 0 0 0 X X

(c) Estimated Q for previously compressed version with QF = 80. (d) Difference between (c) and original table for QF=80. Fig. 3. Estimating Q table for original and previously compressed tif image.

III.

EXPERIMENTAL RESUTLS AND DISCUSSION

A. Estimation Accuracy Our testing image set consisted of 550 images collected from different sources (more than five camera models), in addition to some from the public domain Uncompressed Color Image Database (UCID), which provides a benchmark for image processing analysis [16]. Each of these images was compressed with different quality factors, [60, 70, 80, and 90]. Again, each of these was uncompressed and resaved as bitmap. This yielded 550×4 = 2,200 untouched images. For each quality factor group, an image’s histogram of DCT coefficients at one certain frequency was generated and used to determine the corresponding quantization step at that frequency according to section 2. This was repeated for all the 64 histograms of DCT coefficients. The resulting quantization table was compared to the quality factor’s known table and the percentage of correctly estimated coefficients was recorded. Also, the estimated table was used in equations (2) and (3) to determine the image’s average distortion and blocking artifact measures, respectively. These values were recorded and used later to set a threshold value for distinguishing forgeries from untouched images. Table 1 shows the accuracy of estimating all 64 entries using the proposed method for each quality factor averaged over the whole set. It exhibits a similar behavior to JPEG images; as quality factor increases, estimation accuracy increases steadily with an expected drop for quality factors higher than 90 as the periodic structure becomes less prominent and the bumps are no longer separate enough . Overall, we can see that the estimation accuracy is higher than TABLE I.

PERCENTAGE OF CORRECTLY ESTIMATED COEFFICIENTS FOR SEVERLA QFS

QF BMP JPEG[1]

60

70

80

90

82.07% 72.03%

84.80% 76.99%

87.44% 82.36%

89.44% 88.26%

that of JPEG images [1]. We anticipate that because lossy compression tends to lessen available data to make a better estimate. Average estimation time for all 64 entries of images of size 640×480 for different QFs was 52.7 seconds. Estimating Q using MLE methods [10-11] is based on searching for all possible Q(i,j) for each DCT coefficient over the whole image which can be computationally exhaustive for large size files. Another method [12] proposed a logarithmic law and argued that the distribution of the first digit of DCT coefficients follows that generalized Benford’s law. The method is based on re-compressing the test image with several quality factors and fitting the distribution of DCT coefficients of each version to the proposed law. The QF of the version having the least fitting artifact is chosen and its corresponding Q table is the desired one. Of course the above methods can only estimate standard compression tables. Although it may be accurate, it is time consuming. Plus it fails when the recompression quantization step is an integer multiple of the original compression step size. Another method [17] tends to calculate the autocorrelation function of the histogram of DCT coefficients. The displacement corresponding to the peak closest to the peak at zero is the value of Q(i,j) given that the peak is higher than the mean value of the autocorrelation function. The method eventually uses a hybrid approach; the low frequency coefficients are determined directly from the autocorrelation function, while the higher-frequency ones are estimated by matching the estimated part to standard JPEG tables scaled by a factor of s, which is determined from the known coefficients. Table 2 shows the estimation accuracy while Table 3 shows estimation time, for the different mentioned methods against ours. Note that accuracy was calculated for directly estimating only the first nine AC coefficients without matching. This is due to the methods failing to estimate high frequency coefficients as most of them are quantized to zero. On the other hand, the listed time is for estimating the nine coefficients and then retrieving the whole matching table from JPEG standard lookup tables. Maximum peak is faster than

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8 No. 8, 2010 TABLE II.

ESTIMATION ACCURCAY FOR THE FIRSY 3×3 AC COEFFICIENTS FOR SEVERAL QFS

QF Method MLE

50

60

70

80

90

75.31

83.10

90.31

96.34

93.83

59.5

Avg. Acc. 83.06

Benford

99.08

87.59

80.82

93.81

59.47

31.53

75.38

Auto.

48.94

50.37

63.71

81.43

65.37

57.50

61.22

Max.Peak

97.93

97.07

99.01

97.67

89.57

76.04

92.88

TABLE III.

100

ESTIMATION TIME IN SECONDS FOR THE FIRSY 3×3 AC COEFFICIENTS FOR SEVERAL QFS

QF Method MLE

50

60

70

80

90

100

38.73

37.33

37.44

37.36

37.32

34.14

Benford

59.95

58.67

58.70

58.72

58.38

80.04

Auto

9.23

11.11

11.10

11.12

11.24

8.96

Max.Peak

11.27

11.29

11.30

11.30

11.30

11.56

statistical modeling and nearly as fast as that autocorrelation method. However, average accuracy of our method is far higher. MLE is reliable with 83% accuracy but with more than double the time. Benford’s law based method has an accuracy of 75 % but is the worst in time because recompressing the image and calculating distributions for each compressed version may become time consuming for larger images. Images used in the experiments were of size 640×480. B. Forfery Detection From the untouched previously compressed bitmap image set, we selected 500 images for each quality factor, each of which was subjected to four common forgeries; cropping, rotation, composition, and brightness changes. Cropping forgeries were done by deleting some columns and rows from the original image. An image was rotated by 270 o for rotation forgeries Copy-paste forgeries were done by randomly copying a block of pixels from an arbitrary image and placing it in the original image. Random values were added to every pixel of the image to simulate brightness change. The resulting fake images were then stored in their uncompressed form for a total of (500×4) × 4 = 8,000 images. Next, the quantization table for each of these images was estimated as above and used to calculate the image’s average distortion, (2), and the blocking artifact, (3), measures, respectively. Fig. 4(a) and (b) show values of the average distortion measure and blocking artifact measure, respectively. The scattered dots represent 500 untouched images (averaged for all quality factors for each image) while the cross marks represent 500 images from the forged dataset. As the figure TABLE IV. Distortion Measure JPEG Average BMP BAM

JPEG BMP

(a) Average distortion measure

(b) Blocking artifact measure

Fig. 4. Distortion measures for untouched and tampered images.

shows, values from forged images tend to cluster higher than those from untampered images. We tested the distortion measure for untouched images against several threshold values and calculated the corresponding false positive rate FPR (the number of untouched images declared as tampered), An ideal case would be a threshold giving zero false positive. However, we had to take into account the false negatives (the number of tampered images declared as untampered) that may occur when testing for forgeries. Hence, we require a threshold value keeping both FPR and the FNR low. For average distortion measure, we selected a value that gave FPR of 10.8% and a lower FNR as possible for the different types of forgeries for average distortion. The horizontal line marks this threshold τ = 50. Similarly, we selected the BAM’s threshold to be τ = 40, with a corresponding FPR of 5.6%. Table 4 shows the false negative rate (FNR) for the different forgeries at different quality factors for bitmaps and JPEGs. As expected, as QF increases, a better estimate of the quantization matrix of the original untampered image is obtained, and as a result the error percentage decreases. Notice how the values drop than those for JPEG file. Notice also that detection of cropping is possible when the cropping process breaks the natural JPEG grid, that it, the removed rows or columns do not fall in line with the 8×8 blocking. Similarly, when the pasted part fails to fit perfectly into the original JPEG compressed image, the distortion metric exceeds the detection threshold, and a possible composite is declared. Fig. 5 shows examples of composites. The resulting distortion measures for each composite are shown in left panel. The dark parts denote low distortion whereas brighter parts indicate high distortion values. Notice the highest values correspond to the alien part and hence mark the forged area.

FORGERY DETECTION ERROR RATES FOR BITMAPS AND JPEGS

Original 12.6% 10.8%

Cropping 9.2% 3.9%

Rotation 7.55% 4.45%

6.8% 5.6%

3.3% 1.05%

5.95% 3.05%

Compositing 8.6% 2.0% 3.15% 1.25%

Brightness 6.45% 4.9% 5.0% 3.7%


IV.

CONCLUSIONS

The method discussed in this paper is based on using the approximated histogram of DCT coefficients of bitmaps for extracting the image’s compression history; its quantization table. Also the extracted table is used to expose image forgeries. The method proved to have practically high estimation accuracy when tested on a large set of image from different sources compared to other statistical approaches. Moreover, estimation times proved to be faster than statistical methods while maintaining very good accuracy for lower frequencies. Experimental results also showed that performance for bitmaps surpasses that of JPEGs because of their lossy nature but on the other hand, it takes more time to process a bitmap. REFERENCES [1] Hamdy S., El-Messiry H., Roushdy M. I., Kahlifa M. E., “Forgery detection in JPEG compressed images”, JAR-Unpublished, 2010. [2] Rosenholtz R., Zakhor A., “Iterative procedures for reduction of blocking effects in transform image coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 2, pp. 91–94, Mar. 1992. [3] Fan Z., Eschbach R., “JEPG decompression with reduced artifacts,” Proc. IS&T/SPIE Symp. Electronic Imaging: Image and Video Compression, San Jose, CA, Feb. 1994. [4] Fan Z., and F. Li, “Reducing artifacts in JPEG decompression by segmentation and smoothing,” Proc. IEEE Int. Conf. Image Processing, vol. II, 1996, pp. 17–20. [5] Tan K. T., Ghanbari M., “Blockiness detection for MPEG-2-coded video,” IEEE Signal Process. Lett., vol. 7, pp. 213–215, Aug. 2000. [6] Minami S., Zakhor A., “An optimization approach for removing blocking effects in transform coding,” IEEE Trans. Circuits Syst. Video

Technol., vol. 5, pp. 74–82, Apr. 1995. [7] Yang Y., N Galatsanos. P., Katsaggelos A. K., “Regularized reconstruction to reduce blocking artifacts of block discrete cosine transform compressed images,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp. 421–432, Dec. 1993. [8] Luo J., Chen C.W., Parker K. J., Huang T. S., “Artifact reduction in low bit rate dct-based image compression,” IEEE Trans. Image Process., vol. 5, pp. 1363–1368, 1996. [9] Chou J., Crouse M., Ramchandran K., “A simple algorithm for removing blocking artifacts in block-transform coded images,” IEEE Signal Process. Lett., vol. 5, pp. 33–35, 1998. [10] Fan Z., de Queiroz R. L., “Maximum likelihood estimation of jpeg quantization table in the identification of bitmap compression history”, in Proc. Int. Conf. Image Process. ’00, 10-13 Sept. 2000, 1: 948–951. [11] Fan Z., de Queiroz R. L., “Identification of bitmap compression history: jpeg detection and quantizer estimation”, in IEEE Trans. Image Process., 12(2): 230–235, February 2003. [12] Fu D., Shi Y.Q., Su W., “A generalized benford's law for jpeg coefficients and its applications in image forensics”, in Proc. SPIE Secur., Steganography, and Watermarking of Multimed. Contents IX, vol. 6505, pp. 1L1-1L11, 2007. [13] Swaminathan A., Wu M., Ray Liu K. J., “Digital image forensics via intrinsic fingerprints”, IEEE Trans. Inf. Forensics Secur., 3(1): 101-117, March 2008. [14] Ye S., Sun Q., Chang E.-C., “Detection digital image forgeries by measuring inconsistencies in blocking artifacts”, in Proc. IEEE Int. Conf. Multimed. and Expo., July, 2007, pp. 12-15. [15] J. Fridrich, M. Goljan, and R. Du, "Steganalysis based on JPEG compatibility", SPIE Multimedia Systems and Applications, vol. 4518, Denver, CO, pp. 275-280, Aug. 2001. [16] Schaefer G., Stich M., “UCID – An Uncompressed Color Image Database”, School of Computing and Mathematics, Technical. Report, Nottingham Trent University, U.K., 2003. [17] Petkov A., Cottier S., “Image quality estimation for jpeg-compressed images without the original image”, EE398 Projects - Image and Video Compression, Stanford University, March 2008.


(a) Three composite bitmap images.

(b) Distortion measure for the three images in (a). Fig. 5. Distortion measures for some composite bitmap images. The left panel represents the average distortion measure while the right panel represents the blocking artifact measure.

Retrieval of Bitmap Compression History

Retrieval of Bitmap Compression History

Suggest Documents

SECOMPAX: A bitmap index compression algorithm

Optimizing Bitmap Indices with Efficient Compression - CiteSeerX

Optimizing Bitmap Indices with Efficient Compression - CiteSeerX

Improving Bitmap Index Compression by Data ... - Semantic Scholar

image wavelete compression using shift coding and bitmap slicing

Group Bitmap Index: A Structure for Association Rules Retrieval

An Efficient Compression Scheme For Bitmap Indices - SDM

SPLWAH: a bitmap index compression scheme for searching in ...

compression and progressive retrieval of multi ... - ISPRS Archives

Bitmap Printing Test

Compression History Identification for Digital Audio Signal

Compression, Indexing, and Retrieval for Massive String Data â

Compression: A Key for Next Generation Text Retrieval Systems

Image Retrieval using Compression-based Techniques - eLib - DLR

Image Retrieval using Compression-based Techniques - DLR ELIB

Compression, Indexing, and Retrieval for Massive String Data â

Compression: A Key for Next Generation Text Retrieval Systems

Update Conscious Bitmap Indices - CiteSeerX

evaluation of bitmap index using prototype data

EVALUATIoN oF FRACTAL DIMENSIoNS FRoM BITMAp IMAGE

The History of Information Retrieval Research - Center for Intelligent ...

VEKTORSKA I BITMAP GRAFIKA Razumijevanje razlike između ...

HOBI: Hierarchically Organized Bitmap Index for ...

Better bitmap performance with Roaring bitmaps

Retrieval of Bitmap Compression History