A New Watermarking Algorithm for Scanned Grey PDF Files Using DWT and Hash Function Walid Alakk, Hussain Al-Ahmad and Alavi Kunhu Electrical and Computer Engineering Department Khalifa University of Science, technology and Research, Sharjah, UAE
[email protected],
[email protected],
[email protected] Abstract—This paper deals with the development and assessment of a watermarking technique which is suitable for scanned PDF documents. The watermark will serve two purposes. The first one is to protect the copyright ownership. This watermark is a logo which can be extracted even if the document has gone through slight image manipulations. The second one is to authenticate the document. A slight editing in the document will change the second watermark and indicate forgery. The algorithm was tested successfully on a variety of scanned documents and the performances of the algorithm were assessed. Keywords: Watermarking; DWT; Hash function; Hash-key; LSB; PSNR; PDF file.
I.
INTRODUCTION
Watermarking algorithms are used to insert digital data or digital signatures in the original media file to prove the owner’s identity of that file and prevent copyright violation. Several commercial companies around the world offer copyright protection services to their customers. The inserted watermark can be visible [1] where it can be seen by anyone who is viewing the file, or it can be invisible where it can be only detected by the one who created the watermark using some decoding algorithms. For imperceptible watermark, there is a need for it to be robust so that it cannot be destroyed or lost by modifying the digital media file. There is another requirement for copyright protection; the watermarking algorithm should be blind which means the original media file is not needed to extract the watermarking information [2]. Moreover, watermarking techniques can be applied either in the transform domain or in the spatial domain. Transform domain techniques proved to be more immune and can survive different attacks [3]. In contrast, spatial domain techniques are more sensitive and fragile and can be used to authenticate the content of the watermarked file [4]. The distortion in the watermarked file after the watermarking process can be analyzed and assessed objectively by using the peak signal to noise ratio (PSNR) [5, 6]. In addition, the watermarking effect can be assessed subjectively by viewing the watermarked file. This paper introduces a way to watermark PDF files containing grey images in both wavelet and spatial domain by converting the PDF file into a bit map image file. Two watermarks will be inserted in the PDF file. The first one will be inserted in the wavelet domain using the discrete
wavelet transform (DWT) and the second one in the spatial domain using hash function. Five sections are included in this paper. Section 2 discusses the algorithm for embedding the watermark signals. Section 3 illustrates the extraction process. Section 4 demonstrates the results of watermarking and its effect on the extracted watermark. Finally, section 5 concludes the work. II. EMBEDDING THE WATERMARK SIGNALS The watermarked PDF file will have two watermarks. One of them is robust and used for copyright protection. The first watermark is inserted in the wavelet domain of the converted PDF file using DWT coefficients. The second watermark is fragile and used for authentication and discovering changes in the watermarked file. This watermark is inserted in the spatial domain of the file using the least significant bit (LSB) method. The fragile watermark is generated by using SHA-256 hash function [7]. Fig. 1 represents a block diagram for the operation of the algorithm.
Original PDF file
Converted PDF file into image format of Bitmap (bmp)
Insertion of four logos in the wavelet domain
Insertion of the fragile hash-key in the spatial domain
Watermarked image with two types of watermarking
The final multiwatermarked PDF file
Fig. 1. A block diagram of the algorithm
A. DWT algorithm The PDF file is first converted into an image file. Then the image is decomposed as follows: Low-low frequency, low-high frequency, high-low frequency and high-high frequency. Then the low-low frequency is divided into different four sections, LL2, LH2, HL2 and HH2 and the logos will be inserted in one of these sections. The decomposition is achieved in two levels using “Haar filter”. Fig. 2 shows how the image is decomposed. If the logo is smaller than the number of available bits in the LL2 section,
then it can be repeated several times. The bits of the watermark logo will be inserted in LL2 section. The insertion process in the coefficients is done by using the Even/Odd technique. Fig. 3 shows a diagram that summarizes the DWT embedding process. The equation for the Even/Odd process is as follows: LL1=DWT {Org_img}, LL2=DWT {LL1}
(1)
⎧ ⎛ LL 2 ) ⎞ ⎪ΔQD ⎜ ⎟ if w ( i , j ) = 1 then LL 2 = ⎨ ⎝ Δ ⎠ ⎪⎩ LL 2 ⎧ ⎛ LL 2 ) ⎞ ⎪ΔQe⎜ ⎟ if w ( i , j ) = 0 then LL 2 = ⎨ ⎝ Δ ⎠ ⎪⎩ LL 2
LL2
HL2
LH2
HH2
LH1
HH1
Watermark logos
Converted into image format Convert to wavelet domain using DWT
Step 4: The image is reconstructed by combining the selected row with the rest of the image. This results in an image that is watermarked with two watermarks (robust logos and fragile hash-key). Finally, the image is converted back into PDF file. Watermarked Image using DWT Divide into border in which the hash-key is inserted and the rest of the image from which the hash-key is extracted
HL1
Fig. 2. Pyramidal structure Original PDF file
Step 3: The 256 bits are inserted in the selected row of the DWT watermarked image (region2). The method that is used for inserting the hash-key is by using the LSB in the spatial domain.
Hash-key data obtained
Hash-key holding
Generated “SHA-256” for the hash-key data obtained LSB method for hiding the obtained hash-key Watermarked image with robust and fragile watermarks
Coefficient Selected
Watermarked PDF file with robust and fragile watermark Fig. 4. The embedding of the hash-key
Embedding Process Watermarked Image using DWT Fig. 3. The DWT embedding
In the previous equations, Org_img indicates the original image and , represents the watermarked image. represents an even quantization while Moreover, represents an odd quantization to the nearest integer number. The ∆ is the quantization scaling factor.
Fig. 5. Regions of extracting and embedding the hash-key
III. EXTRACTION OF WATERMARKS B. Hash function algorithm The process of embedding the hash-key is shown in Fig. 4. Step 1: the watermarked image, with the robust watermark using DWT, is divided into two parts as shown in Fig.5. One of them is the whole image excluding one row (region1). The other part is the selected row (region 2). Step 2: The hash-key using SHA256 is calculated for region 1. Then, this hash-key is converted to binary number with 256 bits.
A. Hash function extraction The process of extracting the hash key begins with converting the watermarked PDF file into an image file. Then, the selected row of the image is cropped and the hash key is extracted from the LSB of the pixels in this row (region 2). Then the hash key is calculated using the pixels in region 1 (i.e. the image after excluding the selected row). The two hash-keys are compared as shown in Figure 6. If they are equal then the file is authenticated. If they are different then this indicates tampering in the PDF file.
In the previous equation, Q points to the nearest quantization value and ∆ is the scaling factor. Then the logos are extracted. The logos are averaged and one logo is extracted. For example if 4 logos are inserted then the extracted logo will be:
Watermarked PDF file with two types of watermarks Watermarked image with two types of watermarks
W (i, j ) = w1(i, j ) + w2(i, j ) + w3(i, j ) + w4(i, j ) Divide into region1 and region 2
Region 2
Region 1
Re-construct ‘SHA256’ hash key from the image except the border
LSB method for extracting hash-key from border
(3)
if W (i, j ) ≥ 2 → W (, j ) = 1 if W (i, j ) < 2 → W (, j ) = 0
IV. RESULTS The new algorithm was implemented and tested on different PDF files. Examples are shown in Fig. 8 and Fig.9. A logo was inserted four times in each image and the hashkey was embedded in the border and extracted from the hash part of the image (region 1) using SHA-256 hash function.
Compare the extracted and re-constructed hash-key to determine the authenticity Fig. 6. The extraction process of the hash-key
B. DWT extraction algorithm After extracting the hash-key, the robust watermark will be extracted from the multi watermarked PDF file. The PDF file is first converted into an image file. Then the image file is converted into the wavelet domain using DWT. Fig. 7 shows the diagram for the DWT extraction process of the robust watermark. Watermarked PDF file with two types of watermarks
Fig. 9. PDF file and the converted image file of the scanned painting
The distortion caused by the watermarking was assessed by using the PSNR and different scaling factors. Table I shows that the fragile watermark cause very small damage to the image compared to the robust watermark.
Watermarked image with two types of watermarks
Convert to wavelet domain using DWT
Fig. 8. PDF file and the image file
Coefficient Selected
TABLE I. PERFORMANCE OF MULTI-WATERMARKED FILE WITH DIFFERENT SCALING FACTORS
Extraction Process Four Extracted Logos Recovered C. robust logo
DWT scaling factor
DWT robust watermark
Multi watermarked image
PSNR
PSNR
8 (Lena) 16 (Lena) 8 (Painting) 16 (Painting)
43.9521 38.1772 50.5841 44.8638
43.9496 38.1759 50.5820 44.8626
Fig. 7. The extraction process of the logo
The watermark logos are extracted using the following equation: ⎛ LL 2 ⎞ Q⎜ ⎟ ⎝ Δ ⎠ → if odd , then
w (i, j) = 1
→ if
w (i, j) = 0
even , then
(2)
The obtained hash-key using SHA-256 hash function represents a number with a size of 64 hexadecimal. Table II shows the hash-key which is extracted from region 1. Any small or big change or attack on the watermarked file will change the hash-key of region 1, making it different from the hash-key which was inserted in the border. Table III shows the difference in the hash-key when one pixel level was
changed. This demonstrates that the fragile watermark is capable of detecting the slightest modification to the PDF file. TABLE II. HASH-KEY USING SHA-256 Hash function SHA-256
Hash size
Image
Hash-key
64 hex
Lena
51f7861ff7a5f0d2fd6f7cda4 2322d447a9a04316478e07e e80a9c49b0610bba dd621a82148ac24aeb97f700 40d4aef11d0cc0414a78ad56 87135df8266181af
Painting
Fig. 12. Results of rotating Lena image
TABLE III. CHANGE IN THE HASH-KEY Image LENA
Painting
regenerated 51f7861ff7a5f0d2fd 6f7cda42322d447a9 a04316478e07ee80 a9c49b0610bba dd621a82148ac24a eb97f70040d4aef11 d0cc0414a78ad568 7135df8266181af
extracted 2e086888085a4f2d0212 8b27bdcd52bb8465fbde 8b838b8117f763b64f1df 465 44b47fe91386cf26fcf56 0b43b307c9f6aee91a71f a4c9264203035fdfeb866 c
The robustness of the logo watermark has been tested under different types of attacks such as JPEG compression. This means that if the watermarked PDF file was converted into an image and compressed in JPEG and then converted into PDF file. The results are as shown in Fig. 10 and Fig. 11 for the Lena and Paining files respectively. Fig. 12 and Fig. 13 show the extracted watermark after rotating the PDF files in different degrees. The logo survived the JPEG and the rotation attack.
Fig. 10. Extracted watermark logo from compressed Lena image
Fig. 11. Extracted watermark logo from compressed Painting image
Fig. 13. Results of rotating Painting image
V. CONCLUSION The new algorithm introduces a method of watermarking PDF files that uses two different types of watermark signals. One is robust and is used to protect the copyright ownership of the PDF file. The second watermark is fragile and it will be changed by any type of manipulation imposed on the PDF file. This type of watermarking can be used to authenticate the file. The algorithm was tested successfully on several PDF files. REFERENCES [1] W. Feng-Hsing, P. Jeng-Shyang, and C. Lakhmi, “Innovations in Digital Watermarking Techniques”, Vol.239, Springer, September 2009, pp.11-26. [2] G. Bhatnagar, and B. Raman, “A new robust reference watermarking scheme based on DWT-SVD”, Vol.31, No.5, September 2009, pp.1002-1013. [3] A. Al-Gindy, H. Al-Ahmad, R. Qahwaj, and A. Tawfik, "A frequency domain adaptive watermarking algorithm for still colour images”, in International Conference on Advances in Computational Tools for Engineering Applications, Lebanon, July 2009, pp.186-191. [4] M. Barni, F. Bartolini, and A. Piva, “An Improved Wavelet Based Watermarking Through Pixelwise Masking”, IEEE transactions on image processing, Vol.10, No.5, May 2001, pp.783-791 [5] Z. Wang, A. Bovic, H. Sheikh, and E. Simoncelli, "Image Quality Assessment: From Error Visibility to Structural Similarity”, IEEE Transactions on Image Processing, Vol.13, No.4, April 2004, pp. 600612. [6] S. Al-Mansoori, and A. Kunhu, “Robust Watermarking Technique based on DCT to Protect the Ownership of DubaiSat-1 Image against Attacks”, IJCSNS International Journal of Computer Science and Network Security, Vol.12, No.6, June 2012, pp.1-9. [7] J. Cannons, and P. Moulin, “Design and Statistical analysis of a Hash-Aided Image Watermarking System”, IEEE transaction on image processing, Vol.13, No.10, October 2004, pp.1393-1408.