Appl. Sci. 2015, 5, 1033-1049; doi:10.3390/app5041033
OPEN ACCESS
applied sciences ISSN 2076-3417 www.mdpi.com/journal/applsci Article
Multi-Bit Data Hiding Scheme for Compressing Secret Messages † Wen-Chung Kuo 1,†, *, Shao-Hung Kuo 2 and Lih-Chyau Wuu 1 1
Department of Computer Science and Information Engineering, National Yunlin University of Science & Technology, No.123 University Road, Section 3, Douliou, Yunlin 64002, Taiwan; E-Mail:
[email protected] 2 Graduate School of Engineering Science and Technology Doctoral Program, National Yunlin University of Science & Technology, No.123 University Road, Section 3, Douliou, Yunlin 64002, Taiwan; E-Mail:
[email protected]
†
This paper is an extended version of our paper published in The International Multi-Conference on Engineering and Technology Innovation 2015, Kaohsiung, Taiwan, 30 October–3 November 2015.
* Author to whom correspondence should be addressed; E-Mail:
[email protected]; Tel.: +886-5-534-2601 (ext. 4515); Fax: +886-5-531-2170. Academic Editors: Wen-Hsiang Hsieh and Takayoshi Kobayashi Received: 12 August 2015 / Accepted: 23 October 2015 / Published: 4 November 2015
Abstract: The goal of data hiding techniques usually considers two issues, embedding capacity and image quality. Consequently, in order to achieve high embedding capacity and good image quality, a data hiding scheme combining run-length encoding (RLE) with multi-bit embedding is proposed in this paper. This work has three major contributions. First, the embedding capacity is increased 62% because the secret message is compressed before embedding into the cover image. Secondly, the proposed scheme keeps the multi-bit generalized exploiting modification direction (MGEMD) characteristics, which are effective to reduce modified pixels in the cover image and to maintain good stego image quality. Finally, the proposed scheme can prevent modern steganalysis methods, such as RS steganalysis and SPAM (subtractive pixel adjacency matrix), and is compared to MiPOD (minimizing the power of the optimal detector) scheme. From our simulation results and security discussions, we have the following results: First, there are no perceivable differences between the cover images and stego images from human inspection. For example, the average PSNR of stego images is about 44.61 dB when the secret message (80,000 bits) is embedded for test cover images (such as airplane, baboon, Lena) of size 512×512. Secondly,
Appl. Sci. 2015, 5
1034
on average, 222,087 pixels were not modified after embedding for the cover image. That is to say, 12% less pixels are modified as compared to the MGEMD method. From the performance discussions, the proposed scheme achieves high embedding capacity and good image quality, but also maintains stego image security. Keywords: RLE; MGEMD; RS steganalysis; embedding capacity
1. Introduction Networks are ubiquitous in modern life. More and more things are increasingly digital, such as photos, videos, music, documents, personal information, and so on. Therefore, how to protect digital information is a hot issue. Cryptography and steganography are two popular technologies used to protect digital products. For cryptography, a key is used to encrypt data into meaningless numbers, then we can use the same key or another to decrypt. Common encryption methods are advanced encryption standard (AES), data encryption standard (DES), RSA and MD5. In general, cryptographic technologies provide a certain level of security, but cannot maintain security when the ciphertext is decrypted. Therefore, steganography technologies have been developed. Steganography technologies can be classified into watermarking and data hiding [1]. Digital watermarking technology, in general, can be divided into two categories [2], visible and invisible watermarking. A visible watermark’s advantage is the human eye can discern it. No algorithm is needed to view the information that represents data sources or the owner. The disadvantage of a visible watermark is that the image is changed by the watermark. It is easily overwritten or removed by signal processing technology. Watermarking techniques can be divided into two types: fragile and robust watermarks. A fragile watermark is primarily used to protect the integrity of the image. The slightest modification to the media with a fragile watermark results in the destruction of the watermark. A missing watermark denotes tampering. Robust watermarking can survive a designated class of transformations. An example of a robust watermark application is a watermark to carry copy and access control information. The media may be compressed, cropped or otherwise transformed, but the watermarked information survives. Utilizing digital signal processing and digital imaging technologies to hide secret data without reducing the quality of the cover image is called data hiding. This technique is not readily apparent and hides information in any form (text, images, video). Data hiding has two techniques: spatial domain and transform domain. This kind of data hiding technology has very high image quality and is undetectable by the human eye. The technology of data hiding can be classified into two types: one is irreversible data hiding, and the other is reversible. The difference is reversible data hiding is lossless and can reconstruct the original cover image from the stego image after the secret message is extracted. The watermarking technology always modifies pixels in the frequency domain [3], but pixels modified in the frequency domain contribute to more distortion. Therefore, data hiding technology usually occurs in the spatial domain for less distortion than watermarking technology and quick processing. For data hiding in the spatial domain, least significant bit (LSB) replacement is a classic scheme [4], but the
Appl. Sci. 2015, 5
1035
capacity of embedded data is one bit per pixel (bpp), and it has no security. The exploiting modification direction (EMD) [5] scheme can embed 1.16 bpp for two pixels in each group. Data compression [6] can save storage space and speed up network transmission. In other words, the raw data are processed through various mathematical algorithms to reduce data storage space. This reduced amount of data is transmitted. Then, the decompression operation can recreate the original data at the receiver. There are two types of data compression. One is lossless compression, such as PCX, GIF, TIFF, TGA and PNG image formats, ZIP, RAR data compression technology, run-length encoding, Huffman coding and Lempel–Ziv–Welch (LZW). The other is lossy compression, such as JPEG (Joint Photographic Coding Expert Group), VQ (Vector Quantization) [7–9] and SMVQ [10,11] (Side Match Vector Quantization). Therefore, to achieve the smaller secret message size, the data compression technology is a good solution. To leverage the advantages of compression, we will propose a data hiding scheme that can embed more secret data, i.e., secrets are pre-compression, and then uses multi-bit data hiding. The proposed scheme effectively reduces secret messages size to improve embedding capacity and also combines multi-bit generalized exploiting modification direction (MGEMD) [12] to increase the embedding capacity. In Section 2, we review previous work of RLE (run-length encoding) technology and the multi-bit embedding scheme. Section 3 gives a detailed introduction of the proposed method and then proposes a modified speed up method for MGEMD. In addition, there is also a discussion on the overflow/underflow problems and solutions. Experiments are given in Section 4. Finally, some conclusions are given in Section 5. 2. Related Work In this section, there are two main related works. In the data compression part, we will review the RLE method. Then, three data hiding methods [5,12,13] based on the extraction function are introduced. 2.1. Run-Length Encoding Run-length encoding (RLE) [14] is a well-known, simple and quick form of data compression in which sequences of the same data value are found in many consecutive data elements. The RLE applications of this encoding are when the source information comprises long substrings of the same character or binary digit. For this reason, using RLE to compress the binary secret message is very applicable. For example, secret message 00001011101 will encoded into 0(4)1(1)0(1)1(3)0(1)1(1). In 2006, Chang et al. [15] proposed two new image steganographic methods using the run-length approach. There are two methods, one is BRL (hiding bitmap files by run-length), which focuses on binary images, and the other is GRL (hiding general data files by run-length). The major idea of these methods is to use RLE to increase the SMVQ method embedding capacity [16]. For binary images, Agaian and Cherukuri [17] also proposed run-length based steganography. Their proposed algorithm is dependent on their run length characteristics and characteristics values of the block and alters pixels of the cover image’s embeddable blocks. Simultaneously, this scheme also enhances the security of the embedded data and the capacity of the embedding method. In addition, steganographic access control
Appl. Sci. 2015, 5
1036
in data hiding using run-length encoding and modulo operations was proposed by Lee et al. [18] in 2011. In their scheme, a high capacity steganographic with access control modifies sharp bitstreams into smooth bit streams and embedded into the cover image. The modulo value is fixed in this scheme, meaning the embedding capacity is limited. Accordingly, RLE to increase embedding capacity for data hiding is important. In particular, it will increase the compression ratio when there are many continuities of ones and zeros in these binary images (black and white picture). We use binary images and also gray-scale images for the experiments in this paper. The results reveal a good compression ratio and improved embedding capacity. 2.2. Exploiting Modification Direction Method The exploiting modification direction (EMD) [5] method was proposed by Zhang and Wang in 2006. This method can embed more secret message capacity than the 1-LSB replacement data hiding method. In EMD, two pixels in each group and each pixel value in the image only change once (−1, 0 or +1). Therefore, to achieve this condition, the following extraction function as Equation (1) is given in the Zhang and Wang scheme. n X f (g1 , g2 , . . . , gn ) = [ (gi × i)] mod (2n + 1)
(1)
i=1
where gi is the value of the pixel i and n is the number of pixels. For example, when n = 2, two pixels, g1 and g2 , are considered. Therefore, the extract function is f (g1 , g2 ) = (1 × g1 + 2 × g2 ) mod 5. According to their analysis, the best hiding bit rate is in five-ary. However, the secret embedding capacity decreases when the pixel number increases for each group. Specifically, the embedding capacity is less than 1 bpp (bits per pixel) when the pixel numbers are more than three for each group. 2.3. Generalized Exploiting Modification Direction In order to improve the secret embedding capacity and to embed the binary secret data directly, Kuo and Wang proposed the data hiding method based on generalized exploiting modification direction (GEMD scheme) [13]. The main idea of the GEMD scheme is that each (n + 1)-bit binary secret message can be hidden into n adjacent pixels in the cover image. The new extraction function fb (g1 , g2 , . . . , gn ) is defined as Equation (2): n X fb (g1 , g2 , . . . , gn ) = [ (gi × (2i − 1))] mod 2n+1
(2)
i=1
2.4. Multi-Bit Generalized Exploiting Modification Direction In 2012, Kuo et al. also proposed the multi-bit GEMD (MGEMD) [12] method to increase embedding capacity by using adaptive k. MGEMD can also choose different values of n to determine how many pixels in a group are used to hide secrets in k bits of each pixel and the ability to hide an extra pixel group
Appl. Sci. 2015, 5
1037
into one-bit information, i.e., it can embed the secret messages’ (nk + 1) bits. MGEMD’s extraction function is shown as Equation (3): n X fc (g1 , g2 , . . . , gn ) = [ (gi × ci )] mod 2nk+1
(3)
i=1
where the weight value of ci is: ( ci =
1, i = 1. k 2 ci−1 + 1, i = 6 1 and
(4)
i>0
For example, c1 = 1, c2 = 9, c3 = 73, c4 = 585 when k = 3, n = 4 from Equation (4). Obviously, the difference between the MGEMD scheme and GEMD scheme is that the modulus is changed from 2n+1 to 2nk+1 in order to increase embedding capacity for the MGEMD scheme. 3. The Proposed Scheme As a rule, the goals of data hiding techniques are security, capacity, robustness, imperceptibility, unambiguousness and non-removability, respectively. Data hiding techniques are focused on increasing embedding capacity and high stego image quality. Obviously, significant differences between the original cover image and stego image will be generated when the capacity of embedded secrets increases. Thus, how to enhance the embedding capacity while still maintaining the original stego image quality is a very important issue. In order to give a solution to this issue, a high embedding capacity and good image quality scheme is proposed in this section. 3.1. Multi-Bit Data Hiding Scheme for Compressing Secret Messages In data hiding schemes, unambiguousness means that the stego image was securely transmitted to the receiver and extracts the same secret message as embedded by the sender. To support this attribute, we need to employ a lossless compression method to allow us access to the original data. Fortunately, RLE is suitable, since it is simple, quick and lossless. There are three phases included in the proposed scheme. The flowchart of three phases (secret image compression phase, MGEMD phase and embedding phase) are shown in Figure 1. Secret messages
Max(runs)
2nk+1> Max(runs)
Cover image
Adaptive (n, k) Transform to binary stream
Run-length encoding
Encoded stream
Proposed scheme
MGEMD
(Q, R) R L mod 2 nk 1
Length of RLE (L)
Q
LR 2 nk 1
Figure 1. Flowchart of the proposed scheme.
Stego image
Appl. Sci. 2015, 5
1038
3.2. Secret Image Compression Phase In the proposed scheme, data compression is used to decrease the secret message size, which effectively increasing the embedding capacity. In order to minimize the cost of transmission in a limited bandwidth network, information needs to be compressed before delivery to improve transport efficiency. The proposed scheme is combined with MGEMD to increase embedding capacity. As a result, the multi-bit data hiding scheme for compressing secret messages has twice the embedding capacity. The maximum runs and total length information of RLE will be regarded as secret messages to hide in the cover image. Algorithm 1 . The multi-bit data hiding scheme for compressing secret messages. Input: A cover image and secret image (S) with gray level image 0 Output: A stego image (I ) Step 1. The gray level secret image is transformed into a binary stream. 0
Step 2. Compressing the binary stream by RLE for the new secret message (S ), which includes maximum runs, total length information and the embedding secret messages (s). Step 3. Check if the new secret is zero or one in the high-order bit. This information tells us the beginning bit is zero or one by using RLE. Then, the first pixel’s LSB of the cover image will be changed. Simultaneously, we also count the zeros and ones to record the total length information. Step 4. Find parameters (n and k), such as 2nk+1 > Max(runs) and n ≥ k. Quotient (Q) and remainder (R) are calculated from total length (L) information using Equation (5). R
=
Q
=
L mod 2nk+1 R−L 2nk+1
(5)
Step 5. For the second pixel to the last, decision variables n and k divide the pixels into n adjacent pixels (x1 , x2 , . . . , xn ) as a non-overlapping group. Step 6. Compute the value t = fc (x1 , x2 , . . . , xn ) and the difference D, i.e., D = (s − t) mod 2nk+1 . Step 7. If D = 0, then cover pixels do not change; 0 0 else if D = 2nk , then xn = xn + 2nk and x1 = x1 + 1. 0 nk else if D < 2 , then D = (dn−1 , dn−2 , . . . , d0 )2k , xi+1 = xi+1 + di 0
else if D >
2nk ,
and xi = xi − di for i = n − 1, n − 2, . . . , 0. 0 then D = 2nk+1 − D = (dn−1 , dn−2 , . . . , d0 )2k , xi+1 = xi+1 − di 0
and xi = xi + di for i = n − 1, n − 2, . . . , 0. Step 8. Repeat Step 6 to Step 7 until all secret messages are hidden. Example 1: Let n = 3 and k = 3. Given the cover image’s pixels (155, 155, 155, 158, . . .), secret messages s : (164, 91, 155, 247, . . .)10 . The secret messages are compressed with RLE and hidden in the cover pixels. Finally, we get the stego pixels = (156, 156, 161) from Algorithm 1 by the following steps. Step 1. Convert the secret message s = (164, 91, 155, 247, . . .)10 into the binary stream (1010010001011011 . . .)2 . Step 2. Use RLE to compress the secret stream, s = [1(1)0(1)1(1)0(2)1(1)0(3)1(1)0(1) 1(2)0(1)1(2)0(1)1(1)0(1)1(2)0(1)1(3)0(2)1(2) . . .]. The new secret s0 = (1112131121211121322 . . .) from the s value sequence and the RLE begins at one. Step 3. The least significant bit of the first pixel is equal to one, meaning it is not modified, i.e., the first pixel is still 155. Step 4. Compute the value t = fc (155, 155, 158) = 796 and D = (1 − 796) mod 210 = 229 = (345)8 . Step 5. Since 229 < 29 and (d2 , d1 , d0 )(3, 4, 5), we can compute the stego pixels by using the following equation. 0 0 For d2 = 3, compute x3 = 158 + 3 = 161 and x2 = 155 − 3 = 152; 0 0 For d1 = 4, compute x2 = 152 + 4 = 156 and x1 = 155 − 4 = 151; 0 For d0 = 5, compute x1 = 151 + 5 = 156. The stego pixels are (156, 156, 161).
Appl. Sci. 2015, 5
1039
3.3. Data Embedding Before embedding, secret messages must be transformed to a binary stream. Then, the binary stream uses RLE lossless compression to reduce the data size. Finally, the compressed binary stream is hidden by the MGEMD scheme. The algorithm of the proposed scheme is shown in Algorithm 1. 3.3.1. Speeding up the Modified Method In this subsection, we describe MGEMD features and then use these characteristics to speed up the embedding process. The MGEMD scheme groups the cover pixels into three categories for computation, i.e., D < 2nk , D = 2nk and D > 2nk . In order to speed up the embedding speed, we propose the embedding formulas shown as Tables 1 and 2 for D < 2nk and D > 2nk , respectively. Now, we assume that s1 = 5429 and s2 = 6643 when k = 3, n = 4 and have both cover pixels of (10, 19, 5, 9) in Tables 1 and 2, respectively. Table 1. Speeding up the embedding method when D > 2nk . Item
Formula
Example
s1 = 5429
D = (s − fc ) mod 2nk+1
D = 7810
2nk+1
382 = 8192 − 7810
D>
2nk
D=
−D
D10 → d2
d = (d3 d2 d1 d0 )8
(0576)8
(x1 , x2 , x3 , x4 )
(x1 , x2 , x3 , x4 )
(10, 19, 5, 9)
d3 d2 d1 d0
(0 − d3 )(d3 − d2 ) (d2 − d1 )(d1 − d0 )
(0 − 0)(0 − 5)(5 − 7)(7 − 6) 0, −5, −2, 1
(x1 + d0 , x2 + d1 , x3 + d2 , x4 + d3 ) (10 + 1, 19 − 2, 5 − 5, 9 + 0)
(11, 17, 0, 9)
0
0
0
0
(x1 , x2 , x3 , x4 )
Table 2. Speeding up the embedding method when D < 2nk . Item
Formula
Example
s2 = 6643
D = (s − fc ) mod 2nk+1
D = 832
D=D
832
D10 → d2
d = (d3 d2 d1 d0 )8
(1500)8
(x1 , x2 , x3 , x4 )
(x1 , x2 , x3 , x4 )
(10, 19, 5, 9)
d3 d2 d1 d0
(d3 )(d2 − d3 ) (d1 − d2 )(d0 − d1 )
(1)(5 − 1)(0 − 5)(0 − 0) 1, 4, −5, 0
(x1 + d0 , x2 + d1 , x3 + d2 , x4 + d3 ) (10 + 0, 19 − 5, 5 + 4, 9 + 1)
(10, 14, 9, 10)
D