110
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 58, NO. 2, FEBRUARY 2011
A Modified Chaos-Based Joint Compression and Encryption Scheme Jianyong Chen, Junwei Zhou, and Kwok-Wo Wong, Senior Member, IEEE
Abstract—An approach for improving the compression performance of an existing chaos-based joint compression and encryption scheme is proposed. The lookup table used for encryption is dynamically updated in the searching process. Once a partition not matched with the target symbol is visited, this and other partitions mapped to the same symbol are reallocated to a nonvisited symbol. Therefore, the target symbol eventually associates with more partitions and fewer number of iterations are needed to find it. As a result, expansion of the ciphertext is avoided, and the compression ratio is improved. Simulation results show that the proposed modification leads to a better compression performance, whereas the execution efficiency is comparable. The security of the modified scheme is also analyzed in detail. Index Terms—Chaos, compression, cryptography, simultaneous compression and encryption.
I. I NTRODUCTION
T
HE EFFICIENCY and security requirements of information transmission lead to a substantial amount of research work in data compression and encryption. In order to improve the performance and the flexibility of multimedia applications, it is worthwhile to perform compression and encryption in a single process [1]–[5]. In general, there are two distinct research directions in this area. One of them embeds key-controlled confusion and diffusion in source-coding schemes, whereas another incorporates compression in cryptographic algorithms. Some attempts of introducing key-controlled operations in entropy coding can be found in [1]–[4]. The approach based on multiple Huffman tables [1] simultaneously performs encryption and compression by a key-controlled swapping of the left and right branches of the Huffman tree. Some approaches such as randomized arithmetic coding [2] and key-based interval splitting [3], [4] were proposed to embed cryptographic features in arithmetic coding. The compression capability of these algorithms is close to that of traditional entropy coding. However, security and efficiency problems were found [6]–[8], and further modifications are required.
Manuscript received August 4, 2010; revised November 2, 2010; accepted December 21, 2010. Date of publication February 14, 2011; date of current version February 24, 2011. This work was supported by Shenzhen University Research and Development Fund under Grant 200903. This paper was recommended by Associate Editor G. Grassi. J. Chen and J. Zhou are with the Department of Computer Science and Technology, Shenzhen University, Shenzhen 518060, China (e-mail:
[email protected]). K.-W. Wong is with the Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong (e-mail:
[email protected]). Digital Object Identifier 10.1109/TCSII.2011.2106316
In recent years, there is an increasing trend of designing ciphers based on chaos. This is because chaotic systems are sensitive to the initial condition and the system parameters. These properties are desirable in cryptography. Moreover, the knowledge on chaos and nonlinear dynamics can be applied in the field of cryptography. A chaos-based cipher designed by Baptista [9] searches the plaintext symbol in the lookup table using a key-dependent chaotic trajectory and treats the number of iterations on the chaotic map as the ciphertext. However, it suffers from the problem of ciphertext expansion that the ciphertext length is usually about 1.5 to 2 times that of the plaintext length. An attempt was made in [5] to incorporate certain compression capability in the Baptista-type cipher by adaptively constructing the lookup table according to the probability of occurrence of the plaintext symbols. The ciphertext is not longer than the plaintext, but the compression ratio still has a distance from the source entropy. This is because the chaotic search trajectory frequently lands on the partitions corresponding to irrelevant source symbols, the number of iterations is larger than necessary, and the compression ratio is not close to the source entropy. In this brief, the model of sampling without replacement is adopted in the joint compression and encryption scheme proposed in [5], with the goal of improving the compression performance. The lookup table used for encryption is dynamically updated in the searching process. If a symbol visited by the chaotic trajectory is not the one being encrypted, it is removed from the current lookup table by assigning the phase space associated with this symbol to another nonvisited symbol. Since the target symbol eventually associates with a larger phase space, it can be searched with a higher chance. As a result, the number of iterations required for encryption is reduced. The ciphertext is shortened, and a better compression performance is achieved. The rest of this brief is organized as follows. In the next section, the problem of ciphertext expansion in the original cipher [5], [9] and its mathematical model are analyzed. The modified scheme is described in Section III. Simulation results and security analyses are presented in Sections IV and V, respectively. In the last section, some concluding remarks are given. II. C IPHERTEXT E XPANSION P ROBLEM Before the proposed scheme is described, a brief introduction of embedding compression in a chaos-based cryptosystem [5] is presented. This scheme can be considered as a hybrid
1549-7747/$26.00 © 2011 IEEE
CHEN et al.: MODIFIED CHAOS-BASED JOINT COMPRESSION AND ENCRYPTION SCHEME
cipher. Source symbols with high probability of occurrence are encrypted by searching in the dynamic lookup table, and this is called the search mode. Entropy coding is performed on the output of this mode. Then, both the entropy codewords and other less probable symbols are masked by a pseudorandom bitstream, and this is named the mask mode. The compression capability of this hybrid cipher is mainly contributed by the search mode as the mask mode does not lead to any reduction in plaintext length. Therefore, the more the symbols are encrypted in the search mode, the higher the compression ratio can be achieved. In the search mode, the plaintext symbol being encrypted is searched in the lookup table using a pseudorandom sequence generated by iterating a chaotic map from the keydependent parameters and initial condition. In [5], the logistic map is chosen as the underlying chaotic map. It has the form xn+1 = bxn (1 − xn )
(1)
where xn ∈ [0, 1] is the output at discrete time n = 0, 1, 2, . . .. The control parameter b should be a real number between 3.6 and 4 for generating chaotic output sequences. The lookup table is composed of the partitioned phase space of the chaotic map and the corresponding symbol mapping. The phase space is first divided into a number of equal-width partitions, each of which maps to a possible plaintext symbol. More probable symbols are assigned with more partitions so that they will have a higher chance to be visited by the secret searching trajectory. The length of the searching trajectory is equal to the number of iterations of the chaotic map, which is then taken as the ciphertext. The encryption process is similar to the following model: Encrypting the target plaintext symbol using the lookup table is equivalent to randomly fetching a symbol until the desired symbol is drawn. The time of drawing is equivalent to the length of the searching trajectory, i.e., the ciphertext. Here, a model of sampling with replacement is presented to analyze the ciphertext expansion problem. Suppose that there are four source symbols, i.e., A, B, C, and D, with probabilities of occurrences of 1/2, 1/4, 1/8, and 1/8, respectively. Hence, half of the partitions are mapped to symbol A, a quarter to symbol B. Symbols C and D each associates with 1/8 of the total number of partitions. In sampling with replacement, the procedure of encrypting symbol A corresponds to the geometric distribution in probability theory and statistics. It can be considered as the first success in getting A at the kth draw after the k − 1 failures. The total number of times k is the ciphertext for symbol A. The probability of drawing A from the table with replacement is given by (2), where [1 − P (A)]k−1 is the probability of failing to obtain A in the first (k − 1) times. The cumulative probability for the first k times, denoted as CPk (A), is given by (3). These expressions indicate that Pk (A) is close to zero and CPk (A) approaches 1 only if k tends to infinity. The ciphertext of A could be any value from one to infinity in theory. It could occupy a large number of bits in practice and leads to ciphertext
111
expansion in this type of chaos-based cipher [5], [9], Pk (A) = [1 − P (A)]k−1 P (A)
(2)
CPk (A) = P (A) + [1 − P (A)] P (A) + · · · + [1 − P (A)]k−1 P (A) = 1 − (1 − P (A))k .
(3)
III. M ODIFIED S CHEME A. Encryption Unlike the model of sampling with replacement as adopted in [5], here, we propose to use the model of sampling without replacement in updating the lookup table. The encryption procedures of the proposed scheme are described as follows. Step 1) Following the approach of [5], as described in Section II, construct the lookup table according to the probabilities of the occurrence of the source symbols. Step 2) Sequentially encrypt each symbol in the plaintext by searching in the lookup table using a secret chaotic trajectory. If the symbol being encrypted is found, the number of iterations of the chaotic map is considered as the ciphertext. Otherwise, the lookup table will be updated using a new process based on the model of sampling without replacement. If the partition just visited maps to a nontarget symbol, all the partitions associated with that symbol need to be reassigned to another symbol. With the considerations of compression ratio and simplicity of the encryption process, those partitions are assigned to the nonvisited symbol with the highest probability. However, it should be noticed that the partitions can be randomly assigned to other symbols to increase the difficulty of attack. In the next iteration, the chaotic trajectory continues to search the target symbol in the updated lookup table until the symbol is found. Step 3) When the current plaintext symbol has been encrypted, the lookup table is initialized again according to the symbols’ probabilities of occurrence, as described in Step 1). However, the exact partitions mapped to a symbol may be shifted [5] so that the lookup table for encrypting the next symbol may not be the same as the one for the current symbol. After that, the next symbol is encrypted using the same procedures in Step 2). These operations are repeated until all the symbols in the plaintext sequence have been processed. As the proposed scheme mainly focuses on the searching model, the other procedures remain the same as those described in [5] and are not repeated here. An illustration of the lookup table update process for the example source in Section II is shown in Fig. 1. The parameter b of (1) is arbitrarily set to 4.0, and the initial condition is randomly chosen as 0.33866355. Suppose that the target symbol is C and the first iterated value x1 of the chaotic map is 0.89588219, which lands on the partition mapped to the irrelevant symbol A. For that reason, all the partitions corresponding to symbol A should be assigned to symbol B in the second iteration, and
112
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 58, NO. 2, FEBRUARY 2011
up the plaintext symbol from the lookup table. If the number of iterations is smaller than the ciphertext, the chaotic search trajectory has not landed on the target symbol yet. Then, the same symbol replacement process needs to be performed so as to synchronize with the encryption part. With the correct key, the original plaintext sequence can be reconstructed. IV. P ERFORMANCE OF THE M ODIFIED S CHEME A. Efficiency Analysis
Fig. 1. Illustration of the proposed table update scheme. Only three iterations are needed to encrypt the symbol C; therefore, the ciphertext is 3.
Fig. 2. For comparison, the scheme used in [5] is presented. Four iterations are needed to encrypt the target symbol C, and therefore, the ciphertext is 4.
the lookup table is updated. In the next iteration, the second chaotic map output x2 is equal to 0.37310916. Unfortunately, it again does not land on a partition corresponding to symbol C. The searching process continues with all the partitions of B assigned to C, and the lookup table is updated again. The third iterated value x3 of the chaotic map is 0.93559486, which falls into the partition associated with the target symbol C. Therefore, the ciphertext is 3, and the process of encrypting the current symbol is complete. For comparison purpose, the original scheme [5] is illustrated in Fig. 2. After four iterations, the fourth iterated value x4 of the chaotic map is 0.24102849, and the chaotic trajectory lands on the phase space mapped to symbol C. The corresponding ciphertext is 4, which is larger than that of the modified scheme. Thus, the required number of iterations, i.e., the ciphertext value, is reduced in the new approach. B. Decryption The secret key consists of the initial value and the parameter of the chaotic map as well as the initial 32-bit mask block [5]. It must be secretly delivered to the receiver. In addition, the information of plaintext probability must be available to the receiver for reconstructing the lookup table. The decryption process is similar to the encryption one. The decoder regenerates the chaotic trajectory from the secret key and then looks
Due to the hitting of nontarget symbols, the required number of iterations in the original scheme [5] is large, and the compression performance is thus limited. In order to avoid the chaotic map trajectory falling into the partitions corresponding to the irrelevant symbols again, those partitions are released in the modified scheme by replacing the nontarget symbol visited in the previous step by another symbol. This process is similar to the model of sampling without replacement, and it follows the hypergeometric distribution for successfully obtaining a sample from a finite population at the kth time without replacement. Equation (4) is an expression of the probability of finding symbol A from the lookup table without replacement, where N is the total number of partitions, and M is the number of partitions corresponding to A. The cumulative probability CPk (A) of the first k draws is given by (5). This value is equal to 1 when k is (N − M + 1). This means that the maximum value of the ciphertext for each plaintext symbol is (N − M + 1). The pigeonhole principle also illustrates this fact. The number of iterations does not exceed the number of distinct plaintext symbols, and therefore, the ciphertext is guaranteed not longer than the plaintext. The modified scheme finds the target plaintext symbol faster than the original one [5] and results in a higher compression ratio, Pk (A) =
k−1 1 CM CN −M k kCN
1 CPk (A) = CM
k C s−1
N −M
s=1
s sCN
(4)
.
(5)
B. Simulation Results The proposed algorithm is implemented in C++ programming language running on a personal computer with an Intel Core 2 2.00-GHz processor and 2-GB memory. We follow the choices in [5] so as to make a fair comparison with its results. The logistic map is chosen as the underlying chaotic map. The parameter b is set to 3.999999991, whereas the initial condition is chosen as 0.3388. The maximum number of iterations is set to 15. To test the compression capability of the proposed scheme, the standard files from the Calgary Corpus [10] are used. There are 18 distinct files of different types, including text, executable geophysical data, and picture. Two simulation configurations are chosen. In the first configuration, only the top 16 probable plaintext symbols are selected, and all of them are stored in one table. In the second configuration, the top 128 probable symbols
CHEN et al.: MODIFIED CHAOS-BASED JOINT COMPRESSION AND ENCRYPTION SCHEME
113
TABLE I C IPHERTEXT- TO -P LAINTEXT R ATIO OF THE C ALGARY C ORPUS F ILES
TABLE II C OMPARISON OF E NCRYPTION AND D ECRYPTION T IMES
are chosen. They are distributed to 16 tables, and each table contains eight symbols. To make a comparison with the modified scheme, the compression results of the original scheme are directly extracted from [5], without reexecution. This is because the compression ratio is independent of the hardware computing platform. In addition, the files are compressed by the Huffman-coding scheme without encryption as a reference of the performance of traditional entropy coding. Table I lists the ciphertext-to-plaintext ratios of the three approaches. The data are smaller than 100%, which imply that all files can be compressed. However, the compression performance of our scheme is far better than that of [5]. For the 16-map case, the compression performance improves by 4.06%–14.62% with an average improvement of 10.96%. For the one-map case, the improvement falls between 4.41% and 16.85%, with the mean value of 12.54%. Furthermore, the compression performance is only 4% in average worse than that of the Huffman coding. This can be considered as the tradeoff of having additional cryptographic features in traditional entropy coding.
Moreover, the average encryption and decryption speeds are increased by 8.7% and 11.8%, respectively.
C. Encryption and Decryption Efficiency For the 16-map case, the encryption and decryption times of the proposed scheme are listed in Table II. Moreover, the original algorithm [5] is reexecuted on our computing platform, and the results are also presented in the same table. The data show that the modified scheme is slightly faster than the original approach [5]. The files are also compressed by the Huffman coding, followed by the 128-bit Advanced Encryption Standard (AES). The comparison results show that the proposed scheme needs less time in encrypting and decrypting 16 out of 18 files.
V. S ECURITY A NALYSES A. Key Space, Key, and Plaintext Sensitivities The key of the proposed scheme is composed of the control parameter b and the initial value x0 of the logistic map, together with the 32-bit initial cipher block. In the software implementation, b and x0 are represented by double precision format using 52 bits. The total key space can reach 136 bits and is comparable to 128-bit AES. However, b should be carefully chosen to avoid the nonchaotic regions with a negative Lyapunov exponent [11], [12]. A plot of the Lyapunov exponent against b is shown in Fig. 3. In the chaotic region with a positive Lyapunov exponent, the period of the output sequence can be considered as infinity. However, the actual period is limited by the computation precision. A constructive scheme for finding the maximum period of a chaotic map under limited precision is described in [13]. It is employed to find the period of the logistic map. The results show that all the periods in the chaotic region are far longer than 107 in double precision computation, which are sufficient for practical ciphers [13]. The key and plaintext sensitivities are evaluated as follows. The files from the Calgary Corpus [10] are encrypted using two sets of secret key with only a tiny difference. The two resultant ciphertext sequences are then compared bit by bit, and the percentage of difference bits is calculated. Ten tests are performed for the one-map and 16-map cases, respectively. The average values are listed in Table III. They show that the bit change percentages are very close to 50%, which justify the high sensitivity of the ciphertext to the key. To numerically
114
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 58, NO. 2, FEBRUARY 2011
Since the expansion of ciphertext is avoided, all the chosen probable symbols are encrypted in the search mode of the proposed scheme. Therefore, the security of our scheme is higher than that of [5]. VI. C ONCLUSION
Fig. 3. Plot of the Lyapunov exponent computed at increment of 0.0001 of the control parameter b in the interval [3.6, 4]. TABLE III AVERAGE K EY AND P LAINTEXT S ENSITIVITIES
The existing approach of embedding compression in the chaos-based cryptosystem suffers from the drawback of low compression performance. This can be explained by the model of sampling with replacement. A modified scheme based on the model of sampling without replacement has been proposed. As a result, the number of chaotic map iterations wasted for visiting irrelevant symbols is reduced. The compression capability is improved, which is close to that of conventional entropy coding. Moreover, the lookup table can be realized by memory chips or field programmable gate array, and therefore, the proposed joint compression and encryption scheme is easy to be implemented by hardware circuits. R EFERENCES
evaluate the plaintext sensitivity, a bit is changed at different positions of the plaintext sequence, which is then encrypted using the same key. The two resultant ciphertext sequences are compared bitwise. These operations are repeated 20 times, and the average values can be found in the rightmost column in Table III. The measured average bit changes are close to 50%, which imply that the ciphertext is very sensitive to the plaintext. These tests confirm the high key and plaintext sensitivities of our scheme. This is because the lookup table is disturbed by the plaintext. A tiny change in the plaintext affects not only the corresponding ciphertext block but also the encryption process. B. Other Security Issues Two encryption rounds are suggested in the original chaosbased joint compression and encryption scheme [5]. The search mode can be considered as a variant of the Baptista-type cryptosystem [9]. The mask mode is indeed a stream cipher that masks the plaintext by a pseudorandom bitstream. Its security is determined by the randomness of the occurrence of the numbers or bits in the mask stream [14]. The statistical test suite recommended by the U.S. National Institute of Standards and Technology [15] is employed to evaluate the randomness of the mask stream. In the test, 300 sequences, each of 1 000 000 bits, have been extracted. They all pass the statistical tests including frequency, block frequency, cumulative sums, runs, longest run, rank, and fast Fourier transform. All the P -values are larger than 0.01. Therefore, the sequences are considered as sufficiently random according to [15]. In the original scheme [5], most selected probable plaintext symbols are encrypted in the search mode, but some are encrypted in the mask mode together with low probable symbols.
[1] C. P. Wu and C. C. J. Kuo, “Design of integrated multimedia compression and encryption systems,” IEEE Trans. Multimedia, vol. 7, no. 5, pp. 828– 839, Oct. 2005. [2] M. Grangetto, E. Magli, and G. Olmo, “Multimedia selective encryption by means of randomized arithmetic coding,” IEEE Trans. Multimedia, vol. 8, no. 5, pp. 905–917, Oct. 2006. [3] J. Wen, H. Kim, and J. Villasenor, “Binary arithmetic coding with keybased interval splitting,” IEEE Signal Process. Lett., vol. 13, no. 2, pp. 69– 72, Feb. 2006. [4] H. Kim, J. Wen, and J. Villasenor, “Secure arithmetic coding,” IEEE Trans. Signal Process., vol. 55, no. 5, pp. 2263–2272, May 2007. [5] K. W. Wong and C. H. Yuen, “Embedding compression in chaos-based cryptography,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 55, no. 11, pp. 1193–1197, Nov. 2008. [6] J. Zhou, Z. Liang, Y. Chen, and O. C. Au, “Security analysis of multimedia encryption schemes based on multiple Huffman table,” IEEE Signal Process. Lett., vol. 14, no. 3, pp. 201–204, Mar. 2007. [7] G. Jakimoski and K. Subbalakshmi, “Cryptanalysis of some multimedia encryption schemes,” IEEE Trans. Multimedia, vol. 10, no. 3, pp. 330– 338, Apr. 2008. [8] J. Zhou, O. C. Au, and P. H. Wong, “Adaptive chosen-ciphertext attack on secure arithmetic coding,” IEEE Trans. Signal Process., vol. 57, no. 5, pp. 1825–1838, May 2009. [9] M. S. Baptista, “Cryptography with chaos,” Phys. Lett. A, vol. 240, no. 1/2, pp. 50–54, Mar. 1998. [10] [Online]. Available: ftp://ftp.cpsc.ucalgary.ca/pub/projects/text.compression. corpus [11] G. Alvarez and S. Li, “Some basic cryptographic requirements for chaosbased cryptosystems,” Int. J. Bifurcat. Chaos, vol. 16, no. 8, pp. 2129– 2151, 2006. [12] C. M. Ou, “Design of block ciphers by simple chaotic functions,” IEEE Comput. Intell. Mag., vol. 3, no. 2, pp. 54–59, May 2008. [13] T. Addabbo, M. Alioto, F. Fort, A. Pasini, S. Rocchi, and V. Vignoli, “A class of maximum-period nonlinear congruential generators derived from the Rényi chaotic map,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 4, pp. 816–828, Apr. 2007. [14] S. Tezuka, Uniform Random Numbers: Theory and Practice. Norwell, MA: Kluwer, 1995. [15] A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, S. Leigh, M. Levenson, M. Vangel, D. Banks, A. Heckert, J. Dray, and S. Vo, (2010, Apr. 27). A Statistical Test Suite for the Validation of Random Number Generators and Pseudo Random Number Generators for Cryptographic Applications, NIST Special Publication 800-22. [Online]. Available: http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html