JOURNAL OF NETWORKS, VOL. 7, NO. 8, AUGUST 2012
1239
Format-Preserving Encryption for Character Data Min Li College of Information Technical Science, Nankai University, Tianjin 300071, China Email:
[email protected]
Zheli Liu, Jingwei Li, Chunfu Jia College of Information Technical Science, Nankai University, Tianjin 300071, China Email:
[email protected] [email protected] [email protected]
Abstract—This paper presents FPE (Formatpreserving Encryption) for character data in both fixed-width and variable-width encoding. Previous researches only studied FPE for fixed-width character data. In this paper, FPE for character data is categorized into NPE (Number-preserving Encryption) and LPE (Length-preserving Encryption). The schemes related to NPE and LPE are proposed to encrypt fixed-width and variable-width character data, respectively. Furthermore, the paper provides a general solution for both data types. The security and efficiency of these schemes are analyzed and verified. Index Terms—format-preserving encryption, numberpreserving encryption, length-preserving encryption, Feistel network. I. INTRODUCTION FPE (Format-preserving Encryption) is used to encrypt plaintext into ciphertext which has the identical format as the original text. FPE for character data has many applications, such as database encryption and sensitive data protection in network transmission. It has some advantages over traditional block ciphers that lead to expanded data size and changed format. For example, when a 16digit credit card number is encrypted, the ciphertext generated with AES will be a block of binary data, but the ciphertext with FPE will be another 16-digit number. The FPE problem was first identified in 1981 [1]. Until 2009 Ballare [2] developed the formal definition and security goals. The basic methods for integer domain X n provided by Black and Rogaway [3] are prefix, cycle-walking, and generalized-Feistel. Other FPE methods are FFSEM [4], RtE [2], FFX [5], etc. RtE and FFX are suitable for character domain de-
Corresponding author: Zheli Liu. E-mail:
[email protected]
© 2012 ACADEMY PUBLISHER doi:10.4304/jnw.7.8.1239-1244
scribed as Chars n , a string set where the Chars is a finite character set, and n is the number of characters of the string in Chars n . These modes transform string encryption into integer FPE by establishing bijective relationship between string and integer. The FPE for Chars n described above is called NPE (Number-preserving Encryption), which requires that the character number of ciphertext must be equal to the number of plaintext. NPE is only suitable for fixed-width encoding character data. The NPE schemes can not solve the FPE problem for all types of character data in databases. In many cases, databases store character data in variable-width. For example, if UTF-8[6] is used, an English character is one byte long while a Chinese character takes three bytes. After an 7 bytes string “nkChina” is encrypted by NPE schemes, the ciphertext may become seven Chinese characters that have 21 bytes. This process will cause memory overflow. Therefore, FPE for variable-width character data requires preserving storage length. FPE in this case can be named as LPE (Length-preserving Encryption), which requires that the storage length of ciphertext must be equal to that of plaintext. LPE is suitable for variablewidth encoding character data. In this paper, character data is divided into fixed-width encoding and variable-encoding, and FPE for character data is categorized into NPE and LPE for encrypting two types of character data respectively. Schemes corresponding to NPE and LPE are presented. Furthermore, a general solution that can be applied to both fixed-width and variable-width encoding character data is proposed. II. PRELIMINARY A. Definition of FPE Definition 1(Format-preserving Encryption). Formatpreserving encryption can be considered as a cipher [2]: E:K T X X Where the sets K , , T and X are the key space, format space, tweak space, and domain. They are nonempty sets and X . For each , it corresponds to a set X which is a subspace of X . The set X means -
1240
indexed slice of domain. Each slice must be a finite set. Briefly stated, an FPE cipher on a domain X can be considered as a permutation which is made up of subpermutation on each slice of X . Three algorithms are used to describe FPE: fpe = (setup, encryption, decryption) Algorithm setup: Initializing parameters. Typically, initialization includes the following: problem domain X , arguments for symmetric encryption, such as the round of Feistel network, symmetric key k , etc. Algorithm encryption: Inputting tweak t and plaintext x and outputting ciphertext y Ek ,t ( x) or ⊥. Algorithm decryption: An inverse process of encryption. Inputting tweak t and ciphertext y and outputting plaintext x when and only when Ek ,t ( x) y . B. FPE Modes RtE (Rank-then-encipher) is a general FPE mode proposed by Bellare [2] in 2009. This researcher demonstrates how to convert FPE problem on a complex domain into that in an integer set through rank and unrank algorithm. RtE can solve FPE problem on arbitrary regular language. All strings of various encodings can be expressed as regular expression, so RtE can be used to encrypt character data. In 2010, Bellare [5] proposed FFX mode which is combined with tweak feature [7]. FFX can solve FPE problem on Chars n by establishing a bijective mapping between Chars n and an integer set that converts FPE on Chars n into integer FPE. Both RtE and FFX use transformation to simplify FPE on an original complex domain into FPE on an integer set. These methods are named as CtE (Coding-thenencipher) [8], which consists of two basic components: Coding: using a reasonable coding manner to construct a bijection between some original domain and an integer set. Integer FPE: using secure integer FPE scheme, such as prefix [3] and FFSEM [4] to solve FPE problem in mapped integer set. Coding component consists of encoding operation and decoding operation. Encoding operation maps an item on domain Chars n into some integer while decoding operation is contrary to the process of encoding operation. RtE and FFX belong to the CtE based on their encryption algorithm. The encryption process of RtE, FFX and CtE is a pseudorandom permutation on domain X and the number of characters can be preserved after encryption. These FPE modes can only achieve NPE for fixed-width encoding character data. Ⅲ. DEFINITION OF FPE FOR CHARACTER DATA
According to the definition of FPE in previous literatures, character encoding and FPE for character data are defined in current study as:
© 2012 ACADEMY PUBLISHER
JOURNAL OF NETWORKS, VOL. 7, NO. 8, AUGUST 2012
Definition 2 (Character Encoding). A character encoding can be described as : Chars Bins . Chars is the character set and Bins is the binary code set. There are two bijective functions of Chars and Bins : ① Bins Encode(Chars) is the mapping method of each character in Chars into its binary code in Bins . ② Chars Decode( Bins ) is the mapping method of each binary code in Bins into a character in Chars . For character encoding : Chars Bins , size() means the byte length of a character, If char1 , char2 Chars , size(Encode(char1 )) size(Encode(char2 )) , then is fixed-width character encoding. For example, the storage length of every character encoded in UCS-2 is two bytes. If char1 , char2 Chars , size(Encode(char1 )) size(Encode(char2 )) , then is variable-width character encoding. For example, an ASCII character in UTF-8 is one byte, while a Greece character has two bytes, and a Chinese character has three bytes.
Due to different encoding methods of character data, FPE for character data can be described as follows: Definition 3 (FPE for Character Data). FPE for character data can be described as E K ,T () on domain X Chars n . For domain Chars n , Chars is a character set in given character encoding . In the format space {1, 2} , 1 is the format of the number of characters, 2 is the format of the storage length of character data. The domain X is divided into two slices X 1 and X 2 which are defined by 1 and 2 respectively. FPE for character data has two cases: ① FPE on X 1 can be named as NPE (Number-preserving Encryption), NPE can only meet the need of encryption for fixedwidth encoding character data. ② FPE on X 2 can be named as LPE (Length-preserving Encryption), LPE can meet the need of encryption for variable-width encoding character data.
Ⅳ. FPE SCHEMES FOR CHARACTER DATA A. NPE Scheme A scheme of NPE based on CtE is presented here to solve the above mentioned problem. 1) Algorithm The algorithm of this scheme can be described as follows: Algorithm setup: Determining the character encoding and the domain Chars n which is determined by ; Imposing a total order of Chars n and randomly select an element in Chars n as a reference value ref ;
JOURNAL OF NETWORKS, VOL. 7, NO. 8, AUGUST 2012
1241
Determining the coding algorithm which is used to bijective map between the elements of Chars n and integers; Determining integer FPE mode and its parameters. Algorithm encryption: The process for encrypting plaintext string x Chars n
is: (1) Encoding x to integer Cx according to reference value ref by Cx encoding ( x, Chars n , ref ) ; (2) Implementing secure integer FPE on Z|chars n | in order to encrypt Cx Cy integerFPE(Cx, k , t ) ; (3) Decoding Cy
into
to string
integer y
Cy
in Chars n
by by
y decoding (Cy, Chars , ref ) . Encoding algorithm and decoding algorithm are shown in Fig.1 and Fig.2. n
input: x, Chars n , ref output: Cx begin Cx (| Chars n | px pref ) mod | Chars n | ;
// | Chars n | denotes number of elements in Chars n // px denotes the position of x in Chars n under order return Cx ; end
Fig.1. Encoding algorithm of the NPE scheme
input: Cy, Chars n , ref output: y begin Cy (Cy pref ) mod | Chars n | ; return y ; // y is the Cy - th string in Chars n end
In terms of efficiency, the encoding and decoding algorithm of this scheme has similar time complexity to FFX, and is more efficient than rank and unrank of RtE [8]. B. LPE Scheme A LPE scheme for variable-width encoding character data is proposed. By splitting a variable-width character set into several subsets according to the storage lengths of characters, LPE can be transformed into NPE. So the schemes mentioned previously, such as RtE, FFX and CtE can be used. In order to enhance security, shuffle mechanism is adopted to hide the character sequence. Two typical shuffle algorithms that can be used here are Knuth shuffle [9] and thorp shuffle [10]. 1) Algorithm The algorithm of the LPE scheme can be described as follows: Algorithm setup: According to storage length, splitting a variablewidth encoding character set Chars into several
Algorithm decryption: The decryption process is the reverse of the encryption algorithm process. 2) Security and Efficiency Typically, the basic module of symmetric cryptography is block cipher and pseudorandom permutation. So an important security goal of symmetric cryptography is pseudo-randomness. FPE is a specific type of symmetric cryptography. In 2002, Black and Rogaway firstly described the classical security goal of FPE: PRP (Pseudorandom Permutation) [3], which requires that adversary cannot establish a difference between an FPE cipher and a random permutation on domain. The basic module of this scheme is integer FPE, so security of the scheme depends on integer FPE. If secure integer FPE algorithm is applied, the scheme will attain PRP security [8].
© 2012 ACADEMY PUBLISHER
m i 1
subcharsi split(Chars ) ,
where
subcharsi is the subset composed of i-bytes characters; Determining storage length n of the domain Chars n ; Determining NPE scheme and its parameters; Determining shuffle algorithm. Algorithm encryption: Taking plaintext string x , encryption algorithm Ek ,t ()
of NPE, shuffle algorithm shuffle() as input, and ciphertext string y as output, where string x and y are both the strings in Chars n , the encryption process is given in Fig.3. input: x, Ek ,t (),shuffle() output:
Fig. 2. Decoding algorithm of the NPE scheme
subsets:
y
begin
m
x split( x), p[ z ] position( x);
i 1 i
yi Ek ,t ( xi ); / / i {1, 2,..., m} y ' merge( i1 yi ); m
y shuffle( y '); return y ; end Fig.3. Encryption algorithm of the LPE scheme
It splits x into sub-strings xi , each with the same length. The original position of each character in x is recorded as p[ z ] (given the character number of x be z ). Then it encrypts xi into yi by the selected NPE mode on subcharsizi (given the character number of xi
be
zi , z i 1 zi ). Finally, it merges yi into y and shufm
fles y ' to the resulted ciphertext y .
'
1242
JOURNAL OF NETWORKS, VOL. 7, NO. 8, AUGUST 2012
Algorithm decryption: Taking ciphertext y , decryption algorithm Dk ,t () of NPE, unshuffle() , and p[ z ] as input, and plaintext x as output, the decryption process is given in Fig.4. input: y, Dk ,t (), unshuffle(), p[ z ] output: x begin y ' unshuffle( y );
m i 1
yi split ( y ');
xi Dk ,t ( yi ); / / i {1, 2,...m} x merge( i 1 xi , p[ z ]); m
return x ; end Fig.4. Decryption algorithm of the LPE scheme
It unshuffles plaintext y to y ' , splits y ' into substrings yi in same length, decrypts yi on subcharsiki into xi , and finally merges xi into plaintext x according to p[ z ] . 2) Security and Efficiency Existing FPE modes are applied to sub-strings in this scheme, and these FPE modes reaches PRP security [2, 5, 8] level. However, the scheme preserves the number of characters of string and sub-strings. It might lead to the revelation of some information to attackers. Therefore secure shuffle algorithm is used to hide sequence and position of characters in plaintext string in order to enhance security. The scheme needs m times encryptions of NPE, where m is the number of sub-character sets. Denoting encryption time as mTe and shuffle time as Tshuffle , the main
binary code set Bins , and their bijective mapping function Bins Encode(Chars) , Chars Decode( Bins ) ; Determining storage length n of the domain Chars n , where n is the number of bits; Determining Feistel network and its parameters, such as block size l , key k , tweak t , pseudorandom function f k () and round number r ; Determining initialization vector IV of CBC mode. Algorithm encryption: Taking plaintext string x Chars n and its storage length n , the type of Feistel network and its block size l , key k , tweak t , IV , pseudorandom function f k () and
round number r as input, and ciphertext string y Chars n as output, the encryption process is given in Fig.5. input: x, n, k , t , f k (), r , IV , l output y begin bx Encode( x); if( n l ) do
by Feisteln ,k ,t , f ,r (bx); bx by; until by Bins n else
n / l 1 i 1
for i 1 to n / l 1 do byi Feistell ,k ,t , f ,r (byi 1 bxi );
bxi byi ;
running time of encryption is mTe Tshuffle . So this
until byi Bins n endfor
scheme has stable and high efficiency. C. General FPE Scheme for Character Data A general FPE scheme that works on both fixed-width and variable-width encoding character data is proposed as a consistent solution to the problem. A string can be encrypted through encrypting its binary code. One way is to construct a symmetric cipher with suitable block size based on Feistel network [11]. For fixed-width or variable-width character encoding standards like GB18030, UCS-2, and UTF-8, the binary codes belong to valid range corresponding to character set, so encryption process requires a combination of cycle-walking[3]. Because of uncertain performance of cycle-walking, CBC mode [12] is used to encrypt long plaintext in order to improve efficiency. 1) Algorithm The algorithm of this general scheme is described as follows: Algorithm setup: Determining the character encoding : Chars Bins , including character set Chars ,
© 2012 ACADEMY PUBLISHER
bxi split (bx);
by0 IV ;
n / l 1
by merge( i 1 by ); i
endif y Decode(by ); return y ; end
Fig.5. Encryption algorithm of the general scheme
At first, it maps string x Chars n into binary code bx Bins n ,where Bins n is the binary string set which is composed of binary code within Bins of and n is the number of bits. If n is not bigger than l , it encrypts bx based on Feistel network combined with cycle-walking until by is a legal binary string in Bins n . Then it maps by into y Chars n by Decode() , and y is the final
ciphertext string. If n is bigger than l , it split bx into several blocks bxi , execute Feistel network combined with cycle-walking in CBC mode and get ciphertext byi of each block. It merges them into by , and maps by into
JOURNAL OF NETWORKS, VOL. 7, NO. 8, AUGUST 2012
y Chars n by Decode() , and y is the final ciphertext string. Algorithm decryption: The decryption process is the reverse of the encryption algorithm process. 2) Security and efficiency The basic encryption module of the scheme is Feistel network. Feistel network is a classical symmetric structure used in the construction of block ciphers. It has been proved secure due to the use of pseudorandom function and enough rounds: Luby and Rackoff [11] have proved that when m 2 n / 2 , the scheme is secure against chose ciphertext attacks(CCA) (here m is the number of plaintext/ciphertext pairs and n is the Feistel block size); Patarin [13] further showed that if Feistel network is run with a sufficient number of rounds, the number of ciphertext/plaintext pairs needed by an attacker approaches the theoretical maximum, which is the square root of the size of the entire plaintext; Hoang and Rogaway [14] proved beyond-birthdaybound security for the generalized Feistel network. They showed CCA-security of generalized Feistel network against 2n (1 ) queries for any 0 (here n is the Feistel block size). Furthermore, both cycle-walking and CBC mode will not degrade the security of the scheme [3, 15]. As characters' binary codes belong to valid range of character set defined by character encoding, the scheme must ensure that the ciphertext would be within legal range through cycle-walking. However, cycle-walking has uncertain performance after multiple times of encryption. Plaintext of large size would result in so many times of encryption that lower the efficiency. So this scheme adopts CBC mode to encrypt large plaintext in order to reduce iteration and improve efficiency. The scheme is tested with an experiment within which balanced Feistel network is used and its block size depends on the length of plaintext. Pseudorandom function is constructed by truncating the output of AES. The round number of Feistel network is set to six to ensure adequate security. The experiment randomly selects UCS-2 strings of fixed-width encoding and UTF-8 strings of variablewidth encoding as plaintext, encrypts them many times, and finally calculates the average encryption time as the evaluation of algorithm efficiency. The results of experiment for UCS-2 strings are shown in Table I. TABLE I. Encryption time for UCS-2 strings of the general scheme Length of Encryption Length of Encryption Length of Encryption plaintext time plaintext time plaintext time (bit) (ms) (bit) (ms) (bit) (ms) 8
0.12
88
0.31
168
0.45
16
0.13
96
0.32
176
0.52
24
0.15
104
0.35
184
0.47
32
0.15
112
0.29
192
0.49
© 2012 ACADEMY PUBLISHER
1243
40
0.27
120
0.34
200
0.52
48
0.26
128
0.36
208
0.59
56
0.24
136
0.46
216
0.56
64
0.21
144
0.43
224
0.59
72
0.23
152
0.42
232
0.58
80
0.31
160
0.47
240
0.57
The results of experiment for UTF-8 strings are shown in Table II. TABLE II. Encryption time for UTF-8 strings of the general scheme Length of Encryption Length of Encryption Length of Encryption plaintext time plaintext time plaintext time (bit) (ms) (bit) (ms) (bit) (ms) 8 0.16 88 0.34 168 0.63 16
0.15
96
0.34
176
0.63
24
0.13
104
0.47
184
0.67
32
0.16
112
0.37
192
0.69
40
0.31
120
0.36
200
0.74
48
0.47
128
0.63
208
0.73
56
0.31
136
0.42
216
0.74
64
0.47
144
0.47
224
0.69
72
0.31
152
0.43
232
0.74
80
0.33
160
0.52
240
0.69
Experiment data showed in Table I and Table II indicate: Encryption time gradually increased from 0.12ms to 0.59ms with the length of plaintext when encrypting UCS-2 strings; Encryption time gradually increased from 0.13ms to 0.74ms with the length of plaintext when encrypting UTF-8 strings; With the increase of plaintext length, the efficiency is reduced slowly, but within acceptable range. According to the above analysis, the authors conclude that this general scheme is suitable for both fixed-width and variable-width encoding character data and has relatively high and stable efficiency. Ⅴ. CONCLUSION
FPE encrypts plaintext into ciphertext with the identical format as the original text. This feature makes it have some advantages over traditional block ciphers in some applications like credit card encryption. Previous researches only studied FPE for fixed-width encoding character data. In this paper, FPE for character data are categorized into NPE and LPE. A NPE scheme based on CtE is described and a LPE scheme for variable-width encoding character data is proposed. The LPE scheme splits plaintext space into several sub-character sets and executes FPE of fixed-length encoding character data on each subset respectively. Furthermore, a general FPE scheme which can encrypt both fixed-width and variable-width encoding character data is proposed. In this scheme, binary code of string is encrypted by constructing block cipher based on Feistel network. Cycle-walking and CBC
1244
JOURNAL OF NETWORKS, VOL. 7, NO. 8, AUGUST 2012
mode are adopted to ensure that ciphertext is in valid range. In addition, the security and efficiency of these schemes are analyzed and verified. ACKNOWLEDGMENT This work was supported by National Natural Science Foundation of China (No. 60973141), National Science Foundation of Tianjin, China (No. 09JCYBJ00300), the Fundamental Research Funds for the Central Universities and Specialized Research Fund for the Doctoral of Higher Education of China (No. 20100031110030). REFERENCES [1]
[2]
[3]
[4]
National Bureau of Standards. FIPS PUB 74. Guidelines for Implementing and Using the NBS Data Encryption Standard[S], 1981. Bellare M, Ristenpart T, Rogaway P, et al. Formatpreserving encryption[C]. Selected Areas in Cryptography (SAC 2009). Berlin:Springer ,2009. Black J, Rogaway P. Ciphers with Arbitrary Finite Domains[C]. Topics in Cryptology – CT-RSA ’02, Springer, 2002:114–130. Spies T. Feistel Finite Set Encryption Mode[EB/OL]. http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/pro posedmodes/ffsem/ffsem-spec.pdf , 2008. Bellare M, Rogaway P, Spies T. The FFX mode of operation for format-preserving encryption[EB/OL]. http://www.csrc.nist.gov/groups/ST/toolkit/BCM/ documents/proposedmodes/ffx/ffx-spec.pdf , 2011. Network Working Group. Request for Comments: 3629. UTF-8, a transformation format of ISO 10646[EB/OL]. http://www.ietf.org/rfc/rfc3629.txt , 2003. Liskov M, Rivest R, Wagner D. Tweakable block ciphers[C]. CRYPTO 2002. Springer, 2002: 31-46 . Liu Zhe-li, Jia Chun-fu, Li Jing-wei. Research on the Format-preserving Encryption Modes[J]. Journal on Communications, 2011, 32(6): 184-190(in Chinese) . Ulf Mattsson. Format Controlling Encryption Using Datatype Preserving Encryption[EB/OL]. http://eprint.iacr.org/2009/257 , 2009. Morris B, Rogaway P, Stegers T. How to Encipher Messages on a Small Domain[C]. Advances in Cryptology–CRYPTO’09 , 2009. Luby M, Rackoff C. How to Construct Pseudorandom Permutations and Pseudorandom Functions[J]. SIAM Journal on Computing, 17(2), 1988: 373-386. William F. Ehrsam, Carl H. W. Meyer, John L. Smith, Walter L. Tuchman. Message verification and transmission error detection by block chaining[P], US Patent 4074066 , 1976. Patarin J. Security of random Feistel schemes with 5 or more rounds[C]. Cryptology-CRYPTO’04. Berlin:Springer, 2004: 135-158. Hoang T, Rogaway P. On generalized Feistel networks[C]. LNCS 6223: 30th Annual International Cryptology Conference. Berlin: Springer, 2010: 613-630. Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone. Handbook of Applied Cryptography[M]. CRC Pre ss. ISBN 0-8493-8523-7, 1996. H
H
[5]
[6]
H
[7] [8]
[9]
H
[10]
[11]
[12]
[13]
[14]
[15]
H
H
Min Li got her B.E. degree in 1998 and M.E. degree in 2005 in Control Theory and Control Engineering from Nankai University, China. Currently, she is a Lecturer and a Ph.D
© 2012 ACADEMY PUBLISHER
candidate at Nankai University, China. Her current research interest is applied cryptography. Zheli Liu got his M.E. degree in System Architecture in 2005 and his Ph.D in Computer Application in 2009 from Jilin University, China. He was a postdoctoral fellow at Nankai University, China from 2009 to 2011. He is currently a Lecturer at Nankai University, China. His current research interests include applied cryptography and card operation system. Jingwei Li got his B.S. degree in Mathematics in 2005 from Hebei University of Technology, China. Now he is a Ph.D candidate in Computer Applications Technology at Nankai University, China. His current research interests include applied cryptography and network security. Chunfu Jia got his Ph.D in Control Theory and Control Engineering from Nankai University in 1996. He was a postdoctoral fellow at University of Science and Technology of China. Now he is a professor at Nankai University, China. His current research interests include computer system security, network security, trusted computing and malicious code analysis, etc.