Tiziano Bianchi, Alessandro Pivaâ. Dept. Elettronica e Telecomunicazioni,. Universit`a di Firenze,. Via S. Marta 3, I-50139, Firenze, Italy. Mauro Barni. Dept.
EFFICIENT LINEAR FILTERING OF ENCRYPTED SIGNALS VIA COMPOSITE REPRESENTATION Tiziano Bianchi, Alessandro Piva∗
Mauro Barni
Dept. Elettronica e Telecomunicazioni, Universit`a di Firenze, Via S. Marta 3, I-50139, Firenze, Italy
Dept. Ingegneria dell’Informazione, Universit`a di Siena, Via Roma 56, 53100, Siena, Italy
ABSTRACT Signal processing tools working on encrypted data provide an efficient solution when sensitive signals must be protected from an untrusted processing device. In this paper, we investigate an issue usually neglected in the proposed solutions for secure signal processing, that is the huge size augmentation from the plaintext to the samplewise encrypted representation of signals, due to the use of cryptosystems operating on very large algebraic structures. A composite signal representation is proposed that allows to speed up linear filtering on encrypted signals via parallel processing and to reduce the size of encrypted signals. A case study is proposed and discussed. Index Terms— Secure Signal Processing, Signal Processing in the Encrypted Domain, Signal representation, Homomorphic Encryption, Parallel Processing. 1. INTRODUCTION The possibility of processing encrypted signals directly in the encrypted domain (signal processing in the encrypted domain, or s.p.e.d.) is receiving an increasing attention from the cryptographic and signal processing communities as a viable solution to satisfy the security requirements of applications wherein valuable or sensible signals has to be processed by a non-trusted party [1]. Several applications would benefit from the availability of s.p.e.d. tools [2]: access to a database containing encrypted data or signals, database access by means of encrypted queries, remote processing of private data, transcoding of encrypted contents. Processing of encrypted signals is indeed feasible by relying on probabilistic homomorphic encryption [3] and secure multiparty computation (MPC) [4]. In this paper we focus on techniques based on homomorphic encryption, since they constitute the basis for any practical implementation of s.p.e.d. theory. A cryptosystem is said to be homomorphic with respect to an operation ⋆, if another operation ◦ exists such that, given two plaintexts m1 and m2 , we have: D[E[m1 ] ◦ E[m2 ]] = m1 ⋆ m2 ,
(1)
where D and E indicate, respectively, the decryption and encryption operators. In other words, the application of the ⋆ operation in the ∗ The work described in this paper has been partially supported by the European Commission through the IST Programme under Contract no 034238 - SPEED and by the Italian Research Project (PRIN 2007): “Privacy aware processing of encrypted signals for treating sensitive information”. The information in this document reflects only the author’s views, is provided as is and no guarantee or warranty is given that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability.
plain domain corresponds to the application of the ◦ operation in the encrypted domain. Additively homomorphic cryptosystems, for which ⋆ = + and ◦ = ·, play a central role in s.p.e.d. theory. For such systems we have: D[E[m1 ] · E[m2 ]] = m1 + m2 , D[E[m1 ]c ] = c · m1 ,
(2)
where c is a constant factor, hence allowing the application of many basic signal processing tools directly in the encrypted domain [5, 6]. A problem with the use of homomorphic encryption is that signals need to be encrypted sample-wise [5]. Samplewise encryption poses severe complexity problems since it introduces a huge size augmentation between the original signal sample and the encrypted one. Let us assume that the Paillier cryptosystem is used [7]; in this case each encrypted sample is an element of ZN 2 , i.e. the set of integer numbers modulo N 2 with N being at least 1024 bit long, that is each encrypted sample needs at least 2048 bits to be represented. By considering that plain signal samples are usually represented by a few bits (e.g. 8 bits for images or 20 bits for ECG signals), we conclude that due to encryption, signals are expanded by a factor ranging from 100 to 250. Such a huge size augmentation is clearly not affordable in many application scenarios. In [8], a compositite representation of signals that permits to greatly reduce the size augmentation due to encryption, while still allowing the exploitation of the homomorphic properties of the underlying cryptosystem to process signals in the encrypted domain, has been proposed. The main idea behind such a representation is to pad multiple data samples to form a composite encrypted message. As a simple example, let us consider l-bit signal samples. We can bundle R l-bit messages m1 . . . mR within a single composite message x as follows: x = m1 · 20 + m2 · 2L + . . . + mR · 2L(R−1) .
(3)
If L is larger than or equal to l, samples will remain distinct in the composite representation. In this paper, such a representation is further investigated and used to derive efficient solutions for the implementation of linear filtering. Though some simple implementations were proposed in [8], the problem of linear filtering was not thorougly addressed. In such a case, alternative implementations can be considered in order to exploit the opportunities offered by the parallelism provided by the composite signal representation. Also, the composite representation suggested in equation (3) presents a number of problems that need to be tackled with, as the security aspects and the possibility of processing encrypted composite signals by relying on homomorphic
M a(n)
...
PR−1 i R −1 where ωQ = Q i=0 B = Q BB−1 . Moreover, the original samples can be obtained from the packed representation as n o ai (k) = [aP (k) + ωQ ] ÷ B i mod B − Q. (8)
...
...
aP(k)
R
(a)
Proof : let us express
R
...
a(n)
ap (k) + ωQ =
aP(k)
Fig. 1. Graphical representation of a packed representation having order R: (a) M -polyphase packed representation; (b) block packed representation. Identically shaded boxes indicate values belonging to the same packed word.
encryption, which is not a trivial task if we want to allow computations with sign. In the sequel, all the aforementioned issues will be discussed, with particular emphasis to the different possible implementations of linear filtering. Namely, both a convolution approach and an overlapadd/save implementation are proposed and compared, showing the great potentialities offered by the new composite representation of signals. 2. COMPOSITE REPRESENTATION OF SIGNALS Let us consider an integer valued signal a(n) ∈ Z, satisfying |a(n)| ≤ Q, where Q is a positive integer. Given a couple of positive integers B, R, we define the packed representation of a(n) of order R and base B as aP (k) =
[aj (k) + Q] B j .
(9)
j=0
(b)
R−1 X
R−1 X
ai (k)B i
(4)
Thanks to the properties of a(n) and (5), we have 0 ≤ aj (k) + Q ≤ 2Q ≤ B − 1. Hence, ap (k) + ωQ can be considered as a positive base-B integer whose digits are given by aj (k) + Q. Moreover, since ap (k) + ωQ has R digits, it is bounded by ap (k) + ωQ ≤
R−1 X j=0
(B − 1)B j = B R − 1 < N
(10)
where the last inequality comes from (6). As to the second part of the theorem, let us express ap (k)+ωQ = B i
R−1 X
[aj (k)+Q]B j−i +
i−1 X
[aj (k)+Q]B j . (11)
j=0
j=i
Thanks to the properties of aj (k) + Q, we have Q]B j ≤ B i − 1. Hence [ap (k) + ωQ ] ÷ B i = =B
R−1 X
R−1 X
Pi−1
j=0 [aj (k)
+
[aj (k) + Q]B j−i
j=i
[aj (k) + Q]B j−i−1 + ai (k) + Q
j=i+1
i=0
where ai (k), i = 0, 1, . . . , R − 1 indicate R disjoint subsequences of the signal a(n). Here, aP (k) represents a word containing R signal samples, chosen according to a partitioning of the original signal samples a(n) into R sets of M samples each. In the following, we will consider two partitions: 1) ai (k) = a(iM + k); 2) ai (k) = a(kR + i). The first case corresponds to consider the M th order polyphase components of signal a(n), i.e., divide the signal in M periodically interleaved subsequences, and will be referred to as M -polyphase packed representation (M PPR). The second case corresponds to partitioning the signal a(n) in adjacent blocks having size R and will be referred to as block packed representation (BPR). A graphical interpretation of M -PPR and BPR is provided in Fig. 1. In order to use the packed representation for the parallel processing of an encrypted signal, we must first establish some properties. These are given by the following theorem: Theorem 1 Let us assume B > 2Q
(5)
BR ≤ N
(6)
0 ≤ aP (k) + ωQ < N
(7)
where N is a positive integer. Then, the following holds:
from which (8) is demonstrated. 2.1. Packing and unpacking operations Let us now analyze the possibility to perform packing and unpacking operations when the data have been encrypted by the data owner P1 . The first part of Theorem 1 demonstrates, first of all, that the packed representation can be safely encrypted using a homomorphic cryptosystem defined on modulo N arithmetic: in fact, as long as the hypotheses of the theorem hold, the packed data aP (k) takes no more than N distinct values, so its value can be represented modulo N without loss of information. As to the security of composite signal encryption, since we work with a semantically secure cryptosystem [3] (e.g., Paillier), the security is automatically achieved. If the original signal samples a(n) have been encrypted samplewise by P1 using an additive homomorphic cryptosystem, the encryption of the packed representation can be computed directly in the encrypted domain by another party P2 , by applying (4) and exploiting the homomorphic properties of the cryptosystem. The unpacking operation instead can not be carried out in the encrypted domain by means of homomorphic computations by P2 , since the conversion from packed to samplewise representation requires rounding and division. Then unpacking has to be carried out by the data owner P1 , or performed by means of a properly designed interactive protocol involving P1 and P2 .
Ba (k)
3. PROCESSING ENCRYPTED COMPOSITE SIGNALS: LINEAR FILTERING 3.1. Convolution The output of a linear filter having impulse response hn when the input is the sequence a(n) is given by the convolution of two sequences, defined as ∞ X
y(n) =
r=−∞
hr a(n − r).
c(n) =
L−1 X
hr a(n − r) n = 0, 1, . . . , P + L − 2
r=0
cP (k) =
r=0
M
Fig. 2. Graphical representation of a ˜P (k) as in (15): for 0 ≤ k < M , it is just a shift of one row of aP (k), whereas for M ≤ k < M + L − 1 it is a copy of aP (k − M ). Shaded boxes indicate then elements having the same value.
whereas for M ≤ k < M + L − 1 we can write
hr a ˜P (k − r) k = 0, 1, . . . , M + L − 2
(14) =
0≤k