Multi-file Private Information Retrieval from MDS Coded ... - arXiv

6 downloads 270072 Views 222KB Size Report
May 9, 2017 - which considers retrieving several files simultaneously, known as the ... number of files. ...... For example, the atoms ai or bj serving as side.
1

Multi-file Private Information Retrieval from MDS Coded Databases with Colluding Servers Yiwei Zhang and Gennian Ge

arXiv:1705.03186v1 [cs.IT] 9 May 2017

Abstract Private information retrieval (PIR) gets renewed attentions in recent years due to its information-theoretic reformulation and its application in distributed storage system (DSS). We study a variant of the information-theoretic and DSS-oriented PIR problem which considers retrieving several files simultaneously, known as the multi-file PIR. In this paper, we give an upper bound of the multi-file PIR capacity (the maximal number of bits privately retrieved per one bit of downloaded bit) for the degenerating cases. We then present two general multi-file PIR schemes according to different ratios of the number of desired files to the total number of files. In particular, if the ratio is at least half, then the rate of our scheme meets the upper bound of the capacity for the degenerating cases and is thus optimal. Index Terms Private information retrieval, multi-file PIR, distributed storage system, PIR capacity

I. I NTRODUCTION The concept of private information retrieval is first introduced by Chor et al. [7], [8]. In a system consisting of several servers each storing the same n-bit database, a classic PIR scheme will allow a user to retrieve a single bit of the database without revealing the identity of the desired bit to any single server. PIR has received a lot of attention from the computer science community and cryptography community. The main research objective is to minimize the total communication cost, including both the upload cost (the queries sent to the servers) and the download cost (the feedbacks from the servers). A series of works have managed to reduce the communication cost, see for example, [4], [10], [11], [27], [28] and the references therein. Some variants of the PIR models gradually appear. Particularly, it is natural to consider retrieving several bits simultaneously [9], [13], [14], [25]. Recently, the PIR problem is reformulated from an information-theoretic point of view. Instead of retrieving a single bit from an n-bit database, the information-theoretic PIR problem considers retrieving a single file from an n-file database, where the length of each file could be arbitrarily large. In this sense, the objective turns into minimizing the total communication cost per unit of retrieved data. It is further noticed in [6] that in this information-theoretic model the upload cost could be neglected with respect to the download cost. Meanwhile, with the development of distributed storage system (DSS), it is natural to consider designing PIR schemes for coded databases instead of the replicated-based databases in the classic model [1], [5], [6], [16]. The general model for the information-theoretic, DSS-oriented PIR problem is as follows. Assume we have a coded database containing N servers storing M files. Each file is stored independently via the same arbitrary (N, K)-MDS code. A user wants to retrieve a specific file from the database. Any T out of the N servers may collude in attacking on the privacy of the user. We call the PIR scheme for such a system as an (N, K, T ; M )-scheme. PIR rate is defined as the number of bits of messages that could be privately retrieved per one bit of downloaded message. The supremum of all achievable rates is called the PIR capacity, denoted by C = C(N, K, T ; M ). Starting from the pioneering work [19] by Sun and Jafar, a series of papers [2], [12], [18], [20], [22], [23], [29] have attempted to analyze the fundamental limits on the PIR capacity. The exact value of the PIR capacity for the degenerating cases, i.e., either K = 1 or T = 1, has been completely solved. For MDS coded databases with colluding servers (K ≥ 2 and T ≥ 2), determining the capacity still remains an open problem. Similarly to the development of the classic PIR model, we could consider some other variants. For example, the symmetric PIR problem is considered in [21] and [26], where a further constraint is that the user should know nothing about any undesired file. [17] discusses multi-round PIR schemes, where the queries and feedbacks are made in several rounds and the user may adjust his queries according to the feedbacks from previous rounds. [24] considers arbitrary collusion patterns instead of the original “T out of the N servers may collude” model. [15] considers the model replacing the underlying MDS storage code by some non-MDS code. In this paper, we consider the multi-file PIR model where a user wants to retrieve several files simultaneously. Multi-file PIR is first considered in a recent paper by Banawan and Ulukus [3] for the case K = T = 1. They show that in order to retrieve P ≥ 2 files, one can do better than the trivial approach of executing P independent (single-file) The research of G. Ge was supported by the National Natural Science Foundation of China under Grant Nos. 11431003 and 61571310, Beijing Hundreds of Leading Talents Training Project of Science and Technology, and Beijing Municipal Natural Science Foundation. Y. Zhang is with the School of Mathematical Sciences, Capital Normal University, Beijing 100048, China (email: [email protected]). G. Ge is with the School of Mathematical Sciences, Capital Normal University, Beijing 100048, China. He is also with Beijing Center for Mathematics and Information Interdisciplinary Sciences, Beijing 100048, China (e-mail: [email protected]).

2

PIR schemes. In this paper, we will consider the multi-file PIR problem for general parameters K and T , i.e., we consider the multi-file PIR from MDS coded databases with colluding servers. Our main contributions are summarized as follows. First, we present an upper bound of the multi-file PIR capacity for the degenerating cases, i.e., one of the parameters K and T equals one. We then present two multi-file PIR schemes which work for general K and T , according to different ratios of the number of desired files to the total number of files. This is a natural generalization combining the ideas in [3] and in our recent paper [29] dealing with (single-file) PIR schemes. Particularly, P is at least half, the rate of our scheme meets the upper bound of the capacity for the degenerating cases and when the ratio M P is thus optimal. For the complementary case M < 12 , our scheme fulfills the potential of the scheme posed in [3]. The rest of the paper is organized as follows. In Section II, we introduce the general model of the multi-file PIR problem. In Section III we analyze the upper bound of the multi-file PIR capacity via information inequalities for the degenerating cases. In Section IV we present a general multi-file PIR scheme when P ≥ M 2 and show its optimality for the degenerating cases. In Section V we discuss how to construct a general multi-file PIR scheme when P < M 2 . Section VI concludes the paper. II. P ROBLEM S TATEMENT In this section we introduce the general model of the multi-file PIR problem. Basically the statement follows the same way as in [3] or as the single-file PIR model shown in [2], [18]–[20], [29]. The general model considers a distributed storage system consisting of N servers. The system stores M files, denoted as W [1] , W [2] , . . . , W [M] ∈ FqL×K , i.e., each file is of length LK and represented in a matrix of size L × K. The files are independent and identically distributed with H(W [i] ) = LK, H(W

[1]

,W

[2]

,...,W

[M]

i ∈ {1, 2, . . . , M },

(1)

) = M LK.

(2)

[i]

Denote the jth row of the file W [i] as wj ∈ FK q , 1 ≤ j ≤ L. Each file is stored in the system via the same given (N, K)-MDS code. The generator matrix G ∈ FK×N of the MDS code is denoted as q i h , (3) G = g1 g2 · · · gN K×N

[i]

and the MDS property means that any K columns of G are linearly independent. For each wj , the nth server stores the [i] stored on the nth server are the concatenated projections of all the files coded bit wj gn . Thus, the whole contents yn ∈ FML q {W [1] , W [2] , . . . , W [M] } on the encoding vector gn , i.e.,   W [1]   .. yn =  (4)  gn . W [M] iT h [M] [M] [2] [2] [1] [1] = w1 g n · · · wL g n w1 g n · · · wL g n · · · w1 g n · · · wL g n .

(5)

Now assume that a user wants to retrieve an arbitrary set of P files and the set of indices of these files is P = {i1 , i2 , . . . , iP } ⊂ {1, 2, . . . , M }. Denote W [P] = {W [i] , i ∈ P}. The retrieval is done by sending some queries to the servers and then getting the feedbacks. Let F denote a random variable generated by the user and unknown to any server. F represents the randomness of the user’s strategy to generate the queries. Let G denote a random variable generated by the servers and also known to the user. G represents the randomness of the server’s strategy to produce the feedbacks1. Both F and G are generated independently of the files and the desired indices P, i.e., H(F , G, P, W [1] , W [2] , . . . , W [M] ) = H(F ) + H(G) + H(P) + H(W [1] ) + · · · + H(W [M] ).

(6)

[P] Qn

Using his strategy F , the user generates a set of queries to the nth server, 1 ≤ n ≤ N . The queries are independent of the files, i.e., [P] [P] [P] I(Q1 , Q2 , . . . , QN ; W [1] , W [2] , . . . , W [M] ) = 0. (7) [P]

[P]

Upon receiving the query, the n-th server responds a feedback An , which is a deterministic function of the query Qn , [P] the strategy G and the data yn (and therefore a deterministic function of the query Qn , the strategy G and the files {W [1] , W [2] , . . . , W [M] }), i.e., [P] [P] [P] [1] [2] [M] H(A[P] ) = 0. n |Qn , G, yn ) = H(An |Qn , G, W , W , . . . , W

(8)

1 The role of G is first proposed by Sun and Jafar in [18]. Such a strategy allows the server to perform some coding procedures before responding and thus could reduce the download cost. For more details please refer to [18]. In almost all the other works on PIR schemes, including the current draft, we follow a question-and-answer format. That is, for any query vector the server responds the corresponding projection of his contents onto the query. For this kind of schemes we may just neglect G.

3

The user retrieves his desired files based on all the queries and feedbacks, plus the knowledge of the strategies F and G, i.e., [P] [P] [P] [P] [P] [P] (9) H(W [P] |Q1 , Q2 , . . . , QN , A1 , A2 , . . . , AN , F , G) = 0. [P]

[P]

[P]

For any subset T of the servers, |T | = T , let QT represent {Qn , n ∈ T }. Similarly we have the notation AT . The privacy constraint requires that this set of servers learns nothing about the identity of the retrieved files, i.e., [P]

[P]

I(P; QT ) = I(P; AT ) = 0, ∀T ⊆ {1, 2, . . . , N }, |T | = T.

(10)

The rate for the multi-file PIR scheme is defined as the ratio of the size of all the retrieved files to the total download cost, i.e., P [i] i∈P H(W ) , (11) PN [P] ) H(A n n=1

and the multi-file PIR capacity C P is the supremum of all achievable rates. III. U PPER BOUND

OF THE MULTI - FILE

PIR CAPACITY

FOR THE DEGENERATING CASES

In this section we shall analyze the upper bound of the multi-file PIR capacity based on information inequalities. The case K = T = 1 has been presented in [3]. We shall only consider the degenerating cases K = 1 or T = 1 here. As for the non-degenerating case (i.e., K ≥ 2 and T ≥ 2), analyzing the upper bound of the capacity is a rather difficult problem, even for the single-file PIR model. Given the parameters P, N, K, T , the minimal download cost is then a function of the number of files M , denoted as ΩM . For simplicity we need the following notations as in [3].  Q , Q[P] (12) n : P ⊂ {1, 2, . . . , M }, |P| = P, n ∈ {1, 2, . . . , N } .  [P] [P] [P] for n1 ≤ n2 , n1 , n2 ∈ {1, 2, . . . , M }. (13) A[P] n1 :n2 , An1 , An1 +1 , . . . , An2 The scheme could be assumed to be symmetric, or else we could build a symmetric scheme via a non-symmetric scheme by replicating all permutations of servers and files. Therefore we may assume that for any two sets of indices of P desired files P1 , P2 and for any two servers, say the nth and the mth server, we have [P2 ] H(An[P1 ] |Q) = H(Am |Q).

(14)

The equality above also holds when one considers conditional entropy with prior knowledge of any subset of files, i.e., [P2 ] |W [S] , Q) for any S ⊂ {1, 2, . . . , M }. H(An[P1 ] |W [S] , Q) = H(Am

(15)

The key to the analysis of the multi-file PIR capacity is the following inductive relation. P L = H(W [P] ) = H(W = = = ≤ Note that

P

[P]

(16)

|Q)

(17)

[P] [P] I(W ; A1:N |Q) + H(W [P] |A1:N , Q) [P] I(W [P] ; A1:N |Q) [P] [P] H(A1:N |Q) − H(A1:N |W [P] , Q) X [P] [P] H(A[P] , Q). n |Q) − H(A1:N |W 1≤n≤N [P]

(18) (19) (20) (21)

[P]

1≤n≤N

H(An |Q) is exactly the total download cost. So the inductive relation above can be rewritten as [P]

ΩM ≥ P L + H(A1:N |W [P] , Q).

(22)

The remaining trick is how to bound the second part in the inequality above. We are now ready to analyze the capacity for two degenerating cases, i.e., either T = 1 or K = 1.

4

A. T = 1 From the symmetry argument and the fact that any K servers possess independent messages, we have [P]

[P]

H(A1:N |W [P] , Q) ≥ KH(A1 |W [P] , Q).

(23)

One way for a server to analyze P is by analyzing the entropy of the queries it receives (or equivalently, the feedbacks it produces) conditioned on any subset of files. Since each server is not allowed to know any information of the identity of the retrieved files, we must have [P]

[P]



H(A1 |W [P] , Q) = H(A1 |W [P ] , Q), ∀P ′ ⊂ {1, 2, . . . , M }, |P ′ | = P. [P]

(24)



When M ≥ 2P , we may choose the index set P ′ to be disjoint from P. Thus, H(A1 |W [P ] , Q) characterises the amount of download from a server in a PIR scheme with the same parameters P, N, K, T but a different number of files M − P . So we have the following inductive relation. ΩM ≥ P L +

K ΩM−P , M ≥ 2P. N

(25)

[P]

)L , as shown in [3, Lemma 2]. So we have ΩM ≥ Plus, when M < 2P , H(A1 |W [P] , Q) is lower bounded by (M−P N K P L + N (M − P )L, M < 2P . Recursively using the inductive relation (25), we have

K M M K K K M + ( )2 + · · · + ( )⌊ P ⌋−1 ) + ( )⌊ P ⌋ (M − P ⌊ ⌋)L. N N N N P So the multi-file PIR capacity C P is upper bounded by !−1 M ⌊M P ⌋ 1 − (K PL M  K⌊ P ⌋ M P N) . ≤ C = −⌊ ⌋ + ΩM P P N⌊M P ⌋ 1− K N ΩM ≥ P L(1 +

(26)

(27)

And in particular, when M ≤ 2P , we have

K(M − P ) −1 . PN One can check that when K = 1, the results (27) and (28) agree with those in [3]. CP ≤ 1 +

(28)

B. K = 1 Similarly as (23), we have the following claim. [P]

[P]

H(A1:N |W [P] , Q) ≥ T H(A1 |W [P] , Q).

(29)

No matter how we design a PIR scheme, we have two kinds of queries. The first kind is a query dependent on the desired files. The second kind is a query independent on the desired files (serving as side information). Each query of the first kind will contribute one bit to the retrieved messages. In an optimal scheme, it is certain that all the queries of the first kind will contribute independent bits, or otherwise the redundancy could be made use of to improve the scheme. From the perspective of any T colluding servers, since they are not allowed to learn any information of the identity of the retrieved files, then the queries of the second kind on these T servers should contribute independent bits as well. Therefore we have the claim (29) above. [P] Using the analyses above of H(A1 |W [P] , Q) again, we can deduce the following inductive relation. ΩM ≥ P L +

T ΩM−P , M ≥ 2P. N

(30)

[P]

)L Plus, when M < 2P , H(A1 |W [P] , Q) is lower bounded by (M−P , as shown in [3, Lemma 2]. So we have ΩM ≥ N T P L + N (M − P )L, M < 2P . Recursively using the inductive relation (30), we have

T T M T T M M + ( )2 + · · · + ( )⌊ P ⌋−1 ) + ( )⌊ P ⌋ (M − P ⌊ ⌋)L. N N N N P P So the multi-file PIR capacity C is upper bounded by !−1 T ⌊M ⌊M P ⌋  P ⌋ 1 − ( ) P L T M M N . CP = ≤ − ⌊ ⌋ ⌊M ⌋ + T ΩM P P N P 1− N ΩM ≥ P L(1 +

(31)

(32)

And in particular, when M ≤ 2P , we have

T (M − P ) −1 . PN One can check that when T = 1, the results (32) and (33) agree with those in [3]. CP ≤ 1 +

(33)

5

IV. A

GENERAL MULTI - FILE

PIR SCHEME

FOR

P ≥

M 2

M 2 ,

In this section we shall present a general multi-file PIR scheme for P ≥ i.e., a user wants to retrieve at least half of all the files. The idea in designing the scheme is a combination of the idea in [3] and in our recent work [29], which improves the rate for the general single-file PIR problem for a certain range of parameters. We first proceed with an illustrative example. A. Example:N = 4, K = 2, T = 2, M = 3, P = 2 Denote the three files by W [1] , W [2] and W [3] . Assume each file is of length 72 and represented in a matrix of size 36 × 2, i.e., [1] [2] [3] w1 ! w1 ! w1 ! .. .. .. W [1] = W [2] = W [3] = (34) . . . [1] [2] [3] w36 w36 w36 [2]

[1]

[3]

where wi , wi , wi ∈ F2q , 1 ≤ i ≤ 36. Here Fq is a sufficiently large finite field that allows the existence of an MDS code used in the construction later. Let W [1] and W [2] be the desired files. Independently choose three random matrices S1 , S2 , S3 ∈ Fq36×36 uniformly from all the 36 × 36 full rank matrices over [1] Fq . Construct a list of atoms a[1:36] = S1 W [1] . Note that this is only a formal expression and should be understood as [1]

[1]

follows. For example, if the first row of S1 is the vector (p1 , p2 , . . . , p36 ), then a1 represents the linear combination a1 = [1] [1] [1] p1 w1 + p2 w2 + · · · + p36 w36 . We use the word “atom” to emphasize that it is a query related to exactly only one file. [2] [2] Similarly let a[1:36] = S2 W . Suppose there exists a (36, 30)-MDS code and the transpose of its generator matrix is denoted as MDS36×30 . Select the [3] first 30 rows of S3 , denoted as S3 [(1 : 30), :]. Construct a list of atoms a[1:36] = MDS36×30 S3 [(1 : 30), :]W [3] . Note that this is only a formal expression and should be understood as follows. For example, if the first row of MDS36×30 S3 [(1 : 30), :] is [3] [3] [3] [3] [3] the vector (q1 , q2 , . . . , q36 ), then a1 represents the linear combination a1 = q1 w1 + q2 w2 + · · · + q36 w36 . [3] [2] [1] The atoms a[1:36] , a[1:36] and a[1:36] will form the queries for the servers. The queries for the servers are divided into Σ seventeen blocks. In fifteen blocks each query is only a single atom and in two blocks ΛΣ 1 and Λ2 each query is a mixture of three atoms. The query structure is as follows.

ΛΣ 1 :

ΛΣ 2 :

(

(

[1] a1 [1] a3 [1] a5

Server I [2] [3] + a1 + a1 [2] [3] + a3 + a3 [3] [2] + a5 + a5

Server I [1] [2] [3] z 1 a1 + z 2 a1 + z 3 a1 [1] [2] [3] z 1 a3 + z 2 a3 + z 3 a3 [1] [2] [3] z 1 a5 + z 2 a5 + z 3 a5

[1] a1 [1] a4 [1] a6

Server II [2] [3] + a1 + a1 [2] [3] + a4 + a4 [3] [2] + a6 + a6

Server II [1] [2] [3] z 1 a1 + z 2 a1 + z 3 a1 [1] [2] [3] z 1 a4 + z 2 a4 + z 3 a4 [1] [2] [3] z 1 a6 + z 2 a6 + z 3 a6

[1] a2 [1] a3 [1] a6

Server III [2] [3] + a2 + a2 [2] [3] + a3 + a3 [3] [2] + a6 + a6

Server IV ) [2] [3] + a2 + a2 . [2] [3] + a4 + a4 [3] [2] + a5 + a5

[1] a2 [1] a4 [1] a5

Server III [1] [2] [3] z 1 a2 + z 2 a2 + z 3 a2 [1] [2] [3] z 1 a3 + z 2 a3 + z 3 a3 [1] [2] [3] z 1 a6 + z 2 a6 + z 3 a6

Server IV ) [1] [2] [3] z 1 a2 + z 2 a2 + z 3 a2 [1] [2] [3] z 1 a4 + z 2 a4 + z 3 a4 [1] [2] [3] z 1 a5 + z 2 a5 + z 3 a5

where z1 , z2 , z3 are randomly chosen distinct elements in Fq .

[x]

Λλ

I Server II Server III Server IV ) ( Server [x] [x] [x] [x] a6λ+1 a6λ+1 a6λ+2 a6λ+2 : [x] [x] [x] [x] a6λ+3 a6λ+4 a6λ+3 a6λ+4 [x] [x] [x] [x] a6λ+5 a6λ+6 a6λ+6 a6λ+5

for λ ∈ {1, 2, 3, 4, 5}, x ∈ {1, 2, 3}.  W [1] h i Recall that the n-th server, 1 ≤ n ≤ 4, has stored yn =  W [2]  gn , where G = g1 g2 g3 g4 is the generator 2×4 W [3] [2] [1] matrix of the (4, 2)-MDS storage code. Therefore, each server knows the 108 bits {wi gn : 1 ≤ i ≤ 36}, {wi gn : 1 ≤ [3] [2] [1] [3] i ≤ 36} and {wi gn : 1 ≤ i ≤ 36}. Upon receiving the query of the form ai + aj + ak , the n-th server responds with [3] [2] [1] (ai + aj + ak )gn , which is a linear combination of the 108 bits the server possesses. [2] [1] Retrieving the files W [1] and W [2] is equivalent to retrieving a[1:36] and a[1:36] since S1 and S2 are both full rank matrices. 

[1]

[2]

[3]

Note that a[7:36] , a[7:36] and a[7:36] could be retrieved since each appears as a query on two different servers and any two [3]

[3]

of the four vectors {g1 , g2 , g3 , g4 } are linearly independent. Then a[1:6] could be solved from a[7:36] due to the property of

6

[1]

[2]

Σ MDS36×30 . Therefore, the interferences can be eliminated in the blocks ΛΣ 1 and Λ2 . So the queries of the form ai + ai and [2] [1] [2] [1] z1 ai + z2 ai , 1 ≤ i ≤ 6, could be retrieved. Then we can solve the six systems of equations to retrieve a[1:6] and a[1:6] . The scheme is private against any two colluding servers. From the perspective of any two colluding servers, they have received [3] [2] [1] 25 atoms ai , 25 atoms aj , 25 atoms ak and 10 mixed queries and thus the query structure is completely symmetric with respect to all the files. Altogether the two servers have 30 distinct atoms towards each file. Extract the coefficients of each atom as a vector in F36 q . Recall how we select the random matrices S1 , S2 S3 and the (36, 30)-MDS code. It turns out that the 30 vectors with respect to each file form a random subspace of dimension 30 in F36 q , so the two servers cannot tell any difference among the atoms towards different files and thus the identity of the retrieved files is disguised. 12 36×2×2 = 17 . As a comparison, for the sake of retrieving two files a trivial The rate of the scheme above is then 12×5×3+12+12 12 approach is to execute two independent (single-file) PIR schemes. However, the rate will be only 61 91 < 17 if one uses the (single-file) PIR scheme in [29, Section III.B].

B. The general framework Let α and β be the smallest positive integers satisfying      ! N N N −T α = (α + β) − . (35) K K K   N Assume the existence of an (α+β) N K , α K -MDS code. The transpose of its generator matrix is denoted as MDS(α+β)(N )×α(N ) . K

Denote the files as W [1] , W [2] , . . . , W [M] . Each file is of length LK and represented in a matrix of size L × K over Fq , where L is computed as   N L = (α + β) . (36) K   N Here Fq is a sufficiently large field that allows the existence of the (α + β) N K , α K -MDS code mentioned above. Each [m] [m] [m] file, say W [m] , consists of L rows, w1 , w2 , . . . , wL . For each file W [m] we will build a list of L atoms, denoted [m] [m] [m] [m] [m] as a[1:L] . Each atom ai , 1 ≤ i ≤ L, is a linear combination of {w1 , w2 , . . . , wL }. Without loss of generality let {W [1] , W [2] , . . . , W [P ] } be the set of desired files. In this scheme, any query is either a single atom or a mixture of M atoms. Constructing the multi-file PIR scheme contains the following steps. • Step 1: Independently choose M random matrices, S1 , . . . , SM , uniformly from all the L × L full rank matrices over Fq . [m] Then the atoms for the desired files W [m] , 1 ≤ m ≤ P are just built by a[1:L] = Sm W [m] .   −1 N • Step 2: We need an assisting array of size N K−1 × N consisting of K symbols. Each symbol appears K times and every K columns share a common symbol. For example, when N = 4, K = 2 the array is of the form ( ) 1 1 2 2 3 4 3 4 . 5 6 6 5

• Step 3: [Construction of the query structure] The query structure is divided into M α + P β blocks. [m] The M α blocks are denoted as Λλ , 1 ≤ λ ≤ α and 1 ≤ m ≤ M . In such a block any query is an atom towards the [m] [m] corresponding file W [m] . The atoms appeared in a block Λλ are the atoms a[(β+λ−1) N +1:(β+λ) N ] . The block is set in an (K ) (K ) “isomorphic” form with the assisting array, i.e., every K servers share a common atom. For example, each block of queries [x] Λλ in the previous example is isomorphic with the assisting array above. Pick a random MDS generator matrix H ∈ FqP ×M . For example, this could be done by choosing a Reed-Solomon generator matrix and then randomly permuting the columns, resulting in a matrix H of the form     h11 h12 · · · h1M h1  h2   h21 h22 · · · h2M      (37) H= . = . ..  . .. ..  ..   .. . .  . hP 1 hP 2 · · · hP M hP The P β blocks are denoted as ΛΣ λ,p , 1 ≤ p ≤ P , 1 ≤ λ ≤ β. In such a block any query is a mixture of M atoms. Again the block is set in an“isomorphic” form with the assisting array. If the corresponding entry in the assisting array is the number Σ x ∈ {1, 2, . . . , N K }, then we set the corresponding query in the block Λλ,p to be the formal sum [1]

hp · (a(λ−1)

[2]

[M]

, . . . , a(λ−1) N +x ) = ,a (K ) ( )+x (λ−1)(N K )+x N K

M X

m=1

[m]

hpm a(λ−1)

N )+x (K

.

(38)

K

7

• Step 4: [Set the atoms for those undesired files] For each undesired file W [m] , choose the first α  [m] as Sm [(1 : α N K ), :]. The atoms a[1:L] are built by   N [m] ), :]W [m] . a[1:L] = MDS(α+β)(N )×α(N ) Sm [(1 : α K K K

N K



rows of Sm , denoted

(39)

C. Analysis of the multi-file PIR scheme Each query appears in K different servers and thus can be successfully retrieved. So we have retrieved all the atoms [m] [m] for each file W [m] . Then by the property of MDS(α+β)(N )×α(N ) , we can solve a[1:β N ] from a[β N +1:L] for K K (K ) (K ) each undesired file. After eliminating these interferences from the queries in those P β blocks ΛΣ λ,p , we are facing with β systems of equations. The coefficients of each system come from some P columns of H and therefore form a non-singular [m] P × P matrix. By solving these systems we retrieve a[1:β N ] for every desired file. Therefore for every desired file we have (K ) retrieved all the L atoms towards the file, which is equivalent to the retrieval of the file itself since each Sm is a full rank matrix. The scheme is private against any T colluding servers. From the perspective of any T colluding servers, the query structure is completely symmetric respect to all the files. The number of atoms towards any file known by these T colluding  with  N −T N servers is (α + β) N , which equals α K by the equality (35). K − K  Extract  the coefficients of an atom as a vector N N in FL q . Recall how we build the atoms and the property of the (α + β) K , α K -MDS code. One can see that the vectors L with respect to each file will form a random subspace of dimension α N K in Fq . So these T colluding servers cannot tell any difference among the atoms towards different files and thus the identity of the retrieved files is disguised. The rate of the scheme can be calculated as  N Pα + Pβ P LK K = M N =  . (40) N N −T −T Mα + Pβ (M α + P β)K K + NK P K − K K(M−P ) −1 , meeting the Recall that we are now dealing with the case P ≥ M 2 . When T = 1, the rate above is exactly 1 + PN ) −1 , meeting the upper bound upper bound of the capacity shown in (28). When K = 1, the rate above is exactly 1 + T (M−P PN of the capacity shown in (33). Therefore, when P ≥ M , we determine the exact multi-file PIR capacity for the degenerating 2 cases. We summarize our main result as follows. (N )  Theorem 1: There exists an (N, K, T ; M ) P -files PIR scheme with rate M N NK when T + K ≤ N and N −T −T P (K )−( K ) +( K ) M P ≥ 2 . The rate of the scheme is optimal when either K = 1 or T = 1. Note that the condition T + K ≤ N is essential since otherwise the equality (35) does not make sense. Another way to explain this limitation is as follows. We build the queries according to the assisting array. If T + K > N then every T servers will know all the queries and all the atoms and thus no side information can be used. We close this section by briefly discussing the trivial approach of executing P independent (single-file) PIR schemes. Using the (single-file) scheme in [29], we can only have a P -file PIR scheme with rate [m] a[β N +1:L] (K )

(1 + RM−1 P − RM−1 )(1 + R + R2 + · · · + RM−1 )−1 ,

(41)

) ( where R = 1 − . This is calculated similarly as the process of calculating single-file PIR rate in [29] and one should ( ) take into consideration that we can retrieve a certain amount of each of the other P − 1 files for free when we execute a (single-file) PIR scheme towards a fixed file. To check that Theorem 1 performs better than the trivial approach is equivalent to showing M (42) 1 + R + · · · + RM−1 > ( R + 1 − R)(1 + RM−1 P − RM−1 ). P N −T K N K

The right-hand side achieves maximum at either P = M (then its value is 1 + (M − 1)RM−1 ) or P = M 2 (then its value is M−1 − 1)R )), both smaller than the left-hand side. So our multi-file scheme is indeed better than the trivial (1 + R)(1 + ( M 2 approach. V. A

GENERAL MULTI - FILE

PIR SCHEME

FOR

P 17 difference is that we have the 4th round but suppress the 5th round. The rate of this scheme is 135 28 . One can see that in the query structure, the interference in each query related to the undesired files (i.e. the atoms ci , dj , ek ) can be eliminated with the help of some queries in previous rounds from the other server, referred to as “side information from undesired files”. The key in designing a scheme is to adjust the number of stages in each round, such that every piece of side information from undesired files is fully used. For example, the number of pieces of side information of the form ci + dj on each server is five, among which four appear as interferences in Round 3 on the other server and the left one appears as an interference in Round 4 on the other server. After eliminating the interferences from undesired files, we have several queries of the form ai + bj . Such a query could only help us retrieve at most one single atom. So we require that exactly one of the two atoms should have already been retrieved from previous rounds on the other server and thus could also serve as a piece of side information, referred to as “side information from desired files”. For example, the atoms ai or bj serving as side information are written in bold form and each could be retrieved from previous rounds on the other server. With the help of the side information from both the desired files and the undesired files, a user can retrieve all his desired atoms. The scheme is private against any single server. From the perspective of each server, the query structure is completely symmetric with respect to all the files. Moreover, all the atoms towards each file known by a server are distinct. So a single server cannot tell any difference among the atoms towards each file and thus the identity of the retrieved files is disguised. B. Multi-file PIR schemes for K = T = 1 In this subsection we shall describe a general algorithm to construct multi-file PIR schemes for the basic case K = T = 1. The construction involves two aspects: building the frame of the query structure (i.e. determining the number of stages for each round) and filling in the frame with proper atoms. We only discuss the first aspect here. Once the frame is determined, filling the frame just follows naturally by making use of the side information. Recall that in the example in the previous subsection, for the case N = 2, M = 5, P = 2, our scheme has the 4th round while the scheme in [3] has the 5th round. It is not obvious to see which scheme performs better without detailed computation. Motivated by this example, we could consider building some candidate schemes which contain different rounds and then find out which candidate performs better. We use Ωx , M − P < x ≤ M , to denote a candidate scheme containing the first M − P rounds plus the xth round. In each Ωx , the number of stages in each round can be calculated similarly as shown in [3]. The numbers are adjusted so that every piece of side information from undesired  files is fully made use of. Suppose the number of stages in the xth round P is C. In each stage, the nth server requires x−y pieces of side information consisted of some given y < x undesired files. Note that a stage in the yth round will provide altogether N − 1 pieces of such side information to the nth server (one piece C( P ) from the mth server, m 6= n). So we need Nx−y −1 stages in the yth round matching the C stages in the xth round. Recursively we could determine the number of stages in all the rounds. Set C as the minimal positive integer such that all the computed numbers are integers. The frame of the sub-scheme is finished. For example, in the previous subsection we build a scheme Ω4 with one stage in the 4th round. We need two stages in the 3rd round and one stage in the 2nd round to match the stage in the 4th round. Then we need four stages in the 2nd round and two stages in the 1st round to match the two stages in the 3rd round. Now the total number of stages in the 2nd round is then 1 + 4 = 5. Finally we need ten stages in the 1st round to match the five stages in the 2nd round and thus the total number of stages in the 1st round is 10 + 2 = 12. This explains the frame of the query structure in the previous example. Note that we still need to verify if we have enough side information from desired files for filling the whole query structure. For example, each atom serving as side information from desired files (those in bold form) in the table can be retrieved from previous rounds on the other server. Once this is verified, building the frame of a proper candidate scheme is finished. Computing the download cost and total size of retrieved files is then straightforward. Then we can find the best scheme by analyzing the rate of different candidate schemes. We summarize the procedure described above in the following algorithm.

9

TABLE I T HE QUERY STRUCTURE FOR THE CASE N = 2, M = 5, P = 2.

Round 1

Round 2

Stage Stage Stage Stage Stage Stage Stage Stage Stage Stage Stage Stage

1 2 3 4 5 6 7 8 9 10 11 12

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Server I a1 , b1 , c1 , d1 , e1 a2 , b2 , c2 , d2 , e2 a3 , b3 , c3 , d3 , e3 a4 , b4 , c4 , d4 , e4 a5 , b5 , c5 , d5 , e5 a6 , b6 , c6 , d6 , e6 a7 , b7 , c7 , d7 , e7 a8 , b8 , c8 , d8 , e8 a9 , b9 , c9 , d9 , e9 a10 , b10 , c10 , d10 , e10 a11 , b11 , c11 , d11 , e11 a12 , b12 , c12 , d12 , e12 a25 + b13 a26 + c13 b26 + c14 a27 + d13 b27 + d14 a28 + e13 b28 + e14 c25 + d25 c26 + e25 d26 + e26 a13 + b32 a33 + c15 b33 + c16 a34 + d15 b34 + d16 a35 + e15 b35 + e16 c29 + d29 c30 + e29 d30 + e30 a39 + b14 a40 + c17 b40 + c18 a41 + d17 b41 + d18 a42 + e17 b42 + e18 c33 + d33 c34 + e33 d34 + e34 a14 + b46 a47 + c19 b47 + c20 a48 + d19 b48 + d20 a49 + e19 b49 + e20 c37 + d37 c38 + e37 d38 + e38 a53 + b15 a54 + c21 b54 + c22 a55 + d21 b55 + d22 a56 + e21 b56 + e22 c41 + d41 c42 + e41 d42 + e42

Server II a13 , b13 , c13 , d13 , e13 a14 , b14 , c14 , d14 , e14 a15 , b15 , c15 , d15 , e15 a16 , b16 , c16 , d16 , e16 a17 , b17 , c17 , d17 , e17 a18 , b18 , c18 , d18 , e18 a19 , b19 , c19 , d19 , e19 a20 , b20 , c20 , d20 , e20 a21 , b21 , c21 , d21 , e21 a22 , b22 , c22 , d22 , e22 a23 , b23 , c23 , d23 , e23 a24 , b24 , c24 , d24 , e24 a1 + b25 a29 + c1 b29 + c2 a30 + d1 b30 + d2 a31 + e1 b31 + e2 c27 + d27 c28 + e27 d28 + e28 a32 + b1 a36 + c3 b36 + c4 a37 + d3 b37 + d4 a38 + e3 b38 + e4 c31 + d31 c32 + e31 d32 + e32 a2 + b39 a43 + c5 b43 + c6 a44 + d5 b44 + d6 a45 + e5 b45 + e6 c35 + d35 c36 + e35 d36 + e36 a46 + b2 a50 + c7 b50 + c8 a51 + d7 b51 + d8 a52 + e7 b52 + e8 c39 + d39 c40 + e39 d40 + e40 a3 + b53 a57 + c9 b57 + c10 a58 + d9 b58 + d10 a59 + e9 b59 + e10 c43 + d43 c44 + e43 d44 + e44

10

TABLE II T HE QUERY STRUCTURE FOR THE CASE N = 2, M = 5, P = 2 ( CONT.).

Round 3

Stage 1

Stage 2

Round 4

Stage 1

a15 + b60 + c23 a60 + b16 + d23 a16 + b61 + e23 a63 + c27 + d27 b63 + c31 + d31 a64 + c28 + e27 b64 + c32 + e31 a65 + d28 + e28 b65 + d32 + e32 c45 + d45 + e45 a69 + b17 + c24 a17 + b69 + d24 a70 + b18 + e24 a72 + c35 + d35 b72 + c39 + d39 a73 + c36 + e35 b73 + c40 + e39 a74 + d36 + e36 b74 + d40 + e40 c47 + d47 + e47 a18 + b78 + c43 + d43 a78 + b19 + c44 + e43 a19 + b79 + d44 + e44 a81 + c46 + d46 + e46 b81 + c48 + d48 + e48

a61 + b3 + c11 a4 + b62 + d11 a62 + b4 + e11 a66 + c25 + d25 b66 + c29 + d29 a67 + c26 + e25 b67 + c30 + e29 a68 + d26 + e26 b68 + d30 + e30 c46 + d46 + e46 a5 + b70 + c12 a71 + b5 + d12 a6 + b71 + e12 a75 + c33 + d33 b75 + c37 + d37 a76 + c34 + e33 b76 + c38 + e37 a77 + d34 + e34 b77 + d38 + e38 c48 + d48 + e48 a79 + b6 + c41 + d41 a7 + b80 + c42 + e41 a80 + b7 + d42 + e42 a82 + c45 + d45 + e45 b82 + c47 + d47 + e47

Algorithm 1: (Input: K = T = 1, N , M , P ) 1. Building the framework of the candidate schemes Ωx . The number of stages in the mth round as αm . Set αx = C  P is Pdenoted α . and αy = 0 for M − P < y ≤ M , y 6= x. For the rest values of m ≤ M − P , αm = N1−1 P m+i Select C to be the i=1 i minimal positive integer such that all the numbers αm are integers. 2. For each candidate scheme, verify if it is a proper scheme, i.e., verify if we have enough side information from desired files for filling the whole query structure. 3. Compute the rate of each proper candidate scheme and find the best Ωx . 4. Fill in the scheme Ωx by making use of the side information. We close this subsection by presenting another example for the case N = 3, M = 5, P = 2. Following the algorithm, we build the candidate schemes Ω5 and Ω4 . In the scheme Ω5 , we have α5 = C, α4 = 0, α3 = C 1 C 1 3C 1 α 2 5 = 2 , α2 = 2 × 2 × α3 = 2 and α1 = 2 (2α2 + α3 ) = 4 . So we set C = 4 and then (α5 , α4 , α3 , α2 , α1 ) = (4, 0, 2, 2, 3). Similarly in the scheme Ω4 we have (α5 , α4 , α3 , α2 , α1 ) = (0, 2, 2, 3, 4). Then we verify whether each scheme is proper. In the scheme Ω4 , the total number of atoms serving as interferences from desired files on each server is 3 × 2 + 3 × 2 + 1 × 3 = 15 (the first part 3 × 2 corresponds to the three queries of the form a + b + x + y where x and y represent two distinct symbols from {c, d, e}, multiplied by the number of stages in Round 4; the second part 3 × 2 corresponds to the three queries of the form a + b + z where z ∈ {c, d, e}, multiplied by the number of stages in Round 3; the last part 1 × 3 corresponds to the unique query a + b, multiplied by the number of stages in Round 2). These atoms could by supplied by the 2 × 2 × 4 = 16 atoms appearing on the other two servers in the first round. So Ω4 is proper. Ω5 is analyzed similarly. For Ω4 , the total download cost is then   M X M = 3 × (4 × 5 + 3 × 10 + 2 × 10 + 2 × 5) = 240, αi N i i=1 while the total size of desired atoms (each query containing at least one atom a or b will contribute 1 to this amount) is then    ! M X M M −P − = 3 × (4 × 2 + 3 × 7 + 2 × 9 + 2 × 5) = 171. αi N i i i=1

126 Thus the rate of Ω4 is 171 240 = 0.7125. The rate of the scheme Ω5 is 177 = 0.711864. So Ω4 is slightly better. Finally we fill in the scheme Ω4 in the following table. Atoms serving as interferences from desired files are in bold form. Note that the query structure is a little asymmetric since we retrieve 86 bits of the first file and 85 bits of the second file. A symmetric scheme could be built just by adding a replication of the structure with the roles of the first two files switched.

11

TABLE III T HE QUERY STRUCTURE FOR THE CASE N = 3, M = 5, P = 2.

Round 1

Round 2

Stage Stage Stage Stage

1 2 3 4

Stage 1

Stage 2

Stage 3

Round 3

Stage 1

Stage 2

Round 4

Stage 1

Stage 2

Server I a1 , b1 , c1 , d1 , e1 a2 , b2 , c2 , d2 , e2 a3 , b3 , c3 , d3 , e3 a4 , b4 , c4 , d4 , e4 a22 + b5 a13 + c5 b13 + c9 a14 + d5 b14 + d9 a15 + e5 b15 + e9 c13 + d13 c14 + e13 d14 + e14 a5 + b23 a25 + c6 b25 + c10 a26 + d6 b26 + d10 a27 + e6 b27 + e10 c19 + d19 c20 + e19 d20 + e20 a43 + b6 a34 + c7 b34 + c11 a35 + d7 b35 + d11 a36 + e7 b36 + e11 c25 + d25 c26 + e25 d26 + e26 a6 + b44 + c8 a45 + b7 + d8 a7 + b45 + e8 a49 + c15 + d15 b49 + c17 + d17 a50 + c16 + e15 b50 + c18 + e17 a51 + d16 + e16 b51 + d18 + e18 c31 + d31 + e31 a67 + b8 + c12 a8 + b67 + d12 a68 + b9 + e12 a58 + c21 + d21 b58 + c23 + d23 a59 + c22 + e21 b59 + c24 + e23 a60 + d22 + e22 b60 + d24 + e24 c34 + d34 + e34 a9 + b71 + c27 + d27 a72 + b10 + c28 + e27 a10 + b72 + d28 + e28 a80 + c32 + d32 + e32 b80 + c33 + d33 + e33 a76 + b11 + c29 + d29 a11 + b76 + c30 + e29 a77 + b12 + d30 + e30 a83 + c35 + d35 + e35 b83 + c36 + d36 + e36

Server II a5 , b5 , c5 , d5 , e5 a6 , b6 , c6 , d6 , e6 a7 , b7 , c7 , d7 , e7 a8 , b8 , c8 , d8 , e8 a9 + b22 a16 + c1 b16 + c9 a17 + d1 b17 + d9 a18 + e1 b18 + e9 c15 + d15 c16 + e15 d16 + e16 a24 + b9 a28 + c2 b28 + c10 a29 + d2 b29 + d10 a30 + e2 b30 + e10 c21 + d21 c22 + e21 d22 + e22 a10 + b43 a37 + c3 b37 + c11 a38 + d3 b38 + d11 a39 + e3 b39 + e11 c27 + d27 c28 + e27 d28 + e28 a46 + b10 + c12 a11 + b46 + d12 a47 + b11 + e12 a52 + c13 + d13 b52 + c17 + d17 a53 + c14 + e13 b53 + c18 + e17 a54 + d14 + e14 b54 + d18 + e18 c32 + d32 + e32 a12 + b68 + c4 a69 + b12 + d4 a1 + b69 + e4 a61 + c19 + d19 b61 + c23 + d23 a62 + c20 + e19 b62 + c24 + e23 a63 + d20 + e20 b63 + d24 + e24 c35 + d35 + e35 a73 + b1 + c29 + d29 a2 + b73 + c30 + e29 a74 + b2 + d30 + e30 a81 + c33 + d33 + e33 b81 + c31 + d31 + e31 a3 + b77 + c25 + d25 a78 + b3 + c26 + e25 a4 + b78 + d26 + e26 a84 + c36 + d36 + e36 b84 + c34 + d34 + e34

Server III a9 , b9 , c9 , d9 , e9 a10 , b10 , c10 , d10 , e10 a11 , b11 , c11 , d11 , e11 a12 , b12 , c12 , d12 , e12 a23 + b1 a19 + c1 b19 + c5 a20 + d1 b20 + d5 a21 + e1 b21 + e5 c17 + d17 c18 + e17 d18 + e18 a1 + b24 a31 + c2 b31 + c6 a32 + d2 b32 + d6 a33 + e2 b33 + e6 c23 + d23 c24 + e23 d24 + e24 a44 + b2 a40 + c3 b40 + c7 a41 + d3 b41 + d7 a42 + e3 b42 + e7 c29 + d29 c30 + e29 d30 + e30 a2 + b47 + c4 a48 + b3 + d4 a3 + b48 + e4 a55 + c13 + d13 b55 + c15 + d15 a56 + c14 + e13 b56 + c16 + e15 a57 + d14 + e14 b57 + d16 + e16 c33 + d33 + e33 a70 + b4 + c8 a4 + b70 + d8 a71 + b5 + e8 a64 + c19 + d19 b64 + c21 + d21 a65 + c20 + e19 b65 + c22 + e21 a66 + d20 + e20 b66 + d22 + e22 c36 + d36 + e36 a5 + b74 + c25 + d25 a75 + b6 + c26 + e25 a6 + b75 + d26 + e26 a82 + c31 + d31 + e31 b82 + c32 + d32 + e32 a79 + b7 + c27 + d27 a7 + b79 + c28 + e27 a86 + b8 + d28 + e28 a85 + c34 + d34 + e34 b85 + c35 + d35 + e35

12

C. The general framework Generalizing the scheme for arbitrary parameters K and T is a little complex. The single-file case is presented in [29]. For the multi-file case, we find a good PIR scheme by the following algorithm. Algorithm 2: (Input: K, T , N , M , P (K + T ≤ N )) 1. 2. 3. 4.

Building the framework of the candidate schemes Ωx . For each candidate scheme, verify if it is a proper scheme. Compute the rate of each proper candidate scheme and find the best Ωx . Fill in the scheme Ωx by making use of the side information.

The statement of Algorithm 2 is almost the same with the previous Algorithm 1. However, for general K and T the first two procedures of the algorithm become rather technical. Denote the files as W [1] , W [2] , . . . , W [M] . Each file is of length LK and represented in a matrix of size L×K over Fq , where L is a constant sufficiently large. Fq is a sufficiently large field that allows [m] [m] [m] the existence of several MDS codes used in the scheme later. Each file, say W [m] , consists of L rows, w1 , w2 , . . . , wL . [m] [m] [m] For each file W [m] we will build a list of L atoms, where each atom is a linear combination of {w1 , w2 , . . . , wL }. Let [1] [2] [P ] P = {W , W , . . . , W } be the set of desired files. The first two procedures of the algorithm include the following steps. • Step 1: Independently choose M random matrices, S1 , . . . , SM , uniformly from all the L × L full rank matrices over Fq . • Step 2: Let α and β be the smallest positive integers satisfying      ! N N N −T α = (α + β) − . (43) K K K   N Assume the existence of an (α + β) K ,α N N . K -MDS code. The transpose of its generator matrix is MDS(α+β)(N K )×α(K ) • Step 3: We need an assisting array as mentioned earlier in Section IV, Subsection B. • Step 4: [Construction of the query structure] The query structure is divided into several blocks, where each block is labelled by F ⊆ {W [1] , W [2] , . . . , W [M] }, a subset of files. In a block labelled by F , the queries are of the same form, i.e., every query is a mixture of |F | atoms related to the files in F . We set each block in an “isomorphic” form with the assisting array, i.e., every K servers share a common query. We further call a block labelled by F a t-block if |F | = t. For any F ⊆ {W [1] , W [2] , . . . , W [M] }, |F | = t, we require that the number of t-blocks labelled by F is αt . In a candidate scheme Ωx , αt is computed as  M−P  , if t = x; β (44) αt = 0, if M − P < t ≤ M, t 6= x;  P P P  α if t ≤ M − P. i=1 i αt+i β , T • Step 5: [Dividing the query structure into groups] Now let F denote a nonempty set of files with F P 6= ∅. Denote E = F \P. Select α blocks labelled by E and β blocks labelled by F and consider them as a group. Repeat this process for all proper F until all the blocks are divided into groups. • Step 6: [Constructing the atoms for the undesired files in each group] For each group constructed with respect to F and [m] any undesired in F , the number of atoms towards W [m] within the blocks in this group is (α + β) N K , among  file W N N which α K atoms appear in blocks labelled by E and the other β atoms appear as interferences in blocks labelled by F . K   N ′ [m] ′ [m] , where S denotes some α N Sm W rows in the Let these (α + β) N N atoms for W be built by MDS m (α+β)(K )×α(K ) K K matrix Sm . Repeat this process for all the groups. Note that we shall come across the same undesired file W [M] several times. We have to ensure that the rows we select from Sm are non-intersecting. This is guaranteed as long as L is sufficiently large. Up to now the framework of the candidate scheme Ωx is finished. • Step 7: [Constructing the atoms for the desired files] This step corresponds to the second procedure, i.e., verifying if a candidate scheme is proper. This step is not easy to state explicitly and we just explain the essential idea. For each query containing several atoms towards desired files, pick out any one atom and let the remaining atoms serve as interference atoms from desired files. The atoms for each desired file are built in a technical way such that 1) all the atoms serving as interferences can be deduced from the atoms that could be retrieved and 2) from the perspective of any T colluding servers, all the atoms towards any file are independent (this will guarantee the perfect privacy of the desired files). If these can be satisfied then the candidate scheme is a proper scheme. Particularly, in the single-file case, the atoms towards the unique desired file W [m] are just simply built by Sm W [m] . After these steps, the remaining task is then computing the rate of each proper candidate scheme and finding the best choice. For example, for the case N = 4, K = T = 2, M = 5, P = 2, first we calculate the parameters in the equation (43) and get

13

α = 5 and β = 1. Then in a candidate scheme Ω5 we have (α5 , α4 , α3 , α2 , α1 ) = (1, 0, 5, 50, 525). The total download cost is then    X M M N = 12 × (1 + 5 × 10 + 50 × 10 + 525 × 5) = 38112, αi K i K i=1

while the total size of desired atoms (each query containing at least one atom towards desired files will contribute 1 to this amount) is then    !  X M M M −P N − = 12 × (1 + 5 × 9 + 50 × 7 + 525 × 2) = 17352. K αi i i K i=1

Thus the rate of Ω5 is 17352 38112 = 0.455290. Similarly in a candidate scheme Ω4 we have (α5 , α4 , α3 , α2 , α1 ) = (0, 1, 10, 105, 1100). The total download cost is then    X M M N = 12 × (1 × 5 + 10 × 10 + 105 × 10 + 1100 × 5) = 79860, αi K i K i=1

while the total size of desired atoms (each query containing at least one atom towards desired files will contribute 1 to this amount) is then    !  X M M M −P N − = 12 × (1 × 5 + 10 × 9 + 105 × 7 + 1100 × 2) = 36360. αi K i i K i=1 Thus the rate of Ω4 is

36360 79860

= 0.455297. So Ω4 is slightly better. VI. C ONCLUSION

In this paper we consider the multi-file PIR from MDS coded databases with colluding servers. We analyze the PIR capacity for degenerating cases, i.e., either K or T equals one. We then present two general multi-file PIR schemes depending on the P P . Particularly, when M ≥ 21 , the rate of our scheme meets the upper bound of the capacity for degenerating cases and ratio M is thus optimal. P < 12 , our scheme fulfills the potential of a scheme by Banawan and Ulukus in [3] dealing with the case K = T = 1. When M P It is a pity that our scheme for M < 21 seems to be a little tedious. The scheme will become much clearer if one can easily figure out what is the best candidate scheme, or one can have a neater way to check the properness of each candidate (i.e. a neater way to deal with the atoms serving as interference from desired files). R EFERENCES [1] D. Augot, F. Levy-dit-Vehel and A. Shikfa, A storage-efficient and robust Private Information Retrieval Scheme allowing few servers, in Proceedings of Cryptology and Network Security (CANS), pp. 222-239, 2014. [2] K. Banawan and S. Ulukus, The capacity of private information retrieval from coded databases, arXiv preprint arXiv:1609.08138, 2016. [3] K. Banawan and S. Ulukus, Multi-message private information retrieval: capacity results and near-optimal schemes, arXiv preprint arXiv:1702.01739, 2017. [4] A. Beimel, Y. Ishai, E. Kushilevitz and J.-F. Raymond, Breaking the O(n1/(2k−1) ) barrier for information-theoretic private information retrieval, in Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS), pp. 261-270, 2002. [5] S. Blackburn, T. Etzion and M. Paterson, PIR schemes with small download complexity and low storage requirements, arXiv preprint arXiv:1609.07027, 2016. [6] T. Chan, S. Ho and H. Yamamoto, Private information retrieval for coded storage, in Proceedings of IEEE International Symposium on Information Theory (ISIT), pp. 2842-2846, 2015. [7] B. Chor, O. Goldreich, E. Kushilevitz and M. Sudan, Private information retrieval, in Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS), pp. 41-50, 1995. [8] B. Chor, E. Kushilevitz, O. Goldreich and M. Sudan, Private information retrieval, Journal of the ACM, vol. 45, no. 6, pp. 965-981, 1998. [9] D. Demmler, A. Herzberg and T. Schneider, RAID-PIR: Practical multi-server PIR, in ACM Workshop on Cloud Computing Security, pp. 45-56, 2014. [10] Z. Dvir and S. Gopi, 2-server PIR with sub-polynomial communication, Journal of the ACM, vol. 63, no. 4, article 39, 2016. [11] K. Efremenko, 3-query locally decodable codes of subexponential length, SIAM Journal on Computing, vol. 41, no. 6, pp. 1694-1703, 2012. [12] R. Freij-Hollanti, O. Gnilke, C. Hollanti and D. Karpuk, Private information retrieval from coded databases with colluding servers, arXiv preprint arXiv:1611.02062, 2016. [13] J. Groth, A. Kiayias and H. Lipmaa, Multi-query computationally-private information retrieval with constant communication rate, in International Workshop on Public Key Cryptography, pp. 107-123, 2010. [14] R. Henry, Y. Huang and I. Goldberg, One (block) size fits all: PIR and SPIR with variable-length records via multi-block qeries, in Network and Distributed System Security Symposium (NDSS), 2013. [15] S. Kumar, E. Rosnes and A. G. i Amat, Private information retrieval in distributed storage systems using an arbitrary linear code, arXiv preprint arXiv:1612.07084, 2016. [16] N. B. Shah, K. V. Rashmi and K. Ramchandran, One extra bit of download ensures perfectly private information retrieval, in Proceedings of IEEE International Symposium on Information Theory (ISIT), pp. 856-860, 2014. [17] H. Sun and S. Jafar, Multiround Private Information Retrieval: Capacity and Storage Overhead, arXiv preprint arXiv:1611.02257, 2016. [18] H. Sun and S. Jafar, Private information retrieval from MDS coded data with colluding servers: settling a conjecture by Freij-Hollanti et al., arXiv preprint arXiv:1701.07807, 2017.

14

[19] H. Sun and S. Jafar, The capacity of private information retrieval, IEEE Transactions on Information Theory, to appear, 2017. DOI: 10.1109/TIT.2017.2689028. [20] H. Sun and S. Jafar, The capacity of robust private information retrieval with colluding databases, arXiv preprint arXiv:1605.00635, 2016. [21] H. Sun and S. Jafar, The capacity of symmetric private information retrieval, arXiv preprint arXiv: 1606.08828, 2016. [22] R. Tajeddine and S. E. Rouayheb, Private information retrieval from MDS coded data in distributed storage systems, in Proceedings of IEEE International Symposium on Information Theory (ISIT), pp. 1411-1415, 2016. [23] R. Tajeddine and S. E. Rouayheb, Private information retrieval from MDS coded data in distributed storage systems, arXiv preprint arXiv:1602.01458, 2016. [24] R. Tajeddine, O. Gnilke, D. Karpuk, R. Freij-Hollanti, C. Hollanti and S. E. Rouayheb, Private information retrieval schemes for coded data with arbitrary collusion patterns, arXiv preprint arXiv:1701.07636, 2017. [25] L. Wang, T. K. Kuppusamy, Y. Liu and J. Cappos, A fast multi-server, multi-block private information retrieval protocol, in IEEE Global Communications Conference, pp. 1-6, 2015. [26] Q. Wang and M. Skoglund, Symmetric private information retrieval for MDS coded distributed storage, arXiv:1610.04530, 2016. [27] S. Yekhanin, Private information retrieval, Communications of the ACM, vol. 53, no. 4, pp. 68-73, 2010. [28] S. Yekhanin, Towards 3-query locally decodable codes of subexponential length, Journal of the ACM, vol. 55, no. 1, article 1, 2008. [29] Y. Zhang and G. Ge, A general private information retrieval scheme for MDS coded databases with colluding servers, arXiv preprint arXiv:1704.06785, 2017.

Suggest Documents