Multiple User Tracing Codes B´alint Laczay
Mikl´os Ruszink´o
Department of Computer Science and Information Theory Budapest University of Technology and Economics H-1521 Budapest, P.O. Box 91. Email:
[email protected]
Computer and Automation Research Institute of the Hungarian Academy of Sciences H-1518 Budapest, P.O. Box 63. Email:
[email protected]
Abstract— For integers 2 ≤ k ≤ r, a set of binary codewords is a k-out-of-r multiple user tracing code, if from the bitwise “or” of at most r codewords one can determine at least k of them. Single user tracing codes (k = 1) were introduced by Csur¨ ˝ os and Ruszink´o [1] and the order of magnitude of their rate was determined to be 1r by Alon and Asodi [2]. Here we continue this line of investigations for the case of tracing more than one user and prove that for any fixed k, the magnitude of the rate is exactly r1 . We show further that ` for some ´ non-constant k = k(r), the rate can be better than Ω log r/r2 , which is an upper bound for the rate of the union free, or superimposed codes.
I. I NTRODUCTION A family of subsets of [n] = {1, 2, . . . , n} is called a union free (UF(r), or r superimposed) code, if all the unions of at most r members of the family are different. Clearly, this means that from the union of any at most r members, we can determine all the members that took part in that union. For given t and r, let N (t, r) be the minimal n for which a UF(r) code of size t exists. Roughly speaking, the code rate is the t limit of Nlog (t,r) as t → ∞. If a family of subsets has the property, that from the union of at most r members of the family we can identify at least one, but not necessarily all of them in the union, then we have a single user tracing (SUT(r)) code. If we can identify at least k members, then this is a multiple user tracing (MUTk (r)) code. In this paper, we investigate the code rate of MUT codes. This problem is motivated by communication theory, by search theory and even by DNA computing, but it also has a strong mathematical justification. The order of magnitude of the rate of UF codes remained in the last few decades undetermined. This is not the case for SUT codes. In fact, SUT(r) codes are MUT1 (r) codes, while UF(r) codes are MUTr (r) codes. We know that the order of magnitude of the rate of MUTk (r) codes changes when k increases from 1 to r. Investigating the growth of the rate of MUT codes, therefore, may lead to new results on UF and superimposed codes, too. In Section II we present our motivations from multiple access communication and DNA computing. In Section III we give exact definitions and recall some known results on UF and This work was supported by OTKA under Grants T038198 and T046234. This work was also sponsored by the Office of Naval Research International Field Office and the Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant number FA8655-05-1-3017. The U.S Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright notation thereon.
SUT codes. In Section IV, we introduce the exact definition of MUT codes, and calculate their code rate. Then in Section V we examine the parameter domain, where our results ensure the existence of better codes than the previously known ones. II. M OTIVATIONS Information transmission of many independent sources over a single shared channel is an extensively investigated area. There are many practically important multiple access channel models, e.g. the “or” channel [3], [4], [5], [1], [2], [6], the adder channel [7], and the collision channel [8]. Here we will consider the “or” channel. The “or” channel is a multiple-input single-output communication channel with binaryWinputs x1 , x2 , . . . , xt , binary t output y and the equation y = i=1 xi , or equivalently ( 0 if ∀i : xi = 0; y= 1 otherwise. This channel is a good model of binary OOK modulation, where users are sending a waveform for the bit “one” and are sending nothing for the bit “zero”. The receiver can determine whether there was at least one waveform sent in a given timeslice or none. Consider a signaling or alarming system with many (say t) independent users, where the users want to signal their activity or they want to send some alarm via the channel. (This is what we call signature coding). Let each user have a binary codeword vector of length n. Assume, that there are at most r active users. They send their codeword to the channel, meanwhile the inactive users send nothing. Consider, that the users send their codewords synchronized, this way the output of the multiple access “or” channel is also a binary vector of length n, namely the bitwise “or” of the codeword vectors of the active users. In the most simple case, the codewords have the property that the receiver from the output of the channel, can detect all of the active users (this is called unique decoding). One can see that if we consider the binary codewords of length n as characteristic vectors of subsets of [n], then these subsets form a UF(r) code. Sometimes we want to be able to identify only one active user, in this case we need SUT(r) codes. It is also interesting, to require the identification of two users, maybe three, etc. In these cases, we need MUTk (r) codes, which are the topic of this paper.
Another motivation is the group testing (also called search problem) [9], where we have a set of samples, and some of them (at most r) are special in some way. Our task is to find the special samples. Consider, that we have means of testing a group of samples at once, with the result showing whether there is at least one special sample in the set or none. In this case, we can find all the special samples with a significantly lower number of tests than testing each one. For this purpose, we form n groups of the samples, and do one test for each group. The testing is non-adaptive, which means that the groups are designed beforehand starting the tests, so the samples in the next group to be tested cannot depend on the results of the already executed tests. If we have to find all of the special samples, then the design of the groups has to match a UF(r) set system. With the emerge of DNA computing, and various other areas, designs not being able to identify all special samples, but at least one, two, etc. of them also became interesting. E.g., in DNA sequencing we sometimes have to find at least one clone which contains a given feature (see e.g. [10]). We do not have the necessary resources to test all of the clones one-by-one, but we do have efficient means of testing a pool of clones simultaneously to determine whether any of them contains the feature. In this case, using group designs corresponding to SUT or MUT codes can be very useful. III. U NION F REE
AND
S INGLE U SER T RACING C ODES
In the following, we use the terminology of the signature coding problem. A code C ⊆ {0, 1}n is a set of binary codeword vectors of length n. We refer to the number |C| of codewords as the size of the code, and to n as the length of the code. There are some active users, their codewords are collected into set F . The bitwise “or” operator for some F ⊆ C gives a vector of length n, where the ith element is defined by the following formula: _ [S (F )]i = [a]i ∀i = 1, . . . , n, a∈F
where [x]i denotes element i of the vector x. This is the output vector of the “or” channel if the codewords of the active users are those in F . Let F = 2C be the set of all subsets of the code, and introduce the notations F=r = {F ∈ F : |F | = r},
F≤r = {F ∈ F : |F | ≤ r}, F≥r = {F ∈ F : |F | ≥ r} and
Fk≤. k, i.e., Definition 3: A code C is MUTk (r), if there is a function f : {0, 1}n → F such that ∀F ∈ F≤r : f (S (F )) ⊆ F and |f (S (F ))| ≥ min(k, |F |). Our main result is that we have determined the rate of this code class to be also in the order of 1r for any fixed k. Moreover, if k = k(r) ≤ c log r (for some constant c to be determined in Section V) then MUTk (r) codes perform better than UF(r) codes. The upper bound is a simple consequence of Theorem 2 of Cs˝ur¨os and Ruszink´o [1], because any MUTk (r) code is also a SUT(r) code: Corollary 1: For any 2 ≤ k ≤ r, 2 RMUT (k, r) ≤ . r On the other hand, the following lower bound holds: Theorem 3: For any 2 ≤ k ≤ r, RMUT (k, r) ≥
1 . 5k(8e)k r
So from Corollary 1 and Theorem 3, we know that for fixed k, the rate is in the order of 1r : Corollary 2: For any fixed 2 ≤ k, 1 1 Ω ≤ RMUT (k, r) ≤ RMUT (k, r) ≤ O . r r
Theorem 3 will be proved in two main steps. First we introduce weakly multiple user tracing codes for k out of r users (WUTk (r) codes), i.e. codes, which identify at least k users out of the m active ones if k ≤ m ≤ r. In case of m < k, they do not necessarily identify any of them. Then, by tailoring the proof of Alon and Asodi [2] to our needs, we prove a required lower bound on WUT codes. The main invention in that part is that we put together a MUT code from several segments with different bit-probabilities. Then finally, we concatenate this WUTk (r) code with a UF(k) code to yield an MUTk (r) code. A code C is weakly multiple user tracing for k out of r users (WUTk (r)) if it is possible to determine at least k users out of the at least k and at most r active ones, i.e., Definition 4: A code C is WUTk (r) if there is a decoding function f : {0, 1}n → F≥k such that ∀F ∈ Fk≤.≤r : f (S (F )) ⊆ F. Lemma 1: The following two conditions form a sufficient condition for a code being WUTk (r):
•
There is no set F of at most r codewords for which there is a disjoint set G of exactly r codewords for which S (F ) = S (F ∪ G) (we say F covers G): ∀F ∈ F≤r ∀G ∈ F=r : F ∩ G = ∅ → S (F ) 6= S (F ∪ G) ; (1)
•
From any set F of codewords with k ≤ |F | < 2r, we can always select k codewords such that for each of them there is a position where only this codeword has a “one” bit amongst all the codewords in F : ∀F ∈ Fk≤.