22
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 1, JANUARY 2003
Hardness of Approximating the Minimum Distance of a Linear Code Ilya Dumer, Member, IEEE, Daniele Micciancio, Member, IEEE, and Madhu Sudan, Member, IEEE
Abstract—We show that the minimum distance of a linear code is not approximable to within any constant factor in random polynomial time (RP), unless nondeterministic polynomial time (NP) equals RP. We also show that the minimum distance is not approximable to within an additive error that is linear in the block length of the code. Under the stronger assumption that NP is not contained in random quasi-polynomial time (RQP), we show that the minimum distance is not approximable to within the 0. Our results hold for codes over factor 2log ( ) , for any any finite field, including binary codes. In the process, we show that it is hard to find approximately nearest codewords even if the number of errors exceeds the unique decoding radius 2 by only an arbitrarily small fraction . We also prove the hardness of the nearest codeword problem for asymptotically good codes, provided the number of errors exceeds (2 3) . Our results for the minimum distance problem strengthen (though using stronger assumptions) a previous result of Vardy who showed that the minimum distance cannot be computed exactly in deterministic polynomial time (P), unless P = NP. Our results are obtained by adapting proofs of analogous results for integer lattices due to Ajtai and Micciancio. A critical component in the adaptation is our use of linear codes that perform better than random (linear) codes. Index Terms—Approximation algorithms, bounded distance decoding, computational complexity, dense codes, linear codes, minimum-distance problem, NP-hardness, relatively near codeword problem.
I. INTRODUCTION N this paper, we study the computational complexity of two central problems from coding theory: 1) the complexity of approximating the minimum distance of a linear code and 2) the complexity of error correction in codes of relatively large minimum distance. An error-correcting code of block length over a -ary alphabet is a collection of strings (vectors) from
I
Manuscript received November 24, 2000; revised September 20, 2002. The work of I. Dumer was supported in part by the National Science Foundation under Grants NCR-9703844 and CCR-0097125. The work of D. Micciancio was supported in part by the National Science Foundation CAREER Award CCR-0093029. The work of M. Sudan was supported in part by a Sloan Foundation Fellowship, an MIT-NEC Research Initiation Grant, and the National Science Foundation CAREER Award CCR-9875511. The material in this paper was presented in part at the 40th IEEE Annual Symposium on Foundations of Computer Science (FOCS), New York, October 1999, and the IEEE International Symposium on Information Theory (ISIT), Sorrento, Italy, June 2000. I. Dumer is with the College of Engineering, University of California at Riverside, Riverside, CA 92521 USA (e-mail:
[email protected]). D. Micciancio is with the Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093 USA (e-mail:
[email protected]). M. Sudan is with the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail:
[email protected]). Communicated by N. I. Koblitz, Associate Editor for Complexity and Cryptography. Digital Object Identifier 10.1109/TIT.2002.806118
, called codewords. For all codes considered in this paper, the alphabet size is always a prime power and the alphabet is the finite field with element. A code is linear if it is closed under addition and multiplication by a scalar, i.e., is over base field . For such a code, the a linear subspace of of informainformation content (i.e., the number tion symbols that can be encoded with a codeword1 ) is just its dimension as a vector space and the code can be compactly repgenerator matrix of rank such resented by a . An important property of a code is its that , the Hamming minimum distance. For any vectors of nonzero coordinates of . weight of is the number is a norm, and the induced metric The weight function is called the Hamming distance. The of the code is the minimum Ham(minimum) distance taken over all pairs of distinct codewords ming distance . For linear codes, it is easy to see that the minimum equals the weight of the lightest nonzero distance . If is a linear code over with codeword block length , rank , and minimum distance , then it is cuscode. Throughout the tomary to say that is a linear paper, we use the following notational conventions: matrices are denoted by boldface uppercase Roman letters (e.g., , ), the associated linear codes are denoted by the corresponding callito mean graphic letters (e.g., , ), and we write with block length , information that is a linear code over content , and minimum distance . A. The Minimum-Distance Problem Three of the four central parameters associated with a linear code, namely, , , and , are evident from its matrix representation. The minimum distance problem (MINDIST) is that of evaluating the fourth—namely—given a generator matrix find the minimum distance of the corresponding code . The minimum distance of a code is obviously related to its error corand, therefore, finding is a funrection capability damental computational problem in coding theory. The problem gains even more significance in light of the fact that long -ary codes chosen at random give the best parameters known for any (in particular, for ).2 A polynomial time algorithm to compute the distance would be the ideal solution to the problem, as it could be used to construct good error-correcting codes by choosing a generator matrix at random and checking 1Throughout the paper, we write log for the logarithm to the base 2, and log when the base q is any number possibly different from 2. 2For square prime powers q 49, linear algebraic-geometry (AG) codes can perform better than random ones [21] and are constructible in polynomial time. For all other q 46, it is still possible to do better than random codes, however, the best known procedures to construct them run in exponential time [24].
0018-9448/03$17.00 © 2003 IEEE
DUMER et al.: HARDNESS OF APPROXIMATING THE MINIMUM DISTANCE OF A LINEAR CODE
if the associated code has a large minimum distance. Unfortunately, no such algorithm is known. The complexity of this problem (can it be solved in polynomial time or not?) was first explicitly questioned by Berlekamp, McEliece, and van Tilborg [5] in 1978, who conjectured it to be nondeterministic polynomial time (NP)-complete. This conjecture was finally resolved in the affirmative by Vardy [23], [22] in 1997, proving that the minimum distance cannot be computed in polynomial time unless P NP. ([23], [22] also give further motivations and detailed account of prior work on this problem.) To advance the search of good codes, one can relax the requirement of computing exactly in two ways. • Instead of requiring the exact value of , one can allow for approximate solutions, i.e., an estimate that is guaranteed to be at least as big, but not much bigger than the for true minimum distance . (For example, some approximation factor ). • Instead of insisting on deterministic solutions that always produce correct (approximate) answers, one can consider only holds in a randomized algorithms such that with probability at least probabilistic sense. (Say, .) 3
23
error-correction problem for linear codes. The simplest formulation is the nearest codeword problem (NCP) (also known as the “maximum-likelihood decoding problem”). Here, the input and a received instance consists of a generator matrix and the goal is to find the nearest codeword word to . A more relaxed version is to estimate the minimum “error that is the distance to the nearest codeweight” word, without necessarily finding codeword . The NCP is a well-studied problem: Berlekamp, McEliece, and van Tilborg [5] showed that it is NP-hard (even in its weight estimation version); and more recently, Arora, Babai, Stern, and Sweedyk [3] showed that the error weight is hard to approximate to within any constant factor unless P NP, and within factor for any , unless NP QP (deterministic quasi-polynomial time). This latter result has been recently improved to inapproximability within
Such algorithms can still be used to randomly generate relatively good codes as follows. Say we want a code with minimum distance . We pick a generator matrix at random such that the code is expected to have minimum distance . Then, we run the probabilistic distance approximation algorithm many times using independent coin tosses. If all estimates returned by the distance approximation algorithm are at least , then the minimum distance of the code is at least with very high probability. In this paper, we study these more relaxed versions of the minimum distance problem and show that the minimum distance is hard to approximate (even in a probabilistic sense) to within any constant factor, unless NP RP (random polynomial time) (i.e., every problem in NP has a polynomial time probabilistic algorithm that always rejects NO instances and accepts YES instances with high probability). Under the stronger assumption that NP does not have random quasi-polynomial4 time algorithms (RQP), we prove that the minimum distance of a code of block length is not approximable to within a for any constant . (This is a natufactor of rally occurring factor in the study of the approximability of optimization problems—see the survey of Arora and Lund [4].) Our methods adapt the proof of the inapproximability of the shortest lattice vector problem (SVP) due to Micciancio [18] (see also [19]) which, in turn, is based on Ajtai’s proof of the hardness of solving SVP [2].
under the assumption that P NP by Dinur, Kindler, Raz, and Safra [9], [8]). On the positive side, NCP can be trivially approximated within a factor . General, nontrivial approximation algorithms have been recently discovered by Berman and Karpinski [6], who showed that (for any finite field ) NCP can for any fixed be approximated within a factor in probabilistic polynomial time, and in deterministic polynomial time. However, the NCP only provides a first cut at understanding the error-correction problem. It shows that the error-correction problem is hard, if we try to decode every linear code regardless of the error weight. In contrast, the positive results from coding theory show how to perform error correction in specific linear codes corrupted by errors of small weight (relative to the code distance). Thus, the hardness of the NCP may come from one of two factors: 1) the problem attempts to decode every linear code and 2) the problem attempts to recover from too many errors. Both issues have been raised in the literature [23], [22], but only the former has seen some progress [7], [25], [10], [20]. One problem that has been defined to study the latter phenomenon is the “bounded distance decoding problem” (BDD, see [23], [22]). This is a special case of the NCP, where the error weight is guaranteed (or “promised”) to be less than . This case is motivated by the fact that within such a distance, there may be at most one codeword and hence decoding is clearly unambiguous. Also, this is the case where many of the classical error-correction algorithms (for say Bose–Chaudhuri–Hocquenghem (BCH) codes, Reed–Solomon (RS) codes, algebraic-geometry (AG) codes, etc.) work in polynomial time.
B. The Error-Correction Problem
C. Relatively Near Codeword Problem (RNC)
In the process of obtaining the inapproximability result for the minimum-distance problem, we also shed light on the general
To compare the general NCP, and the more specific BDD problem, we introduce a parameterized family of problems that we call the relatively near codeword problem (RNC). For real , RNC is the following problem.
3One can also consider randomized algorithms with two-sided error, where also the lower bound d d holds only in a probabilistic sense. All our results can be easily adapted to algorithms with two-sided error. 4f (n) is quasi-polynomial in n if it grows slower than 2 for some constant c.
of a linear code Given a generator matrix of (not necessarily known) minimum distance , an integer with the promise that , and a received word
24
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 1, JANUARY 2003
, find a codeword within distance from . (The algorithm may fail if the promise is violated, or if no such codeword exists. In other words, in contrast to the papers [3] and [5], the algorithm is expected to work only when the error weight is limited in proportion to the code distance.) Both the NCP and the BDD are special cases of RNC : NCP RNC while BDD RNC . Until recently, not much , leave alone was known about RNC for constants (i.e., the BDD problem). No finite upper bound on can be easily derived from Arora et al.’s NP-hardness proof for NCP [3]. (In other words, their proof does not seem to hold for .) It turns out, as observed by Jain et al. RNC for any [14], that Vardy’s proof of the NP-hardness of the minimumdistance problem also shows the NP-hardness of RNC for (and actually extends to some ). In this paper, we significantly improve upon this situation, by showing NP-hardness (under randomized reductions) of bringing us much closer to an eventual RNC for every (negative?) resolution of the BDD problem. D. Organization The rest of the paper is organized as follows. In Section II, we precisely define the coding problems studied in this paper, introduce some notation, and briefly overview the random reduction techniques and coding theory notions used in the rest of the paper. As explained in Section II, our proofs rely on coding theoretic constructions that might be of independent interest, namely, the construction of codes containing unusually dense clusters. These constructions constitute the main technical contribution of the paper, and are presented in Section III (for arbitrary linear codes) and Section VI (for asymptotically good codes). The hardness of the relatively near codeword problem and the minimum-distance problem are proved in Sections IV and V, respectively. Similar hardness results for asymptotically good codes are presented in Section VII. The reader mostly interested in computational complexity may want to skip Section III at first reading, and jump directly to the NP-hardness results in Sections IV and V. Section VIII concludes with a discussion of the consequences and limitations of the proofs given in this paper, and related open problems.
II. BACKGROUND A. Approximation Problems In order to study the computational complexity of coding problems, we formulate them in terms of promise problems. A promise problem is a generalization of the familiar notion of decision problem. The difference is that in a promise problem not every string is required to be either a YES or a NO instance. Given a string with the promise that it is either a YES or NO instance, one has to decide which of the two sets it belongs to. The following promise problem captures the hardness of approximating the minimum-distance problem within a factor .
Definition 1—Minimum Distance Problem: For prime power and approximation factor , an instance of GAPDIST is a pair , where and , such that is a YES instance if . • is a NO instance if . • and an integer with the In other words, given a code or , one must promise that either decide which of the two cases holds true. The relation between and the above approximating the minimum distance of promise problem is easily explained. On the one hand, if one to the can compute a -approximation minimum distance of the code, then one can easily solve the or promise problem above by checking whether . On the other hand, assume one has a decision oracle that solves the promise problem above. Then, the minimum distance of a given code can be easily approximated using always returns YES the oracle as follows. Notice that while always returns NO. Using binary search, one YES can efficiently find a number such that and NO.5 This means that is not a NO instance and is not a YES instance, and the minimum must lie in the interval . distance Similarly, we can define the following promise problem to capture the hardness of approximating RNC within a factor . Definition 2—Relatively Near Codeword Problem: For and , an instance of prime power and factors is a triple , where , , GAPRNC , such that and6 and • •
is a YES instance if is a NO instance if
. .
It is immediate that the problem RNC gets harder as increases, since changing has the only effect of weakening the . RNC is the hardest when in promise is vacuously true and we which case the promise obtain the familiar (promise version of) the nearest codeword problem. Definition 3—Nearest Codeword Problem: For prime power and , an instance of GAPNCP is a triple , , , and , such that is a YES instance if . • is a NO instance if . • is NP-hard for every conThe promise problem GAPNCP (cf. [3]7 ), and this result is critical to our hardstant ness result(s). We define one last promise problem to study the 5By definition, the oracle can give any answer if the input is neither a YES instance nor a NO one. So, it would be wrong to conclude that (A ; d 1) is a NO instance and (A; d) is a YES one. 6Strictly speaking, the condition t < d( ) is a promise and hence should be added as a condition in both the YES and NO instances of the problem. 7To be precise, Arora et al. [3] present the result only for binary codes. However, their proof is valid for any alphabet. An alternate way to obtain the result for any prime power is to use a recent result of Håstad [13] who states his result in linear-algebra (rather than coding-theoretic) terms. We will state and use some of the additional features of the latter result in Section VII.
0
1 C
DUMER et al.: HARDNESS OF APPROXIMATING THE MINIMUM DISTANCE OF A LINEAR CODE
hardness of approximating the minimum distance of a code with linear additive error. and prime power , let Definition 4: For be the promise problem with instances GAPDISTADD , where and , such that • •
is a YES instance if is a NO instance if
. .
B. Random Reductions and Techniques The main result of this paper (see Theorem 22) is that approximation problem GAPDIST is NP-hard for any constant factor under polynomial reverse unfaithful random reductions (RUR-reductions, [15]), and for
it is hard under quasi-polynomial RUR-reductions. These are probabilistic reductions that map NO instances always to NO instances and YES instances to YES instances with high probability. In particular, given a security parameter , all reductions presented in this paper warrant that YES instances are properly in time.8 The exmapped with probability istence of a (random) polynomial time algorithm to solve the hard problem would imply NP RP, i.e., every problem in NP would have a probabilistic polynomial time algorithm that always rejects NO instances and accepts YES instances with high probability.9 Similarly, hardness for NP under quasi-polynomial RUR-reductions implies that the hard problem cannot be solved RQP. Therefore, similarly to a proper in RQP unless NP NP-hardness result (obtained under deterministic polynomial reductions), hardness under polynomial RUR-reductions also gives evidence of the intractability of a problem. In order to prove these results, we first study the problem GAPRNC . We show that the error weight is hard to approxiunless mate to within any constant factor and for any , we immediately NP RP (see Theorem 16). By using to the minreduce our error-correction problem GAPRNC for any constant . imum-distance problem GAPDIST We then use product constructions to “amplify” the constant and prove the claimed hardness results for the minimum-distance problem. for is obtained by The hardness of GAPRNC adapting a technique of Micciancio [18], which is, in turn, based on the work of Ajtai [2] (henceforth Ajtai–Micciancio). They consider the analogous problem over the integers (rather 8Here, and in the rest of the paper, we use notation poly (n) to denote any polynomially bounded function of n, i.e., any function f (n) such that f (n) = O(n ) for some constant c indepentent of n. 9Notice that we are unable to prove that the existence of a polynomial time approximation algorithm implies the stronger containment NP ZPP (ZPP is the class of decision problems L such that both L and its complement are in RP, i.e., ZPP = RP coRP), as done, for example, in [12]. The difference is that [12] proves hardness under unfaithful random reductions (UR-reductions [15], i.e., reductions that are always correct on YES instances and often correct on NO instances). Hardness under UR-reductions implies that no polynomial time algorithm exists unless NP coRP, and, therefore, RP NP coRP. It immediately follows that coRP RP and NP = RP = coRP = ZPP. However, in our case where RUR-reductions are used, we can only conclude that RP NP RP, and therefore NP = RP, but it is not clear how to establish any relation involving the complementary class coRP and ZPP.
\
25
than finite fields) with Hamming distance replaced by Euclidean distance. Much of the adaptation is straightforward; in fact, some of the proofs are even easier in our case due to the use of finite fields. The main hurdle turns out to be in adapting the following combinatorial problem considered and solved by Ajtai– Micciancio. time, integers Given an integer construct, in an -dimensional lattice (i.e., a subset of closed under addition and multiplication by an integer) with minand a vector imum (Euclidean) distance such that the (Euclidean) ball of radius around contains is a constant indeat least vectors from (where pendent of ). In our case, we are faced with a similar problem with reand Euclidean distance replaced by Hamming displaced by tance. The Ajtai–Micciancio solution to the above problem involves number-theoretic methods and does not translate to our setting. Instead, we show that if we consider a linear code whose performance (i.e., tradeoff between rate and distance) is better than that of a random code, and pick a random light vector in , then the resulting construction has the required properties. We first solve this problem over sufficiently large alphabets using high-rate RS codes. (See next subsection for the definition of RS codes. This same construction has been used in the coding theory literature to demonstrate limitations to the “list-decodability” of RS codes [16].) We then translate the result to small alphabets using the well-known method of concatenating codes [11]. In the second part of the paper, we extend our methods to address asymptotically good codes. We show that even for such RP (see Theorem 31), codes, the RNC is hard unless NP though for these codes we are only able to prove the hardness of for . Finally, we translate this to a result GAPRNC (see Theorem 32) showing that the minimum distance of a code is hard to approximate to within an additive error that is linear in the block length of the code. C. Coding Theory For a through introduction to coding theory the reader is referred to [17]. Here we briefly review some basic results as used in the rest of the paper. Readers with an adequate background in coding theory can safely skip to the next section. A classical problem in coding theory is to determine for given alphabet , block length , and dimension , what is the highest possible minimum distance such that there exists linear code. The following two theorems give an well-known upper and lower bounds for . Theorem 5—Gilbert–Varshamov (GV) Bound: For finite , there exists a field , block length , and dimension with minimum distance satisfying linear code (1) where
26
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 1, JANUARY 2003
is the volume of the -dimensional Hamming sphere of radius . For any , the smallest such that (1) holds true is . (Notice that the definition of depends both on denoted and the block length . For brevity, the information content we omit from the notation, as the block length is usually clear from the context.) See [17] for a proof. The lower bound on given by the Gilbert–Varshamov theorem (GV bound, for short) is not effective, i.e., even if the theorem guarantees the existence of linear codes with a certain minimum distance, the proof of the theorem employs a greedy algorithm that runs in exponential time, and we do not know any efficient way to find such codes for small al. Interestingly, random linear codes meet the GV phabets bound, i.e., if the generating matrix of the code is chosen uniformly at random, then the minimum distance of the resulting code satisfies the GV bound with high probability. However, given a randomly chosen generator matrix, it is not clear how to efficiently check whether the corresponding code meets the GV bound or not. We now give an upper bound on . Theorem 6—Singleton Bound: For every linear code , the minimum distance is at most
See [17] for a proof. A code is called maximum distance separable (MDS) if the Singleton bound is satisfied . RS codes are an important with equality, i.e., if example of MDS codes. For any finite field , and dimension , the RS code can be defined -dimensional vectors obtained evaluating all as the set of polynomials of degree less than at all nonzero points . Exare defined similarly, but tended RS codes , including . evaluating the polynomials at all points can have at Since a nonzero polynomial of degree zeros, then every nonzero (extended) RS codemost zero positions, so its Hamming weight word has at most (resp., ). This proves that (extended) is at least RS codes are MDS. Extended RS codes can be further extended considering the homogeneous polynomials
In Section VI, we will also use AG codes. The definition of these codes is beyond the scope of the present paper, and the interested reader is referred to [21]. An important operation used to combine codes is the concatenating code construction of [11]. Let and be two linear and , with generating matrices codes over alphabets and . is called the outer code, and is called the inner code. Notice that the dimension of the inner code equals the dimension of the alphabet of the outer , viewed as a vector space over base field . The code of outer codeword idea is to replace every component with a corresponding inner codeword . More precisely, let be a basis as a vector space over and let be for for all the (unique) linear function such that . The concatenation function is given by
Function
is extended to sets of vectors in the usual way , and concatenated code is just to set . the result of applying the concatenation function is an code and is an It is easy to see that if code, then the concatenation is an code with generator matrix given by
for all
and
.
III. DENSE CODES In this section, we present some general results about codes with a special density property (to be defined), and their algorithmic construction. The section culminates with the proof of Lemma 15 which shows how to efficiently construct a gadget that will be used in Section IV in our NP-hardness proofs. Lemma 15 is, in fact, the only result from this section directly used in the rest of the paper (with the exception of Section VI, which extends the results of this section to asymptotically good codes). The reader mostly interested in computational complexity issues may want to skip this section at first reading, and refer to Lemma 15 when used in the proofs. A. General Overview
and evaluating them at all points of the projective line . This increases both the block length and the minimum distance by one, giving twice extended . RS codes Another family of codes we are going to use are the . In these codes, Hadamard codes . The codeword each codeword corresponds to a vector associated to is obtained by evaluating all nonzero linear at . Since there are linear functions functions in variables (including the zero function), the code has block . Notice that any nonzero vector is length fraction of the linear functions mapped to by exactly a (including the identically zero function). So the . minimum distance of the code is
be the ball of radius Let centered in .10 In this section, we wish to find codes that include multiple codewords in some ball(s) of relatively small radius , where is some positive real number. Obviously, the problem is meaningful , as any ball of radius cannot contain only for more than a single codeword. Below we prove that for any it is actually possible to build such a code. These codes are used in the sequel to prove the hardness of GAPRNC by reduction from the nearest codeword problem. Of particular (i.e., ), as the corresponding interest is the case 10All proofs in this paper are still valid if B (v; r ) is defined as the set of words at distance exactly r from v . This property is not used in this paper, but it is needed, for example, in [20].
DUMER et al.: HARDNESS OF APPROXIMATING THE MINIMUM DISTANCE OF A LINEAR CODE
hardness result for GAPRNC can be translated into an inapproximability result for the minimum distance problem. is -dense around We say that a code if the ball contains at least codewords. We -dense if it is -dense around for say that is some . We want to determine for what values of the parameters there exist -dense codes. The technique we codes use is probabilistic: we show that there exist such that the expected number of codewords in a randomly is at least . It easily follows, by a chosen sphere simple averaging argument, that there exists a center such that -dense around . Let the code is
27
• AG codes, meeting the Tsfasman–Vl˘adut¸–Zink (TVZ) bound, whose distances exceed for most code rates bounded away from and for any . These codes are used to prove hardness results for asymptotically good codes, and approximating the minimum distance up to an additive error. Remark 7: The relation between dense codes and codes that exceed the GV bound can be given an exact, quantitative characterization. In particular, one can prove that for any positive integer and real , the expected number of codewords from in a random sphere of radius is at least if and only if . B. Dense Code Families From MDS Codes
be the expected number of codewords in center is chosen uniformly at random from
when the . Notice that
(2) This shows that the average number of codewords in a randomly chosen ball does not depend on the specific code but only on its information content (and the block length , which is usually implicit and clear from the context). So, we can simply instead of . write In the rest of this subsection, we show that dense codes are closely related to codes that outperform the GV bound. This connection is not explicitly used in the rest of the paper, and it is presented here only for the purpose of illustrating the intuition behind the choice of codes used in the sequel. Using , the definition of the minimum distance (for codes function with information content ) guaranteed by the GV bound (see Theorem 5) can be rewritten as In particular, for any code , the expected number of is bigger than , and codewords in a random sphere of radius there must exist spheres with multiple codewords. If the distance of the code exceeds the GV bound for codes with the same ), is -dense for some information content (i.e., and . Several codes are known for which the minimum distance exceeds the GV bound. • RS codes or, more generally, MDS codes, whose distances exceed for all code rates. RS codes of rate approaching will be of particular interest. This is due to the grows with code rate of RS fact that the ratio codes and tends to for high rates. We implicitly use this . fact in the sequel to build -dense codes for any for very • Binary BCH codes, whose distances exceed high code rates approaching . These codes are not used in this paper, but they are an important family of codes that beat the GV bound.
-dense codes with We are interested in sequences of fixed and arbitrarily large . Moreover, parameter should be polynomially related to the block length , i.e., for some . (Notice that is a necessary condition because code contains at most codewords and we distinct codewords in a ball.) This relation is essential want to allow the construction of dense codes in time polynomial in . Sequences of dense codes are formally defined as follows. is called Definition 8: A sequence of codes , is a -dense a -dense code family if, for every such that the ball code, i.e., there exists a center contains codewords. Moreover, we say that is a polynomial code family if the block length is polynomially bounded, i.e., for some constant independent of . In this subsection, we use twice-extended RS codes to build -dense code families for any . Lemma 9: For any satisfying
, and sequence of prime powers (3)
there exists a polynomial -dense code family with . be as Proof: Fix the value of and and let and . Notice that, specified in the lemma. Define the radius from the lower bound on the alphabet size, we get (4) , we have In particular, since is an integer, it must be
and
. Since
Consider an MDS code with and meeting the Singleton bound (e.g., twice-extended RS codes, see Section II). From the definition of , we immediately get (5) , i.e., the expected number of We will prove that . codewords in a randomly chosen ball of radius is at least It immediately follows from (4) and (5) that
28
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 1, JANUARY 2003
i.e., the code is -dense on the average. Since MDS codes meet the Singleton bound with equality, we have
to obtain a family of dense codes over fixed alphabet . This procedure can be applied to any dense code as described in the following lemma.
Therefore, the expected number of codewords in a random sphere satisfies (6) We want to bound this quantity. Notice that for all and we have
be an arbitrary code, and let be the equidistant Hadamard code of is -dense (around some center ), then the size . If is -dense (around ). concatenated code be a code and let Proof: Let be the equidistant Hadamard code of size . Define code as the concatenation of and . (See Section has II for details.) The resulting concatenated code parameters
(7) for all and where we have used that A slightly stronger inequality is obtained as follows:
Lemma 10: Let
. Now assume is -dense around some center . Notice is injective and that the concatenation function satisfies
(8) Moreover, the reader can easily verify that for all (9) We use (8) and (9) to bound the volume of the sphere as follows:
Therefore, the ball is mapped into ball and the number of -codewords contained in -codewords contained in the number of dense for Therefore, is
equals . .
By increasing the degree of the extension field , we . Combining obtain an infinite sequence of -ary codes Lemmas 9 and 10 we get the following proposition. and prime power , there Proposition 11: For any over a exists a polynomial -dense family of codes . fixed alphabet be an arbitrarily small Proof: As in Lemma 9, let . For every , define constant and let . Notice that satisfies
where in the last inequality we have used Combining the bound we get
and
so we can invoke Lemma 9 with alphabet size -dense code . Let obtain a with Hadamard code catenation of
and be the con-
with inequality (6)
This proves that our MDS codes are -dense. Moreover, if , then the block length is also polynois a polynomial -dense family of codes. mial in , and C. Codes Over a Fixed Alphabet for Lemma 9 gives a -dense code family with . In the sequel, we any fixed wish to find a -dense family of codes over some fixed alphabet . In order to keep the alphabet size fixed and still get arbiand use the MDS trarily large , we take the extension field codes from Lemma 9 with alphabet size . These codes are concatenated with equidistant Hadamard codes
By Lemma 10, is of and are -dense family.
is
-dense. Moreover, the block length , which is polynomial in , because both . This proves that is a polynomial
D. Polynomial Construction We proved that -dense families of codes exist for any . In this subsection, we address two issues related to the algorithmic construction of such codes. • Can -dense codes be constructed in polynomial time? That is, is there an algorithm that on input outputs (in -dense? time polynomial in ) a linear code which is -dense code, i.e., a code such that some • Given a contains at least codewords, can we effiball ciently find the center of a dense sphere?
DUMER et al.: HARDNESS OF APPROXIMATING THE MINIMUM DISTANCE OF A LINEAR CODE
The first question is easily answered: all constructions described in the preceding subsection are polynomial in , so the answer to the first question is “yes.” The second question is not as simple and to-date we do not know any deterministic procedure that efficiently produces dense codes together with a point around which the code is dense. However, we will see that, provided the code is dense “on the average,” the center of the sphere can be efficiently found at least in a probabilistic sense. We prove the following algorithmic variant of Proposition 11. Proposition 12: For every prime power and real constant , there exists a probabilistic algorithm that on input two integers and , outputs (in time polynomial in and ) three , a generator matrix , and a center integers (of weight )11 such that ; 1) , the ball contains 2) with probability at least or more codewords. Before proving the proposition, we make a few observations. The codes described in Proposition 11 are the concatenation of twice-extended RS codes with Hadamard codes, and therefore they can be explicitly constructed in polynomial time. While the codes are described explicitly, the proof that the code is dense is nonconstructive: in Lemma 9, we proved that the average number of codewords in a randomly chosen sphere is large, so spheres containing a large number of codewords certainly exist, but it is not clear how to find a dense center. At a first glance, used in Lemma one can try to reduce the entire set of centers can be replaced by any 9. In particular, it can be proved that MDS code that includes our original MDS code . However, even with two centers left, we still need to count codewords in the two balls to find which is dense indeed. To-date, explicit procedures for finding a unique ball are yet unknown. Therefore, in what follows we use a probabilistic approach and show how to find a dense center with high probability. First, let the center be chosen uniformly at random from the entire space . Given the expected number of codewords , Markov’s inequality yields
showing that spheres containing at most codewords can be . It turns out that a similar found with high probability lower bound
can be proved if we restrict our choice of to a uniformly only. This is an instance of a quite random element of .12 In general lemma that holds for any pair of groups and Lemma 13, we use multiplicative notation for groups . However, to avoid any possible confusion, we clarify in advance that in our application and will be groups and of codewords with respect to vector sum operation. 11See
footnote 10. 12In fact, it is not necessary to have a group structure on the sets, and the lemma can be formulated in even more general settings, but working with groups makes the presentation simpler.
29
Lemma 13: Let be a group, a subgroup, and an arbitrary subset of . Given , consider the subset and let be the average size of as the element runs through . Choose uniformly at random. Then for any Proof: Divide the elements of where and are equivalent if
into equivalence classes . Then choose
uniformly at random. If is chosen, then has the same size as the equivalence class of . (Notice that the equiv.) Since the number alence class of is , the number of elof equivalence classes is (at most) or less is ements that belong to equivalence classes of size (i.e., the maximum number of classes bounded by times the maximum size of each class), and the probability that . The following such a class is selected is at most and, therefore, simple calculation shows that the probability to select an element such that is at most . The value of is
We are now ready to prove Proposition 12. and let be the input Proof: Fix some and -dense code from to the algorithm. We consider the . This code is the concatenation Proposition 11, where from Lemma 9, of a twice extended RS code with and a Hadamard code block-length polynomial in . Therefore, a generator matrix for can be constructed in time polynomial in . At this point, we instantiate the Lemma 13 with groups , , and , where . , the set is just the ball Notice that for any center of radius centered in . From the proof of Lemma 9, (i.e., the expected number of codethe average size of words in a random ball when the center is chosen uniformly at ) is at least . Following Lemma 13, we random from uniformly at random from choose . By Lemma 13 (with ), we get that con. Finally, by tains at least codewords with probability in Lemma 10, we get that the corresponding ball (with radius and center ) contains at least codewords from . The output of the algorithm is given , the block length , and inby a generating matrix for code of this matrix, radius , and vector . formation content E. Mapping Dense Balls Onto Full Spaces In the next section, we will use the codewords inside the to represent the solutions to the nearest codeword ball problem. In order to be able to represent any possible solution,
30
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 1, JANUARY 2003
we need first to project the codewords in to the set of some shorter length. This is accomof all strings over plished in the next lemma by another probabilistic argument. and a vector , let Given a matrix denote the linear transformation from to . Further, let . of size Lemma 14: Let be any fixed subset of . If matrix is chosen uniformly at random, then we have . with probability at least uniformly at random. We want Proof: Choose . Choose to prove that with very high probability at random and define a new function a vector . Clearly, if and only if . (indexed by vector Notice that the random variables , and defined by the random choice of and ) are pairwise independent and uniformly distributed. Therefore, for any , with probability . Let vector be the number of such that . By linearity of we have expectation and pairwise independence of the and Applying Chebychev’s inequality we get
Therefore, for any , the probability that for is at most . By union bound, the probability every such that is at most that there exists some , i.e., with probability at least , and, . therefore, We combine Proposition 12 and Lemma 14 to build the gadget needed in the NP-hardness proofs in the following section. and finite field there exists a Lemma 15: For any outprobabilistic polynomial time algorithm that on input , matrices puts, in time polynomial in and , integers , and , and a vector (of weight ) such that ; 1) , , 2) with probability at least there exists a such that i.e., for every and . Proof: Run the algorithm of Proposition 12 on input and to obtain integers a matrix such that , and a center such that contains at least codewords with probability . Let be the set of all codewords in , and uniformly at random. By Lemma 14, the choose , that conditional probability, given
is at least is at most
. Therefore, the probability that . IV. HARDNESS OF THE RELATIVELY NEAR CODEWORD PROBLEM
In this section, we prove that the relatively near codeword problem is hard to approximate within any constant factor for . The proof uses the gadget from Lemma 15. all , , and any finite field Theorem 16: For any GAPRNC is hard for NP under polynomial RUR-reductions. Moreover, the error probability can be made exponentially small in a security parameter while maintaining the reduction polynomial in . Proof: Fix some finite field . Let be a real such that be an arbitrarily small positive Lemma 15 holds true, let such that GAPNCP is NP-hard. We prove that real, and is hard for and . GAPRNC Since can be arbitrarily small, Lemma 15 holds for any , and GAPNCP is NP-hard for any , this proves the for any and . hardness of GAPRNC The proof is by reduction from GAPNCP . Let be an instance of GAPNCP with . We want to of GAPRNC such that if define an instance is a YES instance of GAPNCP , then is with high probability, while if a YES instance of GAPRNC is a NO instance of GAPNCP , then is with probability . Notice that a NO instance of GAPRNC the main difference between the two problems is that while in the minimum distance can be arbitrarily small, the minimum distance must be relatively in large (compared to error weight parameter ). The idea is to embed the original code in a higher dimensional space to make sure that the new code has large minimum distance. At the same time, we want also to embed target vector in this higher dimensional space in such a way that the distance of the target from the code is roughly preserved. The embedding is easily performed using the gadget from Lemma 15. Details follow. , we invoke Lemma 15 On input GAPNCP instance ) on input (the information content of input code , a generator and security parameter , to find integers , a mapping matrix , and a vector matrix such that ; 1) with probability at least . 2) . Notice that all Consider the linear code rows of matrix are codewords of . (However, only by concateat most are independent.) We define matrix copies of and copies of nating13 (10)
13Here the word “concatenation” is used to describe the simple juxtapposition of matrices or vectors, and not the concatenating code construction of [11] used in Section III.
DUMER et al.: HARDNESS OF APPROXIMATING THE MINIMUM DISTANCE OF A LINEAR CODE
and vector of
as the concatenation of
copies of
and copies
(11)
. The output of the reduction is . Before we can prove that the reduction is correct, we need to bound the quantity . Using the definition of and we get
Finally, let
31
is that given a solution to the nearest codeword problem, such that one has to find a codeword in the sphere . Our proof only asserts that with high probability such a codeword exists, but it is not known how to find it. This was the case also for the Ajtai–Micciancio hardness proof for the shortest vector problem, where the failure probability was only polynomially small. As discussed in Section II-B, hardness under polynomial RUR-reductions easily implies the following corollary. , Corollary 18: For any GAPRNC is not in RP unless NP
, and any finite field RP.
Since NP is widely believed to be different from RP, Corollary 18 gives evidence that no (probabilistic) polynomial time exists. algorithm to solve GAPRNC
and
So, we always have . We can now prove the as correctness of the reduction. In order to consider , we first prove that . an instance of GAPRNC and, therefore, Indeed, (12) is a YES instance, i.e., there exists Now, assume such that . Let be a codeword in such and . We know such a codeword exists that . In such a case, we have with probability at least
(13) is a YES instance. proving that is a NO instance, i.e., the disConversely, assume tance of from is greater than . We want to prove that for we have . Indeed all
V. HARDNESS OF THE MINIMUM-DISTANCE PROBLEM In this section, we prove the hardness of approximating the minimum-distance problem. We first derive an inapproximability result to within some constant bigger than by reduction from GAPRNC . Then we use direct product constructions to amplify the inapproximability factor to any constant and to fac, for any . tors A. Inapproximability to Within Some Constant to within a conThe inapproximability of GAPDIST immediately follows from the hardness of stant . GAPRNC , and every finite field , Lemma 19: For every is hard for NP under polynomial RUR-reductions GAPDIST with exponentially small soundness error. Proof: The proof is by reduction from GAPRNC . Let be an instance of GAPRNC with distance . Assume without loss of generality that does not by belong to code . (One can easily check whether then is a solving a system of linear equations. If , and the reduction can output YES instance because some fixed YES instance of GAPDIST .) Define the matrix (15)
(14) is a NO instance. (Notice that NO proving that instances get mapped to NO instances with probability , as required.) Remark 17: The reduction given here is a randomized many–one reduction (or a randomized Karp reduction) which fails with exponentially small probability. However, it is not a Levin reduction: i.e., given a witness for a YES instance of the source of the reduction we do not know how to obtain a witness to YES instances of the target in polynomial time. The problem
is a YES instance of GAPRNC , i.e., there Assume . Then, is a YES inexists an such that belongs stance of GAPDIST , since the nonzero vector to code and has weight at most . is a NO instance of Conversely, assume GAPRNC . We prove that any nonzero vector of code has weight above . Indeed, if then is a nonzero codeword of and, therefore, has weight (since ). On the other hand, if then
as of GAPDIST
. Hence, .
and
is a NO instance
32
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 1, JANUARY 2003
B. Inapproximability to Within Bigger Factors
is a generator matrix of
To amplify the hardness result obtained above, we take the direct product of the code with itself. We first define direct products. , let be a linear code genDefinition 20: For . Then the direct product of and , erated by , is a code over of block length and denoted . Identifying the set of -dimensional dimension of all matrices in the obvectors with the set is conveniently defined vious way, the direct product in . as the set of all matrices Notice that a generating matrix for the product code can be and , defining a basis codeword easily computed from for every row of and row of (where denotes the transpose of vector , and is the standard and row vector ). No“external” product of column vector are matrices whose rows tice that the codewords of and columns are codewords of . In our are codewords of reduction, we will need the following fundamental property of direct product codes. For completeness (see also [17]), we prove it below. and of minimum Proposition 21: For linear codes and , their direct product is a linear code of disdistance . tance has at least nonzero entries Proof: First, . Indeed, consider the matrix whose rows are if . Since this matrix is nonzero, some row is codewords from or more. Thus, has at a nonzero codeword of weight . least nonzero columns. Now consider the matrix At least columns of this matrix are nonzero codewords of . or more. Each of weight at least , for a total weight of Second, we verify that the minimum distance of is exactly . To see this consider vectors such has exactly nonzero elements. Then notice that the that is a codeword of . Expressing matrix as we see that its th column is zero if the th coordinate of is zero and the th row of is zero is zero. Thus, is zero on all if the th coordinate of entries are but columns and rows and, thus, at most nonzero. We can now prove the following theorem. the following holds. Theorem 22: For every finite field , GAPDIST is hard for NP under poly• For every nomial RUR-reductions. , GAPDIST is hard for NP under • For every . quasipolynomial RUR-reductions for In both cases, the error probability is exponentially small in a security parameter. is hard by Lemma Proof: Let be such that GAPDIST of GAPDIST , consider the in19. Given an instance of GAPDIST , where stance
for an integer parameter . By Proposition 21, it follows that YES instances map to YES instances and NO instances to NO yields the first part of the theorem. instances. Setting is polynomial in , Notice that for constant , the size of can be constructed in polynomial time (for any fixed and independent of the size of ). To show the second part, we use the first part by setting and in the previous reduction, where is the will be block length of . This time, the block length of which is quasi-polynomial in the block length of the original instance . The reduction can be computed in quasi-polynomial (in ) time, and the approximation factor achieved is
As for the relatively near codeword problem, the following corollary can be easily derived from the hardness result under RUR-reductions. the following holds. Corollary 23: For every finite field , GAPDIST is not in RP unless NP • For every RP. , GAPDIST is not in RQP for • For every unless NP RQP. VI. ASYMPTOTICALLY GOOD DENSE CODES Our concatenated codes used in the proof of Proposition 11 and inner Hadamard codes employ outer MDS codes over , for fixed and growing . As a result, overall code rate van. ishes for growing lengths, since so does the inner rate Relative distance also tends to , both in outer MDS codes and overall concatenations. To date, all other codes used to prove NP-hardness also have been asymptotically bad. To get hardness results (in both the relatively near codeword problem and the minimum-distance problem) even when the codes are asymptotically good, we will need constructions of asymptotically good “dense” codes. To get such constructions, we will fix the outer alphabet and the inner code . Then we use AG codes to show that there exist dense families of asymptotover . In particular, we show that there ically good codes over with relative disexist -dense code families and rate , where , , tance are positive real numbers14 independent of . These and codes will be used to prove the NP-hardness of GAPRNC restricted to asymptotically good codes, and the hardness of approximating the minimum distance of a code within an additive linear error. , we use long AG codes Given a square prime power meeting the TVZ bound (see
14Notice that the relative distance satisfies d=l > (r=)=l = and the code is asymptotically good. In fact, for all dense codes considered in this paper, we have < 1 and, therefore, the relative distance is at least .
DUMER et al.: HARDNESS OF APPROXIMATING THE MINIMUM DISTANCE OF A LINEAR CODE
[21]). The generator matrices of these codes can be in time polynomial in . As constructed for any and remarked in Section III, the TVZ bound exceeds the GV bound for most code rates and, therefore, allows to obtain sequences of dense codes with fixed . The resulting code sequence turns . We do not know how to get out to be -dense for arbitrarily close to using this technique. Proposition 24: For every prime power and every there exist and such that there is a polynomial family of asymptotically good -dense codes with rate and relative distance . and . Notice that by Proof: Let taking sufficiently close to , we can get arbitrarily close to . We first prove the result assuming and that is a square. In this case, the AG codes (for appropriate choice of parameters) already give what we want. Later, we will use concatenation to get the proposition for arbitrary . and is a square): Given parama) Case 1—( , , and . eter , let be a code meeting the TVZ bound, i.e., with Let . We show that this code has rate , , , and that the expected relative distance number of codewords of in a random ball of radius is at least . We start with bounding the rate
We want to prove that this bound is strictly positive. Using the and we get assumption
Using get
33
and dividing by the block length of the code, we
This proves that the relative distance is bounded away from , and the code is asymptotically good. We still need to prove that -dense (on the average), i.e., and the the code is expected number of codewords in a random ball of radius is at least . We use (16) to bound the ratio
This proves that the radius is small, relative to the minimum , the expected distance of the code. We want to bound number of codewords in a random ball of radius . Using (2) and (7), we have
We want to prove that this bound is at least . Combining and , we get . Substituting in the last equation
Substituting in the bound for the rate of the code we get
Next we bound the minimum distance. From the TVZ bound we get
(16)
It follows that there exists a vector such that . This concludes the analysis for large square . b) Case 2—(Arbitrary ): In this case, let be the smallest . Given , let even integer such that be a code and a radius as given in Case 1 , relative distance , with rate , and . We concatenate with to get a equidistant Hadamard code of rate code
34
and relative distance
Now, given a vector that has vectors in the ball of also has vectors radius around it, the vector in the ball of radius around it. Notice of does not change in this step, and, that the ratio holds true. This gives the code as claimed in therefore, the proposition. The relative distance of the codes built in Proposition 24 tends to for large alphabets . We remark that using a similar construction it is possible to build dense codes (with and ) for any . The following lemma stresses the algorithmic components of the code construction above and in particular stresses that we know how to construct the generator matrix of the th code in the sequence, and how to sample dense balls from this code in polynomial time. (We also added one more parameter to the algorithm to ensure that the block length of code is at least . This additional parameter will be used in the proofs of Section VII and is being added for purely technical reasons related to those proofs.) , Proposition 25: For every prime power and every , , and a probabilistic algorithm that on there exist and , outputs (in time polynomial in input three integers ) three other integers , , , a generator matrix of a code , and a center (of weight ) such that ; 1) ; 2) ; 3) , the ball contains 4) with probability at least codewords of ; at least . 5) to the algorithm, we Proof: Given inputs -dense code from Proposition 24, consider the . This code is the concatenawhere and a Hadamard code tion of an AG code . Therefore, a generator matrix for can be constructed in time polynomial in . Let , , choose uniformly at random from and let . From the proof of Proposition 24, we know that the expected number is at least of codewords in a randomly chosen ball . We now apply the Lemma 13 instantiating it with groups , , and set . Notice that , the set is just the ball for any center of radius centered in . From the proof of Proposition 24, the (i.e., the expected number of codewords average size of . Setting in Lemma in a random ball) is at least , we immediately get 13, and noting that contains at least codewords with probability that . Finally, by Lemma 10, we get that the corresponding in contains at least codewords from . ball We conclude with the following lemma analogous to Lemma 15 of Section III.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 1, JANUARY 2003
Lemma 26: , that on input integers vector
For every and prime power , there exist , and a probabilistic polynomial time algorithm outputs (in time polynomial in ), , and matrices , and a such that
; 1) ; 2) ; 3) , 4) with probability at least there exists a i.e., for every and . . 5)
, such that
The proof is identical to that of Lemma 15, with the use of Proposition 12 replaced by Proposition 25. VII. HARDNESS RESULTS FOR ASYMPTOTICALLY GOOD CODES In this section, we prove hardness results for the decoding problem even when restricted to asymptotically good codes, and the hardness of approximating the minimum distance problem with linear (in the block length) additive error. First, we define a restricted version of GAPRNC where the code is required to be asymptotically good. , let -restricted Definition 27: For be the restriction of GAPRNC to instances GAPRNC satisfying and . We call relative error weight. Notice that for every constant , the minimum distance of the code satisfies , therefore, the code is asymptotically good with information rate and relative distance at least at least . In particular, when , the relative distance is strictly bigger than . -restricted In the following subsections, we prove that GAPRNC is NP-hard (under RUR-reductions). Then we use this result to prove that the minimum distance of a linear code is hard to approximate additively, even to within a linear additive error relative to the block length of the code. The proofs will be essentially the same as those for nonasymptotically good codes. The main differences are as follows. • We use the asymptotically good dense code construction from Section VI. Before we relied on dense codes that were not asymptotically good. • We reduce from a restricted version of the nearest codeword problem in which the error weight is required to be a constant fraction of the block length. In the following subsection, we observe that a result of Håstad [13] gives us hardness of NCP with this additional property. -reThen we use this result to prove the hardness of the stricted GAPRNC problem, and the hardness of GAPDISTADD. A. The Restricted Nearest Codeword Problem We define a restricted version of the nearest codeword problem in which the target distance is required to be at least a linear fraction of the block length.
DUMER et al.: HARDNESS OF APPROXIMATING THE MINIMUM DISTANCE OF A LINEAR CODE
Definition 28: For , let -restricted GAPNCP ( -GAPNCP for short) be the restriction of GAPNCP to satisfying . instances is NP-hard. This reWe will need the fact that -GAPNCP sult does not appear to follow from the proof of [3]. Instead, we observe that this result easily follows from a recent and extremely powerful result of Håstad [13]. Theorem 29—[13]: For every Abelian group and for every , given a system of linear equations over , it is NP-hard to distinguish instances in which equations can be satisfied, from those in which at most equations can be satisfied.15 Notice that a system of linear equations where is the maximum number of simultaneously satisfiable equations corresponds to an instance of the nearest codeword problem where . So, applied the distance of the target from the code is , and phrased in coding-theoretic terms, to group the theorem says that for every prime power it is hard to tell whether the distance of the target from the code is at most or more than . In other words, -GAPNCP is . NP-hard for every constant approximation factor is NP-hard for . This In particular, -GAPNCP gives the following version of Theorem 29. , there exists a Corollary 30: For every is NP-hard for all prime powers . that -GAPNCP Notice that the phabet size.
such
35
We prove that ,
-restricted GAPRNC , and
is NP-hard for
Since can be arbitrarily close to , and arbitrarily -restricted close to , this proves the NP-hardness of for any . GAPRNC -restricted We prove the NP-hardness of by reduction from -GAPNCP . Let GAPRNC be an instance of -GAPNCP , where . We invoke and , to find integers the algorithm of Lemma 26 on input a generator matrix , a mapping matrix , and a vector such that 1) 2) for every
; and
3) 4) 5)
there exists a ;
satisfying
; ; , and therefore
too;
. Nowhere the second condition holds with probability tice that, since any sphere containing more than a single code, the relative radius must word must have radius at least be at least
in the corollary is independent of the al-
B. Inapproximability of the Restricted Relatively Near Codeword Problem -reIn this subsection, we prove the hardness of the problem. The proof is the same as that stricted GAPRNC of Theorem 16 with the following modifications: 1) instead of using Lemma 15 (i.e., the dense code family based on asymptotically bad MDS codes) we use the asymptotically good dense , so code family from Lemma 26; 2) we consider only that Lemma 26 holds true; 3) we use Corollary 30 to select a such that -GAPNCP is NP-hard; 4) we assume that instance we are reducing is in fact a the GAPNCP -GAPNCP instance. Details follow. Theorem 31: For every prime power , and constants and , there exists and such that -restricted GAPRNC is hard for NP under RUR-reductions with error probability exponentially small in a security parameter. , and let Proof: Fix a prime power and real and be the corresponding constants such that be an arbitrarily small conLemma 26 holds true. Let the approximation factor for which we want to stant, and problem. prove the NP-hardness of the restricted GAPRNC , and let be the constant from Corollary Let is NP-hard, e.g., . 30 such that -GAPNCP 15Håstad’s result has the further property that every linear equation only involves three variables, but we do not need this extra property.
The reduction proceeds as in the proof of Theorem 16. Let , , , and define code , and target as in (10) and (11). The output of the reduction is . is a valid We want to prove that the map to -restricted (RUR) reduction from -GAPNCP . The proof of the following facts is exactly the GAPRNC same as in Theorem 16 (see (12)–(14)): • • if • if
; , then , then
; .
satisfies It remains to prove that GAPRNC instance restriction. Notice that the code has block length the . Therefore, the relative error weight satisfies
Finally, the information content of is , i.e., the same as and in order code . Therefore, the rate is to show that the code is asymptotically good we need to prove . In fact that
and
36
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 1, JANUARY 2003
This proves that the rate of the code is at least
C. Minimum Distance With Linear Additive Error In Lemma 19, we proved that the minimum-distance problem is hard to approximate within some constant factor. The proof was by reduction from GAPRNC. The same reduction, when applied to the restricted GAPRNC problem gives an inapproximability result for the minimum-distance problem with linear ad-restricted ditive error. In particular, Lemma 19 reduces to GAPDISTADD with GAPRNC
This result is formally proved in the following theorem. Theorem 32: For every prime power , there exists a such that GAPDISTADD is hard for NP under polynomial RUR-reductions with soundness error exponentially small in a security parameter. -restricted Proof: The proof is by reduction from with , , and . The GAPRNC reduction is the same as in the proof of Lemma 19. On input , the reduction outputs where is the code defined in (15). We know from the proof of Lemma 19 that is a valid reduction from GAPRNC to is a YES (resp., NO) inGAPDIST . We show that if stance of GAPDIST , then it is also a YES (resp., NO) instance is of GAPDISTADD . This proves that to GAPDISTADD . also a valid reduction from GAPRNC The case of YES instances is trivial, because the definition and GAPDISTADD is the of YES instances in GAPDIST . So, assume is a NO instance of same, namely GAPDIST . Then
and
is a NO instance of GAPDISTADD
.
VIII. OTHER REDUCTIONS AND OPEN PROBLEMS We proved that approximating the minimum-distance problem is hard for NP under RUR-reductions, i.e., probabilistic reductions that map NO instances to NO instances, and YES instances to YES instances with high probability. (The same is true for all other reductions presented in this paper, as well as the hardness proof for the shortest vector problem in [2], [18].) An obvious question is whether it is possible to remove the randomization and make the reduction deterministic. (Observe that a deterministic NP-hardness result is known for the exact version of the minimum-distance problem [23], [22]. In contrast, for the shortest vector problem, even the exact version is known to be NP-hard only under randomized reductions [2].) We notice that our reductions (as well as the Ajtai–Micciancio
ones for SVP) use randomness in a very restricted way. Namely, the only part of the reduction where randomness is used is the proof of Lemma 15 (and Lemma 26 for asymptotically good codes). The construction in the lemma depends only on the input size, and not on the particular input instance we are reducing. So, if the construction succeeds, the reduction will faithfully map all YES instances (of the appropriate size) to YES instances. Therefore, the statements in Lemmas 15 and 26 can be easily modified to obtain hardness results for NP under deterministic nonuniform reductions, i.e., reductions that take a polynomially sized advice that depends only on the input size.16 (The advice is given by the sphere center and transformation matrix satisfying Lemmas 15 and 26.) Corollary 33: For every finite field
the following holds.
and , GAPRNC is hard for • For every NP under nonuniform polynomial reductions. Therefore, no nonuniform polynomial time (P/poly) algorithm exists unless NP P/poly. for GAPRNC , GAPDIST is hard for NP under • For every nonuniform polynomial reductions. Therefore, no P/poly unless NP P/poly. algorithm exists for GAPDIST and , GAPDIST is • For every hard for NP under nonuniform quasi-polynomial reductions. Therefore, no nonuniform quasi-polynomial time unless NP (QP/poly) algorithm exists for GAPDIST QP/poly. and , there exists and • For every such that -restricted GAPRNC is hard for NP under nonuniform polynomial reductions. Therefore, no -restricted GAPRNC P/poly algorithm exists for unless NP P/poly. such that GAPDISTADD is hard for • There exists a NP under nonuniform polynomial reductions. Therefore, no P/poly algorithm exists for GAPDISTADD unless NP P/poly. We also notice that a uniform deterministic construction satisfying the properties of Lemmas 15 and 26 would immediately give a proper NP-hardness result (i.e., hardness under deterministic Karp reductions) for all problems considered in this paper. Interestingly, for the shortest vector problem, Micciancio [18] (see also [19]) showed that a deterministic NP-hardness result is possible under a reasonable (but unproven) number-theoretic conjecture regarding the distribution of smooth numbers (i.e., numbers with no large prime factors). No deterministic proof under similar number-theoretic conjectures is known for the coding problems studied in this paper. Finally, we notice that all our results rely on the fact that the code is given as part of the input. Thus, it is conceivable that for every error-correcting code, there exists a fast algorithm to correct errors (say up to the distance of the code), however, this algorithm may be hard to find (given a description of the code). A result along the lines of the one of Bruck and Naor [7], 16Since our reduction achieves exponentially small error probability, hardness under nonuniform reductions also follows from general results about derandomization [1]. However, the ad hoc derandomization method we just described is more efficient and intuitive.
DUMER et al.: HARDNESS OF APPROXIMATING THE MINIMUM DISTANCE OF A LINEAR CODE
showing that there are specific sequences of codes for which nearest codewords are hard to find (even if the code can be arbitrarily preprocessed) would be desirable to fix this gap in our knowledge. Feige and Micciancio [10] recently showed that the nearest codeword problem with preprocessing is hard to approx. Inapproximability results imate within any factor less than for NCP with preprocessing are interesting because they can be combined with our reduction (Theorem 16) to prove the inapproximability of RNC with preprocessing. Specifically, Theto GAPRNC orem 16 gives a reduction from GAPNCP with and , with the property that the GAPRNC code depends only on the GAPNCP code (and not is hard to solve even when the target vector). So, if GAPNCP with prethe code can be preprocessed, then also GAPRNC processing cannot be efficiently solved. Unfortunately, in order with preproto get nontrivial hardness results for GAPRNC ) using this method, one needs cessing (i.e., results with to start from inapproximability results for GAPNCP with pre. So, inapproximability factors processing with proved in [10] do not suffice. Very recently, the results of [10] , giving the first inhave been improved by Regev [20] to approximability results for RNC with preprocessing. Notice that applying the reduction from Theorem 16 to GAPNCP with allows to establish the inapproximability (within ) of GAPRNC for some constant factor . Notice that in order to get , one needs , and, therefore, . Interestingly, [20] also shows how to modify the reduction in Theorem 16 to get inapproximability . The improveof RNC with preprocessing for any ment is based on the following two observations. (The reader is referred to [20] for details.) in (14) can be 1) The lower bound strengthened to 2) The center of the sphere
is at distance at least
from the code . Based on these two observations, Regev shows that (with a different choice of parameters and ) the reduction given in the for any proof of Theorem 16 maps GAPNCP to GAPRNC . Combined with the improved inapproximability results for GAPNCP with preprocessing, this shows that GAPRNC with preprocessing is NP-hard (under RUR-reducand . tions as usual) for any We conclude with a brief discussion of the main problem left open in this work. Our hardness results for RNC hold . For the case of general (not asymptotically only for good) codes, we showed that one can make arbitrarily close to . However, we cannot achieve equality . RNC (also known as the BDD) is arguably the most relevant decoding problem in practice, and the one for which most known polynomial time algorithms (for specific families of codes, e.g., RS and RNC for codes) work. The problems RNC are qualitatively different, because for , the solution to the decoding problem is guaranteed to be unique. Our reduction of relies on the construction of a code and a sphere such that contains several (in fact, radius
37
exponentially many) codewords. Clearly, for , no sphere can contain more than a single codeword. of radius seems an intrinsic limitation of our reSo, the bound duction technique.17 We leave determining the computational complexity of BDD as an open problem. REFERENCES [1] L. M. Adleman, “Two theorems on random polynomial time,” in Proc. 19th Symp. Foundations of Computer Science, 1978, pp. 75–83. [2] M. Ajtai, “The shortest vector problem in L_2 is NP-hard for randomized reductions (extended abstract),” in Proc. 30th Annual ACM Symp. Theory of Computing, Dallas, TX, May 23–26, 1998, pp. 10–19. [3] S. Arora, L. Babai, J. Stern, and E. Z. Sweedyk, “The hardness of approximate optima in lattices, codes, and systems of linear equations,” J. Comput. Syst. Sci., vol. 54, no. 2, pp. 317–331, Apr. 1997. Preliminary version in FOCS’93. [4] S. Arora and C. Lund, Approximation Algorithms for NP-Hard Problems. Boston, MA: PWS, 1996, ch. 10, Hardness of Approximation. [5] E. Berlekamp, R. McEliece, and H. van Tilborg, “On the inherent intractability of certain coding problems,” IEEE Trans. Inform. Theory, vol. IT-24, pp. 384–386, May 1978. [6] P. Berman and M. Karpinski, “Approximating minimum unsatisfiability of linear equations,” in Proc. 13th Annu. ACM-SIAM Symp. Discrete Algorithms, SODA’02, San Francisco, CA, 2002, pp. 74–83. [7] J. Bruck and M. Naor, “The hardness of decoding linear codes with preprocessing,” IEEE Trans. Inform. Theory, vol. 36, pp. 381–385, Mar. 1990. [8] I. Dinur, G. Kindler, R. Raz, and S. Safra, “An improved lower bound for approximating CVP,” Combinatorica, to be published. [9] I. Dinur, G. Kindler, and S. Safra, “Approximating CVP to within almost-polynomial factors is NP-hard,” in Proc. 39th Annu. Symp. Foundations of Computer Science. Palo Alto, CA, Nov. 7–10, 1998. [10] U. Feige and D. Micciancio, “The inapproximability of lattice and coding problems with preprocessing,” in Proc. IEEE Conf. Computational Complexity—CCC 2002. Montreal, QC, Canada, May 21–23, 2002, pp. 44–52. [11] G. D. Forney, Jr., Concatenated Codes, ser. Research Monograph, no. 37. Cambridge, MA: MIT Press , 1966. ,” Acta Math., vol. [12] J. Håstad, “Clique is hard to approximate within n 182, pp. 105–142, 1999. [13] , “Some optimal inapproximability results,” J. ACM, vol. 48, pp. 798–859, 2001. Preliminary version in Proc. STOC’97. [14] K. Jain, M. Sudan, and V. Vazirani, private communication, May 1998. [15] D. S. Johnson, Handbook of Theoretical Computer Science. Amsterdam, The Netherlands: Elsevier, 1990, vol. A (Algorithms and Complexity), ch. 2, A catalog of complexity classes, pp. 67–161. [16] J. Justesen and T. Høholdt, “Bounds on list decoding of MDS codes,” IEEE Trans. Inform. Theory, vol. 47, pp. 1604–1609, May 2001. [17] F. MacWilliams and N. Sloane, The Theory of Error-Correcting Codes. Amsterdam, The Netherlands: North-Holland, 1981. [18] D. Micciancio, “The shortest vector problem is NP-hard to approximate to within some constant,” SIAM J. Computing, vol. 30, no. 6, pp. 2008–2035, Mar. 2001. Preliminary version in FOCS 1998. [19] D. Micciancio and S. Goldwasser, Complexity of Lattice Problems: A Cryptographic Perspective. Boston, MA: Kluwer, Mar. 2002, vol. 671, The Kluwer International Series in Engineering and Computer Science. [20] O. Regev, “Improved inapproximability of lattice and coding problems with preprocessing,”, Manuscript, Aug. 2002. [21] M. Tsfasman and S. Vl˘adut¸, Algebraic–Geometric Codes. Dordrecht, The Netherlands: Kluwer, 1991. [22] A. Vardy, “Algorithmic complexity in coding theory and the minimum distance problem,” in Proc. 29h Annu. ACM Symp. Theory of Computing, STOC’97. El Paso, TX, May 4–6, 1997, pp. 92–109. [23] , “The intractability of computing the minimum distance of a code,” IEEE Trans. Inform. Theory, vol. 43, pp. 1757–1766, Nov. 1997. [24] V. Zinoviev and S. Litsyn, “On codes exceeding the Gilbert bound” (in Russian), Problems of Information Transmission, vol. 21, no. 1, pp. 109–111, 1985. [25] A. Lobstein, “The hardness of solving subset sum with preprocessing,” IEEE Trans. Inform. Theory, vol. 36, pp. 943–946, July 1990.
p
17The same problem on integer lattices is also the current bottleneck to extend the inapproximability factor of the shortest vector problem beyond 2 [18].