IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
929
Stopping Set Distribution of LDPC Code Ensembles Alon Orlitsky, Member, IEEE, Krishnamurthy Viswanathan, and Junan Zhang, Student Member, IEEE
Abstract—Stopping sets determine the performance of low-density parity-check (LDPC) codes under iterative decoding over erasure channels. We derive several results on the asymptotic behavior of stopping sets in Tanner-graph ensembles, including the following. An expression for the normalized average stopping set distribution, yielding, in particular, a critical fraction of the block length above which codes have exponentially many stopping sets of that size. A relation between the degree distribution and the likely size of the smallest nonempty stopping set, showing that (0) (1) fraction of codes with (0) (1) 1, for a 1 and in particular for almost all codes with smallest variable degree 2, the smallest nonempty stopping set is linear in the block length. Bounds on the average block error probability as a function of the erasure probability , showing in particular that for codes with lowest variable degree 2, if is below a certain threshold, the asymptotic average block error probability is (0) (1) . 1 1
respectively, by the existence of a stopping set, and the size of the maximal stopping set, in the erased variable nodes. This fact was used in [5] to derive a recursive formula for calculating the average error probability of regular LDPC codes. The recursion was subsequently simplified and extended in [7]. The above results focus on exact expressions for the average error probabilities of finite-length codes. In this paper, we bound the error probability of long LDPC codes by analyzing the asymptotic behavior of two quantities: the distribution of stopping sets and the stopping number—the size of the smallest nonempty stopping set in a code. We derive the following results for code ensembles defined by Tanner graphs. •
An expression for the average stopping set distribution of regular and irregular LDPC codes. We use this expression to obtain the critical exponent stopping ratio , the ratio of the likely stopping number to the block length.
•
The probability that the stopping number of a code grows linearly with the block length . For regular codes with variable degree , we show that this probability approaches as increases. For irregular codes, as increases, this probability approaches a value that is determined by , where and represent the degree distribution of the code’s Tanner graph. For irregular codes with , the probability approaches which is always strictly between and . For irregular codes with , this probability approaches . Therefore, for large block length, when the degree distributions satisfy , there always exists a sequence of LDPC codes whose stopping number grows linearly with . This is analogous to results shown for minimum distance [8], [9].
•
The behavior of the average block error as a function of the channel erasure probability . We define an error-floor threshold below which stopping sets of linear size make exponentially small contribution to the average block error probability. Among stopping sets of sublinear size, those of the smallest size contribute most to the average block error probability. When is plotted against , this region corresponds to the error floor. For Tannergraph ensembles with lowest variable degree , we show that the asymptotic average block error probability is . Therefore, the error floor does not decrease as increases. For Tannergraph ensembles with lowest variable degree , we show that the error floor behaves like .
Index Terms—Binary erasure channel (BEC), block error probability, growth rate, low-density parity-check (LDPC) codes, minimum distance, stopping set.
I. INTRODUCTION
L
OW-density parity-check (LDPC) codes have been shown to perform extremely well even with computationally efficient iterative decoding [1]–[3]. These algorithms involve message-passing on the codes’ Tanner graph [4]. Of these, Gallager’s soft-decoding algorithm [1] is equivalent to maximumlikelihood (ML) decoding when the Tanner graph of the code has no cycles of length smaller than or equal to twice the number of iterations. For the binary erasure channel (BEC), Gallager’s soft-decoding algorithm reduces to the iterative decoding algorithm proposed in [2] where it was shown that, as the block length tends to infinity, appropriately designed irregular LDPC codes can approach the capacity of the erasure channel with linear decoding complexity. The problem of analyzing finite-length LDPC codes was considered in [5] which used the concept of stopping sets, introduced in [6], to evaluate the average error probability under iterative decoding over the erasure channel. The role played by stopping set distributions in iterative decoding is analogous to that played by distance spectra in ML decoding. More precisely, the block and bit errors are determined,
Manuscript received February 2, 2003; revised November 8, 2004. the material in this paper was presented at the IEEE International Symposium on Information Theory, Yokohama, Japan, June/July 2003. A. Orlitsky is with the Department of Electrical and Computer Engineering as well as the Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0407 (e-mail:
[email protected]). K. Viswanathan and J. Zhang are with the Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA 920930407 (e-mail:
[email protected];
[email protected]). Communicated by S. N. Litsyn, Associate Editor for Coding Theory. Digital Object Identifier 10.1109/TIT.2004.842571
0018-9448/$20.00 © 2005 IEEE
930
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
The results derived in this paper are related to a large body of research on minimum distance and stopping sets. Stopping set distribution is related to distance spectrum, which was analyzed by a number of researchers for several LDPC code ensembles. In his monograph [1], Gallager calculated the distance spectrum of regular Gallager ensembles. Litsyn and Shevelev extended these calculations to seven other regular ensembles [10] (including Tanner-graph ensembles) and to irregular Gallager ensembles [11]. Burshtein et al. first studied the stopping set distributions for Tanner-graph ensembles in [12]. In this paper, we derive simpler expressions for the distributions. Di et al. [8], [9], studied the minimum distance of Tannergraph LDPC code ensembles. In [13], Richardson et al. studied the stopping number of several code ensembles. Their results , then most can be summarized as follows. If codes in the ensemble have logarithmic minimum distance and stopping number. If , then the codes have linear minimum distance and stopping number, if small stopping sets are protected by sending dummy bits. By contrast, our approach is based on eliminating codes from the ensembles, showing that in this case, the more traditional method of expurgation works as well. The paper is organized as follows. In Section II, we describe Tanner-graph code ensembles and stopping sets. In Section III, we derive the average stopping set distribution for regular and irregular ensembles. In Section IV, we study the stopping number of regular and irregular ensembles. In Section V, we define the threshold and investigate the effect of stopping sets on the block error probability.
II. TANNER-GRAPH ENSEMBLES AND STOPPING SETS
parity-check matrices or Tanner graphs. The Tanner-graph ensembles of LDPC codes are defined by the latter method. A Tanner-graph ensemble of LDPC codes is denoted by where is the block length and and represent the (edge) degree distributions. A fraction of the edges have variable degree and a fraction have check degree . In the Tanner graphs, there are variable nodes,
check nodes, and
edges. Then the node degree distributions are defined as and namely, a fraction of the variable nodes have sockets fraction of the check nodes have sockets. Both and a the variable-node sockets and the check-node sockets are la. A random Tanner graph is generated by beled a random permutation which indicates that the th variable-node socket is connected to the th check-node socket. Note that since parallel edges are permitted, a codeword bit may be involved in a parity-check equation more than once. Moreover, certain parity checks may be dependent. Hence, the rate of the code is bounded by
A. Tanner-Graph Ensembles While individual LDPC codes have proven difficult to analyze, several random ensembles of these codes possess inherent symmetry which facilitates the evaluation of their average performance. This property was exploited, for example, in developing the well-known density evolution technique [2], [3] over Tanner-graph ensembles. These papers have also shown that, with high probability, the performance of an individual code of large block length closely resembles that of the ensemble average. Consequently, most LDPC results have dealt with the average performance of LDPC ensembles rather than that of individual codes. Every linear code is defined by a parity-check matrix which specifies the set of linear constraints satisfied by the codeword bits. Every linear code can also be represented [4] by a bipartite where the set of variable nodes repregraph sents the codeword bits and the set of check nodes represents the set of parity-check constraints satisfied by the codeword bits. and are connected if the codeTwo vertices word bit corresponding to is involved in the parity-check constraint corresponding to . Random ensembles of LDPC codes can therefore be defined by a distribution on a collection of
When and , the Tanner-graph enis regular in the sense that all variable semble nodes have degree and all check nodes have degree . B. Stopping Sets Now we define stopping sets and parameters associated with the distribution of stopping sets in code ensembles. We compare these parameters with the corresponding ones for the distance spectrum. be the Tanner graph of a code. For Let and , let and denote the degrees every of the variable node and the check node , respectively. For , let denote its size. For a given check node any , let denote the number of edges connecting to and we refer to is as the induced degree with respect to . We call a stopping set if no check node is connected to via for a single edge. Formally, is a stopping set if every . Let denote the collection of all stopping sets in the graph. The stopping number of the graph is the size of its of the smallest, nonempty stopping set. The stopping ratio
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
graph is defined by , the ratio of its stopping number to the number of variable nodes. , let the constellation of denote For a subset the set of check-node sockets connected to . If is a stopping is called a stopping constellation. Let set, its constellation denote the collection of all stopping constellations. For the same graph let the one-set of a codeword be the set of all variable nodes corresponding to bits of that are . As in the case of stopping sets, let denote the collection of all one-sets of the graph is the in the graph and the minimum distance size of its smallest, nonempty one-set. is even for every . It follows For every for every , hence, every one-set is a stopthat ping set but the converse does not necessarily hold. Therefore, and
(1)
We now define parameters associated with the distribution of stopping sets in code ensembles. Given an ensemble of , let the average stopping set LDPC codes and an integer distribution
denote the average number of -element stopping sets in a randomly chosen code. We are interested in evaluating the exponential behavior of the average stopping set distribution as the block length increases. There are some integer constraints, e.g., , hence, not all integers for regular codes can be the block lengths. The following discussion is aimed at addressing these constraints. , the set of rational numbers between For any and , there always exists a sequence of ensembles of strictly satisfying integer constraints increasing block lengths such that
We define the normalized average stopping set distribution for as1
In Theorem 2 and 5, we show that this limit exists for both regular and irregular ensembles. We also generalize the definition to by taking a sequence of rational numbers of with and setting
The limit exists and the generalized is continuous because , goes to zero as goes to zero, for a fact shown in Theorem 7. As in [10], we define the normalized average codeword weight distribution as
1All
the logarithms in this paper are natural logarithms.
931
Since the empty set is a stopping set and the all-zero codeword is a codeword and We require two more quantities which we call critical exponent stopping ratio and critical exponent codeword weight ratio. The critical exponent stopping ratio
is the smallest fraction for which a code is likely to contain exponentially many stopping sets of size . Similarly, the critical exponent codeword weight ratio
is the smallest fraction for which a code is likely to contain , we exponentially many codewords of weight . Since have
and
III. AVERAGE STOPPING SET DISTRIBUTIONS IN TANNER-GRAPH ENSEMBLES In this section, we derive the average stopping set distribution of regular and irregular Tanner-graph ensembles. and show that , the critical exponent stopfor the ensemble ping ratio if and if . As in [5], we use polynomial characteristic functions to idendenote the coefficient of tify stopping sets. Let in the polynomial . For codes generated from a Tannergraph ensemble, we consider their Tanner graphs . Given a which has edges incident upon it, we label a check-node socket if it is connected to and if it is not connected. Given a check node , the number of ways of choosing of its sockets is . The number of ways of choosing check-node sockets from all the sockets of is
Every choice of check-node sockets corresponds to a different appear with equal constellation. These constellations probability in the ensemble. Of these, the number of stopping constellations is
For the ensemble, we see from the preceding argument that for any with edges incident upon it, the probability that is a stopping set is
932
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
Now
Hence, for regular ensembles stopping set distribution is
where
, the average
is the only positive solution to
Proof: Abbreviating
as
and
as
(2)
For irregular ensembles that
(7)
, it can be shown
is the number of sets with variable nodes and edges incident upon them. Therefore, the average stopping set distribution is
Although not required for the subsequent derivation, the following Tanner-graph interpretation may be worth noting. If is a stoping set, then can be viewed as the number of check with respect to in each nodes with induced degree stopping constellation. To prove the lemma, we need to determine the exponential growth of the left-hand side (LHS) of (7). Let
Then from (7) (3)
To evaluate the normalized average stopping set distribution , we need to evaluate the asymptotic coefficients of the polynomials that figure in the equations. This can be accomplished by saddle-point analysis. Instead, we use the following trivial inequalities which hold for and with nonnegative coefficients all polynomials : and
It is easy to see that grows polynomially with , hence, we only need to determine the exponential growth of . . By Stirling’s formula For convenience, let
(4) (8)
(5) where For and satisfying certain conditions, these two inequalities are tight up to a polynomial factor. Instead of proving the general results, we show the tightness of these bounds for the specific polynomials used in this paper. , (4) is asympFirst we show that for totically tight in the exponent for large .
Using Lagrange multipliers to maximize the exponent, we obtain that for
Lemma 1: Let and . Then satisfying the for every strictly increasing sequence , the limit integer constraints where
(6)
satisfies
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
Let
, we are looking for a positive solution to . We will show that exists and is unique. Substituting into (8) we get
933
Using Lemma 1 and observing that
we obtain and the lemma follows. All that remains to be shown is the existence and uniqueness of , the positive solution of
Note that , . For exists and is unique. For , that
.
We now calculate the normalized average stopping set disfor irregular ensembles. We need the following tributions modifications to Lemma 1. Lemma 3: Let For every strictly increasing sequence integer constraints: , the limit
for
and . satisfying the and
, it is easy to check
where
is the only positive solution of
Proof: Similar to the proof of Lemma 1. Note that Therefore, for all solution .
,
mentioned in the lemma minimizes
has a unique positive
Note that mentioned in the lemma minimizes . This shows that for the polynomial , (4) is tight up to a polynomial factor of . Now, we derive the normalized average stopping set distribufor regular ensembles. In the following tions
This shows that for the polynomial is tight up to a polynomial factor of .
, (4)
Lemma 4: Let , , and there exists satisfying the integer constraints: , , and , and
is the entropy function. Theorem 2: Let ensemble
. For the regular Tanner-graph
where is the only positive solution of Proof: Recall that
Then for every strictly increasing sequence the above integer constraints, the limit
satisfying
. exists and can be calculated as follows. Without loss of generality, we assume . Then the value of the limit is decided by the solution to the equations
For the regular Tanner-graph ensemble , by (2)
(9) and (10)
934
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
1) If
2) If
, then (9) and (10) have a unique solution with , , and
,
Proof: In (11), can only take possible values. The dominates the other terms in the term with the optimum exponent. Therefore,
and (9) and (10) have only one solution , then the solution is either of the
form or of the form with 3) If
and
, and
and (9) and (10) have more than one solution , then the equations
have a unique positive solution
, and
Proof: Refer to Appendix A. Note that
mentioned in the lemma minimizes
in Case 3. This shows that (5) for the polynomial
is tight up to a polynomial factor of with a few exceptions. Recall that the normalized average stopping set distribution was defined to be From (3)
In order to determine the that maximizes the expression, we to zero. The calculation of the partial derivative is set straightforward. Note that the theorem shows that the expression for the normalized average stopping set distribution for irregular ensembles is similar to that for the average distance distribution given by Theorem 1 in [11]. The ensembles considered in [11] are Gallager ensembles. Here we consider Tanner-graph ensembles which are simpler to analyze. For regular ensembles, the critical exponent stopping ratio can be obtained by calculating using Theorem 2. We as a function of the variable degree for a fixed rate plot in Fig. 1. The critical exponent codeword weight ratio is also shown in the figure for the sake of comand regular ensemparison. As shown in [1], [10], for fixed bles, approaches the Gilbert–Varshamov bound as increases. However, for fixed , goes to zero for large . is more diffiFor irregular ensembles, the expression for cult to evaluate than that for regular ensembles. In the following to , two corollaries, we extend the definition of and show that it is continuous, and derive conditions for . Corollary 6: For the ensemble , when , let
and . For
s.t.
(11) Substituting Lemmas 3 and 4 into the above equation, we derive the normalized average stopping set distribution for the ensemble. Theorem 5: Let ,
where is defined in Lemma 3, tending Lemma 4 to ,2 and
. For the irregular ensemble
is obtained by exis the set of solutions of (12)
(; ) is meaningful in the context of LDPC codes only for ; but the solution of the limit in Lemma 4 can be extended to 2 R. 2
2Q
Then is continuous over . Proof: We prove the first result, the second follows easily. Following Theorem 2, it is easy to verify that the first result holds for regular ensembles. For irregular ensembles, we note is a solution to (12) and as a function of that in Theorem5, , is continuous. Moreover, is continuous for and therefore so is . Hence, when . Corollary 7: For the ensemble , . Conversely if . Proof: Refer to Appendix B.
, if ,
IV. STOPPING NUMBER IN TANNER-GRAPH CODE ENSEMBLES Recall that the stopping number of a Tanner graph is the size of its smallest nonempty stopping set. In this section, we investigate the stopping numbers of regular and irregular ensembles.
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
935
Fig. 1. Critical exponent stopping ratio and critical exponent codeword weight ratio for increasing c and fixed R = 1 approaches h ( ) = 0:11.
Recall that the normalized average stopping set distribution is
and the critical exponent stopping ratio is defined to be . It is clear from these definitions that , the probability that there exists a stopping for any in a graph selected at random from the ensemble set of size goes down exponentially with . However, this does not suffice to show that with high probability there does not exist a for as the definition of stopping set of size does not address stopping sets of size . In this section, we evaluate the probability that the graph contains stopping sets for sufficiently small including those with size of size and investigate the conditions under which this probability . goes to zero or is in For regular codes with , Theorem 8 shows that for all the probability of stopping sets with size goes down polynomially in . Therefore, the stopping ratio is at least with probability nearly for large . For regular codes with degree , the stopping number is at most which follows from the fact that the stopping number is at most the is [1]. minimum distance of a code, which for The analysis of irregular codes that contain degree- nodes . can be divided into two cases based on the value of
0
=
c
= 1 2. Note that for large ,
if In Corollary 7 of Section III we showed that and if . In Lemma 9, we show that the asymptotic probability that the graph contains size- stopping sets is . Since , if , then stopping sets with size occur with nonzero probability. This is unlike Theorem 8 for regular codes with , where the probability of such an occurrence goes to zero for large . then, for In Theorem 13, we show that if , the asymptotic probability that the graph contains stopping sets of size is . Therefore, if , the stopping ratio is bounded below by in a weak sense. To elaborate, this result implies that with positive probability (bounded away from ), the stopping ratio of a code selected at random is at least , unlike the earlier result for regular codes, where this probability approaches for large . , we use the results in On the other hand, if [8], [9] to lower-bound the probability that the stopping number . This lower bound approaches for large . In other is words, with high probability, the stopping number of a randomly . selected code is close to To prove these results, we will use the following observation. is the stopping number of the Tanner graph. By Recall that Markov’s inequality
(13)
936
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
A. Regular Ensembles
Note that
Theorem 8: For the regular Tanner-graph ensemble , if ,
and Proof: First we prove that From (13)
. If is sufficiently small and , , and can be bounded from above by decreasing geometric sequences. Therefore, (14)
where By definition
Hence, for any
.
Combining this with (14) and (15), we obtain that
and sufficiently large Next we prove that . Let denote the variable nodes of the graph. Then (15)
for and this quantity goes to exponentially as . To simplify the first term in (14), we use (2) to obtain
(17) where the second inequality follows from the inclusion–exclusion principle. For , this inequality reduces to
We bound the expression using Lemma 18 in Appendix C which states that
(16)
For , we bound the second term in (17) as follows. Note that the union of stopping sets is a stopping set. Therefore,
Then Hence,
From the Tanner graph’s viewpoint, we only count stopping constellations in which, when is even, each check node has
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
937
a pair of edges incident on it and, when is odd, one of the check nodes has three edges incident on it and the others have two. Then if is even
The inclusion–exclusion principle states that for distinct and any events
(20) and if
is odd
Set , is the event
, and . Then . Observe that for
Combining the two inequalities with (16), we see that
The term4 is due to the fact that the edges are incident on at most decoming from gree- check nodes and alter the number of sockets available . for Then for
Hence, the lemma follows. B. Irregular Ensembles In Section IV-A, we showed that for regular ensembles, with probability approaching the stopping ratio is at least the critical exponent stopping ratio . However, for irregular ensembles, the stopping ratio has behavior different from that of regular ensembles. First we show in the following lemma that the probability that the stopping . number is approaches Lemma 9: For the ensemble when is large
with
,
Proof: Let denote the set of variable nodes, the set the variable nodes of degree- variable nodes, and in . It is easy to see that3 and and and (18) Now, and
Therefore,
It follows from (18) and (19) that
It was shown in Corollary 7 that if , and if , . Let us consider the case first. For this case (19) The second inequality can be obtained by extending Lemma 18 in Appendix C to irregular graphs. 3A=B
=
fa
:
a 2 A and a 2 = B g.
and by the above Lemma, if , then approaches a positive value and therefore,
6” prefixes any term whose sign is not crucial for the proof.
4“
is
938
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
bounded away from . In Theorem 13, we will show that, , approaches . , then for sufficiently large This implies that if and arbitrarily small , a fraction of the codes . have stopping number To prove Theorem 13, we first show that sets composed exclusively of degree- nodes contribute most to the probability that a graph contains a stopping set of size . To that end we establish some facts about stopping sets composed of degree- nodes. denote the set of degree- variable nodes, the colLet lection of all nonempty subsets of , and the collection of all nonempty subsets of . Let
After imposing the constraint
on the rate at which , as
tend to infinity, we can upper-bound
From (76) in the proof of Lemma 22, if denote the probability that there exists a nonempty stopping set . Recall composed of degree- variable nodes with size , the constellation is the the set of that for a set check-node sockets connected to . If is a stopping set, is called a stopping constellation. Let denote the set of all stopping constellations. For a check node , we by .A denote the number of its sockets in constellation iff . Note that iff . Hence, the membership of in is determined by its constellation without reference to . We divide into two and sets
(22) Hence, Therefore, if (21)
. exists, then from
For and , the subgraph induced by is either a cycle or a union of distinct cycles. denote the collection of all nonempty subgraphs Let which are cycles. Note that if is a union of disjoint such that . Hence, cycles, then there exists
and
Let
and
We denote all the nonempty subsets of of size where . Let
Observe that and are the probabilities that there exists a nonempty stopping set of size that is composed of deand , gree- nodes and whose constellation belongs to respectively. Then the following lemma holds. Lemma 10: Let constant determined by
(
is a and
). If exists, then
by
(23) It follows from the inclusion–exclusion principle that (24)
Proof: Observe that . Therefore,
iff
Therefore, we have the following lemma on the asymptotic value of .
. Also (21)
In Lemma 24 in Appendix I, we show that for , if then
(
Lemma 11: Let determined by ). If
, In order to calculate
, we need to evaluate for
is a constant exists, then
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
the probability that the subgraphs induced by are must fall under one of the folcycles. Note that lowing scenarios. Case 1: s.t. : If then . Therefore, (25) Case 2: , and intersect nontrivially, i.e., , , and are nonempty. Since the . Here union of stopping sets is a stopping set, . Moreover, both and are cycles. So it follows that there are at least two check nodes with degree . Therefore, . at least in Using this fact we upper-bound the contribution of nondegento (23) as follows: erate
(26)
939
implies Recall that mial characteristic functions, we have
. From polyno-
Given that , to make sure that is a cycle, edges connecting ’s variable sockets to ’s check sockets have to be added in such a way that no cycle is formed until the last edge is added. Incorporating this probability we get
(31) where and
, for example, . Now
(27)
Instead of calculating the conditional probabilities in the product. we will show in Lemma 23 in Appendix H that (28)
(29) so that where in (26),
is the collection of all sequences such that , and intersect nontrivially. In (27), is an upper bound on the number of of size or less satisfying sets with . Applying Lemma 24, we get (28). Equation (29) follows from (22). , and are disjoint, i.e., Case 3: . Let
(32) For
, let
When and tend to infinity and tends to , tends to a modified geometric sequence with the leading term If , , tend to infinity and asymptotically only contribute to , i.e.,
tends to zero, then corresponding to Case 3
(33) The consecutive term ratio can be calculated by using (77) in the proof of Lemma 23 in Appendix H
(30) where is the collection of all sequences such that , contribution, we first evaluate
. To calculate this .
(34)
940
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
Using
, we can rewrite (32) as
Note that
determine
, and also
(35) Using these arguments in (36), we obtain Substituting in (30), we see that as , , tends to
tend to infinity and
(36)
where
is the collection of all sequences such that , and intersect is the size of . Recall that we denote all nontrivially, and the nonempty subsets of of size by where
(37)
Now we calculate the asymptotic value of Lemma 12: If
.
, then
Proof: Define Let denote the number of sets of size in Obviously, . Given a sequence denote the set of all sequences for all that
. , let such . Then
Then (37) can be rewritten as Let
denote that subset of . Then
where as follows:
for
in which
for all (38)
We prove that as long as
converges absolutely for all
Here expansion of bounded when
denotes the coefficient of in the Taylor . It follows from (34) that is is sufficiently small. Hence,
and
. Therefore, we can bound
Using the fact that
we have
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
941
Therefore,
Combining (38) and (39), we have (39)
Let
It follows from [14, Theorem 8.2] that, if then . Here we let
converges, Now we are ready to prove the main result of the section, a lower bound on the stopping number of code ensembles after expurgation of codes with small stopping sets. Note that further work is still required on the upper bound (see [15]). Theorem 13: For the ensemble , then
, if
Proof: It follows from Theorem 7 that if then . By the definition of , for . and , stopping sets of size where For any make an exponentially small contribution to , as shown in Theorem 8. Hence, it suffices to prove that for all sufficiently small Since
for all
and , as long as
,
we have
Let , , tend to infinity and verified that this implies that
tend to . It can be . Therefore,
The union bound is not sufficiently tight to prove this result. As shown in Lemma 22 in Appendix G, using the union bound only , then for all sufficiently yields the result that if large
Instead, we build on the bound used in the proof of Lemma 22 to prove the theorem. Recall that
In addition, we define the following quantities:
From (33) and (34), we have
and for
It is easy to see that Hence, (40) In the proof of Lemma 22 we show that if
, then
942
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
It can also be shown that for all sufficiently small and large
and
For , it was shown in [8], [9] that almost all . Since the stopcodes have minimum distance at most ping number is bounded above by the minimum distance, a similar statement holds for stopping number as well. The following result first appeared in [13]. , if
Theorem 15: [13] For the ensemble , , s.t. Hence, if
, then where , is an appropriate constant. Proof: From (1) and from [8, Theorem 1], we obtain
If
and
tend to infinity and
tends to , then V. AVERAGE BLOCK ERROR PROBABILITY
If
exists, then from (40) (41)
Let
and
If Lemmas 10 and 11 that
exists, then it follows from
Moreover, in Lemma 12, we show that if
It can be verified that which, in turn, implies that
, then
implies that . Hence,
The stopping set distribution can be used to obtain bounds on the average block error probability of iterative decoding over the erasure channel, where the average is taken over all codes from Tanner-graph ensembles. In this section, we define a quantity , which lower-bounds the threshold for the error floor, i.e., , the block error when the channel erasure probability , then for all , probability varies slowly with . If the contribution of stopping sets of size to the average block error probability vanishes exponentially when compared to the contribution of stopping sets of smaller sizes. Let the block error probability of an LPDC code under iterative decoding over an erasure channel with erasure proba. The average block error probability over a bility be . This secTanner-graph ensemble is denoted by tion contains two theorems. Both concern the asymptotic value . For the ensemble , define of
Let be the smallest variable degree of the ensemble. and , In Theorem 16, we show that if decreases like for a fixed and like when tends to zero. When , the error probability hardly decreases as increases. In Theorem 17, we use techniques developed in Section IV to obtain the exact asymptotic average block error probability for when . Theorem 16: For the ensemble , if , then in the limit of large
which proves the theorem.
with
A similar result holds if we consider minimum distance instead of stopping number. Corollary 14: For the ensemble , then
, if
Proof: For sufficiently small , the probability that a check node is connected to a set of size via more than two edges is negligible compared to the probability that it is connected to via two edges. Hence, the proof is similar to the proof of Theorem 13 with a few minor modifications.
and in the limits of small and large
Proof: Let the erasure set denote the nonempty set of variable nodes corresponding to the erased bits. For any LDPC contains a code , iterative decoding fails if and only if nonempty stopping set [2], i.e.,
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
We bound from above by considering stopping sets and of different sizes. For any
943
By the Chernoff bound [16]
Therefore, for any and and (42) Clearly
(46) and by the Chernoff bound [16] If
, then
Therefore, the third term in (42) vanishes exponentially. Taking expectation on both sides of (42) over the ensemble For a given set and , there are ways of choosing a set that contains . Therefore, the number of ways of that contains the set of size can choosing a typical set be upper-bounded as
and (43) denote the set of degree- variLet able nodes. We can bound from below by only of size and using the considering the stopping sets inclusion–exclusion principle as follows:
(44)
Using techniques similar to those used in proving Theorem 8, it can be shown that both the first term in the right-hand side (RHS) of (43), and the RHS of (44) have the asymptotic form for fixed and when tends to . What is left to show is that the second term in (43) vanishes exponentially if . Let
Hence,
Substituting this into (46) and taking expectation on both sides, and sufficiently large we have that for any
Let
and
and If we let We bound using typical sets. For an erasure probability , let denote the collection of typical erasure sets. More precisely, if (45)
and
approach , we get
If (47)
944
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
then will vanishes exponentially. The theorem is proved after we simplify the condition in (47) to
as follows. In the proof of Theorem 7, we see that if sufficiently small
and if
Expanding
, then for
, then for sufficiently small
for small , we obtain
Therefore, we can set the interval is continuous, we can let
to . Thus,
. Since
The theorem follows.
as given by the above theorem. In the above plot, the average block error probability for finite block lengths was obtained via finite-length analysis [5], [7]. Although the preceding theorem derives the exact asymptotic , it appears (e.g., average block error probability only for , the threshold Fig. 2) that the result also holds for up to for achieving asymptotically small bit error probability. Hence, . Formally we we conjecture that Theorem 17 holds for conjecture that, for Tanner-graph ensembles with • If , then
•
For a given ensemble, can be calculated via numerical with , the threshold for achieving methods. We compare asymptotically small bit error probability for some regular en, , and sembles [3]. For . For , , and . , by Lemma 9 there are stopping sets of When size with constant probability. Therefore, for a fixed , the error probability hardly decreases as increases. By using the techniques employed in Theorem 13, we derive the exact asymptotic average block error probability for irregular ensemwith , bles Theorem 17: For the ensemble , if and , then
the ensembles have a constant average error floor, however for individual codes, their block error floor may approach zero. In Fig. 2, we plot the average block error probability of the against the channel eracycle code ensemble sure probability for increasing block lengths. Observe that the plots approach the asymptotic value
because iterative decoding can recover almost all the erased bits and only “subliner size” stopping sets contribute to the block error probability. , then If
jumps to because, with high probability, iterative decoding cannot recover all the erased bits. APPENDIX A PROOF OF LEMMA 4 Note that
with
Proof: Most of the proof is a repetition of the proofs of Theorems 16 and 13. So we provide only a brief outline. As in Theorem 16, we separate the contribution of stopping sets based on to the average block error probability , stopping sets of size make their size. For , stopping sets an exponentially small contribution. If whose size falls in the interval also make an exponentially small contribution. As in Theorem 13, the con, to tribution of stopping sets with size is dominated by stopping sets that are subsets of and whose constellations belong to . (The quantities and were defined in Section IV.) By using the inclusion–exclusion principle, we can then show that the asymptotic average block probif ability is Note that the inequality is usually called stability condition [17] and the theorem predicts that when ,
where
, . Clearly
,
where the same constraints hold for , then
where
, and
. Define
satisfy (48)
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
0 p1 0 3. Note that
Fig. 2. 1
Average block error probability E = 0:326 <
Clearly, and exponent, hence, we obtain that
(P
945
(C; )) as function of erasure probability . As n increases, it approaches the asymptotic value
= 0:333.
do not affect the asymptotic
Case 2:
and the equations , have only one solution in . Then the solution must have one of the following two forms:
or
with where Since there exists , , and
satisfy (48). satisfying the integer constraints: , and
the equations , have at least one nonnegative solution. Hence, the equations have at least one solution in . Based on the value of and the number of solutions for (48), we discuss three cases. . The equations , Case 1: must has one and only one solution , with . Then
and
. Consequently
If the solution takes any other form, then
always has solutions of the form and Setting to be sufficiently small, we can adjust the three , , and by small amounts , , coordinates , respectively, to obtain other solutions in , and thereby contradicting the assumption of a unique solution to (48). and the equations and Case 3: have more than one solution in . We
946
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
define
to be the set of all the solutions of the two equations in . Let be two distinct and , they solutions. To satisfy must differ in at least three coordinates: , , for . Note that any point on and and is the line segment connecting a solution of the two linear equations and furthermore at least . In particular, this is true for three of its coordinates are in the midpoint of the line segment . As in Case 2, we can adjust its coordinates such that all of them fall within . Therefore, there exists at least one point in , which we , that belongs to the set . denote by denote the point where atLet or tains its maximum value in . It can be proved that for all . Otherwise, for sufficiently small
Let
Without loss of generality, assume . Then at most one can be zero and as increases its sign can change at most once. If the sign changes, it can only change from negative to positive. , , and From (51) there must exist such that for all , . Multiplying (51) by and subtracting from for all . But (52), we obtain
which is a contradiction. Therefore, the solution and (50) is unique.
of (49)
and it can be shown that APPENDIX B PROOF FOR COROLLARY 7 thereby contradicting the assumption that corresponds to the maximum point. Therefore, the maximum point can be obtained by using Lagrange multipliers to maximize the under constraints concave function and . Using Lagrange multipliers we obtain
where
and
Therefore,
Recall that and . Note if for all sufficiently small and if for all sufficiently small . So we want to evaluate for small . Recall that
satisfy
and
are the positive solutions for
As shown for the case of regular code ensembles in Lemma 1, there exists such that
(49) and (50) we obtain . Substituting in Finally, we prove that the positive solution for (49) . Let and (50) is unique. We need to only consider and be two distinct positive solutions. If then which contradicts our assumption of distinct by (49), solutions. Therefore, . Since both and satisfy (49) and (50), we obtain (51)
For any , we can decompose and which are defined as
into two parts
and
Following a line of reasoning similar to the proof of Lemma 4, we obtain that
and (52)
(53)
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
Let
be the values of
maximizing
. Then
947
It can be shown that when maximizing
, the value of in (53) are
(54) and for
By Lemma 3
Substituting into (54), we obtain (57) where
Similar to (56)
is the only positive solution of
For small ,
From (57) and (55), we have
can be written as
Using this fact, it can be shown that Substituting this back into the expression for as , we obtain that serving that
and obTherefore, if (including the case where ), for sufficiently small , hence, . Conversely, if , for sufficiently . small , hence, APPENDIX C Lemma 18: For every
(55) Depending on the value of , we have two cases. Case 1: . Note that . From (53) and (55), for sufficiently small
Case 2:
. Let
be the value of maximizing . Clearly, is not less than . It follows from (53) that
Proof: We present the proof from the viewpoint of check nodes of degree . Then Tanner graphs that have is the number of stopping constellations with check-node sockets. We upper-bound this number. First, order the check nodes and their sockets as follows:
where the subscripts are the check-node indices and the superscripts are the socket indices in a check node. Select out of the sockets. Each selection corresponds to a . Every subsequence can length- subsequence be represented by the following pair: a check set given by and a socket sequence given by
where is one if that we assume that (56) Substituting (55), we have
tween the selected sockets the distance is given by distance is given by bound because, when
and
and zero otherwise. Note . The socket sequence describes the distance beand . If , . If , the .( is a tighter , and .
948
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
Otherwise, only one of the sockets is selected from either or .) gives the The expression number of ways of selecting check-node sockets from sockets such that no check node is selected exactly once. This number can be bounded above by counting the number of valid check set–socket sequence pairs. Since each selected check node contributes at least two and and at at most sockets, the size of the check set is at least . The number of ways of choosing such a check set is most
Since there is a one-to-one correspondence between every and vice versa, (58) holds. When is conversion of to sufficiently small, we have (59). APPENDIX E Lemma 20: Let
,
, and
. Let
Then
The socket sequences take the values from . Therefore, we can upper-bound the number of subsequences by
The lemma follows. APPENDIX D Lemma 19: Let
,
, and
. Let
Then (58)
Proof: The method is similar to the proof of Lemma 19. We present the proof from the viewpoint of Tanner graphs that variable nodes of degree for . There have ways to choose a variable-node subset with nodes are and sockets. We denote the collection of all such sets . We relate to as follows. For any by , there are at most ways to add a degree- variable . Therefore, there node such that becomes are at most possible conversions from to . must have at In the reverse direction, any degree- variable nodes and there are at least least ways to delete a degree- variable node. Therefore, there are at least possible conversions from to . Since there is a one-to-one correspondence between every and vice versa, the lemma holds. conversion of to
and for sufficiently small
APPENDIX F (59)
Proof: We present the proof from the viewpoint of Tanner variable nodes of degree for . graphs that have ways to choose a variable-node subset with There are nodes and sockets. We denote the collection of all . We relate to as follows. Since such sets by has nodes and sockets, if we replace every one of its variable nodes of degree with a new variable node of degree , we obtain a set . There are at most ways to perform this replacement. Here upper-bounds the number of choices for the new node. Therepossible conversions fore, there are at most to . from In the reverse direction, for every set there are at least ways to replace a variable node of degree with a new node of degree . Here lower-bounds the number of variable nodes of degree in and lower-bounds the number of choices for the new node. Therefore, there are at least possible conversions from to .
Lemma 21: Let
,
, and
. Let
When
tends to infinity and
tends to , we have
where if is even if is odd. Proof: We present the proof from the viewpoint of Tanner graphs that have check nodes of degree for . stopping constellations that have There are sockets. We denote the collection of all such stopping constellations by . For any , let denote the number of check nodes in that have degree in the original ways to choose Tanner graph. Then, there are
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
a new check node of degree and add two of its sockets to . Therefore, there are obtain
949
to
possible conversions from to that add a new node of induced degree . , let denote In the reverse direction, for any and degree the number of check nodes that have degree in in the original Tanner graph. Then there are ways to obtain a new to delete a check node that has degree in constellation . Therefore, there are
Fig. 3. To bound the summation of P over the triangular region area 1 v n; 0 w (c 2)v , we divide it into three regions.
f
0
g
to . Therefore, we only need to prove that for all sufficiently small and large (63) By the union bound
conversions from to that subtract a node of induced degree . Since there is a one-to-one correspondence between every and vice versa, we have conversion of to
For convenience, let
(60) With every stopping constellation , one can associate a vector . As in the case of regular Tanner graphs (see Lemma 1), when tends to infinity, for most of the constellations the vector concentrates around some specific value say . When tends to infinity and tends to , it can be shown that
and
Then
We want to bound the sum of triangular region given by
Therefore,
over the two-dimensional
(61) To do so we divide the set into three regions (Fig. 3). For ease of exposition, we will assume although our proof can be easily extended to more general cases. , and , we have . Since When
Under the same limit, the LHS of (60) reduces to
(62) The theorem follows from (60)–(62). APPENDIX G Lemma
22: For the , then
ensemble ,
,
if
applying Lemmas 19 and 21 in Appendices D and F, we have that when
such that
Proof: Recall that for . Similar to and , the stopping sets of sizes Theorem 8, for all and make an exponentially small contribution between
(64) In region , (for simplicity, we assume integer), (64) reduces to
to be an
(65)
950
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
where the first inequality follows from
. Therefore,
Equation (64) also helps us bound the summation of region . Let be small and be large enough such that , we have that when For
(66) over .
We will show that in expression (71) all the terms other than vanish to at least as fast as . (The , the probability sum of these vanishing terms upper-bounds that there exists a stopping set of size between and , with incident edges. Hence, approaches at least more than .) as fast as along the three lines We bound the summation of
(67) By setting as follows:
, we can bound the sum of
over region For , (65) shows that the sequence of odd terms and the secan be quence of even terms in upper-bounded by sequences whose ratio of consecutive terms . Therefore, when is (68)
Now we consider region . For Lemmas 20 and 21, when
,
, applying
(72) and , we choose to be an integer by setting which meets the constraint . We combine (67) and (69) to obtain for For lines
(69) . Since The last step holds as mation over region can be bounded as follows:
, the sumand
(70) Then, when Combing (66), (68), and (70), we get and
Since (65) also holds for
(73) and
(71)
(74)
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
951
Note that for a fixed , when
APPENDIX H Lemma 23: For the ensemble , let dethe collection of note the set of degree- variable nodes, all nonempty subgraphs which are cycles and, for any , the subgraph induced by . Then, for disjoint sets of size
Substituting (72)–(74) in (71) we obtain
Proof: Let at most we have
denote
for . Since there are edges incident on ,
(75) To bound this expression, we need to evaluate and It can be shown that when
and
.
and
Consider the two sequences
and and Hence, for sufficiently small , verge. In particular, the sum of first also converges. Hence,
and conterms in these two series where be shown that as
. As in Lemma 21 in Appendix F, it can tends to infinity
(76) and (77) and Using these facts in (75), we obtain
and as
The ratios between consecutive terms in the two sequences are almost equal, i.e., for
tends to infinity
Therefore, for sufficiently small
and large
and the leading terms of the sequences are also almost equal, i.e.,
952
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005
Hence, for
and
Moreover, for
Summing over all with
, we obtain that for any
(78) Combining these two facts and (31)
where is the number of edges. Recall that
. Define
For hence,
, there exists and , such that , For , we have for all , hence . Let denote the set of all vectors corresponding to and the set of all the vectors corresponding to . and Then, for every APPENDIX I Lemma 24: For the ensemble with largest denote the set of degree- variable check node degree , let the collecnodes, the collection of all stopping sets, and tion of stopping constellations which contain at least one check node with at least three sockets in the constellation. Then for with , if ( is a conany ), then stant determined by
(79) where . For
is determined by the degree distribution of size
and (80) where denotes the constellation induced by . denotes the degree of check node Proof: Recall that in constellation and the degree of in any Tanner graph in . With every constellation one can associate a vector
where denotes the number of check nodes , with and . Let denote the set of all stopping constellathe set of all stopping constellations with sockets. tions and , then for , and the seIf satisfies the constraints quence and . We use to denote the . For , set of all vectors corresponding to all denote the number of constellations whose vector is . let Then
Moreover, for any
Substituting (78)–(80) in the above, we obtain that if then
Using the loose upper bound trivial lower bound
Additionally, if
and the we obtain
tends to
It can be shown that when Since
, the lemma follows.
then
ORLITSKY et al.: STOPPING SET DISTRIBUTION OF LDPC CODE ENSEMBLES
ACKNOWLEDGMENT The authors would like to thank Rüdiger Urbanke for introducing them to LDPC codes and Changyan Di and Henry Pfister for helpful discussions.
REFERENCES [1] R. G. Gallager, Low-Density Parity Check Codes. Cambridge, MA: MIT Press, 1963. [2] M. G. Luby, M. Mitzemacher, M. A. Shokrollahi, and D. A. Spielman, “Efficeint erasure correcting codes,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 569–584, Feb. 2001. [3] T. J. Richarson and R. L. Urbanke, “Efficient encoding of low density parity check codes,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 638–656, Feb. 2001. [4] M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inf. Theory, vol. IT-27, no. 5, pp. 533–547, Sep. 1981. [5] C. Di, D. Proietti, I. E. Telatar, T. J. Richardson, and R. L. Urbanke, “Finite length analysis of low-density parity-check codes on the binary erasure channel,” IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1570–1579, Jun. 2002. [6] T. J. Richarson and R. L. Urbanke, “The capacity of low-density paritycheck codes under message-passing decoding,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 599–618, Feb. 2001.
953
[7] J. Zhang and A. Orlitsky, “Finite length analysis of LDPC codes with lareg left degrees,” in Proc. IEEE Int. Symp. Information Theory, Lausanne, Switzerland, Jun./Jul. 2002, p. 3. [8] C. Di, R. Urbanke, and T. Richardson, “Weight distribution: How deviant can you be?,” in Proc. IEEE Int. Symp. Information Theory, Washington, DC, Jun. 2001, p. 50. [9] , “Weight distribution of low-density parity-check codes,” IEEE Trans. Inf. Theory, submitted for publication. [10] S. Litsyn and V. Shevelev, “On ensembles of low-density parity-check codes: Asymptotic distance distributions,” IEEE Trans. Inf. Theory, vol. 48, no. 4, pp. 887–908, Apr. 2002. [11] , “Distance distributions in ensembles of irregular low-density parity-check codes,” IEEE Trans. Inf. Theory, vol. 49, no. 12, pp. 3140–3159, Dec. 2003. [12] D. Burshtein and G. Miller, “Asymptotic enumeration methods for analyzing LDPC codes,” IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 1115–1131, Jun. 2004. [13] T. J. Richarson, M. A. Shokrollahi, and R. L. Urbanke, “Finite length analysis of varous low-density parity-check ensembles for the binary erasure channel,” in Proc. IEEE Int. Symp. Information Theory, Lausanne, Switzerland, Jun./Jul. 2002, p. 1. [14] W. Rudin, Principles of Mathematical Analysis. New York: McGrawHill, 1976. [15] C. Di, A. Montanari, and R. Urbanke, “Weight distribtution of LDPC code ensembles: Combinatorics meets statistical physics,” in Proc. IEEE Int. Symp. Information Theory, Chicago, IL, Jun./Jul. 2004, p. 102. [16] A. Shwartz and A. Weiss, Large Deviations for Perfomrnace Analysis. London, U.K.: Chapman & Hall, 1995. [17] T. J. Richarson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001.