954
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 3, MAY 2005
Reduced Complexity Interleaver Growth Algorithm for Turbo Codes Fred Daneshgaran, Member, IEEE, and Massimiliano Laddomada, Member, IEEE
Abstract—This paper is focused on the problem of significantly reducing the complexity of the recursive interleaver growth algorithm (IGA) with the goal of extending the range of applicability of the algorithm to significantly larger interleavers for a given CPU time and processor. In particular, we present two novel modifications to IGA changing the complexity order of the algorithm from 4 2 max to max , present several further minor modifications reducing the CPU time albeit not fundamentally changing the complexity order, and present a mixed mode strategy that combines the results of complexity reduction techniques that do not alter the algorithm outcome itself, with a novel transposition value set cardinality constrained design that does modify the optimization results. The mixed strategy can be used to further extend the range of interleaver sizes by changing the complexity order from 2 max to ( max ) (i.e., linear in the interleaver size). Finally, we present optimized variable length interleavers for the Universal Mobile Telecommunications System (UMTS) and Consultative Committee for Space Data Systems (CCSDS) standards outperforming the best interleavers proposed in the literature. Index Terms—Complexity, interleavers, iterative algorithms, optimization, permutations, turbo codes.
I. INTRODUCTION
P
ARALLEL concatenated convolutional codes (PCCC) or turbo codes have recently become very popular for forward error correction largely due to their efficient suboptimal iterative decoding algorithm and their remarkable performance near the capacity limit [1]–[4]. An example two of a component recursive sysPCCC composed of two identical memory tematic convolutional (RSC) codes coupled by an interleaver of shown by the box labeled is depicted in Fig. 1. length For trellis termination of the RSC codes, we use the technique proposed in [4]. After trellis termination, the resulting code resembles a large block code and is denoted by . There is now a growing body of literature on interleaver design techniques for turbo codes (see, for instance, [1], [5]–[18]). In [1], we presented a systematic, recursive interleaver design algorithm producing interleavers specifically tailored to the constituent RSC codes of the construction. We may generically characterize such interleavers as distance spectrum optimized Manuscript received January 20, 2002; revised July 24, 2003, February 18, 2004; accepted March 12, 2004. The editor coordinating the review of this paper and approving it for publication is K. Narayanan. This work was supported in part by Euroconcepts S.r.l., in part by FIRB, and in part by the CAPANINA Project founded by the European Commission withing the VI framework program. F. Daneshgaran is with the Department of Electrical and Computer Engineering, California State University, Los Angeles, CA 90032 USA (e-mail:
[email protected]). M. Laddomada is with the Dipartimento di Elettronica, Politecnico di Torino, 10129 Torino, Italy. Digital Object Identifier 10.1109/TWC.2005.847094
Fig. 1. Example of a rate one-third PCCC employing two identical rate one-half memory = 2 RSC codes. The interleaver is represented by the permutation box 5.
(DSO) interleavers obtained via the application of the interleaver growth algorithm (IGA) [1]. The terminology is used to emphasize the fact that the goal of the interleaver design algorithm presented in [1] is to optimize the distance spectrum of . The main impact of such a dethe associated block code sign philosophy is improvement in the asymptotic performance of the code and lowering of the error floor typically observed in the bit-error rate (BER) and frame-error rate (FER) plots of the turbo codes. In effect, the optimization extends the operational region of the code since it is unlikely that a turbo code would be operated in the error floor region itself. Since the complexity of optimization of the interleaver to produce a code with the best distance spectrum is very large (factorial in the problem size), the IGA uses a greedy approach to produce suboptimal results with a reasonable level of design complexity. Hence, while the optimization problem is well defined, IGA only produces optimal results in the context of a greedy approach. The following are the main attributes of the IGA for interleaver design. 1) The DSO interleavers obtained using IGA in general result in ’s with better distance spectra compared to the other design techniques reported in the literature [9]. Furthermore, once the algorithm is initialized, the IGA results in a unique solution unlike the other design techniques which generally lead to a class of interleavers to choose from. 2) The IGA by virtue of its recursive nature (i.e., an interis constructed from one of length leaver of length ) leads to implicitly prunable interleavers in the sense that the interleaver construction is recursive and there are hardware efficient means of implementing variable length interleavers based on the results. 3) The interleavers designed using IGA are tailored to the specific RSC codes used in the construction of the PCCC. This is in sharp contrast to most other interleaver design
1536-1276/$20.00 © 2005 IEEE
DANESHGARAN AND LADDOMADA: REDUCED COMPLEXITY INTERLEAVER GROWTH ALGORITHM FOR TURBO CODES
techniques that are not per-se tailored to the RSC codes of the construction, if not through a trial-and-test process. This paper is focused on the problem of significantly reducing the complexity of IGA itself with the goal of extending the range of applicability of the algorithm to larger interleavers. In its original formulation, even though the algorithm complexity is polynomial, the complexity is so large that it severely limits the maximum interleaver length for a given CPU time and processor. Recently, we proposed a very efficient error pattern feedback for interleaver optimization significantly improving the performance of the resulting interleavers in terms of their distance spectra [19]. The immediate consequence of the use of error pattern feedback in the design is that for each interleaver of a given target length to be designed, the IGA must be run several times and not just once, further exasperating the complexity problem. The rest of the paper is organized as follows. In Section II, we provide a brief overview of the IGA and highlight the complexity order of the algorithm in its original formulation. Section III presents the core theoretical result of the paper where we present two novel modifications of the IGA changing the order to . of complexity of the algorithm from Furthermore, we present several additional useful modifications of the algorithm that do not change its operational behavior, but do speed up the algorithm although not affecting its complexity order. In Section IV we provide an overview of the delay constrained approach of limiting design complexity first introduced in [5], [20] and discuss its advantages and limitations in a cohesive framework using empirical observations obtained via simulations. Subsequently, we present a mixed mode strategy using a novel transposition value set cardinality constrained design to further extend the range of applicability of the algorithm to larger interleavers. Various tradeoffs between execution time and performance of the designed interleavers can be obtained by changing the cardinality limit. Finally, we present the results of interleaver design for turbo codes proposed for the Universal Mobile Telecommunications System (UMTS) and Consultative Committee for Space Data Systems (CCSDS) standards and present prunable interleavers for these standards that to the best of our knowledge, outperform the best interleavers proposed in the literature. Conclusions are presented in Section V.
955
input symbol is carried to the th position at the output. It is a basic result in group theory [21] that any permutation on a set of elements can be written as a product of disjoint cycles, and may be divided into disjoint subsets such that each cycle operates on a different subset. A cycle of length two is called a transposition. It is easy to verify that any finite cycle can be written as a product of transpositions. Hence, we conclude that transpositions represent the elementary constituents of any permutation. The finite state permuter (FSP) introduced in [1], is a realization of an interleaver in the form of a sliding window transposition box of fixed length equal to the delay of the permutation it effectuates on its input sequence, and with the property that the transposition performed at a given time slot is responsible for the generation of the output at the same time slot. The operation of the FSP can be understood by thinking of the sliding window as a queue. To generate any possible sequence of outputs, it is sufficient to exchange the head of the queue with the element that is to be ejected at that time slot. Any permutation on a finite set of elements can be represented using a unique transposition vector associated with its FSP realization. As an example consider the permutation
Consider the queue model of the FSP and assume that data enters from left to right. Let us label the transpositions to be performed sequentially with the head of the queue to generate the desired outputs, using positive integers. Let the integer 1 denote the case whereby no transposition is performed with the head of the queue and the element at the head of the queue is simply ejected. Then, the transposition vector (to be read from left to is . right) that fully defines permutation Take the binary sequence 10 110 labeled from left to right. Permutation maps this sequence to 00 111. Consider a new transassociated position vector with the permutation
II. OVERVIEW OF THE IGA AND ITS COMPLEXITY This section summarizes the main results of [1] on the IGA algorithm. For the sake of brevity, we shall keep the description of the IGA to the minimum, and invite the interested reader to review [1] for many theoretical details that are omitted here. The basic understanding of the IGA algorithm presented below requires representation of the permutations using their transposition vectors introduced in [1]. It is the transposition vector of a permutation and not directly the permutation itself that is iteratively grown to the desired size. A. Transposition Vector of a Permutation . A given Consider an indexed set of elements interleaver performs a particular permutation of this set of elements. The permutation acts on the indexes of the elements. is used to mean that the th Henceforth, the notation
Note that looks quite different from , yet its output corresponding to the shifted sequence 010 110 is 001 110 which is a shifted version of the output generated by . In essence, the and using the transposition vectors predescriptions of serves information about the prefix symbol substitution property of these two permutations on error patterns whereby the first transposition exchanges a zero with a zero, or a one with a one. Any permutation on elements uniquely defines a transposition vector of size . Conversely, any transposition vector of size , defines a unique permutation on elements. Note that when synthesizing a permutation using the transposition vector , the th element of vector , when scanned from right to left, . In the interleaver can only assume values in the set design algorithm presented in [1], the interleaver is grown based on the minimization of a cost function related to the asymptotic BER (or FER) performance of the overall code. Hence, we shall
956
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 3, MAY 2005
first introduce this cost function, and then present the IGA algorithm.
. Hence,
ated with the inverse permutation denoted the global cost function using error patterns is
B. Cost Function and the Interleaver Growth Algorithm Let be a terminating error pattern of length and Hamming weight associated with the RSC code in a PCCC configuration. By a terminating error pattern, we mean a binary sequence at the input of the RSC that corresponds to a path in the trellis of the code diverging and reemerging with the all-zero is one such sequence for the path. As an example, turbo code presented in Fig. 1. Let the interleaver length be , and let denote the th phase of error pattern for that can appear within the interleaver span of length . For instance, for , , , . Then, all the phases of represent terminating error events prior to permutation and all have equal probability of occurring within the interleaver span. denote a transposition vector of length . Let Let the elementary cost function for the interleaver design based on just one phase of one error event be de, where the notation used noted by is a function of the transposition vector implies that and error pattern phase , both of which . We use the noare vectors whose size grows with tation to mean that the permutes the components of transposition vector generating an error pattern with the denote the RSC output same weight. Let , and set sequence corresponding to the input , where is the corresponds to Hamming weight of the argument. Here, the Hamming distance from the all-zero path of the path in . the trellis of the RSC corresponding to the input as We similarly define the Hamming weight of the output of the RSC receiving the unpermuted input. A suitable elementary cost function directly related to the upperbound on pairwise error event probability of the code is (1), located at the bottom of the page, where is a constant related to the code rate and the signal-to-noise ratio (SNR) . This elementary cost function can be used for optimization of the asymptotic BER of the code. If the goal is optimization of the asymptotic FER, we can from (1). simply drop the factor We assume that the overall cost function is additive in the phases of our error patterns. For a PCCC containing two identical RSCs and one interleaver [4], we define the total cost function as the sum of the cost function associated with the forward , and the cost function associpermutation denoted
(2) The expression in (1) clearly resembles the upperbound to the asymptotic BER of the code. The following are the operational steps of the IGA algorithm that aims at iteratively minimizing this cost function using a greedy search paradigm. 1) Set the iteration index which corresponds to the initial . Initialize the recursion by size of the interleaver to exhaustively solving for (3) for a manageable value of . Set and find 2) Let
. from (4)
and form . and iterate step-2 to grow the transpo3) Set . sition vector to the desired size We note the following. 1) Using IGA, one can neither establish the local nor global strict optimality of the solution at a given recursion step. To establish local optimality, one needs to define a notion of perturbation about a point in an appropriate sized space. If one takes the definition of a local perturbation of a permutation of length to be allowing the first element of the transposition vector to assume any of the possible values, while the rest of the transposition is fixed, then IGA is by definition locally optimal. This notion of local perturbation may indeed be justified in light of the prefix symbol substitution property referred to above. Under this notion of local perturbation, IGA is essentially a steepest decent type algorithm. 2) In many optimization algorithms using the greedy approach, the input vector is often of a fixed dimension. In IGA, the dimensionality of the input vectors (i.e., the transposition vectors) actually increase from one recursion to the next. Hence, the search is over an ever growing input space of higher and higher dimensions. 3) The core theoretical justification for the use of IGA are [1, Theorems 3.1 and 3.2]. These theorems demonstrate that the sequence of cost functions with respect to randomization of the first transposition with the rest of the transposition intact is asymptotically a Martingale [22]. Meaning
(1)
DANESHGARAN AND LADDOMADA: REDUCED COMPLEXITY INTERLEAVER GROWTH ALGORITHM FOR TURBO CODES
that for a sufficiently large interleaver length, even randomly growing the transposition vector one element at a time is a fair process. IGA does much better in that it does not randomly pick the transposition value at a given recursion step, indeed the transposition that gives the best value of the cost function at that step is selected. This leads to a deterministic process where on the average the value of the cost function should decrease. One cannot say the resulting process is a supermartingale since there is no randomness involved anymore, but the minimum of the cost function at each step beats the average unless the variance is zero and hence it is anticipated that IGA should perform better than a pure Martingale. This is as good as having a supermartingale. Indeed, simulation results exhibit an almost monotonic and rapid decrease in the value of the cost function as a function of the recursion step.
957
. For each possible value of , a global cost function is formed, and the transposition minimizing this cost , where is the minfunction is chosen to form imizing value of . The cost function itself is composed of elementary cost functions associated with how the transpositions affect the error patterns and their various phases. For now, let us concentrate on the forward component of the cost function. The arguments that follow, apply equally to the component of the cost function associated with the inverse permutation. Let be the error pattern matrix for error and the that was used during the minerror pattern matrix for error imization at iteration (we assume that data enters the RSC sequentially starting with its leftmost position)
(5)
C. Complexity of IGA in Original Formulation In [1], it is demonstrated that for a set of single error events, IGA has a polynomial complexity of order . This complexity measure is associated with the raw algorithm itself and does not account for the time required to obtain the codeword weights. This is because theoretically a very large look up table could be used to store the codeword weights (the input weight of the error patterns are often very limited). However, it is impractical to use huge look up tables for large interleavers. If we consider actually sequentially encoding the permuted sequence at the input of the RSC (the unpermuted sequence need not be encoded since it represents an error event with a well known weight at the output of the RSC encoder), the algorithm’s execution time complexity grows by one order (i.e., it becomes ). Unfortunately, for many interleaver sizes of practical interest (for information block lengths from several hundreds to several thousands), the algorithm complexity can still be quite high.
III. NOVEL MODIFICATIONS CHANGING COMPLEXITY ORDER OF IGA In this section, we present two novel modifications of IGA to that effectively change its complexity order from . In order to better help understand the first modification of the IGA that changes its complexity order, we shall use a very simple example. In particular, suppose our aim is to design an interleaver for the turbo code of Fig. 1. Suppose we have chosen two error patterns for the formulation of the cost function to use for interleaver design. Let these error patterns and . It can be easily verified that be both these sequences cause a divergence and reemergence from the all-zero path in the trellis of the constituent RSC code of the turbo code presented in Fig. 1. Suppose the interleaver has been grown to length 6 so what . We wish is currently available is the transposition vector to now enlarge the interleaver and grow it to length 7. To do this we need to examine all the possible values that the sevwhere enth transposition can assume and form
(6)
In transition from iteration 6 to iteration 7, one new phase of each error pattern must be additionally examined, so that and become
(7)
(8)
The last row of each matrix represent the new phase (first phase according to our previous definition) of the error pattern. Associated with each error pattern matrix, there is a cost vector that accounts for the contribution of a particular phase of a particular error pattern on the forward component of the global cost funcis the chosen transposition vector at iteration 6, tion. Since and the cost vectors associated with matrices and their permuted versions are minimal cost vectors at that iteration. Let us denote these cost vectors as (9) (10) denotes the elementary forward component of the where cost associated with the th phase of error pattern at iteration , and the symbol “ ” is used to highlight the minimal value of the cost which is associated with the transposition minimizing the global cost function at iteration . Suppose we are at iteration and wish to examine the cost vectors associated with the choice of the sevresulting in the transposition vector enth transposition
958
. Application of the first transposition and as follows: forms the matrices
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 3, MAY 2005
trans-
(11)
and lower RSC. If this encoding is performed sequentially, we operations to perform. This brings the have an additional to . complexity of the algorithm at iteration The global complexity if the goal is to grow the interleaver to is length
(18) (12)
where, the highlighted columns are the columns affected by the (note that in our queue model of the FSP, transposition data enters from left, and the head of the queue is at the right, an element is most position). After the application of and ejected (i.e., the last columns of matrices are ejected) and the rest of the transpositions are applied to the remaining columns of matrices defining and
(13)
The key observation that can be used to reduce the complexity order is as follows. Note that the highlighted rows of matrices and are the same as what is and . Furthermore, the first found in matrices for element that was ejected after the application of these rows were zeros. Since the beginning zeros injected into the RSC do not impact the values of the cost function associated with a given error event aside from a scaling effect (i.e., they just delay the start of an error event), we conclude that the elementary cost values associated with highlighted rows of and and their permuted versions must equal (19) (20)
(14)
After the application of , the rows of matrices and and the rows of the permuted matrices associated with are encoded by the RSCs, the application of and two cost vectors are generated (15) (16)
is a scaling factor where, the factor that arises from the definition of the elementary cost function. Hence, these values do not need to be recalculated at iteration . Indeed, it is noted that for any given error pattern, at it, the component of the cost due to the first phase eration phases must be computed and that there are at most , hence, the component that are affected by application of of the cost function for these phases must be recomputed. The component of the cost function associated with the unaffected phases of the error patterns need not be recomputed and may be obtained by scaling the optimal cost values available from is large, most phases of the error iteration . Indeed, when patterns are unaffected by any given choice of the first transposition. This suggests the following alternate expression:
The total cost associated with the forward component of the cost is then the sum of the entries function for the choice of and denoted in the vectors as
(21)
(17) where denotes a column vector of ones of appropriate size. First, consider the raw complexity of this algorithm which is effectively IGA in its original formulation. There are phases of to be examined at iteration for each error pattern. Let the number of error patterns used for the formulation of the cost function be . For large , . There are transpositions for each one of which we must generate cost vectors. Generation of each element of a bits at the inputs of the upper cost vector requires encoding
is the forward component of the cost function where . associated with the best transposition vector of length-6, This expression shows how the value of the cost function from a previous iteration is used in a new iteration. This modification reduces the complexity order when error events are sequento , or if table look tially encoded from to up is used (in practice, this is impractical), from . Having presented the crux of the argument through a simple example, the following is the formal description of the modified IGA.
DANESHGARAN AND LADDOMADA: REDUCED COMPLEXITY INTERLEAVER GROWTH ALGORITHM FOR TURBO CODES
959
1) Set the iteration index which corresponds to the initial . Initialize the recursion by size of the interleaver to exhaustively solving for (22)
for a manageable value of . Set and . set and 2) For each for all find the sets which are the sets of indexes of the affected phases of error pat(excluding the first phase) at iteration tern due to the application of the first transposition . Let denote the transposition vector of the inverse permutation associated with the permuta(the first transposition of the intion induced by verse permutation is denoted by ). For each and for all find the sets which are the sets of indexes of the affected phases of error pattern (excluding the first phase) at itdue to the application of the first transpoeration of the inverse permutation. We note in passing sition that for a fixed set of error patterns, the set of indexes noted here can be obtained analytically. set and 3) For each compute the components of the cost function for forward cost for the first phase, and for all the affected phases of all the error patterns (23) (24)
Fig. 2. State transition diagram of the RSC encoder of Fig. 1. Transitions are labeled by the input bit and the corresponding parity output bit.
(27)
where, and are the complement set of indexes. The following simple manipulations bring to light the iterative nature of the computation of the global cost function and the complexity dependence on the cardinality of the affected sets:
Similarly, compute the components of the cost function for reverse cost for the first phase, and for all the affected phases of all the error patterns (25) (26) 4) Compute the total cost associated with transposition as follows:
(28)
(29)
960
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 3, MAY 2005
CONSTANT
5) Find
m(r;
TABLE I
) AS A FUNCTION OF THE STARTING STATE AND REMAINDER r . THE TERMINAL STATE AFTER THE ENCODING OF THE ZERO RUN IS ALSO LISTED
from
is a constant that depends on the remainder and the starting state . Table I summarizes the information about this constant and the resulting terminal state after the run of zeros. As an example of application of this concept to determination of the output weight of the encoded sequence, suppose we have an input sequence de. What is the weight of the enscribed as coded parity bits at the output? The following transitions summarize how we may obtain this weight:
where (30)
and form . and iterate step-2 to grow the transposi6) Set . tion vector to the desired size Simulation results are provided later in the paper confirming the reduction in the complexity order of the IGA via the application of the proposed modification. Having reduced the complexity by one order, we present another novel modification that further reduces the complexity . by one order changing the complexity of IGA to We noted earlier that the permuted sequences associated with at iteration can be sequena given transposition vector tially encoded by RSC. This requires roughly operations. The key observation is that in the formulation of the elementary cost functions we do not need the actual sequences at the output of the RSC, we only need their weights. Since the weight of a given permuted sequence at the input of the RSC is often much smaller (i.e., ), the input sequence to the than RSC is largely composed of zeros. From an information theoretic point of view, such sequences have a very simple descripcan be tion. In particular, a given input sequence of weight described via the run-length of zeros without the need to even consider the run-length of ones. The run-length description is extremely useful since it allows us to compute weight of the corresponding sequence at the output of the RSC in a number of steps that is proportional to the input weight of the sequence independently of . Once again to better understand this concept consider the turbo code of Fig. 1. The constituent RSC of this turbo code has the state transition diagram, as shown in Fig. 2. The states of the RSC are identified by the contents of the memory cells in Fig. 1 whereby the most significant bit is taken to be associated with the rightmost cell. Since the RSC is systematic, we have only shown the input/output bit pair on each transition for the parity bits. In order to determine the weight of the output sequence associated with a given input sequence described using its zero run-length information, we need to know the parity contribution of a run of zeros from a given starting state denoted , and denote a run of the resulting terminal state denoted . Let zeros. Clearly, if the starting state is , the parity contribution is zero regardless of , and the terminal state is as well. Hence, consider other starting states. The formula for , that can be directly obthe parity weight denoted tained from the observation of the state transition diagram is (31)
(32) Each transition is marked by a starting state and a terminal state which becomes the starting state of the next transition below it. The transition itself is labeled by input-bit/output-weight of the transition. When there are zero runs, (31) in conjunction with Table I is used to calculate the parity weight. Note that the end state associated with this sequence is not the all-zero state. However, at the end of the data block the trellis is terminated and the weight of the output during termination can be easily obtained . It is evident from this simple from the knowledge of example, that computation of the weight of the output sequence has a complexity that is on the order of the input weight independent of the sequence length. Next, is the question of how do we obtain a description of in terms of the zero run-lengths? We the permuted sequence need to demonstrate that this operation too can be performed with a complexity on the order of the sequence weight independently of its length. To this end, suppose we are at iteration and are examining . Consider an of weight . Since affected phase of an error pattern we are looking at the affected phase, by definition the first element ejected by the FSP is a one. Hence, one bit is mapped ones to position one at the output. There remains whose permuted positions we need to know. Note that after the , is applied to the remainder of the seapplication of quence. If at the end of iteration , the transposition vector is converted to the permutation vector , then knowing the ones at the input, we can simply locations of the read off their permuted locations. Let the permuted positions . Upon sorting the permuted be positions of ones at the output, we easily obtain the description of the permuted sequence in terms of the run-length of zeros as
DANESHGARAN AND LADDOMADA: REDUCED COMPLEXITY INTERLEAVER GROWTH ALGORITHM FOR TURBO CODES
in the example above. Once again given , all the operations involved have a complexity on the order of sequence weights into which has dependent of its length. Conversion from a complexity on the order of sequence length is performed once per iteration, hence overall this operation leads to a quadratic . complexity on final interleaver length Two additional minor modifications that can be used to speed up the algorithm are: 1) using the description of a trellis section in matrix form for encoding rather than literally sequentially encoding bits based on the RSC implementation model; this trivial modification can significantly enhance the processing speed since it reduces the encoding process to table look up (i.e., memory access operations); and 2) embedding a stopping criterion in computation of the sequence weights when they become too large. As an example of application of the modifications and the resulting impact on complexity, consider the interleaver design problem for a turbo code with the following specifications: 1)
16-state ator
RSC
constituent
codes
with
gener-
; 46 single error patterns (i.e., sequences that cause a divergence and reemergence in the trellis of the RSC code once) used for the formulation of the global cost ). function (i.e., Fig. 3 depicts the actual CPU time required to design the interleaver using: a) the original IGA algorithm with sequential encoding of the permuted error patterns whereby the fourth power dependence of required CPU time on the interleaver length is evident; b) the IGA algorithm using the first modification whereby the third power dependence of the required CPU time on the interleaver length is evident; and c) the IGA algorithm using both modifications changing the complexity order and CPU execution time to have a quadratic dependence on the interleaver length. In cases (b) and (c), we have used the minor modifications for algorithm speed-up referenced earlier. To confirm that the algorithms practically produce identical results, the left-most curve in Fig. 5 depicts the global cost as a function of the interleaver length whereby the curves of the cost functions for all three cases coincide. We have additionally examined the resulting transposition vectors in all three cases and have confirmed that they too are identical. For the turbo code specified earlier, we have designed an interleaver of length 1280 using the IGA with both modifications and employing error pattern feedback [19]. Table II provides and in the a summary of the results. The parameters table denote the starting and final interleaver lengths during a given feedback iteration, the number of added error patterns used during the design and the total number of error patterns are also listed in the table. The last column provides information about the actual values of the free distance, its multiplicity and input information weight for the designed turbo code. To the best of our knowledge, this turbo code has the highest minimum distance with the lowest multiplicity and the lowest Hamming weight of the input associated with the minimum distance ever reported in the literature for the specified interleaver length and constituent code. As a comparison, for the same turbo code configuration, in [23] the authors obtained 2)
961
Fig. 3. Interleaver growth required CPU time versus length for three cases: original IGA algorithm (left-most curve); IGA with the first modification (second curve from left); and IGA with both modifications (right-most curve).
where, are the best and free distance and cumulative input weight over 10 000 random interleavers and are the mean and variance of the free distance over the test ensemble. In the same paper, the authors reported on the free distance of the CCSDS 16-state turbo code . with interleaver length equal to 1784 and code rate The reported values are . Note that with an interleaver length we achieve equal of 1280 which correspond to lower end-to-end delay. As a second example, consider the interleaver design for the UMTS standard. We have designed an interleaver of length 1280 using IGA with both modifications and with error pattern feedback. The interleaver is implicitly prunable and given its iterative construction, exhibits excellent distance spectra at various block lengths. In particular, in Table III we report the following: 1) free distance of our pruned interleavers at block lengths of interest for the UMTS standard (second column); 2) to the best of our knowledge, the best free distance and the associated input weight term reported in the literature [24] for the eight-state turbo code at the noted block length (third column); 3) the mean value and variance of the free distance obtained from examination of 10 000 randomly chosen interleavers as reported in [24] (fourth column); and 4) the minimum distance and input Hamming weight of the turbo code for the UMTS standard and the minimum distance and input Hamming weight of the UMTS turbo code employing the best spread interleaver found over an ensemble of 1000 spread interleavers as reported in [23] (fifth column). IV. DELAY CONSTRAINED AND TRANSPOSITION VALUE SET CARDINALITY LIMITED DESIGNS In [5] and [20], the authors presented a simple method of limiting interleaving delay during the design process. It is shown in [1] that the interleaver delay in its FSP realization is related to the maximum element of the transposition vector of the permutation it implements. In particular, for a transposition vector
962
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 3, MAY 2005
TABLE II SUMMARY OF THE RESULTS OF INTERLEAVER DESIGN EMPLOYING ERROR PATTERN FEEDBACK FOR THE TURBO CODE EMPLOYING 16 STATE IDENTICAL RSCS WITH GENERATOR G(D) = [1; (1 + D + D + D + D )=(1 + D + D )]
TABLE III PERFORMANCE COMPARISON OF OUR IMPLICITLY PRUNABLE UMTS INTERLEAVER DESIGNED USING THE MODIFIED IGA WITH ERROR PATTERN FEEDBACK WITH RESULTS REPORTED IN THE LITERATURE. THE UMTS TURBO CODE EMPLOYS 8 STATE CONSTITUENT RSC CODES WITH GENERATOR G(D ) = [1; (1 + D + D )=(1 + D + D )] AND VARIABLE BLOCK LENGTH INTERLEAVER
of size
denoted , the interleaving delay is . This observation suggests that a simple method of limiting the delay of an interleaver independently of its length is that of putting an upper limit on the maximum value of the transposition that may be examined during the iterleaver growth process. In doing so, not only the delay of the resulting interleavers is limited, obviously the complexity of the growth algorithm itself is affected as well. Here is a summary of the delay constrained design presented in [5] and [20]. 1) Grow the transposition vector from some initial length to some desired delay limited threshold using IGA. 2) From iteration to , the upper limit on the value of the transposition to be examined . Experimental results at each iteration is set to presented in [5] and [20] suggest that good results in terms of the performance of the resulting interleavers . are obtained if The complexity of the resulting algorithm using IGA in its original formulation is
(33) Hence, for fixed , the algorithm complexity becomes cubic in . If, on the other hand, we use the modified IGA presented in this paper the algorithm complexity becomes
(34)
Fig. 4.Cost function saturation effect observed in delay constrained designs. The cost function asymptotes match very well the contribution of the internal minimum distance terms which saturate. The straight lines depict this asymptote estimated from the contribution of the internal minimum distance term.
is the maximum Hamming weight of the error where patterns used for the formulation of the cost function. Indeed, imposition of a delay constraint renders the resulting algorithm complexity ultimately linear in the final interleaver size. One major drawback of the delay constraint designs however, is the global cost function saturation effect that is observed . An example of this effect we have very shortly after observed in numerous simulations is depicted in Fig. 4. This example is associated with the turbo code with the specifications given at the end of the previous section. The cost function asymptotes can be very accurately estimated based on the internal minimum distance [1] of the code which saturates. In
DANESHGARAN AND LADDOMADA: REDUCED COMPLEXITY INTERLEAVER GROWTH ALGORITHM FOR TURBO CODES
Fig. 5. Plots of the global cost as a function of the interleaver length for transposition value set cardinality constrained designs. The left-most curve is for the unconstrained IGA in original and modified formulations (the cost function for original and modified versions of IGA coincide). There is a gradual increase in the value of the cost function as the cardinality is reduced. This gradual increase is due to the degradation of the internal distance spectrum.
effect, after the saturation, the interleaver performance as measured by its internal distance spectrum is not improving. This suggests that the delay constrained designs could have an inferior performance in comparison to the original formulation at lengths larger than the delay threshold. The main problem with the delay constrained designs leading to the saturation effect is that since the range of action of the transpositions are limited to the beginning of the FSP, error pattern traps are created. By error pattern traps we mean a scenario whereby error patterns simply leading to low distance codewords at iteration remain unaffected by future transpositions that never “hit” the buffer locations where the error patterns may exist. The observations made in connection with the limiting behavior of the delay constrained designs suggest that it is imporbe able to hit locatant that at iteration , the transposition tions throughout the FSP buffer. Given this fact, one method of implementing the modified IGA in order to render the algorithm , is complexity ultimately linear in the maximum length that of random selection of transpositions to be examined at with a uniform distribution throughout the iteration range of permitted transpositions at that iteration, namely, uni. Indeed, in our experimental form in the range results this technique has been quite successful and no saturation effect of the cost function or the internal minimum distance has been observed. An example of application of this technique is depicted in Figs. 5 and 6. The turbo code specification for the designs are the same as those reported above (i.e., 16-state RSCs and 46 error patterns). Fig. 5 depicts the value of the global cost function as a function of the interleaver length for various values of . It is evident that there is performance degradation in comparison to the original IGA formulation as we use various transposition value set cardinality limits , with a desirable gradual is decreased. Fig. 6, performance degradation as the value of depicts the plots of actual CPU time required for the interleaver growth as a function of the interleaver length and cardinality
963
Fig. 6. Complexity as measured by the CPU time versus interleaver length for the IGA with both modifications presented in the paper (left-most curve) and transposition value set cardinality limited designs. For the latter, as the cardinality is reduced, the execution time reduces.
. It is evident from the figures, that we may obtain limits a performance versus execution speed tradeoffs for the design , the of interleavers for turbo codes. Note that regardless of modified IGA with transposition value set cardinality limits, ultimately exhibit a linear behavior in the interleaver length. V. CONCLUSION In this paper, we have presented two novel modifications of an original algorithm developed in [1] for systematic design of implicitly prunable interleavers tailored to the constituent RSCs of the turbo code. The modifications change the complexity order to where of the interleaver design from is the target interleaver length to be designed. These modifications in addition to several minor modifications presented in the paper significantly extend the range of applicability of IGA to larger interleavers. It can be reasonably assumed that the modified IGA can be used to design interleavers in the range from several hundreds to several thousands within very reasonable CPU times. Finally, we have presented a novel yet very effective modification inspired by our observations of the behavior of delay constrained interleaver designs presented in [5], [20], that ultimately changes the complexity of the interleaver design algorithm to linear in the interleaver length. ACKNOWLEDGMENT The authors would like to thank the Associate Editor and the anonymous reviewers for many useful suggestions that have improved the quality of our paper. REFERENCES [1] F. Daneshgaran and M. Mondin, “Design of interleavers for turbo codes: Iterative interleaver growth algorithms of polynomial complexity,” IEEE Trans. Inf. Theory, vol. 45, no. 6, pp. 1845–1859, Sep. 1999. [2] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes,” in Proc. IEEE Int. Conf. Communications, Geneva, Switzerland, May 1993, pp. 1064–1070.
964
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 3, MAY 2005
[3] S. Benedetto and G. Montorsi, “Unveiling turbo codes: Some results on parallel concatenated coding schemes,” IEEE Trans. Inf. Theory, vol. 42, no. 2, pp. 409–428, Mar. 1996. [4] D. Divsalar and F. Pollara, “turbo codes for PCS applications,” in Proc. IEEE Int. Conf. Communications, Seattle, WA, May 1995, pp. 54–59. [5] M. Campanella, G. Garbo, and S. Mangione, “Simple method for limiting delay of optimized interleavers for turbo codes,” IEE Electron. Lett., vol. 36, no. 14, pp. 1216–1217, Jul. 2000. [6] J. Hokfelt, O. Edfors, and T. Maseng, “A turbo code interleaver design criterion based on the performance of iterative decoding,” IEEE Commun. Lett., vol. 5, no. 2, pp. 52–54, Feb. 2001. [7] S. Dolinar and D. Divsalar, “Weight distribution of turbo codes using random and nonrandom permutations,” Jet Propulsion Lab, Pasadena, CA, JPL TDA progress rep. 42-122, 1995. [8] S. Crozier, “New high-spread high-distance interleavers for turbo codes,” in Proc. 20th Biennial Symp. Communications, Kingston, ON, Canada, May 2000, pp. 3–7. [9] F. Daneshgaran and M. Mondin, “Optimized turbo codes for delay constrained applications,” IEEE Trans. Inf. Theory, vol. 48, no. 1, pp. 293–305, Jan. 2002. [10] F. Said, A. H. Aghvami, and W. G. Chambers, “Improving random interleaver for turbo codes,” IEE Electron. Lett., vol. 35, no. 25, pp. 2194–2195, Dec. 1999. [11] C. Fragouli and R. D. Wesel, “Semi-random interleaver design criteria,” in Proc. IEEE Global Communications Conf., vol. 5, Dec. 1999, pp. 2352–2356. [12] H. Herzberg, “Multilevel turbo coding with a short latency,” in Proc. IEEE Int. Symp. Information Theory, Ulm, Germany, Jun. 1997, pp. 112–112. [13] J. Hokfelt and T. Maseng, “Methodical interleaver design for turbo codes,” in Proc. Int. Symp. Turbo Codes Related Topics, Brest, France, Sep. 1997, pp. 212–215. [14] P. Robertson, “Illuminating the structure of code and decoder of parallel concatenated recursive systematic (turbo) codes,” in Proc. IEEE Global Communications Conf., Dec. 1994, pp. 1298–1303. [15] O. Takeshita and D. Costello Jr, “New classes of algebraic interleavers for turbo-codes,” in Proc. Int. Symp. Information Theory, Cambridge, MA, Aug. 1998, pp. 419–419. [16] K. Andrews, C. Heegard, and D. Kozen, “Interleaver design methods for turbo codes,” in Proc. Int. Symp. Information Theory, Cambridge, MA, Aug. 1998, pp. 420–420. [17] J. Yuan, B. Vucetic, and W. Feng, “Combined turbo codes and interleaver design,” IEEE Trans. Commun., vol. 47, no. 4, pp. 484–487, Apr. 1999. [18] H. Ogiwara and Y.-J. Wu, “Code matched interleaver for parallel concatenated trellis coded modulation,” in Proc. Int. Communications Conf. , New York, Apr. 28 May 2, 2002. [19] F. Daneshgaran, M. Laddomada, and M. Mondin, “Interleaver design for serially concatenated convolutional codes: Theory and application,” IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 1177–1188, Jun. 2004. [20] G. Garbo and S. Mangione, “Some results on delay constrained interleavers for turbo codes,” in Proc. SoftCOM, Workshop Channel Coding Techniques, Split-Dubrovnik (Croatia), Ancona-Bari, Italy, Oct. 2001, pp. 17–24. [21] H. Marshall Jr, The Theory of Groups, 2nd ed. New York: Chelsea, 1976. [22] P. A. Meyer, Martingales and Stochastic Integrals I. Berlin, Germany: Springer-Verlag, 1972. [23] R. Garello, P. Pierleoni, and S. Benedetto, “Computing the free distance of turbo codes and serially concatenated codes with interleavers: Algorithms and applications,” IEEE J. Sel. Areas Commun., vol. 19, no. 5, pp. 800–812, May 2001.
[24] R. Garello, F. Chiaraluce, P. Pierleoni, M. Scaloni, and S. Benedetto, “On error floor and free distance of turbo codes,” in Proc. Int. Conf. Communications, vol. 1, 2001, pp. 45–49.
Fred Daneshgaran (S’84–M’84) received the B.S. degree in electrical and mechanical engineering from California State University, Los Angeles (CSLA) in 1984, the M.S. degree in electrical engineering from CSLA in 1985, and the Ph.D. degree in electrical engineering from University of California, Los Angeles (UCLA), in 1992. From 1985 to 1987, he was an Instructor with the Department of Electrical and Computer Engineering (ECE) at CSLA. From 1987 to 1993, he was an Assistant Professor, from 1993 to 1996, he was an Associate Professor, and since 1997, he has been a full Professor with the ECE Department at CSLA. Since 1989, he has been the Chairman of the Communications Group of the ECE Department at CSLA. Additionally, from 1999 to 2001, he acted as the Chief Scientist for TechnoConcepts, Inc., where he directed the development of a prototype software-defined radio system, managed the hardware and software teams, and orchestrated the entire development process. In 2000, he co-founded EuroConcepts s.r.l., an R&D company specializing in the design of advanced communication links and software radio, where he is currently acting as the Chief Executive Officer and the Chief Technology Officer. In 1996, he founded Quantum Bit Communications, LLC, a consulting firm specializing in wireless communications. Since 1992, he has been a Research Consultant to the TLC group of the EE Department of the Politecnico di Torino, Torino, Italy, where he consulted on a variety of projects for both national and European-funded contracts, and conducted joint research on wavelets, coded modulation, channel coding, etc., with members of the Signal Analysis and Simulation (SAS) group. Dr. Daneshgaran is currently serving as the Associate Editor of the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS in the areas of modulation and coding, multirate and multicarrier communications, broadband wireless communications, and software radio.
Massimiliano Laddomada (S’00–M’04) was born in 1973. He received the degree in electronics engineering in 1999, and the Ph.D. degree in communications engineering in 2003, from the Politecnico di Torino, Torino, Italy. He is currently an Assistant Professor with the Politecnico di Torino. From June 2000 to March 2001, he was a Visiting Researcher at California State University, Los Angeles, and a Consultant Engineer with Technoconcepts, Inc., Los Angeles, CA, a start-up company specializing in software radio. His research is mainly in wireless communications, especially modulation and coding, including turbo codes and, more recently, networks coding. Dr. Laddomada is currently serving as a member of the Editorial Board of IEEE Communications Surveys and Tutorials. He was awarded a five-year openended fellowship by Ente per il Dirito all Studio (E.D.S.U.) in recognition of his university career as an electronics engineer. In 2003, he was awarded the Premio Zucca per l’Innovazione nell’ICT from Unione Industriale of Turin, Italy.