A novel duplex asynchronous data-transfer scheme based on multiple-valued
encoding is proposed for inter- leaving in Low-Density Parity-Check (LDPC) ...
Multiple-Valued Duplex Asynchronous Data Transfer Scheme for Interleaving in LDPC Decoders Naoya Onizawa , Akira Mochizuki , Takahiro Hanyu , and Vincent C. Gaudet Research Institute of Electrical Communication, Tohoku University 2-1-1 Katahira Aoba-ku, Sendai 980-8577, Japan E-mail : onizawa, pico, hanyu@ngc.riec.tohoku.ac.jp Department of Electrical and Computer Engineering 2nd Floor ECERF Building, University of Alberta, Edmonton, AB, T6G 2V4, Canada E-mail :
[email protected]
Abstract
Codes used in modern standards typically require about 1000 to 50000 vertices and 3000 to one million edges. In addition, the decoding algorithm can be easily performed in parallel by every vertex; fully parallel hardware implementations tend to be routing dominated, which limits throughput by available die area and the ability for computational kernels to transmit data to each other over the interleaver. Many iterative decoding algorithms, such as the LDPC decoding algorithm, have been successfully implemented in analog VLSI chips [4]-[7], where these implementations encode probabilities as currents, and require fewer wires per graphical edge. However, the only small codes have been implemented due to the limited operation speed. On the other hand, a fully parallel implementation of a length-1024 code [8] and serial implementations [9] as digital circuit approaches have been reported, which cause high routing congestion and lower speed operation, respectively. In the fully parallel implementation, clock speed and hence throughput are limited by the interleaver portion; higher speeds are prevented by clock-distribution and clock-skew issues. One approach to solving the problem is to utilize an asynchronous system which is not operated by the clock. This paper presents a high-speed duplex asynchronous communication scheme and its circuit implementation for high-speed data transfer between variable nodes and check nodes. Mutual control signals are implicit in the proposed encoding, which makes it possible to reduce the communication cycle time. Since both control signals and mutual data are merged into a single multi-level dual-rail codeword, the duplex communication can be realized without any additional wires. Although all the information is multiplexed in a multi-level dual-rail codeword, valid data-arrival states can be easily detected by observing the difference of the dual-rail codewords. A multiple-valued current-mode cir-
A novel duplex asynchronous data-transfer scheme based on multiple-valued encoding is proposed for interleaving in Low-Density Parity-Check (LDPC) decoders, where high-throughput interleavers between variable and check nodes without clock-distribution problems are highly advantageous. Since control signals and data from mutual nodes are multiplexed using a multi-level dual-rail codeword, the number of communication steps can be greatly reduced, which results in high-speed communication without any additional wires. The hardware is simply implemented by utilizing a multiple-valued current-mode circuit because all the information can be superposed on the same line. The advantages of the proposed asynchronous data-transfer scheme are discussed in comparison with corresponding synchronous and conventional asynchronous schemes.
1 Introduction Due to their capacity-approaching performance, Turbo codes [1], and Low-Density Parity-Check (LDPC) codes [2] have been incorporated into or proposed for numerous recent data communications standards such as DVBS2, IEEE 802.16, and 10GBASE-T. Decoding of capacityapproaching codes is typically performed by iteratively refining a posteriori probabilities on loopy graphs known as factor graphs [3], which describe the structure of a code. Iteratively decoding LDPC codes typically requires 30 to 100 iterations, in which each graphical vertex performs a number of computations, and in which each graphical edge transmits a message from one vertex to the other. The set of edges is often referred to as the permutation or interleaver. 1
Proceedings of the 35th International Symposium on Multiple-Valued Logic (ISMVL’05) 0195-623X/05 $ 20.00 IEEE
t
onto each of the node’s 3 edges. The message sent on a specific edge is equal to a soft XOR function of the incoming messages on the other three edges. In the soft XOR, the parity check is performed based on the probability of the incoming messages. The soft XOR function is described by Equation (2) as
P(X=1)
P(X=1) To check nodes
Channel output
To P(Y=1) variable nodes P(Z=1)
P(Z=1)
(b)
(c)
P(A=1): Probability of A which is a logic value “1”.
Figure 1. Bipartite graph: (a) Overall structure, (b) Variable node, and (c) Check node.
Fully parallel LDPC decoder architectures dedicate computational elements to each variable and check node, and dedicate wires to the specific connections in the factor graph so as to be able to perform decoding iterations within two clock cycles. For large codes, the routing congestion and the length of the longest wires become the limiting factors in decoder performance. For this purpose, we propose using a full-duplex asynchronous data transfer methodology based on multiplevalued encoding and current-mode logic circuitry [11]-[13]. The multiple-valued method will be used for the following reasons:
2 Low-Density Parity-Check Codes 2.1 The message passign algorithm for LDPC decoding
(a) The method is full duplex, allowing two codewords to be simultaneously decoded, one by the variable nodes and one by the check nodes. Messages can then be passed both directions using the same set of wires. Decoding throughput is doubled.
A thorough overview of the message passing algorithm for LDPC codes is provided in [10]. Figure 1(a) shows the factor graph for an LDPC code which is a bipartite graph containing two types of nodes: variable nodes and check nodes. Figure 1(b) shows a 2-edge variable node, also connected to its channel input/output. At a single iteration procedure, a message is sent onto each of the node’s 2 edges. The message sent on a specific edge is equal to a normalized product of the messages incoming on the other edge and from the channel, which is described in Equation (1) as
2.2 Architectures for LDPC decoders
cuit technique is utilized for an efficient circuit implementation. Since the current-mode linear summation is directly realized by just wiring without any active devices, the interface for the proposed asynchronous data-transfer scheme becomes simple. As a result, the proposed asynchronous scheme enables to solve the clock problems while maintaining higher data transfer speed. Moreover, throughput improvement is achieved because two kinds of passing massages can be simultaneously computed by using the duplex communication with just two wires for each bit of each edge.
(2) Decoding is performed iteratively. First, variable node outputs are initialized to probability messages conditioned on channel outputs. Then, these messages are sent to the check nodes. Then, outputs based on their inputs are calculated in the check nodes as in Equation (2). Then, the outputs are sent to the variable nodes, which updated outputs based on the inputs and channel outputs, as in Equation (1). This procedure is repeated for a fixed number of iterations, or until a satisfactory solution is obtained.
(a)
P(Y=1)
(b) The asynchronicity allows a certain tolerance in terms of delays over wires of varying lengths. Also, the requirements on clock distribution and clock skew can be significantly loosened. (c) The high speed of the multiple-valued current-mode logic circuit compared to other circuits would allow novel decoding algorithms such as the stochastic decoding algorithm [14] to be fully exploited, while also reducing the number of wires in the interconnection network.
(1) where P(A=1) is a probability of A which is a logic value “1”. Figure 1(c) shows a 3-edge check node. At a single iteration procedure, a new message is computed and sent
The following section describes the multiple-valued circuit technique and its application to LDPC decoding. 2
Proceedings of the 35th International Symposium on Multiple-Valued Logic (ISMVL’05) 0195-623X/05 $ 20.00 IEEE
B
ODD
EVEN (b)
Secondary
C
4
Logic value Dual-rail codeword Primary Secondary A C B (0,1) (0,1) (0,2) 㵰0㵱 㵰0㵱 (0,1) (1,2) 㵰0㵱 㵰1㵱 (1,3) 1 (1,2) (0,1) 㵰1㵱 㵰0㵱 (1,2) (1,2) (2,4) 㵰1㵱 㵰1㵱 (1,0) (1,0) (2,0) 0 㵰0㵱 㵰0㵱 (1,0) (2,1) 㵰0㵱 㵰1㵱 (3,1) 1 (2,1) (1,0) 㵰1㵱 㵰0㵱 (2,1) (2,1) (4,2) 㵰1㵱 㵰1㵱
Data+ Color information (1,1) + (1,0)
Data+ Color information (1,1) + (1,0)
A (xP, xP’)
3 2
+
B (xS, xS’)
x’
ODD x-x’䋽-2 -1.5 1.5
EVEN x-x’䋽2
2 3
(Secondary module)
(Primary module)
Secondary
C
Color detector
(a)
Ack+ Data Primary
Color detector
Primary
Color
Secondary
A
Color detector
Req+ Data
Primary
Request mode
Req+ Data
4 = C (x, x’)
x
Figure 3. Duplex asynchronous data encoding.
Acknowledge mode (c)
tor (Ü,ܼ ). In the duplex asynchronous data encoding, a codeword of “data” (Ü ,ܼ ) is defined as
Figure 2. Model of duplex asynchronous data transfer protocol and its modes: (a) Overall structure, (b) Request mode, and (c) Acknowledge mode.
(0,0) : (logic value “0”), (1,1) : (logic value “1”).
3 Full-duplex asynchronous data transfer using multiple-valued encoding and circuit technique
A codeword of “color information” (Ü ,ܼ ) is defined as (0,1) : ODD, (1,0) : EVEN.
3.1 Principle of duplex asynchronous data transfer
The bundled dual-rail codeword A (=(Ü ,ܼ )) (Ü ,ܼ 0,1,2) sent from the primary module is represented by the sum of the dual-rail codeword for data and color information defined above. For example, if a logic value is “1” and a color is “EVEN”, then the bundled dual-rail codeword becomes (2,1) that is calculated by summing up (1,1) and (1,0) in every component. Similarly, the bundled dual-rail codeword B (=(Ü ,ܼ )) (Ü ,ܼ 0,1,2) sent from the secondary module is also represented in the same way defined above. Consider that the dual-rail codeword C (=(Ü,Ü ¼ )) (Ü,ܼ 0,1,2,3,4) is defined by the sum of (Ü ,ܼ ) and (Ü ,ܼ ) in every component as
Asynchronous data transfer requires control signals because timing is managed locally. We define color information as control signals [15] To explain a duplex asynchronous data transfer, we use a model whereby a color detector is virtually inserted between the primary and secondary module as shown in Figure 2(a). Duplex asynchronous data transfer protocol has two modes, a request mode and an acknowledge mode. In the request mode shown in Figure 2(b), from the primary module, the dualrail codeword A which bundles both data and color information is sent to the color detector, while the dual-rail codeword B which bundles both data and color information from the secondary module is sent to the color detector. The acknowledge mode is shown in Figure 2(c). At the color detector, it is determined whether both colors are equal or not, and its result is sent back to the primary module and the secondary module as the dual-rail code C. After receiving the dual-rail codeword C, the primary module sets the next data and color information as well as the secondary module.
(Ü,ܼ )= (Ü ,ܼ )+ (Ü ,ܼ ). Figure 3 shows the relationship between six signal states and the dual-rail codeword (Ü,Ü ¼ ). In this encoding, the valid state can be detected by calculating the difference between Ü and ܼ as (Ü-ܼ ). That is, (Ü-ܼ ) becomes the minimum value -2 if both colors are ODD, and (Ü-Ü ¼ ) becomes the maximum value 2 if both colors are EVEN. Otherwise, (Ü-ܼ ) becomes an intermediate value between -2 and 2 during the transition. Since (Ü-Ü ¼ ) is monotonically increased or decreased during the transition, the difference (Ü-Ü ¼ ) can be easily detected to be -2 or 2 by using two kinds of thresholds, -1.5 or 1.5.
3.2 Duplex asynchronous data encoding Assume that the codeword corresponding to the multilevel dual-rail signal Ü and Ü ¼ is represented by a code vec3
Proceedings of the 35th International Symposium on Multiple-Valued Logic (ISMVL’05) 0195-623X/05 $ 20.00 IEEE
EC
X’
TCS
TCP
OUT1
IX
Muller C-element
C RCP
CD
CD
IX’
DC
DC Current signal
n
READYS
RCS
OUT2
x+1.5x’+1.5
I1.5
EC: Encoder CD: Color detector DC: Decoder
Comparator
IX+1.5 +
I-V
C READYP
IN2
EC
I-V I-V
X
IN1
For decoder
I-V converter
I-V
Secondary
Primary
Stage III
Figure 6. Color Detector. Color Information I-V converter
1
Vth30
1
VIN
VOUT
1
Vx+x’
1
IXD
IXD’
Vth VIN
IXC
Ix+x’
IXC’
Linear sum
VTC
Vth50
Figure 7. Decoder.
IX
I X’ age signal . Then the current signals ( , ¼ ) corresponding to the components of the codeword ( ,¼ ) are produced by summing up the currents ( ,¼ ) and ( , ¼ ) in the primary module. Similarly, the current signals ( , ¼ ) corresponding to the components of the codeword ( ,¼ ) are produced as the same way. As a result, the codeword based on the duplex asynchronous encoding appears. The function of the color detector is to detect the color information of the codeword: ODD or EVEN. The circuit diagram of the color detector is shown in Figure 6. The dual-rail codeword (, ¼ ) is recovered by making the input dual-rail codeword ( , ¼ ) twice using nMOS current mirrors. The threshold operations to perform the difference (-¼ ) are described as
Figure 5. Encoder.
3.3 Hardware implementation using multiplevalued current-mode logic circuitry Figure 4 shows an interface for the duplex asynchronous data transfer. This interface consists of an encoder (EC), a color detector (CD), a decoder (DC) and a Muller Celement. The function of the encoder is to generate the dual-rail currents which correspond to the components of the codeword. The circuit diagram of the encoder is shown in Figure 5. In the encoder, a one-bit binary voltage is transformed to the current signals ( ,¼ ) corresponding to the components of the codeword for data ( ,¼ ). The current signals ( , ¼ ) corresponding to the components of the codeword ( ,¼ ) for color information are generated through the pass transistors controlled by a volt-
- and - . ¼
¼
These equations can be rewritten as
+ and +. ¼
¼
The threshold operations can be performed by comparing 4
Proceedings of the 35th International Symposium on Multiple-Valued Logic (ISMVL’05) 0195-623X/05 $ 20.00 IEEE
Ü with ܼ directly, where the transition of Ü and the transition of ܼ have the differential relation. In Stage I, the
Table 1. Approximate evaluation of data transfer methods for interleavers.
current is calculated by summing up the input current and the current source , then is converted into the voltage using an I-V converter. The other voltages corresponding to ¼ , and (¼ + ) are obtained as well as the voltage . In Stage II, if the difference (- ¼ ) is less , (+ ) is less than ¼ and is also less than than (¼ + ). Hence, the resulting voltages ( , ) of the threshold operations become (,). On the other hand, if (- ¼ ) is greater than 1.5, is greater than ( ¼ + ) and (+ ) is also greater than ¼ . As a result, the voltages ( , ) become ( , ). In Stage III, the voltages ( , ) are the inputs of the Muller Celement. The Muller C-element is a state-holding element. When both inputs are 0 the output is set to 0, and when both inputs are 1 the output is set to 1. For the other input combinations, the output does not change. The output voltage of the Muller C-element is which means the result of detection. If the voltages ( , ) are (0,0), the voltage is set to low, which indicates that both colors are ODD. On the other hand, if the voltages ( , ) are (1, 1), is set to high, which indicates that both colors are EVEN. For the other combinations of voltages ( , ), the voltage holds a previous state. Figure 7 shows the circuit diagram of the decoder. The ¼ which is generated by wiring is summed current ¼ using an I-V converter. converted into the voltage ¼ is compared with one of two Then the voltage threshold voltages: and which are changed according to a one-bit binary voltage input from the binary CMOS. For example, if the input from the primary module is logic value “1” corresponding to the data (1,1) and the input from the secondary module is logic value “0” corresponding to the data (0,0) in the ODD phase, the dual-rail signal (,¼ ) becomes (1,3) that is calculated by summing up (1,2) and (0,1) in every component. As a result, in the primary module, the logic value “0” is obtained by comparing the voltage corresponding to the summed current with the threshold voltage . Similarly, in the secondary module, the logic value “1” is obtained by comparing the voltage with the threshold voltage . To control data transfer between the proposed duplex asynchronous interface and the binary CMOS module, TC, RC and READY signals are used.
Architecture
Synchronous
Asynchronous
Multiple-valued duplex asynchronous
#wires
2
6
2
Global clock
Required
No clock
No clock
Throughput
------
1x
2x
Avoidance of clock-skew
+
+++
+++
Power saving
+
++
+++
mit the data at high throughput because a high clock frequency which results in clock-distribution and clock-skew problems is required. On the other hand, the conventional asynchronous data transfer can be applied to solve the above problems. However, the number of wires becomes three times larger than the synchronous data transfer because the six wires are required to transmit the mutual data. The proposed multiple-valued duplex asynchronous data transfer can achieve a high throughput with maintaining the same number of wires. As mentioned in the previous section, the mutual data are transmitted asynchronously and simultaneously, a high throughput is available without clockdistribution and clock-skew problems. Furthermore, the control signals and data from mutual nodes can be multiplexed using multiple-valued current-mode logic circuitry on the same line, so just two wires are required to send the mutual messages simultaneously. As a result, the throughput becomes two times higher the asynchronous data transfer. Moreover, the use of the proposed scheme enables to reduce power dissipation of clock-distribution buffering.
4 Conclusion In this paper, a duplex asynchronous data-transfer scheme based on multiple-valued encoding has been proposed for interleaving in LDPC decoders. Since control signals and data are multiplexed using a multi-level dual-rail codeword, high-speed communication is realized without any additional wires. The use of the proposed asynchronous scheme makes it possible to potentially solve the problems of the synchronous systems at high clock frequency operation. Moreover, since two different data can operate simultaneously at the decoder, bidirectional data superposed at the same lines can realize two times throughput. As a future project, we will design a compact LDPC decoder VLSI chip by a combination of the proposed scheme and stochastic computation which can easily implement the
3.4 Evaluation Table 1 summarizes the comparisons of a synchronous data transfer, the conventional asynchronous data transfer and the multiple-valued duplex asynchronous data transfer. The synchronous data transfer makes it difficult to trans5
Proceedings of the 35th International Symposium on Multiple-Valued Logic (ISMVL’05) 0195-623X/05 $ 20.00 IEEE
check and variable nodes [14].
[8] A. Blanksby and C. Howland, “A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Decoder,” IEEE J. SolidState Circuits, Vol. 37, No. 3, pp. 404-412, Mar. 2002. [9] E. Yeo, P. Pakzad, B. Nikolic, and V. Anantharam, “VLSI Architectures for Iterative Decoders in Magnetic Recording Channels,” IEEE Trans. on Magnetics, Vol. 37, No. 2, pp. 748-755, Mar. 2001. [10] C. Schlegel and L. Perez, Trellis and Turbo Coding, IEEE/Wiley, 2004, ISBN 0-471-22755-2. [11] T. Hanyu, and M. Kameyama, “ A 200MHz Pipelined Multiplier Using 1.5V-Supply Multiple-Valued MOS CurrentMode Circuits with Dual-Rail Source-Coupled Logic”, IEEE J. Solid-State Circuits, vol. 30, no. 11, pp. 1239-1245, Nov. 1995. [12] T. Ike, T. Hanyu and M. Kameyama, “Fully Source-Coupled Logic Based Multiple-Valued VLSI,” Proc. 32nd IEEE Int. Symp. on Multiple-Valued Logic, pp. 270-275, May 2002. [13] T. Hanyu, A. Akira and M. Kameyama, “Multiple-Valued Dynamic Source-Coupled Logic,” Proc. 33rd IEEE Int. Symp. on Multiple-Valued Logic, pp. 207-212, May 2003. [14] V. C. Gaudet and A. C. Rapley, “Iterative Decoding Using Stochastic Computation,” Electronics Letters, vol. 39, no. 3, pp. 299-301, February 6, 2003. [15] T. Hanyu, T. Takahashi and M. Kameyama, ”Bidirectional Data Transfer Based Asynchronous VLSI System Using Multiple-Valued Current Mode Logic,” Proc. 33rd IEEE Int. Symp. on Multiple-Valued Logic, pp. 99-104, May 2003.
References [1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon Limit Error-Correcting Coding and Decoding: Turbo-Codes,” IEEE Int. Conf. on Communications, Geneva, pp. 1064-1070, May 1993. [2] R. G. Gallager, “Low-Density Parity-Check Codes,” IRE Trans. on Information Theory, vol. IT-8, pp. 21-28, Jan. 1962. [3] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, “Factor Graphs and the Sum-Product Algorithm,” IEEE Trans. on Information Theory, vol. 47, no. 2, pp. 498-519, Feb. 2001. [4] V. C. Gaudet and G. Gulak, “A 13.3Mbps 0.35um CMOS Analog Turbo Decoder IC with a Configurable Interleaver,” IEEE J. Solid-State Circuits, Vol. 38, No. 11, pp. 2010-2015, Nov. 2003. [5] C. Winstead, J. Dai, S. Yu, C. Myers, R. Harrison, and C. Schlegel, “CMOS analog MAP decoder for (8,4) Hamming code,” IEEE J. Solid State Circuits, pp. 122-131, Vol. 39, No. 1, Jan. 2004. [6] M. Moerz, T. Gabara, R. Yan, and J. Hagenauer, “An Analog 0.25um BiCMOS Tailbiting MAP Decoder,” Proc. IEEE ISSCC, San Francisco, CA, Feb. 2000, pp. 356-357. [7] D. Vogrig, A. Gerosa, A. Neviani, A. Graell i Amat, G. Montorsi, and S. Benedetto, “A 0.35 um CMOS Analog Turbo Decoder for the 40-bit, rate 1/3, UMTS Channel Code,” accepted for publication in IEEE J. Solid-State Circuits, Oct. 2004.
6
Proceedings of the 35th International Symposium on Multiple-Valued Logic (ISMVL’05) 0195-623X/05 $ 20.00 IEEE