Burst Error Detection Hybrid ARQ with Crosstalk-Delay Reduction for

Burst Error Detection Hybrid ARQ with Crosstalk-Delay Reduction for Reliable On-Chip Interconnects Bo Fu and Paul Ampadu Dept. of Electrical and Computer Engineering University of Rochester, Rochester, NY 14627 {bofu, ampadu}@ece.rochester.edu Abstract We present a hybrid ARQ (HARQ) scheme using single-error correcting burst-error detecting (SEC-BED) codes to address multiple errors in nanoscale on-chip interconnects. For a given residual flit error rate requirement, the proposed HARQ method yields 20% energy improvement over other burst error correction schemes. By further integrated with skewed transitions, the proposed HARQ method can efficiently improve the error resilience against burst errors and also reduce delay uncertainty caused by capacitive coupling. The low overhead of our approach makes it suitable for implementation in reliable and energy efficient on-chip communication.

1. Introduction As technology scales, on-chip interconnects become more sensitive to noise sources [1][2]. Error control schemes, such as automatic repeat request (ARQ), forward error correction (FEC) and hybrid ARQ (HARQ), have been applied to improve the reliability of nanoscale interconnects [3]–[10]. The HARQ scheme requests a retransmission only when the number of errors exceeds its error-correction capability and can achieve a good balance of reliability and energy consumption [4][6][7]. Single-error correcting and double-error detecting (SECDED) codes (e.g. extended Hamming) have been widely used in conventional HARQ schemes [4][6][7]. As spatial burst errors (errors in multiple adjacent wires) become more prevalent in nanoscale technologies [2]–[4], these previous approaches become less effective in ensuring reliable on-chip communication. In nanoscale technologies, the propagation delay of on-chip interconnect is greatly affected by crosstalk coupling [5][11]. Depending on the switching behavior of a wire and its neighbors, the delay of a wire in the middle of a bus changes from τ 0 to (1 + 4λ )τ 0 , where λ is the ratio of total coupling capacitance to total ground capacitance and τ 0 is the delay when there is no crosstalk. The worst case delay (1 + 4λ )τ 0 occurs when there is a transition 010101 (101010) on three adjacent wires [11]. With the increased crosstalk coupling effect in nanoscale technology, these delay variations (also referred to as delay uncertainty) can greatly degrade the performance of on-chip interconnects. To achieve reliable on-chip communication, error correction and crosstalk-induced delay reduction should be considered simultaneously. In this paper, we propose a HARQ scheme using single-error correcting and burst-error detecting (SEC-BED) codes. In this approach, single random errors are corrected in the first transmission and a retransmission is requested when double errors or burst errors of length b-bits (b≥3) are detected. Further, the proposed HARQ scheme is combined with skewed transitions to reduce capacitive coupling induced delay uncertainty. The remainder of the paper is organized as follows: Section 2 describes the proposed HARQ scheme using SEC-BED codes. A method combining error control with skewed transition is presented in Section 3. The performance of proposed scheme is evaluated in Section 4. Conclusions are presented in Section 5.

2. Hybrid ARQ using SEC-BED codes

(a) (b) (c) Figure 1. (a) HARQ scheme using SEC-BED codes (b) block diagram of SECBED decoder (c) double and burst error detection. SEC-BED codes have been studied for fault tolerant memory design [12]. In this paper, we apply SEC-BED codes in an HARQ scheme to improve the reliability of onchip interconnects. Let E1 be the error set consisting of all single errors and E2 the error set consisting of b-bit burst errors (b≥2). E1∩E2 =∅ (the empty set). If a linear block code can correct all single random errors and detect double random errors and b-bit burst errors, the parity check matrix H of this code must meet the following requirements [12]: 1) ∀e ∈ E 1 ∪ E 2 , s = e ⋅ H T ≠ 0 2) ∀e i , e j ∈ E 1 and i≠j, e i ⋅ H T + e j ⋅ H T ≠ 0 3) ∀ei ∈ E1 and e b ∈ E 2 , e i ⋅ H T ≠ e b ⋅ H T 4) ∀e i , e j , e k ∈ E1 and i≠j≠k, e i ⋅ H T + e j ⋅ H T + e k ⋅ H T ≠ 0 where e is the error vector and s is the syndrome vector. In requirement 1), the non-zero syndrome value is used to detect errors. To meet requirement 2) and 4), any three columns of the H matrix should be linearly independent. The code constructed using this H matrix has a minimum Hamming distance dmin≥4 and can correct single errors and detect double random errors. Requirement 3) is used to separate the syndrome between single errors and burst errors. Based on these requirements, a parity check matrix HR×N can be constructed as (1) below,

I H R× N =  b M 1

Ib  Ib M 2  M L −1

Ib  M L 

(1)

where R(R > b) is the parity check bits in the codeword; N is the codeword length equal to b ⋅ (2 R−b − 1) ; N-R is the input message length; Ib is the b×b identity matrix; Mi (1≤i≤L, L=2R-b-1) is a (R-b)×b matrix whose column is b copies of the binary representation of integer i. It is easy to verify that the parity check matrix HR×N meets all the code requirements specified above. Any error vector e from E1 or E2 results in a non-zero value in the first b-bits of the syndrome. A single random error can be corrected, because no two columns in matrix H R×N are the same. When a burst error occurs, more than one “1” exists in the first b-bits of the syndrome; on the other hand, only a single “1” occurs in the first b-bits of the syndrome for a single random error. If there are two random error, the first b-bits of the syndrome is zero or has two “1”s, which is not equal to the first bbit of any column in the matrix HR×N. As an example, consider b=4 and R=9. A SEC-

Figure 2. A unified coding framework combining error control with CACs. BED(124,115) code can be constructed using the H matrix in (1). This code can be further shortened to any message length, e.g. to the shortened SEC-BED(73, 64). Fig. 1(a) shows the implementation of the proposed HARQ scheme. The input flit is encoded by a SEC-BED encoder, and simultaneously saved into a transmitter buffer simultaneously. In the receiver, an acknowledgement/negative acknowledgement (ACK/NACK) signal is sent back to the transmitter based on whether or not the error is correctable. If a NACK is received in the transmitter, the saved flit is used to perform a retransmission. Fig. 1(b) shows the implementation of the proposed SEC-BED decoder. The realization of double or burst error detection is very simple. To detect double random errors or b-bit burst errors, “1” is counted in the first b bits of the syndrome. If there is more than one “1” in the first b bits of the syndrome, there is a b-bit burst error or a double-bit random error. If the first b bit of the syndrome are zero and the last (Rb) bits of the syndrome are non-zero, there are two random errors. The implementation of the proposed error detection circuits is shown in Fig. 1(c).

3. Crosstalk reduction A number of solutions have been proposed to reduce capacitive coupling induced delay uncertainty, such as shielding [13], crosstalk avoidance codes [14] and skewed transitions [15][16]. In order to reduce crosstalk-induced delay uncertainty and correct on-chip interconnect errors simultaneously, a combination of error control with crosstalk avoidance codes (CACs) is proposed in [5][14]. In this method, the input data is first encoded using nonlinear CACs, such as forbidden overlap condition (FOC) codes, forbidden transition condition (FTC) codes, or forbidden pattern condition (FPC) codes. These CACs involve nonlinear mapping from data to codeword but require less area overhead compared to linear CACs such as duplication and half shielding. The output of nonlinear CACs is encoded using error control codes. The additional parity bits generated by ECC need to be encoded through a linear CAC. The encoding process of this unified coding framework is shown in Fig. 2. The combination of error control codes with CACs results in a large number of interconnect wires, which increases the energy consumption of on-chip interconnects. In this paper, we propose a method combining error control with skewed transitions to correct errors and also reduce the crosstalk-induced delay uncertainty. Fig. 3 shows the concept of skewed transitions. In skewed transitions [15][16], a relative delay ΔT is introduced between adjacent bus lines to avoid simultaneous switching transitions in opposite directions. This method can reduce the worst case delay without increasing the routing area. In traditional skewed transitions [15], delay elements are inserted at the beginning of alternate bus lines to generate the relative delay. This approach limits the benefits we can achieve from skewed transitions, because a large relative delay will increase the overall link delay. Instead of inserting delay elements, we propose using timing borrowing to offset the transitions in each pair of adjacent interconnect lines. In the error control encoding process, parity check bits are calculated from the original data. Thus, there is a timing slack between

Figure 3. Concept of skewed transitions. Figure 4. Path delay distribution in Hamming(71,64) encoder. data and parity check bits. Fig. 4 shows the path delay distribution of a Hamming(71, 64) encoder. The encoder is synthesized using TSMC 65 nm technology. The results show that the delay difference between data paths and parity check bit paths can be up to 420 ps. This means that the data is ready to transmit much earlier than the parity check bits are available. In a traditional parallel encoding process, the data and parity check bits are transmitted simultaneously once the calculation of the parity check bits is completed. Maximum capacitive coupling can occur when we send the data and parity check bits at the same time. In the proposed timing borrowing method, we take advantage of this timing slack between the data and parity check bits in an error control encoder and send part of the data before the parity check bits are available. Figure 5 shows our proposed method combining error control with skewed transitions. Two clocks are used (CLK1 and CLK2), with CLK1 arriving ahead of CLK2. The difference between these two clocks depends on the timing slack between the data and parity check bits. These two clocks are used alternately to offset the transitions in each pair of adjacent interconnect lines. A mapping is needed for the parity check bits, because they can only be sent out using CLK2. Unlike previous skewed transition methods, the introduced delay between two adjacent interconnect lines does not increase the overall link delay. Thus, we can achieve more benefits from skewed transition. Also, sending the data a little early does not hurt the encoder performance, because data is not on the critical paths of the error control encoder.

4. Performance evaluation In this section, the performance of the proposed error control scheme is evaluated in

(a) (b) Figure 5. Proposed method combining error control with skewed transitions (a) block diagram (b) skewed transitions with timing borrowing.

Figure 6. Multiple adjacent errors caused by a noise source. terms of reliability, energy, codec delay and area for a 64-bit input link. The proposed HARQ scheme using SEC-BED(73, 64) is compared to Hamming H(71,64) code, conventional HARQ using extended Hamming EH(72,64) code, dual rail DR(129,64) code [5][8], and interleaving method in [3]. Four groups are used in the interleaving scheme and each group is encoded using a Hamming H(21,16) code. To improve the system clock frequency, the encoder, link, and decoder are pipelined. Go-back-N [17] retransmission is applied to increase the throughput of the HARQ scheme. 4.1 Performance of proposed HARQ using SEC-BED codes 4.1.1 Reliability On-chip communication errors can be attributed to voltage perturbations induced by noise from many sources. In [18], the error probability of a single wire can be modeled by a Gaussian pulse function,

V



∞

ε = Q swing  = ∫V 2σ  2σ N 

swing N

2 1 e − y / 2 dy 2π

(2)

where Vswing is the link swing voltage and σN is the standard deviation of the noise voltage. The model in (2) assumes the probability of error in each wire is independent. As technology scales, the probability of a noise source inducing errors in multiple neighboring wires increases [2]–[4]; thus, a more realistic error model that includes spatial burst errors should be considered. The model in (2) can be extended to include burst errors. Instead of only affecting one wire, the noise source is modeled to affect its neighbors. The effect of a noise source on neighboring wires can be described by a coupling probability Pn, as shown in Fig. 6. The higher Pn, the more likely the noise source is to cause errors in neighboring wires. When the coupling probability Pn is considered, the probability of the noise source causing an error in dn can be expressed as (3) below,

P(b = 1) = (1 − Pn ) 2 ⋅ ε

(3)

where ε is the probability of a single wire being erroneous if coupling probability Pn is 0. The probability of two- and three-bit burst error caused by the same noise source P(b=2) and P(b=3) can be expressed as (4) and (5), respectively,

P (b = 2) = 2 ⋅ Pn (1 − Pn ) ⋅ ε 2 P (b = 3) = Pn ⋅ ε

(4) (5)

The probability of the same noise source at dn also affecting dn+2 and dn-2 is usually much smaller than Pn. So we ignore the probability of a noise source causing burst errors of four bits or more P(b≥4|dn)=0. The residual flit-error probability Presidual is used to evaluate the reliability of different error control schemes. The residual flit error rate of the HARQ scheme can be expressed in (6) below,

Presidual = Pe + Pd _ nc ⋅ Pe + Pd _ nc ⋅ Pe + ... 2

(6)

where Pe is the probability of an undetectable error pattern, and Pd_nc is the probability that an error can be detected and is not correctable. In our experiments, a comprehensive simulation with different error patterns was performed to decide the Pe and Pd_nc values.

(a) Pn = 0 (b) Pn = 10-2 Figure 7. Residual flit error rate of different error control schemes. Fig. 7 shows the residual flit error rate of different error control schemes as a function of noise voltage deviation at Pn =0 and Pn =10 -2. A link swing voltage Vswing of 1.1 V is used here. As expected, the residual flit error rate increases as σN increases. When Pn is equal to 0 (no burst errors considered), the conventional HARQ scheme and the proposed method achieve a much smaller residual flit error rate compared to other error control schemes, because they can correct or detect double random errors. The multiple Hamming codes with interleaver fail to correct double random errors occurring in the same group. As the probability of burst errors increase (e.g. Pn =10 -2), the residual flit error rate of the conventional HARQ increases greatly, because the conventional HARQ scheme is unable to detect any odd-bit burst errors greater than or equal to three. The proposed HARQ method and Hamming codes with interleaving achieve a better performance in residual flit error rate compared to other error control schemes, because both of them can detect or correct up to four-bit burst errors. Compared to conventional HARQ, the proposed method achieves several orders reduction in residual flit error rate when burst errors are considered. 4.1.2 Energy consumption In [4][5][7], lower link swing voltages are used for more powerful error control schemes to reduce the energy consumption. Fig. 8 compares the link swing voltage of different error control schemes for the same residual flit error rate. The results show that the proposed method achieves the smallest link swing voltage to achieve the same reliability requirement when burst errors are considered. The proposed method can reduce the link swing voltage up to 25% and 15% compared to that of the conventional HARQ scheme EH(72,64) and Hamming codes with interleaving, respectively. Fig. 9 shows the energy consumption of different error control schemes for the same reliability requirement (at a residual flit error rate of less than 10 -20 [5]). The system frequency is 1 GHz. The energy includes codec energy and link energy. The link energy is measured in Cadence Spectre using Predictive Technology Model (PTM) CMOS 65 nm technology [19]. A 65 nm global link interconnect model [19] with parameters in Table 1 is used in the simulation. The results show that the proposed method achieves Table 1. Parameters used for link model Width (μm) 0.45

Minimum Space (μm) 0.45

Thickness (μm)

Height (μm)

1.20

0.20

Dielectric Constant 2.2

Figure 8. Required link swing voltage.

Figure 9. Energy comparison.

the least energy consumption compared to other error control schemes when burst errors are considered. The energy performance of the proposed method results mainly from the reduced link swing voltage and the fewer required links. Compared to burst error correction method using multiple Hamming codes with interleaving, the proposed method achieves 20% reduction in energy consumption for the same reliability requirement. 4.1.3 Codec delay and area Table 2 compares the delay and area of different error correction schemes using a TSMC 65 nm typical library. The calculation of parity check bits and syndromes is implemented as simple XOR trees. The decoder delay is typically much larger than the encoder delay. Dual rail DR(129,64) scheme has the smallest decoder delay compared to other error control schemes. The delay of the proposed method is close to that of conventional HARQ EH(72,64). Table 2 also shows the codec area of different error control schemes. The area includes pipelined registers. Using go-back-N retransmission, N flits will be retransmitted if a NACK signal is received. Thus, a transmitter buffer is needed to store these N flits in HARQ and the proposed schemes. The number N is dependent on the round trip transmission delay. In our simulation, N=4. The result shows that Hamming H(71,64) scheme has the least codec area, because the encoder and decoder circuit are relatively simple; also no transmitter buffers are needed in this scheme. The area of the conventional HARQ scheme and proposed HARQ using SEC-BED codes has a larger area because of the transmitter buffer in the go-back-N algorithm. However, the codec area overhead is relatively small when compared to the area of links and IP cores. Table 2. Codec delay and area of different error control schemes Error control scheme

Encoder delay (ns)

Decoder delay (ns)

Area (μm2 )

Hamming (71,64)

0.40

0.62

2080.8

HARQ using EH (72,64)

0.42

0.64

4283.7

Dual rail DR(129,64)

0.43

0.52

2338.4

Hamming + interleaving

0.35

0.53

2411.9

Proposed scheme

0.43

0.65

4357.5

Table 3. Combining proposed HARQ with crosstalk-delay reduction techniques. Crosstalk reduction techniques used HARQ+Skewed transitions HARQ+CACs

Data

Parity check bits

No. of wires

Skewed transitions

Skewed transitions

74

FOC

Half-Shielding

94

FTC

Shielding

124

FPC

Shielding

123

4.2 Combining proposed HARQ with skewed transitions We also compare the performance of combining proposed HARQ scheme with different crosstalk-delay reduction techniques. Table 3 shows the specific techniques used for data and parity check bits and the required number of wires for each scheme. An extra wire is needed in the proposed method to maintain burst error detection capability of SEC-BED codes after data mapping. The proposed method requires the fewest number of wires for crosstalk avoidance, while other methods need a large number of extra wires. Fig. 10 compares the worst case signal transition delay of each scheme under the same link routing area constraint. The combination of the proposed HARQ with FTC codes has the largest link routing area because it requires more additional wires. The link area of other schemes is normalized to the link area of FTC scheme by adjusting the wire spacing. Other link parameters are the same as in Table 1. The timing offset ΔT used in the skewed transitions is 300 ps. The delay of each scheme is normalized to the worst case delay of an uncoded bus with the minimum wire width and spacing. The results show that the crosstalk reduction techniques can greatly reduce the worst case delay. Compared to an uncoded bus, the use of skewed transitions with timing borrowing achieves up to 45% delay reduction. Compared to other crosstalk-delay reduction techniques, skewed transitions with timing borrowing can achieve about 10% extra delay reduction, because fewer number of wires allows wider wire spacing. The reduced worst case delay decreases the delay uncertainty caused by crosstalk coupling improving the reliability of on-chip interconnects. Fig. 11 compares the link energy consumption of combining proposed HARQ with different crosstalk-delay reduction techniques. The results show that the use of skewed transitions with timing borrowing achieves the least link energy consumption because it

Figure 10. Normalized delay of different crosstalk reduction schemes.

Figure 11. Link energy comparison.

requires the fewest number of wires. Compared to FOC scheme, the skewed transitions with timing borrowing consume 27% less link energy. Furthermore, the combination of error control with skewed transitions does not need encoder and decoder circuits for crosstalk avoidance leading to less codec overhead.

5. Conclusions In this paper, a hybrid ARQ scheme using SEC-BED code is presented to improve the error resilience against burst errors in on-chip interconnects. Compared to the conventional HARQ scheme using SEC-DED codes, several orders reduction in residual flit error rate can be achieved using the proposed method. For a given reliability requirement, the proposed error control scheme achieves up to 20% reduction in energy consumption compared to the conventional HARQ scheme. The proposed HARQ scheme can further be combined with skewed transitions to reduce the delay uncertainty caused by capacitive coupling. Compared to previous solutions combining error control with crosstalk avoidance codes, the use of skewed transitions with timing borrowing achieves better performance in delay reduction and energy consumption. The proposed method can be easily implemented in hardware, making it suitable for reliable and energy efficient on-chip communication.

References [1] F. Caignet, S. D. Bendhia, and E. Sicard, ‘‘The challenge of signal integrity in deep-submicrometer CMOS technology,” Proc. IEEE, vol. 89, Apr. 2001, pp. 556–573. [2] C. Constantinescu, “Trends and challenges in VLSI circuit reliability,” IEEE Micro, vol. 23, Jul./Aug. 2003, pp. 14–19. [3] H. Zimmer and A. Jantsch, “A fault model notation and error-control scheme for switch-to-switch buses in a network-onchip,” in Proc. of the CODES-ISSS Conf., Oct. 2003, pp. 188–193. [4] D. Bertozzi, L. Benini, and G. De Micheli, “Error control schemes for on-chip communication links: the energy-reliability tradeoff,” IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 24, Jun. 2005, pp. 818–831. [5] S. Sridhara and N. R. Shanbhag, “Coding for system-on-chip networks: a unified framework,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, Jun. 2005, pp. 655–667. [6] S. Murali, T. Theocharides, N. Vijaykrishnan, M. J. Irwin, L. Benini, and G. De Micheli, “Analysis of error recovery schemes for networks-on-chips,” IEEE Des. Test Comput., vol. 22, Sep./Oct. 2005, pp. 434–442. [7] A. Ejlali, et al., “Joint consideration of fault-tolerance, energy-efficiency and performance in on-chip networks,” in Proc. of Design, Automation and Test in Europe (DATE'07), Apr. 2007, pp. 1–6. [8] D. Rossi, A. K. Nieuwland, S.V.E. van Dijk, R.P. Kleihorst, and C. Metra, “Power consumption of fault tolerant busses”, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, May 2008, pp. 542–553. [9] A. Ganguly, P. P. Pande, B. Belzer, and C Grecu, “Design of low power & reliable networks on chip through joint crosstalk avoidance and multiple error correction coding,” Journal of Electronic Testing: Theory and Applications (JETTA), Special Issue on Defect and Fault Tolerance, June 2008, pp. 67–81. [10] B. Fu and P. Ampadu, “An energy-efficient multi-wire error control scheme for reliable on-chip interconnects using Hamming product codes,” VLSI Design, vol. 2008, doi:10.1155/2008/109490, article ID: 109490, 2008, pp. 1–14. [11] P. Sotiriadis and A. Chandrakasan, “Reducing bus delay in sub-micron technology using coding,” in Proc. of IEEE Asia and South Pacific Design Automation Conf. (ASPDAC), 2000, pp. 109–114. [12] E. Fujiwara, Code design for dependable systems: theory and practical applications, pp. 187–260, Wiley Interscience, 2006. [13] J. Zhang and E.G. Friedman,” Effect of shield insertion on reducing crosstalk noise between coupled interconnects”, in Proc. IEEE Intl. Symp. Circuits Syst. (ISCAS), 2004, pp. 529–532. [14] S. R. Sridhara and N. R. Shanbhag, “Coding for reliable on-chip buses: a class of fundamental bounds and practical codes,” IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 26, May 2007, pp. 977–982. [15] K. Hirose and H. Yassura, “A bus delay reduction technique considering crosstalk,” in Proc. of Design, Automation and Test in Europe (DATE), 2000, pp. 441–445. [16] M. Ghoneima, et al., “Reducing the effective coupling capacitance in buses using threshold voltage adjustment techniques,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53, pp. 1928–1933, Sep. 2006. [17] D. Bertozzi and L. Benini, “Xpipes: a network-on-chip architecture for gigascale systems-on-chip,” IEEE Circuits Syst. Mag., vol. 4, Jun. 2004, pp. 18–31. [18] R. Hegde and N. R. Shanbhag, “Toward achieving energy efficiency in the presence of deep submicron noise,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 8, no. 4, Aug. 2000, pp. 379–391. [19] Arizona State Univ., Predictive Technology Model [Online]. Available: http://www.eas.asu.edu /~ptm/