Ki-Wook Kim, Kwang-Hyun Baek, and Sung-Mo Kang ... Most of the previous bus-encoding schemes were designed to minimize ... The key idea is that coupling effects can be alleviated by transforming the signal ... The time-averaged charge ∆Qav provided by the power supply to all ..... 5.1 General Codec Architecture. Fig.
Signal Encoding Schemes for Low-Power Interface Design Ki-Wook Kim, Kwang-Hyun Baek, and Sung-Mo Kang Abstract Coupling effects between on-chip interconnects must be addressed in ultra deep submicron VLSI and system-on-a-chip (SoC) designs. We obtain the lower and upper bounds on coupling effects for randomly distributed, independent data streams based on the information theory. Novel low-power bus encoding scheme is proposed to minimize coupled switchings which dominate the on-chip bus power consumption. The coupling-driven bus invert method employs encoder and decoder that are geared to minimize the coupled transition activities. Experimental results indicate that our encoding method saves effective switchings as much as 30% in an 8-bit bus with one-cycle redundancy.
1
Introduction
Increased coupling effect between interconnects in ultra deep submicron technology not only aggravates the power-delay metrics, but also deteriorates the signal integrity due to capacitive and inductive crosstalk noise. Conventional approaches to interconnect synthesis aim at optimal interconnect structures in terms of interconnect topology, wire width and spacing, and buffer location and sizes [1]. In this paper, we study a signal encoding scheme to minimize coupling effects between interconnects. Signal encoding schemes have been proposed to minimize transition activities on buses while ignoring cross-coupled capacitances. When statistical properties are unknown a priori, the bus-invert method [2] and the on-line adaptive scheme [3] can be applied to encode randomly distributed signals. On the other hand, highly correlated access patterns exhibit a spatio-temporal locality which can be exploited for energy reduction [4] in Gray code [5], [6], the T0 method [7], and the workingzone encoding [8]. Lower bounds for minimum achievable transition activity have been derived for noiseless buses in [9] and for noisy buses in [10]. In [11], a segmentation method was introduced to reduce power consumption. Specification transformation approaches were used to reduce the number of memory accesses at the behavioral level [12]. The effectiveness of various encoding schemes was compared at the system level in [13]. Most of the previous bus-encoding schemes were designed to minimize transition activities on each signal line as if each line were isolated from neighboring lines, hence ignoring coupling effects. Such an assumption may be valid for off-chip buses where the impedances of transmission lines are appropriately adjusted. However, this is not the case for long on-chip buses which are particularly prevalent Ki-Wook Kim is with Pluris, Inc., CA 95014 Kwang-Hyun Baek is with Department of Electrical and Computer Engineering, University of Illinois at UrbanaChampaign, IL 61801 Sung-Mo Kang is with University of California at Santa Cruz, CA 95064
1
Large coupling capacitances
Bus interface
SRAM Encoder/Decoder
core
Bus interface
CPU
On-chip bus
Encoder/Decoder
Buffer
DRAM
ASIC
Serial port
Fig. 1. Tightly cross-coupled on-chip buses in a system-level chip design
in a system-on-a-chip. For example, the wire aspect ratio (namely, wire thickness/width) is expected to be over 2.4 for intermediate wiring (namely, third metal layer and fourth metal layer) in 0.18 µm seven-layered metal process [14]. Accordingly, coupling has become an important issue with scaled supply voltage when we consider signal integrity and power dissipated by coupling capacitances, referred to as coupling power. Shielding can be a way to avoid crosstalk problem with area overhead. In this paper, we propose a new encoding scheme for static on-chip bus structure to minimize coupling power. The key idea is that coupling effects can be alleviated by transforming the signal sequences traveling on-chip buses that are closely placed. Small blocks of encoding and decoding logic are employed at the transmitter and receiver of on-chip buses as shown in Fig. 1. The encoder and decoder (codec) should have a low-complexity architecture so that the power and delay overhead due to the codec circuitry can be compensated by significant savings in switching activities on tightly coupled buses. The authors in [15] proposed a generic bus encoding scheme considering coupling effect. However, state explosion or complex codec circuitry may limit the practical application of their approach.
2
Interconnect Power Characteristics
The time-averaged charge ∆Qav provided by the power supply to all interconnect capacitances is given by btot · Vdd = (C bs + C bx ) · Vdd ∆Qav = p · Ctot · ∆V , C (1)
where p denotes the switching probability, and C tot denotes the total lumped capacitance. The effecbtot is defined by both physical capacitances and switching activities. The effective tive capacitance C capacitance accounts for time-averaged charge stored in physical capacitances provided by the power supply. One can model the physical capacitance of a wire with width w, length l, overlapping segment length y, and edge-to-edge distance d such that y · dmin (2) Ctot = (Ca · w + 2Cf ) · l + Cx · d 2
R
Rd S2
adjacent net
Cx
CL
S1
Cs
CL
primary net
Fig. 2. A distributed RC model for the interconnects
where Ca is unit area capacitance (F/cm2 ) to substrates, Cf is unit length fringe capacitance (F/cm) and Cx is unit length coupling capacitance (F/cm) with minimum spacing d min . To be precise, Cf and Cx are not constant due to complex fringing effects [16], but for our purpose the model is adequate. The area and fringe capacitance are barely independent of the distance d, which is referred to as the self-capacitance C s . Thus, we have a wire capacitance model as shown in Fig. 2. The wire capacitance is composed of the self capacitance C s and the coupling capacitance Cx . For each capacitive component, we define the effective capacitance as bs = Y · Cs C
(3)
bx = Z · Cx C
(4)
Y = pi0,1
(5)
where Y and Z are the average number of effective transitions per cycle for C s and Cx , respectively, which are computed as follows. First, we resolve the self transition activity Y for the self capacitance C s . Let pix,y denote the transitional probability that the value of signal i changes from x to y, which can be represented by n the signal probability p(in=x ) at the end of n-th cycle, and the conditional probability p(i n+1 =y | i=x ) n such that pix,y = p(in=x )· p(in+1 =y |i=x ), where x, y ∈ {0, 1}. If we assume that there is no glitch in the signals, the self transition activity Y is given by
since the capacitance Cs will be charged up only when a low-to-high transition takes place. Next, the coupled transition activity Z is computed according to the correlated switching between physically adjacent interconnects. There are four types of possible transitions when we consider dynamic charge distribution over coupling capacitances as illustrated in Fig. 3. We suppose that there are two parallel wires placed with minimum spacing. A type I transition occurs when one of the signals switches while the other stays unchanged such that the coupling capacitance is then charged up to k1 Cx Vdd , where the coefficient k1 is introduced as a reference for other types of transition. In a type II transition, one bus switches from low to high while the other switches from high to low. The effective capacitance will be larger than k 1 by a factor of (k2 /k1 ) which is usually two. In a type III transition, both signals switch simultaneously and C x will not be charged. However, because of possible misalignment of the two transitions, the amount of power consumption varies according 3
L
L
k 1 ∆Q
Cx L
H
k 2 ∆Q
Cx
H
L
H
(a) Type I L
(b) Type II L
H
k 3 ∆Q
Cx L
k 4 ∆Q
Cx L
H
(c) Type III
(d) Type IV
Fig. 3. Transition types: (a) Single line switching; (b) both lines switching in opposite direction; (c) both lines switching in the same direction; (d) no switching.
to the dynamic characteristics by a factor of k 3 . In a type IV transition, there is no dynamic charge distribution over coupling capacitance. Thus, we set k 4 to zero. Throughout this paper, we assume k2 /k1 = 2, and k3 = k4 = 0. ij n n n+1 Let pij xy,qr denote the joint transitional probability defined by p xy,qr = p(i=x ∧ j=y )· p(i=q ∧ n+1 |in ∧j n ), where x, y, q, r ∈ {0, 1}. Each type of transition contributes to the effective coupling j=r =x =y capacitance between a wire i and a wire j as follows. ij ij ij Z = k1 (pij 00,01 + p00,10 + p11,01 + p11,10 )+ ij ij ij k2 (pij 01,10 + p10,01 ) + k3 (p00,11 + p11,00 )
(6)
We consider only charging up the coupling capacitance in computing dynamic power dissipation. Suppose that two adjacent lines have a signal transition from 11 to 10. In the initial state of 11, both signals are high, thus there is no potential across the coupling capacitance (∆V = 0). So, the coupling capacitance contains no charge when signals start to switch. At the end of transition, signals switch to 10. One signal is high, while the other signal is low. Such a potential difference across the coupling capacitance (∆V = Vdd ) leads to charge up the coupling capacitance. Thus, the signal switching from 11 to 10 is accounted for dynamic power consumption. On the other hand, the discharging transition for the coupling capacitance is not taken into account for dynamic power consumption. One example is the transition from 10 to 11. Initial potential across the coupling capacitance is Vdd , which decreases to 0 at the end of transition. In other words, the coupling capacitance is discharged when such input vectors are applied. Because the power supply is used as a reference of power consumption, we do not consider discharging events in dynamic power computation. The total lumped capacitance for a bus can be computed according to Equations (5) and (6). Accordingly, the dynamic power consumed by the interconnects and drivers is given by 2 Pdyn = (Y (Cs + CL ) + ZCx ) · Vdd · fc
4
(7)
x for a terminated bus. We assume that the capacitance We define the capacitance ratio η = CsC+C L ratio η is four in 0.18 µm technology in this work. The capacitance ratio increases as the aspect ratio of the interconnect increases.
3
Information Theory Preliminaries
We define an information source, called an ensemble X, as a collection of discrete symbols, x, representing an indication, an event or an indivisible and definable quantity or operation that the source transmits. Each symbol in an ensemble is associated with probability mass function p(x) = P r{X = x}, x ∈ X [17]. Thus the information source can transmit any one of different symbols with a given probability. The information content of each symbol is quantified in a way that the length of a binary codeword is proportional to the inverse of the probability of occurrence, because the more unexpected events are, the more information is conveyed (namely, the more surprising events) [18]. The quantity of a symbol is represented by self-information such that 1 bits. (8) I(x) = log 2 p(x) The self-information can be readily derived in the special case of equi-probable events. Suppose that there are a total of q symbols in the alphabet, then the number of bits, N , to represent all q symbols is q = 2N . Now with equi-probable symbols we have p(x i ) = q, so the number of bits to represent 1 bits. The definition of selfthe symbol (quantity of information content) N = I(x) = log 2 p(x) information can be extended to quantify the information of the source by considering all the symbols in the alphabet. The entropy, namely, average information, of an ensemble X is defined by X H(X) = − p(x) · log 2 p(x). (9) x∈X
The entropy is expressed in bits with the convention that for p(x) = 0, 0 · log (1/0) = 0. The entropy measures the information content or uncertainty of the information source X. Equation (9) is originated from the entropy of statistical thermodynamics, thus the term of entropy is used. As extensions to an information source, the joint entropy is derived for multiple different source alphabets that operate jointly or concurrently. The joint entropy H(X, Y ) of a pair of discrete random variables (X, Y ) with a joint distribution p(x, y) is defined as XX H(X, Y ) = − p(x, y) · log p(x, y). (10) x∈X y∈Y
The joint entropy H(X, Y ) represents the average bits of information per joint pair of symbols, (xi , yi ). The conditional entropy H(Y |X) of a pair of discrete random variables (X, Y ) with a conditional distribution p(y|x) is defined as XX H(Y |X) = − p(x, y) · log p(y|x). (11) x∈X y∈Y
5
1.0
1.0
H(x)
0.9
0.9
0.8
0.8
H (Entropy)
0.7
0.7
y
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
H−1(y)
0.1
1−H−1(y)
x
0.0
0.0 0
0.2
0.4
0.6
0.8
1.0
p Fig. 4. The entropy function H(p)
A chain rule holds for the entropy given by H(X, Y ) = H(X) + H(Y |X). As a natural extension of the entropy to a Markov source, the entropy rate of a stochastic process {X i } is defined by H(X ) = lim
n→∞
1 H(X1 , X2 , · · · , Xn ) n
(12)
when the limit exists. The function H(x) is defined on the real interval [0, 1] as H(X) = −x log 2 x−(1 − x) log 2 (1 − x) bit as plotted in Fig. 4. The function H(x) maps the probability of a binary-valued, independent variable to its entropy. The inverse function, H −1 (y), of H, is defined on the real interval [0, 1] as, H −1 (y) = x, if y = H(x) and x ∈ [0, 0.5]. The function H −1 (y) maps the entropy of a binaryvalued, independent variable to a probability value that lies between 0 and 0.5.
4
Achievable Bounds on Coupling
We assume that the input signals X to the interconnects come from a first-order Markov information source which is ergodic and stationary. The source is also assumed to be memoryless such that the probability distributions for subsequent signals are independent of the previous signals. Second extension of the input symbols X 2 is employed where a block of symbols (two symbols in this case) are considered at the same time rather than individual symbols in order to reflect the coupling effects involving multiple interconnects. The following theorem bounds the transitional probability p of a signal [9].
6
(a) switching in inbetween lines
(b) switching in border lines l0
l0 Charge shift in two coupling cap.
l1
Charge shift in one coupling cap.
l1 l2
l2 (N-2) lines
...
l N-1
l N-1
N-2 x 2 coupling cap. N
2 N
δ(1) =
...
l3
l3
2 border lines
x 1 coupling cap.
2(N-1) N
Fig. 5. Asymptotic coupling charge variation δ(1) when one bit out of N bits switches.
Theorem 4.1: For a random process {X i }, let the symbols be encoded in a uniquely decodable manner into {Bi } with an expected number of R bits/symbol. The transitional probability p i of Bi is bounded by H H ≤ pi ≤ 1 − H −1 (13) H −1 R R where H is the entropy rate of {Xi }. The bounds in Equation (13) are asymptotically achievable if {Xi } is a stationary and ergodic process. Definition: Asymptotic coupling charge variation δ(m) is the cycle-averaged charge variation in coupling capacitance caused by the transitions in m-bit lines out of N bits. Asymptotic coupling charge variation is represented as multiples of k 1 . In order to derive bounds on the coupled transition activity, we will employ Lemmas 4.1 and 4.2 below. Lemma 4.1 provides the asymptotic coupling charge variation when m bits out of N bits switch during a cycle. Lemma 4.2 gives the coupled transition activity when each bit switches, independently of the other bits, with the same transitional probability p. Theorem 4.2 bounds the coupled transition activity based on Theorem 4.1 and Lemma 4.2 for independent data source. Lemma 4.1: Let {Bi }, 0 ≤ i ≤ N − 1, be N -bit data items. If m bits out of the N bits switch, the asymptotic coupling charge variation δ(m) is given by δ(m) = 2m ·
N −1 N
(14)
Proof: As a basis, Fig. 5 shows the asymptotic coupling charge variation when one bit out of N -bit lines switches. There are two types of transitions in view of the coupling charge variation. On one hand, 7
when a bit line li (1 ≤ i ≤ N − 2) that is located between two lines switches as shown in Fig. 5(a), either charging or discharging takes place in two coupling capacitance, namely C xi−1,i and Cxi,i+1 , respectively. Given uniform distribution of switching line selections, the probability of selecting one switching line among N bits is (N − 2)/N . Because two coupling capacitances are associated with this type of transition, 2(N − 2)/N coupling charge variation takes place on average. On the other hand, if the switching bit line (l 0 or lN −1 ) is neighbored by only one bus line, then there is only one coupling capacitance of which charge is affected by the switching as shown in Fig. 5(b). Since there are two border bus lines, the probability of such events is 2/N . Thus, the asymptotic coupling charge variation is 2/N . To sum up, the asymptotic coupling charge variation for one bit switching δ(1) is 2(N − 1)/N . Now we can generalize the number of switching bits by m, such that 0 ≤ m ≤ N . The number of combinations for choosing m bits out of N bits is N m . When m bits switch, 2m coupling capacitances are involved in charge redistribution unless signal transitions occur on the border bus lines and switching lines are adjacent. Considering the cases in which the border bus lines switch and in which adjacent lines switch, we have the asymptotic coupling charge variation δ(m) such that N −2 N −2 2m · N m − 2 m−1 − 2 m−2 δ(m) = N 2(N − 1) = N m
m N −1 m−1
(15)
2m · (N − 1) = N
In this work, the dynamic power consumption is our concern, thus, charging up the coupling capacitance has to be accounted for power consumption, while discharging event, such as 01→ 00 transition, is not considered. Thus, half of the asymptotic coupling charge variation, namely charging up the coupling capacitance, is only considered in computing the coupled transition activity. ThereP δ(m) fore, the coupled transition activity Z is represented by Z= N m=1 2 ·q(m) where q(m) denotes the probability of an m-bit transition out of N bits. Lemma 4.2: Let {Bi } be a 0-1 valued random process, in which a bit B i (0 ≤ i ≤ N − 1) switches independently of other bits with transitional probability p i = p (0 ≤ i ≤ N − 1). Then the coupled transition activity Z per bus cycle is Z = (N − 1) · p (16) if transitions on {Bi } follow the binomial distributions. Proof: If transitions on {Bi } follow binomial distributions, the probability of m-bit transitions is
8
given by
δ(m) m 2 ·p ·(1
− p)N −m . Hence, the coupling transition activity Z is Z=
N X δ(m) m · p · (1 − p)N −m 2
m=1
(17)
= (N − 1) · p
Theorem 4.2: Let {Xi } be a random process with entropy rate H, and whose symbols are encoded in a uniquely decodable manner into {B i } with an expected number of R bits/symbol. Suppose B i switches independently with transitional probability p i = p (0 ≤ i ≤ N − 1). Then, the coupled transition activity per bus cycle is bounded by −1 H −1 H (N − 1) · H ≤ Z ≤ (N − 1) · 1 − H (18) R R The bounds in Equation (18) are asymptotically achievable if {X i } is a stationary and ergodic process. Proof: Theorem 4.1 and Lemma 4.2 lead to Equation (18) Example 4.1: The transition activities and corresponding coupling effects can be mitigated by introducing redundancies. Two types of redundancies, namely, spatial redundancy and temporal redundancy, can be used to mitigate transition activities. Spatial redundancy reduces the density of transitions by introducing additional bus lines. For instance, bus invert algorithm can be implemented by adopting spatial redundancy wherein one extra bus line is used to notify a receiver of the phase of transferring data. On the other hand, extra cycles are used to make transition density sparse which is said to be temporal redundancy. In bus invert scheme based on temporal redundancy, the inversion of the following data bits is determined by an additional control bit transferred ahead of data bits. If we use one additional cycle as a temporal redundancy in 8-bit bus (N = 8), then the entropy rate H = 8 and R = H + 1 = 9 for randomly distributed independent data. According to Theorem 4.2, the lower limit of couplings per bus cycle is 2.1442 couplings/symbol. For a bus line, we have 0.2680 coupling/bit in a bus cycle
5 5.1
Low Power Encoding Schemes General Codec Architecture
Fig. 6 illustrates a generic codec architecture for two bit signals. The encoder consists of two components: a predictor and an encoding function block E. The prediction function x b(n) is a function of past input values given by x b(n) = f (x(n − 1), x(n − 2), · · · , x(n − K)). We consider K = 1 for low-complexity codec architecture. The combinational function E reduces the average number of self transitions and coupled switching between y i (n) and yj (n). The encoding function E differs 9
Encoder
Decoder
xi(n)
x(n) i
E
yi(n)
Bus(i)
yi(n)
D
x(n) i
x(n) j
x(n) j
E
yj(n)
Bus(j)
yj(n)
D
x(n) j
Fig. 6. Low-power encoder-decoder framework
from the architectures proposed in [3], [19] in that it takes as input the data on adjacent buses to account for coupling effect. In general, an original input signal x i (n) in i-th bus line is encoded to yi (n) at cycle n based on the encoding function given by y i (n) = E(xi (n), x bi (n), xi−1 (n), x bi−1 (n), xi+1 (n), x bi+1 (n)), where xi−1 (n) and xi+1 (n) are signal in bus lines adjacent to x i (n). The input data xi (n) and the prediction function x bi (n) account for the reduction of self transition activities in yi (n). Signal integrity and coupling power depend on both the current value of neighboring signals (xi−1 (n) and xi+1 (n)) and the transition histories (b x i−1 (n) and x bi+1 (n)). As a mirror of the encoder, the decoder consists of two components: a decoding function block D and a register to keep the prediction function x b(n). The decoding logic function is given by x i (n) = D(yi (n), x bi (n), yi−1 (n), x bi−1 (n), yi+1 (n), x bi+1 (n)). The decoder D realizes the inverse function of the encoder E. In choosing a codec scheme, we need to take into account two major issues. The first criterion is a tradeoff between architecture complexity and encoding efficiency. Rent’s rule [20] states that there is a simple power-law relationship between the number of I/O terminals for a logic block and the number of gates contained in that block for a given degree of parallelism. It means that considerations for adjacent input pins are likely to add logic gates to a codec functional block. Increased number of logic gates in turn can induce overhead power consumption and longer propagation delay. Therefore, it should be ensured that benefits from data encoding are large enough to compensate for the overhead of a codec architecture. Secondly, the codec system should guarantee the unique decodability constraints, even in the presence of physical noise. The signal integrity can be ensured by asserting spatial redundancy (extra control lines) or temporal redundancy (extra clock cycles) or by selecting appropriate supply voltage, the size of transmitter and receiver, and the clock frequency.
10
inv
x0(n) x0(n-1)
y3(n) y4(n) y5(n)
x1(n) x1(n-1) x2(n) x2(n-1) x3(n) x3(n-1)
y6(n) y7(n)
x5(n) x5(n-1)
Encoder (E)
xi(n) xi(n-1)
qli
xi+1(n) xi+1(n-1)
x4(n) x4(n-1)
qhi
x6(n) x6(n-1) x7(n) x7(n-1)
qh0 ql1
CE qh1 ql2
CE qh2 ql3
CE qh3 ql4
CE
8-out-of-15
y2(n)
ql0
CE
y1(n)
Majority voter
Counter
inv
Coupling
x0(n-1) x1(n-1) x2(n-1) x3(n-1) x4(n-1) x5(n-1) x6(n-1) x7(n-1) inv_in x0(n) x1(n) x2(n) x3(n) x4(n) x5(n) x6(n) x7(n)
y0(n)
inv
qh4 ql5
CE qh5 ql6
CE qh6
inv_in Coupling counter
Coupling encoder (CE)
Fig. 7. 8-bit bus encoder for the coupling-driven bus-invert scheme.
5.2
The Coupling-Driven Bus-Invert Scheme
The bus-invert method [2] is limited to reduce transition activities while assuming that coupling power contribution can be ignored. However, coupling power becomes a dominant component of dynamic power as wires become thinner and taller. To reflect the technology trend appropriately, we propose a coupling-driven bus-invert method to tackle the coupling power reduction problem. The bus-invert method flips the data signal when the number of switching bits is more than half of the number of signal bits. In the same context, we invert the input vector, when the coupling effect of the inverted signals is less than that of the original signals. The problems are then how to accurately account for the coupling effect, and to effectively implement the scheme with low hardware overhead. Before addressing these issues, we state our assumptions. First, synchronous latches are located at the transmitter side, thus all the transitions shall take place at the same time on the bus. The simultaneous transitions exclude type III transitions by setting k 3 = 0. It means that the results we achieve are on the lower end of power saving. Second, statistics on the information source are not given in advance. Hence this scheme is suitable for data bus encoding, where it is difficult to extract accurate probabilistic information off-line. Enumeration method is employed to represent the coupling effect. If a bus line l i is located between two other lines, a signal transition on l i can trigger charge shifts on both coupling capacitances connected to li−1 and li+1 , respectively. In other words, at most two couplings can be initiated by a signal transition. Thus, 2(N − 1) bits are sufficient to represent the whole set of couplings in an 11
L
H
l0 l1
L
L H
Cx Vdd
L
l0 l1
Cs Vdd
H
l2
L
L
l2
H
l3
Cx Vdd
Cs Vdd
H
l3 L
L
Cs Vdd
2Cx Vdd
H
linv
linv Cs Vdd
L
(a) transmission without inversion
(b) transmission with inversion
Fig. 8. The CBI scheme versus the BI scheme
N -bit bus per bus cycle, because there are (N − 1) coupling capacitances and each capacitance can experience zero, one or two units of coupling if coupling intensity is assumed to be digitized. The encoder architecture is shown in Fig. 7. According to the types of correlated transition between neighboring buses, the coupling encoder generates a codeword as follows: 00 for a type III or IV transition, 01 for a type I transition, and 11 for a type II transition. The reason that we assign 11 to a type II transition is that switching in different directions requires to change the polarity of the charge stored in the coupling capacitance, hence consuming about twice the amount of charge required for a type I transition. The codeword 11, instead of 10, helps to make a decision on data inversion using a majority voter, because the majority voter outputs high when at least eight input lines are high out of fifteen inputs. The majority voter is implemented by using full-adder circuitry [2]. The control signal inv can be transmitted to the receiver using extra bus lines or extra transfer cycles. One problem of additional bus lines for control is the area overhead that may not be allowed due to physical constraints. In some cases, widening the space between signal bus lines can reduce the coupling effects more effectively than introducing extra control lines, because the coupling capacitance is inversely proportional to net space. Temporal redundancy is an alternative using extra clock cycles to transfer control signals. We assume that the input stream is transmitted in burst mode that enables us to accommodate temporal redundancy [21]. Example 5.1: Fig. 8 illustrates the difference between the CBI scheme and the BI scheme. Suppose we have 4-bit bus and one additional control line l inv to notify the inversion. Given an input vector transition from 0001 to 0010, the self capacitance C s of l2 is charged up, which corresponds to (C s · Vdd ), when the data is transmitted without inversion as shown in Fig. 8(a). The coupling capacitance between l1 and l2 is also charged up associated with (C x · Vdd ) since this is a type I transition. Type II transition between l2 and l3 accounts for (2Cx · Vdd ). Meanwhile, data inversion leads to a vector transition from 0001 to 1101 along with control line switching as shown in Fig. 8(b). There are three self capacitances to be charged up, and one coupling 12
capacitance experiences a type I transition. In the conventional BI scheme, only the self capacitance is considered in determining inversion. The amount of charge shift for the raw data transfer is (C s · Vdd ) as shown in Fig. 8(a). On the other hand, data inversion leads to greater charge shift on self capacitance, namely (3C s · Vdd ). Thus, the BI scheme sends data without inversion as shown in Fig. 8(a) because it is restricted to the charge shifts in self capacitance. However, the CBI scheme makes a different decision because the charge shift due to coupling capacitance is also taken into account. Raw data transfer as in Fig. 8(a) consumes (C s + 3Cx )Vdd , x while inverted data transfer as in Fig. 8(b) consumes (3C s + Cx )Vdd . Considering that the ratio C Cs is significant (namely, four) in deep sub micron technology, data inversion is favorable to raw data transfer in terms of power saving. Thus the CBI scheme inverts the data as shown in Fig. 8(b). The following theorem gives the coupled transition activity with independent data source where each bit switches independently with transitional probability p, when the coupling-driven bus-invert algorithm is applied. Theorem 5.1: Let {Bi } be an N -tuple of 0-1 valued random variables such that B i = < b0 , b1 , · · · , bN −1 >, in which a bit bk (0 ≤ k ≤ N − 1) switches independently of other bits with transitional probability pk = p (0 ≤ k ≤ N − 1). Suppose {Bi } are encoded with the coupling-driven bus-invert algorithm with one-cycle redundancy. Then, the coupled transition activity Z per bus cycle is given by N − 1 N · (2N − 3) N −2 Z = − N · (19) N 2 2 · (N − 2) 2 if transitions on {Bi } follow the binomial distributions. Proof: The coupling-driven bus invert method flips input signal when the number of couplings in an input vector is greater than that of inverted input. The number of couplings for the number of switching bits is shown in Fig. 9(b). If the number of switching bits m is greater than N2 , then data inversion is favorable which results in (N − m)-bit switching. Since the occurrence probability of m-bit switching and that of (N − m)-bit switching are the same as shown in Fig. 9(a), the average number of couplings in (N − m)-bit switching doubles by flipping the input stream as illustrated in the columns “CBI encoded data” in Fig. 9(b). Meanwhile, if m is N2 , then we need to consider the transitions for each case. There is no gain from inverting signals when at least one of the border lines (line 0 or line N − 1) is switching, because transitions in a border bus line have only half the coupling effect compared to the transitions in inner bus lines asymptotically. However, if all the switching take place in inner bus lines, signal inversion
13
reduces the coupling transition activity by
1 2N
·
N −2 N 2
. Therefore,
Probability
number of couplings
−1 NX 2 1 N −1 Z = N · 2 (N − 1) 2 m−1 m=1 N −2 N −1 − + (N − 1) N N 2 −1 2 N − 1 N · (2N − 3) N −2 = − N · N 2 2 · (N − 2) 2
0
1
2
3
4
5
6
7
8
CBI encoded data
0
Number of switching bits (a)
(20)
1
Raw data
2
3
4
5
6
7
8
Number of switching bits (b)
Fig. 9. (a) Occurrence probability; (b) number of couplings
Example 5.2: The average number of couplings per bus cycle for an 8-bit bus is evaluated to be 2.484 which is close to the lower limit Z min = 2.144 with one-cycle redundancy as computed in Example 4.1. According to Equation (16), the average number of couplings per bus cycle for randomly distributed, independent data is 3.5, since p = 0.5 and N = 8. Hence, the percentage reduction in ∗100 when we use the couplings over raw data transmission is 29% which is calculated as 3.5−2.484 3.5 coupling-driven bus-invert method. The coupling-driven bus-invert scheme can employ multiple cycles to notify the receiver whether the transmitted data are inverted or not. Multiple extra cycles for control signals imply that the granularity of unit codeword to be processed becomes finer, hence less couplings are likely to occur on the bus. However, this benefit comes at the cost of extra clock cycles.
6
Experimental Results
The encoding scheme proposed was implemented and power consumptions were measured by using HSPICE. All the parasitics were extracted from the physical layout in 0.18µm technology. 14
TABLE I Percentage reduction in transition activity compared to raw data transmission using the bus-invert method (BI) and the coupling-driven bus-invert method (CBI) with one-cycle redundancy for 8-bit bus.
Data
BI [2] XP SP
TP
XP
CBI SP
TP
V1
22.2
25.5
22.7
30.1
16.9
28.2
V2
24.6
26.2
24.8
31.7
18.8
30.0
V3
23.4
25.1
23.6
31.2
17.1
29.4
A1
26.8
25.8
26.7
33.7
19.5
32.0
A2
26.6
26.7
26.6
34.0
20.4
32.3
A3
27.0
25.5
26.8
34.2
19.5
32.4
P1
24.2
23.8
24.2
32.4
16.9
30.5
P2
23.3
21.7
23.1
30.6
16.0
28.8
P3
21.4
19.0
21.1
29.0
13.2
27.0
Avg.
24.3
24.2
24.3
31.8
17.5
30.0
Applied data streams consist of MPEG video files (V1, V2 and V3 in Table I), MP3 audio files (A1, A2 and A3) and PDF format files (P1, P2 and P3). Table I shows the comparison of proposed coupling-driven bus-invert (CBI) method with the conventional bus-invert (BI) method. All the values in Table I are represented by the percentage reduction compared to raw data transmission without any encoder/decoder. To compare the CBI scheme with the BI scheme fairly, we use one extra cycle to transfer control signals per eight data cycles. So, transmission of eight bytes takes nine cycles. And we implemented the majority voter by using full-adder circuitry as in [2]. The reason that we compare the CBI scheme with the BI method is that the BI method does not require probabilistic information in advance. The results under the column SP present the percentage reduction in self transition activity, which (X 0 ) ∗100, where SP (X) denotes the number of self transitions for raw are computed by SP (X)−SP SP (X) 0 input streams and SP (X ) denotes the number of self transitions for encoded data. The results under the columns XP and TP present the percentage reduction in coupled transition activity and total transition activity, respectively. The total transition activity accounts for both self transition activity and coupled transition activity with a capacitance ratio η = 4. The CBI scheme yields better results than the BI method in terms of coupled transition activity, while the BI scheme shows better reduction in self transition activity. Because of the significant capacitance ratio η, total savings in transition activity in the CBI scheme is greater than the BI scheme. The CBI results show 31.8% average coupling reduction which is in accordance with our theoretical bound 29% in Example 5.2. The deviation comes from the locality and correlations in input data 15
TABLE II Power consumed by interconnects P(I) in µW and codec circuitry P(E) in µW under various capacitance values in pF.
Cx
Cs
Raw P(I)
P(I)
CBI P(E)
P(T)
1.0
0.25
111
66
19
86
22.5
3.0
0.75
319
191
20
210
34.2
5.0
1.25
526
315
21
336
36.1
% Red
streams, which are assumed to be negligible in theoretical analysis. Table II presents the power consumed by both the codec circuitry and bus lines for the CBI scheme. Using 0.18µm technology, power is measured for random data streams using HSPICE with 1.6V power supply and the capacitance ratio η = 4 that is a realistic value [14]. The results under column P (I) correspond to power in µW consumed by bus lines, and the results under column P (E) in µW correspond to the power overhead due to encoder/decoder circuits. In raw data transmission, the total power consumption corresponds to the P (I). On the other hand, in our encoding scheme, the total power consumption under column P (T ) are the sum of interconnect power P (I) and encoder/decoder power P (E). It should be pointed out that within a realistic range of coupling capacitance, the power overhead P (E) due to the codec are relatively small. The percentage of power savings are shown in the column “% Red” by using the coupling-driven bus-invert encoding scheme.
7
Conclusion
Tightly coupled on-chip buses in a system-on-a-chip impose new requirements for interconnect power reduction and signal integrity. We obtained the lower and upper bounds on coupling between adjacent interconnects for randomly distributed, independent data streams using information theory. Novel bus encoding scheme was proposed for reducing power consumed by on-chip buses by decreasing coupled switching. The coupling-driven bus-invert scheme reduces power consumption about 30% with one-cycle redundancy. Simulation results by using HSPICE indicate that the portion of power consumed by the encoder/decoder logic block for the scheme is fairly small in state-of-the-art technology. Therefore, the overhead due to encoder/decoder circuitry is compensated by significant interconnect power savings.
References [1]
J. Cong, “An interconnect-centric design flow for nanometer technologies,” in Int. Symp. VLSI Technology, Systems, and Applications, June 1999, pp. 54–57.
16
[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]
M. R. Stan and W. P. Burleson, “Bus-invert coding for low-power I/O,” IEEE Transactions on VLSI Systems, vol. 3, no. 1, pp. 49–58, Mar. 1995. L. Benini, A. Macii, E. Macii, M. Poncino, and R. Scarsi, “Synthesis of low-overhead interfaces for power-efficient communication over wide buses,” in Proc. ACM/IEEE Design Automation Conf., 1999, pp. 128–133. P. R. Panda and N. D. Dutt, “Reducing address bus transitions for low power memory mapping,” in Proc. European Design and Test Conf., Mar. 1996, pp. 63–37. H. Mehta, R. M. Owens, and M. J. Irwin, “Some issues in Gray code addressing,” in Proc. the Great Lakes Symp. VLSI, Mar. 1996, pp. 178–180. C. L. Su, C. Y. Tsui, and A. M. Despain, “Saving power in the control path of embedded processors,” IEEE Design and Test of Computers, vol. 11, no. 4, pp. 24–30, 1994. L. Benini, G. De Micheli, E. Macii, D. Sciuto, and C. Silvano, “Asymptotic zero-transition activity encoding for address busses in low-power microprocessor-based systems,” in Proc. the Great Lakes Symp. VLSI, 1997, pp. 77–82. E. Musoll, T. Lang, and J. Cortadella, “Working-zone encoding for reducing the energy in microprocessor address buses,” IEEE Transactions on VLSI Systems, vol. 6, no. 4, pp. 568–572, Dec. 1998. S. Ramprasad, N. R. Shanbhag, and I. N. Hajj, “Information-theoretic bounds on average signal transition activity,” IEEE Transactions on VLSI Systems, vol. 7, no. 3, pp. 359–368, Sept. 1999. R. Hegde and N. R. Shanbhag, “Energy-efficiency in presence of deep submicron noise,” in Proc. IEEE/ACM Int. Conf. Computer Aided Design, 1998, pp. 228–234. Y. Zhang, W. Ye, and M. J. Irwin, “An alternative architecture for on-chip global interconnect: Segmented bus power modeling,” in Asilomar Conf. on Signals, Systems, and Computers, 1998, pp. 1062–1065. F. Catthoor, F. Franssen, S. Wuytack, L. Nachtergaele, and H. De Man, “Global communication and memory optimizing transformations for low power signal processing systems,” in VLSI Signal Processing VII, 1994, pp. 178–187. W. Fornaciari, D. Sciuto, and C. Silvano, “Power estimation for architectural exploration of HW/SW communication on system-level buses,” in Int. Workshop on Hardware/Software Codesign, 1999, pp. 152–156. Semiconductor Industry Association, “International technology roadmap for semiconductors,” 1999. P.-P. Sotiriadis and A. Chandrakasan, “Bus energy minimization by transition pattern coding (TPC) in deep submicron technologies,” in Proc. IEEE/ACM Int. Conf. Computer Aided Design, 2000, pp. 322–327. J. Cong and L. He, “Theory and algorithm of local-refinement-based optimization with application to device and interconnect sizing,” IEEE Transactions on Computer-Aided Design, vol. 18, no. 4, pp. 406–420, Apr. 1999. T. M. Cover and J. A. Thomas, Elements of Information Theory, New York, NY: John Wiley & Sons, 1991. R. Togneri, Information Theory and Coding, University of Western Australia, 1999. S. Ramprasad, N. R. Shanbhag, and I. N. Hajj, “A coding framework for low-power address and data buses,” IEEE Transactions on VLSI Systems, vol. 7, no. 2, pp. 212–221, June 1999. S. Landman and R. L. Russo, “On a pin versus block relationship for partitions of logic paths,” IEEE Transactions on Computers, vol. C-20, pp. 1469–1479, 1971. M. R. Stan and W. P. Burleson, “Two-dimensional codes for low-power,” in Proc. Int. Symp. Low Power Electronics and Design, Aug. 1996, pp. 335–340.
17
Response to Reviewers’ Comments We appreciate the comments and efforts of the associate editor and the reviewers in this paper. We agree with most of the review comments and have answered and modified the contents accordingly. The basic notations for different fonts are Bold face : the original comments from reviews, Plain : our answers to review comments, Italic face : the modified/added contents in the paper.
Answers to Reviewer 1
C 1.1: The authors should be commended on preparing a well-written and edited manuscript. That being said, there are still several minor typographical/grammatical errors that could be cleaned up with a thorough proof-reading of the manuscript. A 1.1: We fully agree with this comment and modified appropriately.
C 1.2: What is meant by “ cycle redundancy”? This term is used very extensively throughout the paper but is never adequately defined. This is particularly troubling as a naive interpretation can lead to questions regarding the ideas being presented here. This needs to be clarified in the introductory part of the paper. A 1.2: We agree your comments and added sentences in the paper. (Page 9) Two types of redundancies, namely, spatial redundancy and temporal redundancy, can be used to mitigate transition activities. Spatial redundancy reduces the density of transitions by introducing additional bus lines. For instance, bus invert algorithm can be implemented by adopting spatial redundancy wherein one extra bus line is used to notify a receiver of the phase of transferring data. On the other hand, extra cycles are used to make transition density sparse which is said to be temporal redundancy. In bus invert scheme based on temporal redundancy, the inversion of the following data bits is determined by an additional control bit transferred ahead of data bits.
C 1.3: The information theoretical discussion may be out of the comfort zone for the average TVLSI reader. The ideas and terms are defined only using equations and definitions that will give the reader little, if any, intuition into the ideas trying to be expressed. A better introduction would be nice. 18
A 1.3: According to your comments, we clarified the definition and added comments. (Page 5) We define an information source, called an ensemble X, as a collection of discrete symbols, x, representing an indication, an event or an indivisible and definable quantity or operation that the source transmits. Each symbol in an ensemble is associated with probability mass function p(x) = P r{X = x}, x ∈ X [17]. Thus the information source can transmit any one of different symbols with a given probability. The information content of each symbol is quantified in a way that the length of a binary codeword is proportional to the inverse of the probability of occurrence, because the more unexpected events are, the more information is conveyed (namely, the more surprising events) [18]. The quantity of a symbol is represented by self-information such that I(x) = log 2
1 bits. p(x)
(21)
The self-information can be readily derived in the special case of equi-probable events. Suppose that there are a total of q symbols in the alphabet, then the number of bits, N , to represent all q symbols is q = 2N . Now with equi-probable symbols we have p(x i ) = q, so the number of bits to represent 1 the symbol (quantity of information content) N = I(x) = log 2 p(x) bits. The definition of selfinformation can be extended to quantify the information of the source by considering all the symbols in the alphabet. The entropy, namely, average information, of an ensemble X is defined by H(X) = −
X
p(x) · log 2 p(x).
(22)
x∈X
The entropy is expressed in bits with the convention that for p(x) = 0, 0 · log (1/0) = 0. The entropy measures the information content or uncertainty of the information source X. Equation (9) is originated from the entropy of statistical thermodynamics, thus the term of entropy is used. As extensions to an information source, the joint entropy is derived for multiple different source alphabets that operate jointly or concurrently. The joint entropy H(X, Y ) of a pair of discrete random variables (X, Y ) with a joint distribution p(x, y) is defined as H(X, Y ) = −
XX
p(x, y) · log p(x, y).
(23)
x∈X y∈Y
The joint entropy H(X, Y ) represents the average bits of information per joint pair of symbols, (xi , yi ). The conditional entropy H(Y |X) of a pair of discrete random variables (X, Y ) with a conditional distribution p(y|x) is defined as H(Y |X) = −
XX
x∈X y∈Y
19
p(x, y) · log p(y|x).
(24)
A chain rule holds for the entropy given by H(X, Y ) = H(X) + H(Y |X). As a natural extension of the entropy to a Markov source, the entropy rate of a stochastic process {X i } is defined by 1 H(X1 , X2 , · · · , Xn ) n→∞ n
H(X ) = lim
(25)
when the limit exists.
C 1.4: The abstract implies that several encoding schemes will be described but there appears to be only one (CBI). This should be clarified. A 1.4: We agree these comments and clarified the abstract. (Page 1) Coupling effects between on-chip interconnects must be addressed in ultra deep submicron VLSI and system-on-a-chip (SoC) designs. We obtain the lower and upper bounds on coupling effects for randomly distributed, independent data streams based on the information theory. Novel lowpower bus encoding scheme is proposed to minimize coupled switchings which dominate the on-chip bus power consumption. The coupling-driven bus invert method employs encoder and decoder that are geared to minimize the coupled transition activities. Experimental results indicate that our encoding method saves effective switchings as much as 30% in an 8-bit bus with one-cycle redundancy.
C 1.5: Define the wire aspect ratio (its meaning should be obvious but nonetheless it should be defined in terms of length and width). A 1.5: We added the definition of the aspect ratio to avoid plausible confusion. (Page 2) For example, the wire aspect ratio (namely, wire thickness/width) is expected to be over 2.4.
C 1.6: The model derived in section 2 is a bit confusing. You seem to have ignored the effect of spacing on self-capacitance as the fringing contribution will be a function of the spacing due to its distribution between self- and coupling-capacitance components. A 1.6: Thanks to this comment, we make the notations more clear as follows. (Page 2) One can model the physical capacitance of a wire with width w, length l, overlapping segment length y, and edge-to-edge distance d such that y · dmin (26) Ctot = (Ca · w + 2Cf ) · l + Cx · d 20
where Ca is unit area capacitance (F/cm2 ) to substrates, Cf is unit length fringe capacitance (F/cm) and Cx is unit length coupling capacitance (F/cm) with minimum spacing d min . To be precise, Cf and Cx are not constant due to complex fringing effects [16], but for our purpose the model is adequate. The area and fringe capacitance are barely independent of the distance d, which is referred to as the self-capacitance Cs . Thus, we have a wire capacitance model as shown in Fig. 2.
C 1.7: The initial description of Type II transitions should mention that they are larger than k1 by a factor of (k2/k1), not k2. A 1.7: We rectified the sentence as the comment pointed out. (Page 3) The effective capacitance will be larger than k 1 by a factor of (k2 /k1 ) the value of which is usually two.
C 1.8: The initial definition of the transition encoding needs to be improved: the symbols of “0” and “1” are doing the encoding of the transitions and not the other way around. A 1.8: Originally, transition encoding is introduced to explain generic architecture of encoder/decoder framework, but not specifically adopted in our implementation. So, to improve the readability of the paper, we omitted the transition encoding.
C 1.9: The definition of delta-prime(m) needs to be improved, a figure would definitely help with the definition (and probably shorten the text). A 1.9: As advised in this comment, we simplified the definitions and made the paper be consistent. (Page 7) As a basis, Fig. 5 shows the asymptotic coupling charge variation when one bit out of N bit lines switches. There are two types of transitions in view of the coupling charge variation. On one hand, when a bit line li (1 ≤ i ≤ N − 2) that is located between two lines switches as shown in Fig. 5(a), either charging or discharging takes place in two coupling capacitance, namely C xi−1,i and Cxi,i+1 , respectively. Given uniform distribution of switching line selections, the probability of selecting one switching line among N bits is (N − 2)/N . Because two coupling capacitances are associated with this type of transition, 2(N −2)/N coupling charge variation takes place on average. On the other hand, if the switching bit line (l 0 or lN −1 ) is neighbored by only one bus line, then there is only one coupling capacitance of which charge is affected by the switching as shown in Fig. 5(b). Since there are two border bus lines, the probability of such events is 2/N . Thus, the 21
asymptotic coupling charge variation is 2/N . To sum up, the asymptotic coupling charge variation for one bit switching δ(1) is 2(N − 1)/N .
C 1.10: Is there an error in equation 16? When I try to re-derive it using your methods I get a different answer (Z = (N − 1)p + (N − 1)p N ). A 1.10: I think there is no error in the original derivation. As an example, let us suppose that the P m N −m becomes 1/2 (corresponding probability is p=1/2, and N=2. Then Z = N m=1 δ(m) · p · (1 − p) to our derivation (N − 1) · p), while the reviewer’s expression of Z = (N − 1)p + (N − 1)p N gives 3/4.
C 1.11: What do the in-line equations given in Section 5.1 for yi(n) and xi(n) mean? Where does the nomenclature that is being used in their expression come from? Is it a typo? A 1.11: We corrected the notation as follows. (Page 10) In general, an original input signal x i (n) in i-th bus line is encoded to yi (n) at cycle n based on the encoding function given by y i (n) = E(xi (n), x bi (n), xi−1 (n), x bi−1 (n), xi+1 (n), x bi+1 (n)), where xi−1 (n) and xi+1 (n) are signal in bus lines adjacent to x i (n). C 1.12: Why are 2(N-1) bits sufficient to represent the whole set of couplings in a N-bit bus per bus cycle? A 1.12: It is because there are (N − 1) coupling capacitances and each capacitance can experience zero, one or two units of coupling if coupling intensity is assumed to be digitized. (Page 11) Thus, 2(N − 1) bits are sufficient to represent the whole set of couplings in an N -bit bus per bus cycle, because there are (N − 1) coupling capacitances and each capacitance can experience zero, one or two units of coupling if coupling intensity is assumed to be digitized.
C 1.13: How does the notion of using a resistor string and voltage comparator fit into a lowpower framework? A 1.13: Actually, we implemented a majority voter using a full-adder circuitry of which results are shown in Table II. As you pointed, it may not be appropriate to use a resistor string and voltage comparator for low power application, thus, we omitted that sentence. 22
C 1.14: What is the setup of your experimental results? Were parasitics extracted on the logic encoders? I would assume that they were but this is never stated in the text. A 1.14: Our experimental setup is specified as advised. (Page 14) The methods proposed were implemented and power consumptions were measured using HSPICE. All the parasitics were extracted from the physical layout in 0.18µm technology.
C 1.15: What is the eta ratio for the data in Table 1? A 1.15: In this paper, we assumed that the capacitance ratio η is four. (Page 5) We assume that the capacitance ratio η is four in 0.18 µm technology in this work.
23
Answers to Reviewer 2
C 2.1: The analysis is many times pedantic and formal when a simpler explanation may suffice, while the opposite is also true many times, many terms are used without explanation, etc. Here is an examples: in Lemma 4.1 the terms “interactions” and “couplings” are used without definition and the formula that “couplings” are 1/2 of “interactions” is used. I couldn’t follow what this means and why it is obvious. A 2.1: Thanks to the reviewer’s comment, we clarified and simplified our theorem and definitions as follows. (Page 7) Definition: Asymptotic coupling charge variation δ(m) is the cycle-averaged charge variation in coupling capacitance caused by the transitions in m-bit lines out of N bits. Asymptotic coupling charge variation is represented as multiples of k 1 . In order to derive bounds on the coupled transition activity, we will employ Lemmas 4.1 and 4.2 below. Lemma 4.1 provides the asymptotic coupling charge variation when m out of N bits switch during a cycle. Lemma 4.2 gives the coupled transition activity when each bit switches, independently of the other bits, with the same transitional probability p. Theorem 4.2 bounds the coupled transition activity based on Theorem 4.1 and Lemma 4.2 for independent data source. Lemma 7.1: Let {Bi }, 0 ≤ i ≤ N − 1, be N -bit data items. If m bits out of the N bits switch, the asymptotic coupling charge variation δ(m) is given by δ(m) = 2m ·
N −1 N
(27)
Proof: As a basis, Fig. 5 shows the asymptotic coupling charge variation when one bit out of N -bit lines switches. There are two types of transitions in view of the coupling charge variation. On one hand, when a bit line li (1 ≤ i ≤ N − 2) that is located between two lines switches as shown in Fig. 5(a), either charging or discharging takes place in two coupling capacitance, namely C xi−1,i and Cxi,i+1 , respectively. Given uniform distribution of switching line selections, the probability of selecting one switching line among N bits is (N − 2)/N . Because two coupling capacitances are associated with this type of transition, 2(N − 2)/N coupling charge variation takes place on average. On the other hand, if the switching bit line (l 0 or lN −1 ) is neighbored by only one bus line, then there is only one coupling capacitance of which charge is affected by the switching as shown in Fig. 5(b). Since there are two border bus lines, the probability of such events is 2/N . Thus, the
24
asymptotic coupling charge variation is 2/N . To sum up, the asymptotic coupling charge variation for one bit switching δ(1) is 2(N − 1)/N . Now we can generalize the number of switching bits by m, such that 0 ≤ m ≤ N . The number of combinations for choosing m bits out of N bits is N m . When m bits switch, 2m coupling capacitances are involved in charge redistribution unless signal transitions occur on the border bus lines and switching lines are adjacent. Considering the cases in which the border bus lines switch and in which adjacent lines switch, we have the asymptotic coupling charge variation δ(m) such that N −2 N −2 2m · N m − 2 m−1 − 2 m−2 δ(m) = N =
2(N − 1) N m
m N −1 m−1
(28)
2m · (N − 1) = N In this work, the dynamic power consumption is our concern, thus, charging up the coupling capacitance has to be accounted for power consumption, while discharging event, such as 01→ 00 transition, is not considered. Thus, half of the asymptotic coupling charge variation, namely charging up the coupling capacitance, are only considered in computing the coupled transition activity. P δ(m) Therefore, the coupled transition activity Z is represented by Z= N m=1 2 ·q(m) where q(m) denotes the probability of an m-bit transition out of N bits.
C 2.2: There seems to be a lot of confusion between the authors about several important decisions: Fig. 5 seems to suggest that “correlators” and “decorrelators” are used for transition signaling but the coupling encoder in fig. 6 makes me think that transition signaling should not be used. Which is it? If you *are* using transition signalling then I don’t think fig. 6 works, for example if you go from 00 → 11 before transition signaling, then after signaling this could mean that you go from 00 → 11 (OK), or it could mean that you go from 01 → 10 (bad). So which is it, are you using transition signaling or not? A 2.2: The concept of transition encoding and corresponding correlator/decorrelator is intended to introduce the generic architecture of bus encoding schemes. However, as you pointed out, we do not employ transition encoding in practice. To avoid possible ambiguities, we clarified the figure and eliminated the paragraph describing the transition encoding.
C 2.3: There are statements in the paper that only “charging” capacitances are counted as transitions, not discharging, because you are only interested in power consumption. While 25
there is nothing wrong in principle with this, for coupling capacitances it may not make sense. For example in equation (6) you count 11 → 10 as a “coupling transition” although it does not require any current from Vdd, but you don’t count 01 → 11 although it does require current from Vdd. Even more important it seems that your Coupling encoder in figure 6 actually counts *all* transitions! So again, which is it, are you counting all transitions or not? A 2.3: We consider only up-charging transitions with respect to the coupling capacitance. (Page 4) We consider only charging up the coupling capacitance in computing dynamic power dissipation. Suppose that two adjacent lines have a signal transition from 11 to 10. In the initial state of 11, both signals are high, thus there is no potential across the coupling capacitance (∆V = 0). So, the coupling capacitance contains no charge when signals start to switch. At the end of transition, signals switch to 10. One signal is high, while the other signal is low. Such a potential difference across the coupling capacitance (∆V = V dd ) leads to charge up the coupling capacitance. Thus, the signal switching from 11 to 10 is accounted for dynamic power consumption. On the other hand, the discharging transition for the coupling capacitance is not taken into account for dynamic power consumption. One example is the transition from 10 to 11. Initial potential across the coupling capacitance is Vdd , which decreases to 0 at the end of transition. In other words, the coupling capacitance is discharged when such input vectors are applied. Because the power supply is used as a reference of power consumption, we do not consider discharging events in dynamic power computation.
C 2.4: Fig. 1 is wrong, the direction for the bus lines should not be alternating left/right. A 2.4: We rectified the figure as advised.
C 2.5: Many sections can be easily removed: • equation (1) • section 3 (this has been published multiple times in all the authors’ papers) • figure 4. • in general the analysis part can be easily trimmed down. A 2.5: We appreciate these comments and trimmed some parts of the paper. However, after reading the reviewer 1’s comment, we think that the introductory part of the information theory is still necessary for this paper, so that part is clarified to make sense.
26
C 2.6: the paper is hard to read at times because of odd expressions: • abstract: slim (?) logic blocks • page 1: noises • page 3: k2 the value of which is usually two. • page 5: one in which where • page 7: trasition • page 10: swithcing • page 12: cotrol • page 12: multiple sentences on this page are odd!!! A 2.6: The sentences are clarified as pointed out in this comment.
C 2.7: In section 2 you should state that in the rest of the paper K2 = 2 and K3 = 0 A 2.7: We added the statement as follows. (Page 4) Throughout this paper, we assume k 2 /k1 = 2, and k3 = k4 = 0.
C 2.8: figure 6 has a few problems: • the input order for the CE boxes in the coupling counter and the expanded CE box are different. • the expanded coupling counter has inv in as input but the box in the Encoder does not. • bits 0 and 7 are “treated” differently, I’m not sure I understand the difference between them. If is has to do with the inv. line then you probably need to also count “couplings” there. A 2.8: According to the suggestion, we made the input signal orders in each block be consistent. As the reviewer commented, bits 0 and 7 are handled differently, because they are border line (in other words, they have just one adjacent line, thus coupling effects caused by bit line 0 or 7 have to be accounted differently from bits 1 through 6.
C 2.9: Fig. 7 also has problems: • from the figure 7 and from the text (switching bits > N/2) it seems that you invert when you have > 4 switching bits? This is exactly what the original bus invert does, so what is the difference? Or what is wrong? 27
The worst case couplings are also taken care of by the regular bus invert. Actually I was trying to understand in which case the regular bus invert would invert and the coupling one would not, and vice-versa? I would like to see two such examples where the two “bus inverts” take opposite decisions to better understand the difference between them. •
A 2.9: New examplary figure and explanation for differentiating the CBI scheme from the conventional BI scheme is added in the revised paper as follows. (Page 12) Fig. 8 illustrates the difference between the CBI scheme and the BI scheme. Suppose we have 4-bit bus and one additional control line l inv to notify the inversion. Given an input vector transition from 0001 to 0010, the self capacitance C s of l2 is charged up, which corresponds to (C s · Vdd ), when the data is transmitted without inversion as shown in Fig. 8(a). The coupling capacitance between l1 and l2 is also charged up associated with (C x · Vdd ) since this is a type I transition. Type II transition between l2 and l3 accounts for (2Cx · Vdd ). Meanwhile, data inversion leads to a vector transition from 0001 to 1101 along with control line switching as shown in Fig. 8(b). There are three self capacitances to be charged up, and one coupling capacitance experiences a type I transition. In the conventional BI scheme, only the self capacitance is considered in determining inversion. The amount of charge shift for for the raw data transfer is of (C s · Vdd ) as shown in Fig. 8(a). On the other hand, data inversion leads to more charge shift on self capacitance, namely (3C s · Vdd ). Thus, the BI scheme sends data without inversion as shown in Fig. 8(a). However, the CBI scheme makes a different decision because the charge shift due to coupling capacitance is also taken into account. Raw data transfer as in Fig. 8(a) consumes (C s + 3Cx )Vdd , x while inverted data transfer as in Fig. 8(b) consumes (3C s + Cx )Vdd . Considering that the ratio C Cs is significant (namely, four) in deep sub micron technology, data inversion is favorable to raw data transfer in terms of power saving. Thus the CBI sheme inverts the data as shown in Fig. 8(b).
C 2.10: Experimental results: • how did you implement the majority voter? Need moer details. • why in Table II, the P(E) column is first decreasing and then increasing (20, 19, 21)? • fig. 8: how can the asymptotic reduction be 40theory says 29in the last sentence, I was completely confused by that statement. A 2.10: As advised, we added a comment about the majority voter and fixed a typo in Table II. And to improve the readability and avoid possible confusion as the reviewer pointed out, Fig.8 is omitted.
C 2.11: What is the overhead of this “coupling” bus invert compared to regular bus invert? The 28
difference in transition activity savings between the two is not very much and if the overhead is much more for the coupling case it may not be worth the extra complexity. A 2.11: The overhead of the coupling driven bus invert scheme is the Coupling Encoder (CE) and bigger majority voter (8-out-of-15 for the CBI scheme vs. 5-out-of-9 for the BI scheme) as shown in Fig. 7. This overhead is relatively insignificant as indicated in Table II that can be compensated as the capacitance ration increases.
29