ILP Modelling of the Common Subexpression Sharing Problem. Oscar Gustafsson and Lars Wanhammar. Department of Electrical Engineering. Linköping ...
ILP Modelling of the Common Subexpression Sharing Problem Oscar Gustafsson and Lars Wanhammar Department of Electrical Engineering Linköping University, SE-581 83 Linköping, SWEDEN E-mail: {oscarg, larsw}@isy.liu.se
ABSTRACT Subexpression sharing is an important implementation issue when one data is multiplied with many constants or a sum of products is computed. By modelling the subexpression sharing problem using integer linear programming (ILP) an optimal solution can be found. Further, the model can be directly incorporated with the design of algorithms that have linear design constraints, e.g., linear-phase FIR filters. The proposed method is compared with previously reported algorithms. It produces better results than other subexpression sharing methods, even though it is still not comparable with the optimal method based on graph representation. However, the possibility to expand the ILP model beyond subexpression sharing is discussed. This would then produce identical results to the optimal adder graph method.
1. INTRODUCTION In many DSP algorithms one data value is multiplied by several constants. One typical example is the transposed form FIR filter, where one input data is multiplied with the filter coefficients as shown in Fig. 1. By expressing the multiplication by shifts and additions or subtractions a multiplierless realization is obtained that is efficient for implementation. As additions and subtractions are similar operations we consider them as equal and use addition for both operations. The hardware cost can be further decreased by utilizing redundancy between the coefficients [1]–[7]. This is known as multiple constant multiplication (MCM). The previous work in this area can be divided in two techniques. The first technique is based on pattern matching techniques [3]–[7] and the result depends on the initial representation of the constants. These algorithms are often referred to as using subexpression sharing or subexpression elimination. The second technique is independent of the representation and only considers the values after each addition [1], [2]. The algorithm presented in [2] utilizes a graph representation and is shown to be optimal in terms of minimum number of additions. Most of the algorithms based on pattern matching can be applied to other subexpression sharing applications than multiplications, such as Hadamard matrix evaluation. In this paper only the MCM aspects are covered. However, a similar approach can be used for other subexpression sharing areas. Note that by transposing the signal flow graph a sum-ofproduct network is obtained, i.e. the problem of sum-of-
x(n) T
T
TT
y(n)
Fig. 1. Transposed form FIR filter.
products, e.g. for a direct form FIR filter, can be solved in the same way. In this work we formulate the subexpression sharing problem as an integer linear programming (ILP) problem. Thus, we can incorporate the number of additions required in the design process of systems that can be described using linear constraints, e.g. FIR filters [8], to obtain an optimal solution. The work are initially in the first group of algorithms, but we also discuss how to extend the problem formulation to the extra degrees of freedom given by not using pattern matching. In [7] an ILP problem was also formulated but only subexpressions with two non-zero bits were used. Further, no discussion on how to improve the results was presented.
2. SUBEXPRESSION SHARING If a constant can be expressed using F non-zero bits B–1 additions are required to realize a multiplications of that constant. However, if there are repeats of the same bitpattern (or subexpression) at multiple locations or across coefficients the number of additions can be decreased by computing the subexpressions and use these to realize the multiplication. If the constant now is expressed using G subexpressions and non-zero bits G–1 additions are required. Additional additions are now required to form the subexpressions. Thus, we want to state an ILP problem that minimizes the number of subexpression used plus the number of additions used to form the subexpressions. It is clear that the lower number of nonzero bits a constant can be represented by, the lower number of additions is required. By using canonic signed digit representation (CSD), on average 33% of the bits are nonzero compared with 50% using binary representation. In CSD a B-bit number is represented as B–1
c =
∑ ai 2 i
i=0
(1)
where a i ∈ { – 1, 0, 1 } and a i a i + 1 = 0, i ∈ [ 0, B – 2 ] , i.e. no two adjacent bits can be nonzero [9].
Consider a simple problem of three constants 3 = 101, 13 = 10101, and 25=101001, where 1 denotes –1. A direct realization would require five additions as shown in Fig. 2(a). However, by computing 3 = 101 first, this subexpression can be used to realizing the other constants using only three additions as shown in Fig. 2(b). The goal of common subexpression sharing can be formulated as first identify multiple patterns in the coefficient set, then remove these patterns and calculate them only once. By writing the problem as an ILP problem we can find an optimal solution as opposed to earlier work where heuristic algorithms were proposed. It should however be noted that there are work that is not based on subexpression sharing where better results are obtained [2].
(a) >>2
>>3
where s i = { – 1, 1 } , b j > b j + 1 + 1 , b 1 < B , and bC – 1 ≥ 2 . The index q corresponds to a one at bit b1, si bits at position bi for i ∈ [ 2, C – 1 ] and a sC bit at LSB. The weight of a subexpression is b1
+ s2 2
b2
+ … + sC – 1 2
bC – 1
+ sC
(3)
For example q = ( 4, – 1, 2, 1 ) corresponds to 10101 and vq = 13. We define a set Q b, c which contains all subexpressions with c non-zero bits using at most b bits.The number of used subexpressions is then C
Ks =
∑ ∑
c = 2 q ∈ Q B, c
eq
(4)
For Ks to correspond to the number of additions required to form the subexpressions, each subexpression must be derived from one of the lower order subexpressions. For example, the subexpression 100101 can be derived using one addition from the subexpressions 101, 1001, or 100001. Each lower order subexpression can be found by removing one pair of si and bi terms in q for i ∈ [ 2, C – 1 ] . When b1 is removed s2 should also be removed and all remaining si should be multiplied by s2. Finally, when sC is removed bC– 1 should also be removed and bC–1 should be subtracted from all remaining bi. For a C non-zero bit subexpression q we define a set L q ⊂ Q B, C – 1 which consists of the lower order terms that can form the subexpression. The relationship can be modelled as
>>2
>>4
3
13
>>3
25
13 25
Fig. 2. Multiplication with 3, 13, and 25 without (a) and
with (b) subexpression sharing.
∑
ql ∈ Lq
For a subexpression consisting of two non-zero bits one addition is required. If a subexpression with C non-zero bits is always formed from a subexpression with C–1 non-zero bits each introduced subexpression will require one new addition. A binary (or 0/1) variable, eq, is introduced for each possible subexpression. This means, that if eq is one, we are allowed to use the corresponding subexpression determined by q. For a B-bit subexpression with C non-zero bits q denotes an index consisting of 2(C–1) variables as q = ( b 1, s 2, b 2, …, s C – 1, b C – 1, s C ) (2)
>>5
>>2
3
3. ILP MODELLING
vq = 2
(b)
>>4
eq ≥ eq
(5)
l
This constraint can be viewed as if eq should be one, one of e q must be one. I.e., for the subexpression q to be used one l of the lower order subexpressions ql must be used. To use a subexpression, q, in a constant, a binary variable an,s,t,q is defined where n is the constant number 1 ≤ n ≤ N , s ∈ { – 1, 1 } is the sign, t is the position, and q is the subexpression index. The weight of the subexpression is now t w s, t, q = s2 v q . If B bits are used t is limited as t ≤ B – b 1 where b1 is the index from q. All possible subexpressions with at most c non-zero bits spanning over b bits for a constant n forms a set A n, b, c . This set is also defined for c = 1, where q = ∅ and w ∅ = 1 , so that single bits are also included in the set. The value of the b-bit constant, cn, can now be written as cn =
∑
( s, t, q ) ∈ A n, b
a n, s, t, q w s, t, q
(6)
i.e., the sum of all subexpression used, an,s,t,q, times their corresponding weights, ws,t,q. The total number of subexpression terms used to form the N constants are N
Kc =
C
∑ ∑
∑
n = 1 c = 1 ( s, t, q ) ∈ A n, b
a n, s, t, q
(7)
To use a subexpression q in a constant the corresponding variable eq must be one. We define a set Db,q = (s, t) that contains all possible combinations of s and t for a b-bit subexpression using q. The corresponding constraint can be written as ( s, t ) ∈ D b, q a n, s, t, q ≤ e q, n ∈ [ 1, N ]
(8)
for each subexpression q. Now, we can state an ILP problem as minimize K c + K s subject to
(5) (6) (8)
(9)
This problem can be solved using e.g. branch-and-bound techniques. The solution time will however be large due to the many variables and many degrees of freedom. To decrease the execution time the feature of CSD that no two adjacent bits can be non-zero at the same time is used. We define a set F n, p ⊂ A n, b such that a n, s, t, q belongs to Fn,p if there is a non-zero bit at position p. An additional constraint
∑
( s, t, q ) ∈ F n, p
a n, s, t, q +
∑
( s, t, q ) ∈ F n, p + 1
a n, s, t, q ≤ 1
(10)
where 0 ≤ p ≤ B – 2 can be added to significantly decrease the search space. Further, when the constants are known only variables where all non-zero bits of the subexpression have corresponding non-zero bits in the constants need to be considered. For example, considering the coefficient 10101, only the variables corresponding to 10000, 00100, 00001, 00101, 10001, 10100, and 10101 needs to be included. 3.1. Further Improvements Although this increases the solution time, the results can be improved when not restricting the constants to CSD representation. A simple example is the two coefficients 21 = 10101 and 7 = 01001. Using pattern matching no common subexpressions can be utilized and three additions are required as shown in Fig. 3(a). However, if (10) is not included a solution using two additions can be found using the subexpressions 100100 and 001001 as shown in Fig. 3(b). However, for most large problems the solution time will be very long. In a similar way, the derivation of higher order subexpressions can be improved. For example 10101 can not only be derived from 101 and 10001 as obtained from pattern matching, but also from 101 and 1001 using one addition as shown in Fig. 4. For more than three non-zero bits it is also possible to derive a subexpression with C non-zero bits using one subexpression with D non-zero bits and one with C – D – 1 nonzero bits. For the subexpressions 83 = 1010101 and 53 = 1010101 five additions are needed using the original formulation including (10) as shown in Fig. 5(a). However, if subexpression can be derived from two lower order subexpressions only four additions would be required as shown in Fig. 5(b).
4. RELATIONSHIP TO OTHER METHODS The method proposed in [5] uses only two types of subexpressions. These are 101 and 101 which are shown to be the statistically most common subexpressions. By only including variables for these two subexpressions, the method in this work will produce identical results. As our method enables other subexpressions to be used it will always produce as good or better results. The work in [6] can be seen as an extension of the work in [3] and [4]. The proposed algorithm uses pattern matching and identifies all patterns with a given number of non-zero bits. The pattern with highest frequency is selected and re-
(a) >>3
(b)
>>4
>>2
>>3
7
7
>>2
21 21 Fig. 3. Obtained solutions for 7 and 21 using (a)
original CSD solution and (b) without (10). >>4
>>2 >>2
17
>>4 >>3
7
5 21
>>2
3
>>2
>>3
21 21
21
Fig. 4. Possible ways to derive 21 from lower order
subexpressions. (a)
(b)
>>2 >>4 >>2
>>4 >>4
>>4
>>2 >>6
83 53 53 83 Fig. 5. Possible ways to obtain subexpressions 53 and 83 using (a) original CSD solution and (b) proposed improvement.
placed. When no patterns with the given number of non-zero bits has multiple occurrence, the number of non-zero bits are decreased and the algorithm started again. The algorithm can be seen as a greedy method of solving the problem stated in this work when (10) is used. The work in [7] introduced an ILP model of subexpression sharing using subexpressions with at most two non-zero bits. The results produced will be identical to that in this work if only two non-zero bits are used for each subexpression. As more non-zero bits can be used in the method proposed in our work, the results will be as good or better. The authors also introduces a greedy algorithm which is identical to that in [6] starting with two non-zero bits for the subexpressions. The work presented in [2] can be seen as an extension to the work in [1]. It is shown to be optimal in terms of number of additions. By introducing the new ways to derive higher order subexpressions described in Section 3.1 the method proposed in our work will produce identical results. However, this has not yet been implemented. All possible ways to derive higher order subexpressions has been discussed in [10], [11].
Number of additions This work Pasko [6] Yurdakul [7] Dempster [2] FIR filter 2 in [12] 23 23 23 22 FIR filter 3 in [12] 5 6 5 5 FIR filter in [13] 21 23 24 18 Table 1: Comparison of proposed method with recent results. [6] and [7] are based on subexpression sharing, while [2] is based on adder graphs. Example
5. RESULTS To evaluate the performance of the proposed method a number of different examples from the literature have been optimized and compared with the work in [2], [6], [7]. The results are shown in Table 1. For the algorithm in [6] either results reported in the paper or a local implementation of the algorithm is used. For the algorithms in [7] our proposed method with at most two non-zero bits is used for each subexpression, as there are no reproducible examples given in [7]. It is clear that as expected the proposed method performs better than the other methods based on subexpression sharing ([6] and [7]). Compared with the optimal method in [2] our method still requires a few more additions. This depends on the ways the constants and subexpressions are derived as discussed in Section 3.1.
6. CONCLUSIONS In this work an integer linear programming (ILP) model for subexpression sharing has been introduced. By using standard methods for ILP, e.g. branch and bound, this can be solved optimally. Further, for any algorithm that can be designed using linear constraints, e.g. linear-phase FIR filters [8], the models can be combined to obtain an optimal solution in terms of minimum number of additions. Only subexpression sharing related to multiple constant multiplication was considered, but similar approaches can be used for other application areas, e.g. Hadamard matrices or CRC-generation. The proposed method was compared with recently introduced methods and the results showed that our method was better that previously proposed methods based on subexpression sharing. However, methods based on adder graphs may still produce realizations with fewer additions. How to modify the problem to include adder graphs was also discussed.
7. REFERENCES [1] D. R. Bull and D. H. Horrocks, “Primitive operator digital filters,” IEE Proc. G, vol. 138, pp. 401–412, June 1991. [2] A. G. Dempster and M. D. Macleod, “Use of minimumadder multiplier blocks in FIR digital filters,” IEEE Trans. Circuits Syst. II, vol. 42, pp. 569–577, Sept. 1995.
[3] M. Mehendale, S. D. Sherlakar, and G. Vekantesh, “Synthesis of multiplierless FIR filters with minimum number of additions,” Proc. IEEE/ACM Int. Conf. Computer-Aided Design, Los Alamitos, CA, 1995, pp. 668–671. [4] M. Potkonjak, M. B. Shrivasta, and A. P. Chandrakasan, “Multiple constant multiplication: Efficient and versatile framework and algorithms for exploring common subexpression elimination,” IEEE Trans. Computer-Aided Design, vol. 15, pp. 151–161, Feb. 1996. [5] R. I. Hartley, “Subexpression sharing in filters using canonic signed digit multipliers,” IEEE Trans. Circuits Syst. II, vol. 43, pp.677–688, Oct. 1996. [6] R. Pasko, P. Schaumont, V. Derudder, S. Vernalde, and D. Durackova, “A new algorithm for elimination of common subexpressions,” IEEE Trans. ComputerAided Design, vol. 18, pp. 58–68, Jan. 1999. [7] A. Yurdakul and G. Dündar, “Multiplierless realization of linear DSP transforms by using common two-term expressions,” J. VLSI Signal Processing, vol. 22, pp. 163–172, Sept. 1999. [8] O. Gustafsson, H. Johansson, and L. Wanhammar, “An MILP approach for the design of linear-phase FIR filters with minimum number of signed-power-of-two terms”, European Conf. Circuit Theory Design, Espoo, Finland, Aug. 28–31, 2001. [9] K. Hwang, Computer Arithmetic: Principle, Architecture, and Design, New York: Wiley, 1979. [10]A. G. Dempster and M. D. Macleod, “Constant integer multiplication using minimum adders,” IEE Proc. Circuits Devices Syst., vol. 141, no. 6, pp. 407–413, Oct. 1994. [11]O. Gustafsson, A. G. Dempster, and L. Wanhammar, “Extended results for minimum-adder constant integer multipliers,” IEEE Int. Symp. Circuits Syst., Phoenix, AZ, May 26–29, 2002. [12]Y. C. Lim and S. R. Parker, “Discrete coefficient FIR digital filter design based upon an LMS criteria,” IEEE Trans. Circuits Syst., vol. 30, pp. 723–739, Oct. 1983. [13]A. G. Dempster, S. S. Demirsoy, and I. Kale, “Designing multiplier blocks with low logic depth,” IEEE Int. Symp. Circuits Syst., Phoenix, AZ, May 26-29, 2002.