and non-linear parts. Unfortunately, the node minimiza- tion problem is NP-complete even for the simpli ed case of all coe cients being unity. Nevertheless, signi ...
Global Node Reduction of Linear Systems Using Ratio Analysis Michael Sheliga Edwin Hsing-Mean Sha Dept. of Computer Science & Engineering University of Notre Dame Notre Dame, IN 46556
Abstract
our algorithm. The algorithm proposed in [2] obtains hardware savings by decomposing the binary representation of numbers. By reducing the number of ones in the binary representation of a coecient they are able to reduce the complexity of dedicated multipliers. Our method does not consider the complexity of multipliers, rather it attempts to minimize the number of nodes by reusing nodes in different equations. High level synthesis of linear systems has also been studied in [9, 7]; however, the results are concerned with critical path minimization and speed-up as opposed to node minimization. Nevertheless, the algorithms presented in [9] give us an upper bound on the number of nodes required for a given set of equations. An algorithm to reduce the number of nodes for a given DAG is applicable in many situations including high level synthesis of VLSI circuits and compiler design[1, 5]. It is also possible to decompose a non-linear system into linear and non-linear parts. Unfortunately, the node minimization problem is NP-complete even for the simpli ed case of all coecients being unity. Nevertheless, signi cant reductions in the number of nodes may be obtained using the multi-step algorithm that is presented. This algorithm is based on the ratio of terms in a given equation compared to the corresponding ratio in other equations. Section 2 presents the basic idea of the algorithm while section 3 illustrates the eectiveness of the algorithm using a well known benchmark. Finally, section 4 draws conclusions from the results obtained and discusses possible improvements.
Linear systems are widely used in mathematics and engineering. Constructing a minimal directed acyclic graph (DAG) that corresponds to a given linear system is important in high-level synthesis. It is shown to be NP-complete in this paper. Ratio analysis, a novel multi-step algorithm for constructing a small sized DAG is presented. Ratio analysis attempts to minimize the total number of nodes in a DAG by maximizing the sharing of nodes between equations. The rst part of the algorithm considers the ratio of terms in dierent equations. The second part looks at the dierence between these ratios and the nal equations. The third part generates the nal DAG. Results are shown that illustrate the eciency of the algorithm as well as the savings which are possible from the algorithm's application.
1 Introduction As the demand for more and more complex and powerful algorithms leads to larger and larger VLSI and ULSI circuits, the need for better algorithms to reduce the number of circuits has also increased. High level synthesis is one area that has shown much promise for reducing the number of circuits needed for a given algorithm. One area of high level synthesis that has not received much attention is that of minimizing the number of nodes needed to produce a DAG for an arbitrary set of linear equations using only multiplications and additions. Theoretical results for eciently executing a DAG by using associativity and distributivity on the PRAM model were presented in [8, 11]. These algorithms perform transformations locally and are hard to apply to the problem of constructing a minimal sized DAG. The technique of retiming can eectively reduce the length of the critical path by overlapping iterations [3], but it does not change the topology of a given data ow graph. Their algorithm, however, can be used after a small sized DAG is generated by using
2 The basic idea of ratio analysis To construct a minimal DAG for a set of linear equations is a dicult problem because it requires node sharing between equations. From examining the linear equations is not easy to determine which nodes should be generated so that these nodes can be shared by many equations. This problem is shown to be NP-complete by reducing the ensemble computation problem to it. The ensemble computation problem is shown to be NP-complete in [6]. We present the following theorem without proof. 1
A A
C
A
F
C
A
C
B
C Ratio Determination
30
15
40
2(A+B) + C
2A + C 20
A
B
C
F
30
33
15
3
A
C
B
5
Ratio Adjustment
20
15
40
24 3
OUT1
OUT2
OUT1
OUT1
(A)
4
DAG Completion
OUT2 (B)
OUT2
OUT1
OUT2
(B)
(A)
OUT1 = 30A + 15C OUT2 = 40A + 20C
OUT1 = 30A + 33B + 15C+3F OUT2 = 40(A+B) + 24C
Figure 1: An Example DAG Generated with the Simple Canonical Method and Ratio Analysis
Theorem 2.1 The decision problem of producing a minimum DAG for a given linear system is NP-complete. In [9] an algorithm is introduced for generating a DAG from a set of linear equations. The algorithm consists of multiplying each term in each equation by its coecient, and then adding the results together in a tree structure. We refer to this method as the simple canonical method. The algorithm does not permit node sharing except for transforming terms such as 40A + 40B into 40(A + B), and then eliminating common subexpressions that result from this transformation. Therefore, while the critical path of the resulting DAG is minimal, the node count is not. An example of a DAG that was generated by this method is shown in gure 1(A). Our method, known as ratio analysis, attempts to maximize the number of shared nodes between equations as in gure 1(B). In gure 1 the DAG generated by the simple canonical method requires four multiplication nodes and two addition nodes the while the DAG generated by ratio analysis requires only two multiplication nodes and two addition nodes. This is a common result of ratio analysis. The number of addition nodes stays nearly the same while the number of multiplication nodes decreases. Ratio analysis consists of a number of steps, some of which are NP-complete. Nevertheless the main algorithm may be divided into three parts. First, we determine what ratios between terms are common to dierent equations using ratio determination. Since most equations are not as simple as the ones in gure 1 other steps are required after we have used ratio determination. In the next step of ratio analysis we use ratio adjustment to modify the ratios based on the equations that they are associated with. Figure 2 shows a second DAG that has been generated using both the simple canonical method and ratio analyis. In this step we adjust the ratios that were generated
Figure 2: An Example DAG that Requires Ratio Adjustment
Our Algorithm: 1. Determine Ratios 2. Adjust Ratios to Match Equations 3. Complete DAG 3.1 Complete Equations 3.2 Generate Partial Equations 3.3 Eliminate Excess Nodes
Figure 3:
The Basic Algorithm
during ratio determination in order to generate the nal equations. The DAG generated using the simple canonical method requires six multiplication nodes and ve addition nodes while ratio analysis requires only three multiplication nodes and six addition nodes. If we assume a cost of one for adders and three for multipliers the DAG generated using the simple canonical method costs 23, but the DAG generated using ratio analysis costs only 15. We complete the DAG construction using DAG completion. The nal step is done by using each adjusted ratio to generate each equation it is associated with, generating partial equations which have been previously used but not generated, and eliminating excess nodes. These steps are shown in gure 3. The idea behind each of these steps is described below, however a detailed explanation of the algorithm which is in the full version is not given here due to space considerations. Also, for clarity of explanation we assume that each coecient is a nonnegative integer.
B
C
F
17
23
2
A
C
B
2(A+B) + C
5
3
4
OUT1 = 17B + 23C + 2F OUT3 = 40(A+B) + 24C OUT2 = 31A + 33B + 15C+3F
Figure 4:
A DAG Generated by Ratio Analysis
2.1 Ratio determination We formulate each of N linear equations consisting of M terms as follows.
On =
Xc
mn Yj ; 1 n N; 1 m M; cmn
2 INN (1)
where INN is the set of nonnegative integers. Figure 4 shows an example DAG that has been generated by ratio analysis. This DAG corresponds to a set of three linear equations. These equations are also referred to as nal equations to distingush them from partial equations which will be de ned shortly. The terms of equation 1 are 17B, 23C, and 2F. The input terms associated with equation 1 are B, C, and F. An equation is less than another equation only if each term in the rst equation is less than the corresponding term in the second equation. The rst step of the algorithm determines a common ratio between terms in an equation. These ratios are then merged with corresponding ratios from other equations. These ratios often prove useful in generating DAGs with relatively few nodes because we can use these ratios to produce nodes that may be used by many equations. In gure 4 equation 2 consists of approximately twice as much A and B as C. Therefore the approximate ratio of 2(A + B) to C is generated. This is normally written as 2(A + B):C. The actual ratio is 33A + 33B to 15C. Two is the numerator of the ratio while A and B are the members of the set of numerator terms. Likewise, one is the denominator of the ratio while C is the sole member of the set of denominator terms. The ratio also has a strength that indicates the probability that the approximate ratio will lead to a DAG with fewer terms. Finally the approximate ratio is associated with a set of equations, referred to as the equation list. Since equation 3 also contains approximately twice as much A and B as C, it is added to the equation list. Therefore, for this example the approximate
ratio 2(A+B):C is associated with equation 2 and equation 3. When equation 3 was added to the approximate ratio its strength was increased. The value of the strength is based upon how close the approximate ratio is to the actual ratio, the number of input terms associated with the ratio, and the number of equations that are associated with this ratio. If an approximate ratio is close to an actual ratio, it is likely that a DAG produced with it will require fewer nodes than a DAG produced with an approximate ratio that is not as close to the actual ratio. For example, fewer nodes are required to produce 20A + 10B than 21A + 11B using a ratio of 2A:1B. Likewise a DAG produced with a ratio that may be used by many equations, each with many terms, is likely to require less nodes than a DAG produced with a ratio associated with only a few terms and only a few equations. Note that a 2(A + B) + C node1 exists in the DAG. This node consists of the terms 2A, 2B and C. The node's left parent in gure 4 would be the node A + B while its right parent would be the node A + B + C. A multiplication node's left parent would also be another node, but its right parent would be a constant. The equation 2(A + B) + C is also referred to as a partial equation. A partial equation is any equation that does not represent a completed, or nal equation. An ungenerated partial equation corresponds to an ungenerated node. An ungenerated node is a node whose terms we know, but whose parents are unknown. For example the equation 2(A + B) + C is an ungenerated partial equation when it is produced by ratio determination. Should its left parent be an A + B + C node and its right parent be an A + B node, or should its left parent be a 2(A + B) node and its right parent be C? Ungenerated partial equations are created during ratio generation and other parts of the algorithm.
2.2 Ratio adjustment The next step of ratio analysis is adjusting the approximate ratio to more closely match the equations it is associated with. The process by which this is accomplished lacks an intuitive description, therefore it will be explained using two examples. Our rst example will use a ratio that has only one equation associated with it. We will use equation 2 of gure 4: OUT2 = 31A + 33B + 15C + 3F. The ratio 2(A + B):1C is associated with this nal equation. The ratio equation for this ratio is 2(A + B) + C. First we calculate the maximum constant that may be multiplied by the ratio equation such that the resulting equation is not greater than the nal equation. For our example it may be veri ed that this value is 15. Multiplying the ratio equation by the maximum constant and subtracting the result from the nal equation gives us 1A 1 For simplicity of explanation "the node", "the 2(A + B) + C node" and "the node whose output is 2(A + B) + C" are used interchangeably.
+ 3B + 3F. This is referred to as the dierence equation. Comparing the dierence equation to the ratio equation tells us that we need a "little more" B and F and a little less A and C than the ratio equation supplies. Multiplication by 15 would generate 30A + 30B + 15C. We could then add in the terms 1A, 3B, 1C, and 3F to produce the nal equation. Unfortunately, for the case where several equations are associated with the same ratio, this solution multiplies the ratio equation by a dierent amount for each equation that it is associated with. Therefore this solution is not useful in most cases. In most cases we only allow adjustments by one after multiplication. That is, after we have multiplied the ratio equation by a constant, we are only allowed to add in a terms such as 1D or 1G. Therefore we multiply by 5 in the above example. We may see that 5 is the correct value by noting that when 15 is the maximum multiplier, 3 is the maximum of the dierence equation. After we have multiplied by 5 we will add 1B + 1F and then, during DAG completion, multiply by 3. This is equivalent to multiplying by 15 and addding 3B + 3F. The value 5 is referred to as the adjustment multiplier. Multiplication by 5 produces 10(A + B) + 5C. This equation is referred to as the partially adjusted ratio. The above example illustrates the process for one equation. However we must consider the process for more than one equation. First we complete the above process for all equations. If, after we have adjusted all equations, two or more adjusted ratios are the same, we set aside these ratios and repeat the ratio adjustment process for these equations once we have nished with all other equations. This step assures that ratios that are very similiar will be grouped together. Next we repeat the above process for all other equations up to the point where the adjustment multiplier is calculated. We then nd the minimum of the adjustment multipliers and multiply the equations by this amount. This process is valuable since it generates an itermediate equation that requires additions to adjust as oppossed to the more expensive multiplication operation. Next, for each nal equation associated with the ratio, we calculate the terms to add to the partially adjusted ratio so that the resulting adjusted ratio is as close as possible to the ratio needed for the corresponding nal equation, without being too large. Let us add the equation 73A + 77B + 36C to the above example. Note that the maximum multiplier for the new equation is 32, while the dierence equation is 1A + 5B, and the adjustment multiplier is 7. Therefore the minimum adjustment multiplier of all equations is still 5. In order to generate 73A + 77B + 36C we will need to add B to the partially adjusted ratio and then, during DAG completion, multiply by seven and add in 3A + C.
2.3 DAG completion DAG completion consists of three parts. The rst part, known as equation completion, involves using each adjusted
ratio to generate each equation that is associated with it. The rst part of equation completion multiplies the adjusted ratio by the maximum amount possible with the limitation that the resulting partial equation is not greater than the nal equation. For example, in gure 4 we multiply the 10A + 11B + 5C + F node by three producing 30A + 33B + 15C + 3F. Next we subtract this equation from the nal equation, producing a dierence equation of 1A. We then add each of the terms in the dierence equation to the ungenerated partial equation list. The second part of DAG completion, known as partial equation generation, generates the ungenerated partial equations. This is done one partial equation at a time. Each partial equation is generated using a recursive process. Each recursion in the process reduces the partial equation by transorming it to a simplier equation. The transformation consists of either subtracting a second equation from the partial equation being generated or dividing it by a constant. The transformation that produces the simpliest resulting equation is chosen. Partial equation generation is then applied to the result of the transformation. The third part of DAG completion, known as node reduction eliminates redundant equations and replaces nodes that are not minimal. Due to the way ratios are generated it is possible that the same nal equation will be produced more than once. In order to eliminate the nodes associated with redundant nal equations we must determine which nal equation is the "smallest". We do this by calculating which nal equation has the least nodes while adjusting for the fact that some nodes are shared by more than one equation, and that multiplication nodes are more dicult to generate than addition nodes. We also replace the nodes associated with a nal equation by the nodes that would result from applying DAG completion to the original unadjusted ratio if this decreases the total node count.
3 Experimental results This section brie y presents the experimental results to a linear system based on a well known benchmark, the fth order wave elliptical lter. A table of results for other benchmarks is also presented at the end of the section. Our algorithm is coded on a Sparc 10 and runs in only seconds for this example. Figure 5(A) shows the hand crafted fth order wave elliptical lter with the weighting factors for multiplications a,b,c,d,e,f,g,h set to 2,3,4,4,6,5,2 and 2, respectively (These values were chosen arbitrarily). Figure 5(B) shows the DAG generated for these weighting factors using the ratio analysis program. The equations represented by gure 5(A) are (1)O1 (2)O2 (3)O3 (4)O4
= = = =
1A 125A + 126B + 112C + 56(D + G + H ) 60(A + B ) + 57C + 30(D + G + H ) + 4E 7(A + B + C + G + H ) + 6D
A
B
C
D
E
a
G
F
(5)O5 (6)O6 (7)O7 (8)O8 (9)O9
H
d
f
e
h g
A
OUT2
C
B
OUT6
OUT4
OUT3
D
OUT5 OUT7
E
F
OUT9
OUT8
G
H
13 4 3 2(A + B + C) +D+G+H 56
6
9
3(A + B + C+D) + 5(G + H)
12
11
10
OUT1
OUT2
OUT3
OUT4
OUT6
OUT5
OUT7
OUT8
OUT9
A) The fth order wave elliptical lter. B) the DAG generated by ratio analysis.
Figure 5:
40(A + B ) + 38C + 20(D + G + H ) + 3E 30(A + B + C + D) + 3F + 48G + 50H 45(A + B + C + D) + 4F + 72G + 75H 72(A + B + C + D) + 120G + 131H 60(A + B + C + D) + 100G + 110H
b
c
OUT1
= = = = =
There are 9 multiplication nodes and 31 addition nodes in the DAG generated with ratio analysis, while the corresponding DAG generated using the simple canonical method requires 27 multiplications and 20 additions and the hand crafted DAG of gure 5(A) uses 8 multiplications and 26 additions. If we assume a weighting factor of three for multiplications and one for additions then W(ratio analysis) = 3(9) + 1(30) = 57 while W(canonical) = 3(27) + 1(20) = 101 and W(hand crafted) = 3(8) + 1(26) = 50. The critical path for the hand crafted DAG contains 11 adders and 3 multipliers while the critical path of the DAG generated with ratio analysis contains 13 adders and 1 multiplier. Using the same weighting factors as above we have CP(ratio analysis) = 3(1) + 1(13) = 16 while CP(hand crafted) = 3(3) + 1(11) = 20. First, ratio analysis attempts to locate the distributable terms for the above equations such as 40(A + B). Distributivity and common subexpression elimination process continues until all equations have been examined. The next step in ratio analysis is non trivial ratio determination as explained in section 2.1. First the equations are examined for ratios that are approximately one to one. The rst approximate one to one ratio found is in equation 2. After nding all approximate 1 to 1 ratios, the algorithm continues to look for ratios of the form a=b; 1 a Ma and a < b Mb , where Ma and Mb are set to 3 and 15 in our experiments. For example, the approximate ratio 5 to 3 is used to generate 5(G + H):3(A + B + C + D) for equations 6, 7, 8 and 9. The next step of ratio analysis, ratio adjustment, is applied as explained in section 2.2. As an example, we will look at how ratio adjustment is applied to 5(G + H):3(A + B + C + D). First, for each equation associated with this ratio we determine the maximum multiplier. For equations 6, 7, 8 and 9 the maximum multipliers are 9, 14, 24 and 20 respectively. We use these to produce a set of dierence equations. These are then used to produce the appropriate adjusted ratios. Finally, the DAG construction is completed by multiplying each adjusted ratio by its maximum multiplier and adding in any missing terms. For example, equation 9 is multiplied by 12 and 11H is add to the result producing the nal equation 72(A + B + C + D) + 120G + 131H. Figure 6 shows the results for the fth order wave elliptical lter, three other common lters, and two micro-masks used in image processing to extract texture characteristics. [4].The resultant DAGS generated for these lters can be found in the full version of this paper [10]. As noted previously, ratio analysis tends to decrease the number of mul-
References
Input Filter/System Ratio Analysis Fifth Order Wave All-Pole Lattice 4-Stage Lattice 2-Cascaded Biquad LAW Micro Mask 3 LAW Micro Mask 1
AN MN Cost1 Cost2 30 9 57 66 6 3 15 18 28 3 37 40 17 8 41 49 11 3 20 23 7 0 7 7
Canonical Method Fifth Order Wave All-Pole Lattice 4-Stag Lattice 2-Cascaded Biquad LAW Micro Mask 3 LAW Micro Mask 1
AN MN Cost1 Cost2 20 27 101 128 3 4 15 19 14 14 56 70 14 12 50 62 8 8 32 40 6 6 24 30
Cost 1: Total Cost if AC = 1, MC = 3 Cost 1: Total Cost if AC = 1, MC = 4 AC: Addition Cost, MC: Multiplication Cost AN: Addition Nodes, MN: Multiplication Nodes
Results for Several Well Known Linear Systems. Figure 6:
tipliers while the number of adders remains the same or increases slightly. It may also be seen that the greatest savings is obtained for the largest lters while little, if any, saving are obtained for very small lters. In large lters a large amount of node sharing is possible, but in small lters almost no nodes may be shared.
4 Conclusion Linear systems are widely used in mathematics and engineering. Constructing a minimal DAG that corresponds to a given linear system is NP-complete. This paper presented an eective algorithm, ratio analysis, for constructing a small sized DAG. Ratio analysis attempts to minimize the total number of nodes in a DAG by maximizing the sharing of nodes between equations. It is particularly useful in situations where the cost of multiplication is greater than the cost of addition since the algorithm tends to signi cantly decrease the number of multiplications. Results are shown that illustrate the savings which are possible with ratio analysis. Though the current experiments show satisfactory results there are still many opportunities for improvement. The most crucial step in ratio analysis is ratio adjustment. Adding more \intelligence" to this step of the process would often help to reduce the nal node count. Another interesting area for exploration is allowing ratios with more than two coecients, such as 4A:3B:2C.
[1] A.V. Aho, R. Sethi and J.D. Ullman, \Compilers: Principles, Techniques and Tools". Addison-Wesley, Reading, MA, 1985. [2] A. Chatterjee and R. Roy,\An Architectural Transformation Program for Optimization of Digital Systems by Multi-level Decomposition". Proc. 1993 ACM/IEEE Design Automation Conference, 1993, pp. 343-348. [3] L.-F. Chao, A. LaPaugh, and E. H.-M. Sha, \Rotation Scheduling: A Loop Pipelining Algorithm," Proc. 1993 ACM/IEEE Design Automation Conference, Dallas, TX, June 1993, pp. 566-572. [4] J.-L. Chen, and A. Kundu, \Automatic Unsupervised Texture Segmentation Using Hidden Markov Model," Proc. 1993 International Conference on Acoustics, Speech and Signal Processing, Minneapolis, MN, April 1993, Vol. 5, pp. 21-23. [5] C.N. Fishcer and R.J. Le Blank, \Crafting a Compiler". The Benjamin/Cummings Publishing Co., Menlo Park, CA, 1985. [6] Michael R. Garey and David S. Johnson, \Computers and Intractability - A Guide to the Theory of NP-Completeness". W.H. Freemand and Company, New York, NY, 1979. [7] Z. Iqbal, M. Potkonjak, S. Dey and A. Parker,\Critical Path Minimization Using Retiming and Algebraic Speed-up". Proc. 1993 ACM/IEEE Design Automation Conference, 1993, pp. 573-577. [8] Gary L. Miller, Vijaya Ramachandran and Erich Kaltofen,\Ecient Parallel Evaluation of StraightLine Code and Arithmetic Circuits". SIAM Journal of Computing,August 1988, Vol. 17, No. 4, pp. 687695. [9] Miodrag Potkonjak and Jan Rabey, \Maximally Fast and Arbitrarily Fast Implementation of Linear Computations". Proc. 1992 International Conference on Computer Aided Design, August 1992, pp. 304-308. [10] M. Sheliga and E.H.-M. Sha, \Ratio Analysis: A DAG Construction Algorithm for Linear Computations", Technical Report 94-001, Department of Computer Science and Engineering, University of Notre Dame, 1994. [11] L.G. Valiant, S. Skyum, S. Berkowitz and C. Racko,\Fast Parallel Computation of Polynomials Using Few Processors". SIAM Journal of Computing, November 1983, Vol. 12, No. 4, pp. 641-644.