A POWER AWARE SYSTEM-LEVEL DESIGN SPACE EXPLORATION FRAMEWORK
Yannick Le Moullec, Jean-Philippe Diguet LESTER, UBS, France
[email protected]
Peter Koch Aalborg University, Denmark
[email protected]
Abstract. More than ever, design methodologies for embedded systems need to support design space exploration in order to experiment with various potential HW/SW solutions. Our exploration tool, "Design Trotter", has been designed to evaluate area (A) and time (T) parameters of generic ASIC architectures. However, since exibility and energy (E) have become important parameters, we propose in this paper an extension which allows (A,T,E) exploration of programmable processors. Our method, "HW/SW cost estimation", employs power related information from the target technology, thus providing the salient feature of estimating energy consumption of software executing on xed or application speci c architectures.
1
Introduction
The outset of this work is our framework called "Design Trotter". The main goal of this framework is to explore the solution space at the system-level for embedded systems. Because of A, T, E constraints, good performances can be obtained only if the intrinsic application parallelism is correctly exploited, this is a core feature of Design Trotter. Basically our framework aims to produce hardware based alternatives. However, choosing a software solution can be more eÆcient/ exible, for instance when implementing control-oriented functions or when an RTOS is needed. Therefore, we have added a new feature to our existing framework. The goal of this feature, named "Hardware/Software cost estimation", is to provide estimates in order to compare a variety of HW/SW solutions (i.e., algorithmic alternatives running on dierent programmable processors) by providing estimates such as execution time, as well as power consumption of an application being map to a processor model. 2
Related work
Energy estimation based on transistor and gate-level approaches are usually precise but their computational complexity is prohibitive for nding accurate solutions within limited time bounds. RT-level power estimations based either on table-lookup techniques or analytical models [1] are faster but are still too slow for system design space exploration. Microarchitecture-level power estimation methods such as [2] are based on cycle-by-cycle performance simulators which is often "Simplescalar". The activity of units or blocks is recorded for every cycle and power models are associated to each unit. The power models used are either analytical, transition-sensitive, or a combination hereof. Instruction-level power modeling methods [3], try to evaluate the power consumption of a program by using a cost attached to each instruction of the processor. Inter-instructions activity and other contributions such as exceptions are also taken into account. The above methods, however, suer from several drawbacks: 1) they are either too computationally complex, 2) they required extensive measurements, or 3) they rely on speci c architectural models. Our method is fast enough to explore large design space within reasonable time bounds since it operates at the system-level. Absolute accuracy is not mandatory since we suggest the usage of relative energy estimates in order to select promising solutions. Moreover, we use a processor description language, which permits to address a wide variety of architectural alternatives.
3
Extending the Design Trotter Framework
In order to improve the usability of "Design Trotter", we have introduced several new features which pave the way for experimentation with programmable processors, and estimating their power consumption. Fig.1(A) illustrates the original framework and its inputs and outputs. ’C’
’C’ Parser + architectural independent optimizations
Parser + architectural independent optimizations ARMOR description of the processor
HCDFG
Instruction−set file
HCDFG (internal representation)
Maping
Armor compilation
Pseudo assembly code
Behavioral characterization System−level exploration
Intra−function estimation
Resources usage file Compaction matrix file Scheduling Delay matrix file
User Abstract Rules (UAR) Schedule
Execution time estimates
Inter−function estimation
Resources activity (Gantt chart)
Lib 1 Energy estimation
Hardware/Software cost estimation
Functional units power models
Lib 2
...
Lib n
Target technological information
Hardware cost estimation Energy estimates
(A) The Design Trotter framework is extended by the "HW/SW cost estimation"
(B) The overall System-Level HW/SW estimation trajectory
Figure 1: Design ows
The entry of the framework is the application described in 'C'. This speci cation is translated into an internal representation format called "HCDFG" (Hierarchical and Control Data Flow Graph). Besides specifying the application, the generic architecture used in the exploration stage must be parameterized by the de nition of a set of rules named UAR (User Abstract Rules). The processing part is characterized by the type of available resources, e.g., ALU, MAC, etc., and the operations they can perform. To every type of operator a number of cycles is associated. For the memory part, the user de nes the number of levels (1..N) of the hierarchy. Each level is characterized by its size and its access cycles. The behavioral characterization and intra-function steps, described in [4] results in trade-o curves that represent the quantity of each type of resources needed for several time constraints. During the third step, inter-functions estimation, the overall application is considered. This is done by using the trade-o curves to assign speci c time slots to the individual functions. Now, in order to broaden the scope of the Design Trotter framework, an extension supporting hardware as well as software cost estimation is explained. The overall extension is illustrated on g.1(B). Basically, we expand the scope beyond the generic architecture such that experimentations with arbitrary programmable architectures are also supported in our framework. The target processor is described using the "Armor" language [5]. This language permits to describe a processor through its instruction set and its units (functional units, memories, etc.). The compiler provided with the Armor language is used to compile the description of the processor and generates several les as seen on g.1(B). The rst step in our trajectory is to map the operation nodes found in the HCDFG to instructions of the processor. The second step includes scheduling of the instructions using the RUI, CMX and DMX information. The basic idea here is to use adapted
scheduling algorithms according to the functions behavioral characteristics. These time constraint algorithms, used during the system-level exploration, have been transformed into resource constrained scheduling algorithms for the Hardware/Software cost estimation, in order to take into account the limited amount of resources available in the processor. The "processing- rst" and "memory- rst" algorithms are used to schedule processing and memory oriented functions, respectively, which we have detailed in [4]. The result of the resource constrained scheduling process is represented by a Gantt chart that expresses for every cycle the instructions scheduled and the units being used, including processing units, busses, memories and registers. 4
Energy estimation
In this work we assume the use of CMOS technology. We have opted for a very exible exploration environment that can be loaded with information on various types of target technologies, thus being signi cantly more applicable as compared to existing environments based on a speci c technology. The activity of the units is "recorded" for every cycle and a target speci c power model is associated to each unit. The rst input to the energy estimator is the Gantt chart. The second input is the speci c units power information. The power models are obtained from a pool of libraries that provides the units power consumption of actual target technology. At the moment we consider the average power consumption for each units, which is consistent with the abstraction level of the Design Trotter framework. However, our tool is exible enough to support other models such as transition-sensitive models. Finally the energy is calculated. The general formulation for the energy is E = P t where P is the power and t the execution time. The energy consumed by the application executing on the processor is estimated as the sum of the energy consumed by all units of the processor. Therefore, N Eapplication ' Pk Tk k=1 where Pk is the power consumed by unit k, Tk is the overall activity of unit k as speci ed by the Gantt chart, and N the number of units.
X
5
Experimental Results
In this section we present an example which demonstrates the applicability of the Hardware/Software cost estimation extension. The target processor, the MIPS-lite implements a simpli ed version of the MIPS III+ instruction set. Its units have been synthesized for a Xilinx XCV300e FPGA using the Xilinx "ISE WebPACK" tool. According to the timing analysis, we have found that the processor could operate at 20Mhz. The power consumption of the units has then been derived using Xilinx Virtex power estimate worksheet. The estimated average power and energy consumptions per operation are summarized on g.2(A). We have experimented with the algorithm for square-root papproximation of the sum of two squared and signed integers. It is de ned as follows: a2 + b2 ((0:875x + 0:5y ); x) where x = max(jaj; jbj) and y = min(jaj; jbj). The energy consumption for this application has been estimated for two architecturally dierent con gurations. In the rst one, the two multiplications 0:875x and 0:5y are computed by using shifts and subtraction, i.e., x is shifted three positions to the right to perform 0; 125x, and y is shifted one position to the right to perform 0; 5y . Then 0; 125x is subtracted from x to obtain 0; 875x. In the second version we have enabled the multiplier unit of the processor. In that case the multiplications are performed directly by the multiplier and the subtraction is excluded. The estimated execution times and energy consumptions for the two dierent HW/SW implementations can be seen in g.2(B).
Total Total Total Functional number of number of number unit name CLB slices flip/flop or of shift latches register LUTs alu 203 0 398
Average Energy power per (mW) operation (pJ) 13
650 300
bus-mux
96
0
190
6
control
82
0
162
5
mem_ctrl
132
65
227
9
450
mult
723
147
1383
48
2400
pc_next
122
30
237
8
400
reg_bank
1660
1025
2287
97
4850
shifter
177
0
348
11
550
250
(A) Power and energy estimates for the MIPS-lite units
Algorithm
Architecture
Square-root MIPS-LITE approximation without without multiplier multiplication Square-root approximation MIPS-LITE with with multiplier multiplications
Estimated Estimated energy execution time (ns) consumption (nJ)
1190
190
1140
206
(B) Execution time and energy estimates for two algorithmic/architectural con gurations
Figure 2: Power and energy estimates.
First of all we recognize that adding a multiplier to the architecture reduces the overall execution time of the program. This was expected due to the fact that shift, subtraction and multiplication are all executed within one clock cycle, thus minimizing the overall cycle count by substituting shift and subtraction by multiplication. A reduction of 4% was observed. However, more surprising may be the fact that reducing the execution time by introducing the multiplier, does not at the same time decrease the energy consumption. Despite the lower cycle count, we can see an almost 5% increase in energy consumption by modifying the HW/SW to conduct real multiplications. This example illustrates the salient feature of our extension of Design Trotter; the designer can get answer to the (A,T,E) performance of various alternatives simply by tuning a few design parameters. 6
Conclusion
In modern embedded systems design, exploring the huge solution space eÆciently can be done only by applying a suitable tool. Our tool "Design Trotter" was initially designed for areal (A) and execution time (T) evaluation of generic ASIC architectures. In this paper we have presented an extension to Design Trotter which enables combined HW/SW solutions to be estimated with respect to A and T, as well as energy consumption E. The extension includes an energy estimator which combines calculated unit activity and target technology information (the input for HW and SW being ARMOR and C descriptions). As compared to other estimation tools, we are able to change both the target architecture and the technology information, thus providing an even more exible framework which quickly and conveniently may produce (A, T, E) estimation on dierent platforms, e.g., FPGAs from various vendors. References
[1] H. Mehta, R. M. Owens, and M. J. Irwin, \Energy characterization based on clustering," DAC, pp. 702{707, june 1996. [2] Narayanan Vijaykrishnan, Mahmut T. Kandemir, Mary Jane Irwin, H. S. Kim, and W. Ye, \Energy-driven integrated hardware-software optimizations using simplepower," in ISCA, 2000, pp. 95{106. [3] V.Tiwari, S.Malik, and A.Wolfe, \Power analysis of embedded software: a rst step towards software power minimization," in Ieee Int. Conf. on Computer-Aided Design, Santa Jose, USA, nov 1994, pp. 384{390. [4] Yannick Le Moullec, Jean-Philippe Diguet, Dominique Heller, and Jean-Luc Philippe, \Fast and adaptive data- ow and data-transfer scheduling for large design space exploration," in GLSVLSI'02, 2002. [5] F.Charot and V.Messe, \A exible code generation framework for the design of application speci c programmable processors," in 7th Int. Work. on H/S Codesign, Roma, Italy, May 1999.