Document not found! Please try again

Ultra-Fast and Efficient Algorithm for Energy Optimization ... - CiteSeerX

0 downloads 0 Views 597KB Size Report
Authors' address: Department of Electrical Engineering and Computer Science, University .... of processing elements P = {p1, ... , pk} and communication channels. ..... Figure 9(a)(i) shows the top-level behaviors of the Vocoder algorithm, and.
Ultra-Fast and Efficient Algorithm for Energy Optimization by Gradient-Based Stochastic Voltage and Task Scheduling BITA GORJIARA and NADER BAGHERZADEH University of California, Irvine, USA and PAI H. CHOU University of California, Irvine, USA and National Tsing Hua University, Taiwan

This paper presents a new technique, called Adaptive Stochastic Gradient Voltage-and-Task Scheduling (ASG-VTS), for power optimization of multicore hard realtime systems. ASG-VTS combines stochastic and energy-gradient techniques to simultaneously solve the slack distribution and task reordering problem. It produces very efficient results with few mode transitions. Our experiments show that ASG-VTS reduces number of mode transitions by 4.8 times compared to traditional energy-gradient-based approaches. Also, our heuristic algorithm can quickly find a solution that is as good as the optimal for a real-life GSM encoder/decoder benchmark. The runtime of ASGVTS is 150 times and 1034 times faster than energy-gradient based and optimal ILP algorithms, respectively. Since the runtime of ASG-VTS is very low, it is ideal for design space exploration in system-level design tools. We have also developed a web-based interface for ASG-VTS algorithm. Categories and Subject Descriptors: C.3 [Special-Purpose and Application-Based Systems]: — Real-time and embedded systems; J.6 [Computer-Aided Engineering]: —Computer-aided design (CAD) General Terms: Algorithms, Design, Performance Additional Key Words and Phrases: Power management, voltage and task scheduling, slack distribution ACM Reference Format: Gorjiara, B., Bagherzadeh, N., and Chou, P. H. 2007. Ultra-fast and efficient algorithm for energy optimization by gradient-based stochastic voltage and task Scheduling. ACM Trans. Des. Automat.

The research was sponsored in part by DARPA under contract 4500942474 and in part by NSF under grant CCR-0205712. A portion of this article appeared in Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), August 2004, pp. 381–386. This is an expanded version that includes over 40% new materials and new results. Authors’ address: Department of Electrical Engineering and Computer Science, University of California, Irvine, CA 92697-2625; email: {bgorjiar,nader,chou}@ece.uci.edu. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected].  C 2007 ACM 1084-4309/2007/09-ART39 $5.00 DOI 10.1145/1278349.1278352 http://doi.acm.org/ 10.1145/1278349.1278352 ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39

39:2



B. Gorjiara et al.

Electron. Syst. 12, 4, Article 39 (September 2007), 21 pages. DOI = 10.1145/1278349.1278352 http://doi.acm.org/10.1145/1278349.1278352

1. INTRODUCTION Embedded systems are becoming more challenging to design. On one hand, the applications are demanding more features such as multimedia or baseband processing in realtime; on the other hand, the underlying architecture must deliver higher performance using lower power. Fortunately, advances in circuit design, higher-level integration, and new types of hardware components are offering new options. Designers can now choose from low-power, voltage-scalable general-purpose microprocessors, digital signal processors (DSP), hardwired and reconfigurable hardware accelerators, and many more. Designing a system thus entails choosing a particular configuration of the architecture that can best meet the objectives for power, performance, cost, and size. 1.1 Design Space Exploration It is often difficult to extrapolate the system-level performance from a collection of components alone due to many mutually enabling mapping and scheduling options. The same set of application tasks can be mapped onto different or same processing elements (PEs), and different scheduling schemes can result in different resource slacks that can be further applied towards voltage/frequency scaling. In short, the design space grows rapidly with the rich set of implementation options. To achieve the target performance and to meet constraints on power and cost, researchers have proposed tools and methodologies that support design space exploration (DSE) at the system level. To explore different points in the design space, several steps are performed: resource allocation to determine the architecture, task mapping to enable the timing and power estimation, and task/voltage scheduling to meet the real-time deadlines and save power. Figure 1 shows an example flow of such a DSE process [Dick and Jha 1999]. It uses two nested genetic algorithms (GAs) to generate various system configurations with different resource allocations and task mappings. Note that the task scheduling and voltage/frequency scaling algorithms are in the innermost loop of this iterative process and therefore must have a low runtime complexity in order to explore a large design space. The goal of this work is to make this inner-loop step generate high-quality solutions as fast as possible. 1.2 Related Work Voltage and frequency scaling (VFS) and task scheduling (TS) are mutually dependent. A given schedule induces a set of slack intervals, and slack distribution among the tasks enables more effective VFS. At the same time, VFS opportunities are limited by the available slack in the schedule, and thus different task schedules must also be explored in order to optimize energy consumption [Schmitz et al. 2002]. The VFS policy must consider the time/power overhead of mode transitions [Zhang et al. 2003], and the energy consumption of system ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Algorithm for Energy Optimization



39:3

Fig. 1. The design space exploration process proposed in Dick and Jha [1999].

peripherals and idle components [Jejurikar and Gupta 2004] to avoid deadline violation and energy waste caused by excessive VFS. To address the slack distribution problem, Gruian and Kuchcinski [2001] propose an algorithm that partitions the slack time among the tasks based on whether they are located on the critical path. Luo and Jha [2003] and Schmitz et al. [Schmitz and Al-Hashimi 2001; Schmitz et al. 2002] compute the energy gradient of the tasks and iteratively assign slack to the task with a higher energy-gradient value. These algorithms solve the problem using continuous voltage levels and use dithering to convert them to discrete levels. Bambha et al. [2001] solve slack distribution using a hybrid scheme, where simulated heating in the outer loop controls the global search by parameterizing the local search using Monte Carlo and hill climbing. The authors’ own previous work [Gorjiara et al. 2004] suggests a stochastic algorithm to solve this problem. Among these approaches, Luo and Jha [2003], Schmitz and Al-Hashimi [2001], Schmitz et al. [2002], and Gorjiara et al. [2004] yield the same energy efficiency and outperform the other two [Gruian and Kuchcinski 2001; Bambha et al. 2001]. Zhang et al. [2002] and Leung et al. [2004] use Integer Programming and Mixed Integer Nonlinear Programming (MINLP), respectively, to formulate the voltage scaling problem. In Zhang et al. [2002], each task is partitioned into several subtasks, each of which runs at a different voltage and frequency. The main disadvantage of energy-gradient-based approaches [Luo and Jha 2003; Schmitz and Al-Hashimi 2001; Schmitz et al. 2002] and ILP/MINLP approaches [Zhang et al. 2002; Leung et al. 2004] is the high number of mode transitions, which lead to longer delays and higher energy overheads. Zhang et al. [2003] and Andrei et al. [2005] consider mode transition overhead in their formulations. Andrei et al. [2005] strongly prove that the problem of selecting optimal discrete voltages for multiprocessors under timing constraints and with/without transition overhead is NP-hard. They formulate the optimization problem using Linear Integer Programming (ILP). The long runtime of the optimal formulation makes it less practical to run the voltage-scaling algorithm in the inner loop of the DSE process. Among the above approaches, Luo and Jha [2003], Schmitz et al. [2002], Leung et al. [2004], and Gruian and Kuchcinski [2001] also modify the ordering of the tasks during scheduling to generate schedules that best suit the voltage scaling algorithm. Genetic algorithms and simulated annealing are used to steer the exploration of different task orderings, although both approaches tend to have long runtimes. ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39:4



B. Gorjiara et al.

1.3 Contributions This article presents an efficient voltage and task scheduling algorithm called adaptive stochastic gradient voltage-and-task scheduling (ASG-VTS). It combines stochastic and energy-gradient techniques to simultaneously solve the slack distribution and task reordering problems. Our algorithm can optimize the energy overhead of idle components and peripherals as well as mode transition overheads of DVS. The main contribution of this work is its ability to generate high-quality solutions while keeping the algorithm runtime very low. In fact, it consistently finishes within seconds even for large examples with similar solution quality to that of published techniques that can take substantially longer. This is made possible by our novel data structures that maintain complex dependencies using very low-complexity operations. Our experiments show that ASG-VTS reduces the number of mode transitions by 4.8 times compared to traditional energy-gradient-based approaches. Even though ASG-VTS is a heuristic, it actually finds a solution that is as good as the optimal for a real-life GSM encoder/decoder benchmark. Most importantly, ASG-VTS runs 150 times and 1034 times faster than energy-gradient based and optimal ILP algorithms, respectively. This combination of speed and high-quality solutions makes it an ideal core algorithm for system-level DSE tools. We have also developed a web-based interface for ASG-VTS algorithm [Gorjiara 2004]. Any user can try out this algorithm by uploading a system description in XML format. This article is organized as follows. Section 2 formulates voltage and task scheduling as an optimization problem. Section 3 uses an example to discuss the limitations of traditional energy-gradient-based voltage scaling algorithms. The ASG-VTS algorithm is described in Section 4, followed by the experimental results and analysis in Section 5. 2. SIMULTANEOUS VOLTAGE AND TASK SCHEDULING This article investigates the voltage and task scheduling aspect of the system design process. We assume that proper processing elements have been allocated and tasks have already been mapped to them. We model a system in terms of its architecture and the application. The architecture is represented as a set of processing elements P = { p1 , . . . , pk } and communication channels. Processing elements (PEs) include general-purpose processors, DSPs, and custom IP blocks. A PE may operate at different voltage levels and hence consume different amounts of power. These valid combinations of voltage and frequency settings are called voltage modes. The set of voltage modes for a PE p j is denoted by VM j = {m j,1 , . . . , m j,max } that is sorted based on the corresponding voltage levels. Each voltage mode m has its own frequency freq(m) and power consumption Pwr(m). (V (m) − Vt )2 V (m) Pwr(m) = CL × freq(m) × (V (m))2 , freq(m) = K ×

(1) (2)

where V (m) is the supply voltage in mode m, CL is the switching capacitance, K is a circuit-dependent constant, and Vt is the threshold voltage. ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Algorithm for Energy Optimization



39:5

The application is represented by a periodic task graph TG, a directed acyclic graph G(T, C, α, δ) where T = {τ1 , τ2 , . . . , τn } represents tasks, C = {ci |ci ∈ T × T } represents data dependencies between pairs of tasks, α is TG’s arrival time, and δ is its deadline. We also define Ts as the set of sink tasks of the graph: Ts = {τi |(τi , τ j ) ∈ C, ∀τ j ∈ T }. Each task τ is mapped to a processing element proc(τ ). E T (M ), the total energy consumption of task set T in a selected mode vector M = (m1 , . . . , mn ) can be calculated by: E T (M ) =

n 

(Pwr(τi , mi ) × texec (τi , mi )),

(3)

i=1

where mi is the mode of proc(τi ) during the execution of τi , while Pwr(τi , mi ) and texec (τi , mi ) are the power and delay of τi in mode mi , respectively: Pwr(τi , mi ) = switchingActivity(τi ) · Pwr(mi )

(4)

texec (τi , mi ) = cycleCount(τi , mi )/freq(mi ),

(5)

where switchingActivity(τi ) is the average number of times that capacitance CL is charged per cycle during execution of task τi , and cycleCount(τi , mi ) is the total number of cycles of task τi in mode mi . These parameters can be determined by profiling the power and runtime of the tasks. Esys (M ), the total energy consumption of the system for a given mode vector M , is calculated by: Esys (M ) = E T (M ) + Eidle (M ) + Eov (M ),

(6)

where Eidle (M ) is the energy consumption of processing elements during their idle intervals. The value of Eidle (M ) also depends on availability of standby and shutdown modes for each processing element. In Equation (6), Eov (M ) is the sum of energy overheads of voltage transitions. Energy overhead of a transition from mode mi to m j is given by Martin et al. [2002]: Eov (mi , m j ) = Cr × (V (mi ) − V (m j ))2 ,

(7)

where Cr denotes the power rail capacitance. The delay overhead of the transition is given by: d ov (mi , m j ) = Cd × |(V (mi ) − V (m j )|,

(8)

where Cd is a technology-dependent constant. Furthermore, the overall deadline violation of task set T in mode vector M is calculated by: χT (M ) = max (χ (τi , mi )), i∈[1,n]

(9)

where χ(τi , mi ) is the deadline violation of task τi in mode mi : χ (τi , mi ) = tstart (τi ) + texec (τi , mi ) − δ,

(10)

where tstart denotes the task start time assigned by the scheduler. Note that a positive value of χ indicates the amount of deadline violation, while a negative value of χ represents the amount of slack time available after task execution. ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39:6



B. Gorjiara et al.

Fig. 2. Timing diagram of a system without voltage scaling. Table I. Voltage Modes of the System (Bold entries are given, while non-bold entries are derived) Mode m3 m2 m1

Voltage (V) 2.0 1.8 1.7

Freq (MHz) 100 78 66

Table II. Power Consumption and Delay of the Tasks in Different Voltage Modes (Bold: given; nonbold: derived) task/mode τ1 τ2 τ3

Pwr(τi , mi ) (W) m3 m2 m1 3.20 2.00 1.53 3.00 1.88 1.44 3.00 1.88 1.44

texec (τi , mi ) (msec) m3 m2 m1 6.00 7.78 9.10 4.00 5.18 6.00 4.00 5.18 6.00

The goal of the optimization algorithm is to find a mode vector M such that the cost function  is minimized:  Esys (M ) if χT ≤ 0 (M ) = (11) ∞ if χT > 0. 3. MOTIVATING EXAMPLE This section discusses the fundamentals of energy-gradient based approaches and their limitations. To optimize energy consumption, energy-gradient-based DVS algorithms iteratively sort tasks based on their energy gradient ∂ E/∂tand slightly slow down the task with the highest energy savings. Suppose that we have a system that runs three dependent tasks (τ1 , τ2 , τ3 ) on two PEs. Figure 2 shows the timing diagram of such a system. Each PE can operate in three voltage levels: 2V, 1.8V and 1.7V in modes m3 , m2 , and m1 , respectively. Assuming a threshold voltage of 0.8V and the operational frequency of 100MHz in mode m3 , the operational frequencies of the PEs in m2 and m1 can be computed using Equation (1). Table I shows the voltage and frequency of each voltage mode, and Table II shows the power consumption and delay of each task in each voltage mode. Assuming the power consumption of each task in mode m3 is given, the power consumption in modes m2 and m1 can be computed using Equations (2) and (4). The delay values are calculated using Equation (5), assuming the cycle counts of all tasks remain fixed in different modes. In mode m3 , the total execution delay of all tasks is 14ms, and the total ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Algorithm for Energy Optimization



39:7

energy consumption is 43.2mJ. In this example, the deadline for finishing all tasks is given to be 17.2ms. In Tables I and II, the values in bold are given as inputs, and the other values are derived using Equations (1), (2), (4), and (5). A typical energy-gradient approach produces voltage levels 1.83V, 1.85V, and 1.85V for τ1 , τ2 , and τ3 , respectively. Such voltage selection would correspond to 16% energy savings. However, none of these voltage levels are valid, and they must be mapped to valid ones. One approach is to perform voltage dithering [Schmitz et al. 2002], which entails breaking each task into two subtasks and duty cycling between two closest voltage modes (e.g., m2 and m3 ). This approach requires four mode transitions: three intra-task dithering transitions plus one between τ2 and τ3 . Even though mode reordering optimization [Zhang et al. 2003] can eliminate one transition, still three transitions are required. If transition overhead is nonnegligible, the deadline may be violated. To avoid the violation, the energy gradient approach must be applied in discrete domain. In this example, the energy gradient of τ1 is higher than those of τ2 and τ3 . If τ1 is slowed down to mode m2 , then it consumes most of the slack time, leaving little slack for other tasks. In this case, the energy saving is 8.5% and no mode transition is required. Alternatively, a smarter nongreedy solution is to set τ2 and τ3 to m2 and m3 , respectively, achieving 13% energy savings at the cost of one mode transition. Note that since the slack time is not completely utilized, the mode transition overhead can be easily tolerated without violating the deadline. The purpose of this example is to illustrate that slack distribution must be applied in discrete domain to avoid deadline violations caused by mode transition overheads. Furthermore, it shows that a greedy slack distribution scheme based solely on energy gradient does not necessarily yield efficient solutions in discrete domain. To solve this problem, we develop a nongreedy stochastic algorithm that simultaneously considers the energy gradient and execution delay of tasks for efficient slack distribution.

4. THE ASG-VTS APPROACH Figure 3 shows the flow of the ASG-VTS algorithm that iteratively distributes the slack time. It initially selects the fastest mode vector and then derives a new mode vector by slack distribution. Next, it computes the execution delays of the tasks based on Equation (5) and assigns the new priority values to them. Then the algorithm performs list-based scheduling. If no real-time deadline is violated, then the current mode vector is updated to the new one for the next iteration of slack distribution. Otherwise, the new mode vector is either discarded or engaged in a slack recovery process. The slack distribution and recovery are performed by adding a deviation vector M to the current mode vector. The elements m of the deviation vector M are assigned the following values: ⎧ ⎪ ⎨−1 to slowdown by transitioning from mi to mi−1 m ← 0 no change, or ⎪ ⎩ 1 to speedup by transitioning from mi to mi+1 .

(12)

ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39:8



B. Gorjiara et al.

Fig. 3. ASG-VTS approach (A and B are the same as in Figure 1).

These values are stochastically selected based on the slowdown and speedup probabilities (SDP and SUP). This section first describes the details of slack distribution and explains how SDP and SUP are calculated. Second, we present our task selection heuristic used in slack distribution and recovery. Third, we explain the steps of the ASGVTS algorithm. 4.1 Slack Distribution Heuristic Generally, in stochastic gradient search approaches, the probability of changing a variable in each iteration is calculated based on the gradient of the cost function with respect to that change [Spall 2003]. In other words, the changes that can decrease the cost function more are given a higher chance of occurrence. In our problem, the cost function is defined using Equation (11), and the gradient of the cost function  respective to the change of a mode mi ∈ M is calculated using: ⎧ ⎨ Esys (M )  = mi ⎩ mi ∞

if χT (M ) + χT (M ) ≤ 0

(13)

if χT (M ) + χT (M ) > 0,

where Esys (M )/mi is the gradient of energy consumption with respect to changing mode mi , and χT (M ) is the variation in the amount of slack time (or deadline violation) corresponding to the mode change. Note that mi is −1 for one-level slowdown (transition from mi to mi−1 ) and 1 for one-level speedup (transition from mi to mi+1 ). The slowdown and speedup probabilities are defined such that they favor the decrease of (M ). The slowdown probability of task τi can be defined as follows: SDP(τi ) =

⎧ ⎨0

  ⎩norm m

i

if  ≥ 0 if  < 0,

(14)

where norm() is a normalizing function. Here, norm() divides the energy gradient with respect to mi by the maximum energy gradient with respect to m ∈ M . ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Algorithm for Energy Optimization

That is,   norm m

i



=

 m i .  max m∈M m



39:9

(15)

According to Equation (14), tasks that save the most energy and do not cause any deadline violation will be assigned a higher slowdown probability. Although calculation of energy gradient of tasks (i.e., E T (M ) defined in Equation (3)) has a low cost, calculating the energy gradient of Eidle (M ) and Eov (M ) as well as χT (M ) are very costly, because they require rerunning the scheduling algorithm. To reduce this expensive operation, we define the slowdown probability based on a variation in E T (M ) and a delay factor that is representative of potential deadline violations. Thus: SDP(τi ) = energyCnst × norm(−E T ) + delayCnst × delayFctr(τi )

 texec (τi , mi ) delayFctr(τi ) = 1 − norm aveTexec

(16) (17)

The aveTexec is the average execution delay of all tasks. The first term of Equation (16) means that the tasks whose slowdown saves the most energy are assigned a higher probability of slowdown. The second term means the tasks with relatively high execution delays are assigned a lower probability of slowdown. Note that this term reduces the probability of slowing down the tasks with long execution delays. The constants energyCnst and delayCnst are used to adjust the effect of each term. We also define speedup probability as follows: SUP(τi ) = 1 − SDP(τi )

(18)

Note that although SDP and SUP are defined based on E T (as opposed to Esys ), the algorithm presented in Section 4.3 evaluates the solutions based on the total energy consumption of the system Esys (see Equation (6)). Therefore, if a solution reduces E T at the cost of increasing Esys , then the algorithm does not select that solution. To understand the effectiveness of our SDP formulation, let us consider the following example: assume that we have two tasks τ1 and τ2 , where τ2 consumes more power than τ1 while both have the same execution delay. To maximize energy savings, τ2 should have a higher chance of getting more slack time than τ1 . Figure 4(a) shows how the definition of SDP helps achieving this goal. Here, the slack distribution is performed by iteratively reducing the voltage modes. After each slowdown of a task, its SDP decreases slightly as a result of an increase in execution delay and a small decrease in its energy gradient. As the optimization proceeds, several tasks, including τ2 and others not shown, are slowed down and hence the aveTexec will increase (Figure 4(b)). As a result, the SDP of τ1 gradually increases. At some point, τ1 will also be slowed down. ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39:10



B. Gorjiara et al.

Fig. 4. (a) SDP of two tasks and (b) aveTexec in different iterations.

However, overall τ2 is slowed down more often and therefore consumes a greater portion of available slack time compared to τ1 . Note that even though τ2 has a higher SDP, SDP is only a probability, and it does not guarantee that τ2 will always be assigned a greater slack portion. In the end, the algorithm distributes slack based on the overall system-level energy efficiency. 4.2 Task Selection Heuristic for Slack Distribution/Recovery As mentioned earlier, the algorithm identifies the tasks that have violated their deadlines (χ > 0) and those that have some slack time (χ < 0). To eliminate the deadline violation time of a task, ASG-VTS must randomly speed up some of the tasks, known as relative tasks, that have caused the deadline violation. We define the relatives of a task τ to be the set of all tasks whose execution delay affects the finish time of τ . In a schedule, if a task violates its deadline, then by definition, a necessary condition for correcting this problem is to speed up a subset of its relatives, assuming the application is schedulable in the fastest mode. The set of relatives of a task τ includes its predecessors, denoted as Preds(τ ), as well as its resource-based relatives, denoted as R(τ ). The set of predecessors is recursively defined by: ⎛ ⎞

Preds(τ ) = Parents(τ ) ∪ ⎝ Preds(s)⎠ , (19) s∈Parents(τ ) where Parents(τ ) are the tasks on which τ is immediately dependent: Parents(τ ) = {τi |(τi , τ ) ∈ C}.

(20)

We define resource-based relatives of a task τ as those tasks that are mapped to the same resource as τ and are finished between the arrival and start of τ : R(τ ) = {τi |proc(τi ) = proc(τ ) ∧ tarrival (τ ) ≤ tfinish (τi ) ≤ tstart (τ )},

(21)

ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Algorithm for Energy Optimization



39:11

Fig. 5. An example of relative and time-based relative tasks.

where tarrival (τ ) is the time when all parents of τ finish their executions: tarrival (τ ) =

max

s∈Parents(τ )

tfinish (s).

(22)

Note that the set of resource-based relatives of tasks may change in different iterations because the slack distribution and recovery affect execution delay of the tasks and their schedule. We recursively define the relatives of a task τ as: ⎛ ⎞

relatives(τ ) = {τ } ∪ ⎝ relatives(s)⎠ . (23) s∈Pred(τ )∪R(τ ) Whenever a task misses its deadline, the set of its relative tasks becomes the candidates for speedup. Also, the slack time after a task can be distributed among its relatives. Note that each task is considered a relative of itself and is therefore a candidate for speedup or slowdown. Figure 5 shows an example of a system that runs a task graph on two processing elements. For the given schedule, the sets Parents, Preds, R, and relatives are shown for task d . Most of the techniques proposed so far extract the set of relatives of a task for slack distribution. However, extracting the set of relatives is complex and time-consuming, because keeping track of resource-based relatives requires an additional data structure that captures and updates the links between consecutive tasks mapped to the same resource. Note that the links may change as the delay of tasks changes during iterative slack distribution. To avoid the overhead of constructing and updating the links between the relative tasks, we approximate the above relationship with one that is easier to compute: time-based relationship. We define the set of time-based relatives of a task τ , TBR(τ ), as the set of all tasks whose finish times lie within the live interval of τ . The live interval of a task τ is the interval between the arrival of the host TG and τ ’s finish time. TBR(τ ) = {τi |α ≤ tfinish (τi ) ≤ tfinish (τ ) − α}

(24)

This set includes some of the real relatives, such as predecessors and resourcebased relatives, and possibly other nonrelative tasks. The advantage with this approximation is that extracting time-based relatives is simple and fast. Using the time-based relationship reduces the run-time complexity of slack distribution algorithm at least by a factor of n, where nis the total number of tasks. Furthermore, this approximation does not affect the energy efficiency of the algorithm, because time-based relatives tend to highly overlap with the real relatives (as shown in Figure 5). Also, if selecting a nonrelative task causes any ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39:12



B. Gorjiara et al.

Fig. 6. ASG-VTS Algorithm.

deadline violation in one iteration, then the subsequent iterations will reverse the problem. 4.3 ASG-VTS Algorithm Figure 6 shows the pseudo-code of our voltage selection algorithm. It starts by selecting the fastest mode vector, in which all tasks must be schedulable. The CALCULATEEXECDELAY function, called on lines 2 and 6, calculates the new execution delays of all tasks based on the selected mode vector. The new execution delays are used by the SCHEDULE algorithm to generate a new task schedule. Scheduling of dependent tasks on a multiple-processor system is an NP-hard problem [Garey and Johnson 1979]. Our ASG-VTS algorithm uses the prioritybased list scheduling algorithm, a well-known heuristic that runs in polynomial time. After generating the initial schedule for the fastest mode (lines 1–3), in each iteration of the loop (lines 4–11), a new mode vector is generated by evolving the previous one (line 5) and is used to produce a new schedule (line 7). If the evolved mode vector is more energy efficient than the best mode so far (optMode), and the new schedule is valid, then the evolved mode is selected for the next iteration. Otherwise (line 12), the selection of the next candidate mode vector is based on a probability function. This function gives a higher chance of selection to the better mode vector. The algorithm terminates when the number of iterations (i.e., noOfIter) exceeds a given maximum or if the algorithm cannot improve the optMode beyond a preset iteration limit, as counted by noOfUselessIter. Figure 7 shows EVOLVE algorithm that calculates mode deviation vector M and returns the evolved mode vector. It starts by calculating speedup and slowdown probabilities of all tasks using Equations (16) and (18). If no real-time deadline is violated, then it stochastically slows down the entire task set T . In case of deadline violation, it performs slack distribution and recovery by using the sink tasks as starting points. For each sink task τ ∈ Ts , if the amount of deadline-miss (respectively, slack) time is relatively small, then the speed-up ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Algorithm for Energy Optimization



39:13

Fig. 7. EVOLVE Algorithm.

Fig. 8. (a) SLOWDOWN procedure; (b) SPEEDUP procedure.

(resp., slowdown) operation is performed only on its predecessors, Preds(τ ). However, if the amount of deadline-miss (resp., slack) time is large, then the mode transitions will be performed on TBR(τ ), the time-based relatives of τ . In our experience, a deadline-miss (resp., slack) time is considered small if it is less than 20% of the total execution time of the corresponding task graph. Figure 8 shows the SLOWDOWN() procedure, which stochastically distributes the slack among elements of S, a set of tasks. As long as the slack is greater than zero, a task τ is selected stochastically based on its SDP, and its voltage mode is reduced by one to signify slowdown. To do so, a random number between 0 and 1 is generated, and if it is less than SDP(τ ), then τ is slowed down. In our heuristic, after slowdown of a task τ , the remaining slack is reduced by the amount of the increase in execution delay of τ . In reality, the actual reduction may be more or less than this amount. The SPEEDUP() procedure is analogous to SLOWDOWN() procedure and is shown in Figure 8(b). In the ASG-VTS algorithm, except for EVOLVE(), all of the procedures in the loop have linear complexity. EVOLVE() calls SPEEDUP() and SLOWDOWN() procedures ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39:14



B. Gorjiara et al.

Fig. 9. (a) Vocoder SpecC model; (b) Task graph model of the Vocoder example.

for all members of Ts . In the worst case, SPEEDUP() and SLOWDOWN() procedures must examine all of the predecessors or time-based relatives of a task. Therefore, the complexity of ASG-VTS is O(i · ns · n), where i is the number of iterations, ns is number of sink tasks, and n is number of all tasks. The numbers of iterations are shown for our benchmarks in Table VII. 5. EXPERIMENTAL RESULTS In this section we first demonstrate how the model presented in Section 2 can be extracted from a real embedded application. Then we compare the performance of our algorithm with EE-GLSA [Schmitz et al. 2002] in terms of energy efficiency, algorithm runtime, and the number of mode transitions using a public suite of 25 task-graph benchmarks. Section 5.3 shows the results of our algorithm under different mode transition overheads. Finally, Section 5.4 compares the efficiency and runtime of our algorithm with those of an optimal ILP algorithm [Andrei et al. 2005]. 5.1 Voltage Optimization of a GSM Vocoder GSM Vocoder [European Telecommunication Standards Institute (ETSI) 1996] is a standard algorithm used in cell phones to encode the voice samples. The coder receives an input stream of 13-bit speech samples at a rate of 8 kHz. It produces an output stream of encoded parameters at a bit rate of 12.2 kbits/s. Weymarn described the algorithm in the SpecC language [von Weymarn 2001] to enable the use of the SpecC design methodology and exploration [Gajski et al. 2000]. SpecC, similar to SystemC, allows hierarchical description of modules, called behaviors, that can communicate with each other through channels and shared variables. Figure 9(a)(i) shows the top-level behaviors of the Vocoder algorithm, and Figure 9(a)(ii) shows expanded view of the CODER behavior in more details. The arched arrows show the loops, while the other straight arrows show the data ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Algorithm for Energy Optimization



39:15

Table III. Power and Performance Characteristics of the Processing Elements: (a) Processor [Intel 2007], (b) Hardware Accelerator Mode m0 m1 m2 m3 m4

V (V) 0.75 1 1.3 1.6 1.8

m0 m1 m2 m3

0.75 1 1.3 1.6

(a) freq(MHz) 150 400 600 800 1000 (b) 50 133 200 266

Pwr(mW) 50 150 400 870 1600 25 70 200 435

Table IV. Execution Delay of the Behaviors Behavior PRE-PROCESS LP ANALYSIS OPEN-LOOP1 OPEN-LOOP2 CLOSED-LOOP CODEBOOK UPDATE POST-PROCESS

Delay (μs) 288 4563 438 656 638 869 156 56

PE Processor Processor Processor HW HW HW Processor Processor

exchange between different behaviors. A speech frame arrives every 20ms, and the coding algorithm must finish within that period. All the behaviors of the Vocoder run sequentially due to data dependencies. To meet the real-time constraint, some of the behaviors may need to run on hardware accelerators. To process each frame, the CLOSED-LOOP, CODEBOOK, and UPDATE behaviors run four times, the OPEN-LOOP1 and OPEN-LOOP2 behaviors run twice, and the rest of behaviors run only once. Figure 9(b) shows the task graph model of the Vocoder implementation, where every execution of the behaviors is considered a distinct task and is allowed to run at a different rate. The arrows show the data dependencies constructed based on the channels and shared variables. The application is mapped to a processor and a custom hardware accelerator. Table III shows the power and performance characteristics of the processing elements. To estimate the execution delays of the behaviors as well as their communication traffic, we use the SCE environment [Abdi et al. 2003], which is integrated with profiling capabilities [Cai et al. 2003]. Table IV shows the execution delay of behaviors as well as their mapping to the processors. In this implementation, we map OPEN-LOOP2, CLOSED-LOOP, and CODEBOOK behaviors to a hardware accelerator and the rest to the processor. The simulation result of running the algorithm shows a slack time of 6.2ms over a period of 20ms for such mapping. We apply our voltage scaling algorithm to the task graph model of Vocoder and generate different voltage and frequency values for the ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39:16



B. Gorjiara et al. Table V. Characteristics of the Tasks Before and After DVS

Tasks n0 n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 n12 n13 n14 n15 n16 n17 n18

Mapping Processor Processor Processor HW HW HW Processor HW HW Processor Processor HW HW HW Processor HW HW Processor Processor

Before DVS Delay (μs) Power (mW) 288 1600 4563 1600 438 1600 656 435 638 435 869 435 156 1600 638 435 869 435 156 1600 438 1600 656 435 638 435 869 435 156 1600 638 435 869 435 156 1600 56 1600

After DVS Delay (μs) Power (mW) 720 150 5703 870 729 400 874 200 1276 70 1158 200 259 400 638 435 1158 200 259 400 547 870 874 200 850 200 1158 200 259 400 1276 70 1158 200 390 150 56 1600

tasks. Table V shows the characteristics of the tasks before and after DVS. Our results show that ASG-VTS achieves 39% energy savings over the baseline. This example is available in our on-line tool [Gorjiara 2004]. 5.2 Comparing ASG-VTS with EE-GLSA We compare our algorithm with EE-GLSA in terms of energy savings, the number of mode transitions, and algorithm runtime. Schmitz et al. [2002] present the results of EE-GLSA on a set of benchmarks with tight deadlines using a Pentium III/750 MHz PC. We also run the same set of benchmarks on a similar PC (PIII/700 MHz) to produce the results of ASG-VTS. The second column of Table VI presents the characteristics of the benchmarks in terms of the numbers of tasks and edges in the task graphs. In this experiment, we assume that the processing elements can operate at four voltage modes and the mode transition overhead is negligible. The effect of nonnegligible transition overhead is discussed in Section 5.3. The third column shows the energy savings achieved by EE-GLSA using the voltage dithering technique to convert continuous voltage levels to discrete ones. The fourth column shows the energy savings of ASG-VTS given four discrete voltage modes. As shown, ASG-VTS achieves very similar energy efficiency to EE-GLSA. Columns 5 and 6 show the numbers of mode transitions required by EEGLSA and ASG-VTS, respectively. Since EE-GLSA relies on dithering for converting continuous voltage values to discrete ones, it requires both intertask and intratask mode transitions. Reordering optimization [Zhang et al. 2003] reduces the number of intertask mode transitions. However, it does not affect the intratask mode transitions. In this experiment, although we applied reordering optimization to EE-GLSA, it still requires 4.8 times more mode transitions ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Algorithm for Energy Optimization



39:17

Table VI. Comparing ASG-VTS with EE-GLSA Benchmarks [Schmitz et al. 2002] tgff 1 tgff 2 tgff 3 tgff 4 tgff 5 tgff 6 tgff 7 tgff 8 tgff 9 tgff 10 tgff 11 tgff 12 tgff 13 tgff 14 tgff 15 tgff 16 tgff 17 tgff 18 tgff 19 tgff 20 tgff 21 tgff 22 tgff 23 tgff 24 tgff 25 Average

#tasks /edges 8/9 26/43 40/77 20/33 40/77 20/26 20/27 18/26 16/15 16/21 30/29 36/50 37/36 24/33 40/63 31/56 29/56 12/15 14/19 19/25 70/99 100/135 84/151 80/112 49/92

Energy Savings % EE-GLSA

71.05 26.79 69.18 12.99 17.14 1.61 29.90 13.83 24.85 35.77 16.96 5.11 20.71 28.12 4.15 29.88 22.20 23.44 27.84 52.30 19.45 29.10 23.20 8.53 20.16 24.3

ASG-VTS

66.67 26.7 66.72 11.83 18 1.53 28.29 13.56 19.19 33.9 16.04 4.31 19.37 26.92 4.3 28.12 20.76 20.48 26.16 45.43 19.89 33.14 21.75 7.81 20.28 23.1

No. of mode trans. EE-GLSA

ASG-VTS

9 27 47 24 47 23 23 18 21 17 33 41 44 27 44 33 31 13 16 25 79 106 104 80 57 39.6

2 2 14 7 14 6 6 0 10 2 5 10 13 5 7 3 4 2 3 11 17 11 39 0 15 8.32

compared to ASG-VTS on average. If the mode transition overhead is not trivial, then these transitions will result in energy waste and deadline violation. Table VII shows the runtime of ASG-VTS and the number of iterations required for convergence of the algorithm. The worst-case execution of ASG-VTS is 0.12s, while it is 17.99s for EE-GLSA [Schmitz et al. 2002]. Therefore, ASGVTS is 150 times faster than EE-GLSA. Such low runtime is possible because our SDP heuristic (Section 4.1) significantly reduces the number of iterations in slack distribution cycles. Furthermore, each iteration in ASG-VTS has a low runtime complexity, partly due to our low-cost heuristic for task selection as described in Section 4.2. The task selection heuristic improves the runtime of the algorithm by 2.1 times on average. 5.3 Impact of Mode Transition Overhead on ASG-VTS As shown in Table VI, ASG-VTS has the attractive property that it generates solutions that make few mode transitions. Effectively it means the transition overhead has little impact on the efficiency of such solutions. Table VIII shows the average energy savings and the average number of mode transitions for the benchmarks of Section 5.2 given different values of Cr and Cd (see Equations (7) and (8)). Row 2 shows that when the overhead is negligible, with 8.3 transitions, an energy saving of 23.1% is achieved. Rows 3 and 4 show that ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39:18



B. Gorjiara et al. Table VII. ASG-VTS Runtime and Maximum Number of Iterations Test Benches

Number of Tasks/edges

Run Time (seconds)

Number of Iterations (i)

tgff1 tgff2 tgff3 tgff4 tgff5 tgff6 tgff7 tgff8 tgff9 tgff10 tgff11 tgff12 tgff13 tgff14 tgff15 tgff16 tgff17 tgff18 tgff19 tgff20 tgff21 tgff22 tgff23 tgff24 tgff25

8/9 26/43 40/77 20/33 40/77 20/26 20/27 18/26 16/15 16/21 30/29 36/50 37/36 24/33 40/63 31/56 29/56 12/15 14/19 19/25 70/99 100/135 84/151 80/112 49/92

0.01 0.02 0.04 0.01 0.03 0.02 0.03 0.02 0.01 0.02 0.02 0.03 0.02 0.02 0.04 0.03 0.03 0.01 0.02 0.02 0.05 0.12 0.11 0.04 0.04 0.032

179 114 125 161 122 171 209 129 125 174 158 143 169 153 147 150 155 131 159 171 111 124 153 108 137 147

Average

Table VIII. Runtime of the ASG-VST Algorithm and Number of Iterations Cd (μs/V) 0 1 10 100

Cr (μF) 0 1 10 100

avg. savings % 23.1 22.8 22.3 20.0

avg. no. of mode trans. 8.32 7.92 7.76 6.1

for a small or moderate transition overhead, energy savings decrease by less than 1%, while the number of transitions remains almost fixed. Row 5 shows that with high transition overhead, the energy savings decrease by 3%, while number of mode transitions decrease by 27% (from 8.32 to 6.1). This experiment shows that ASG-VTS can effectively maintain a low number of mode transitions and high energy efficiency even in the presence of transition overhead. 5.4 Comparing ASG-VTS with the Optimal Solution This subsection compares the energy efficiency and runtime of ASG-VTS with the optimal ILP solution [Andrei et al. 2005] formulated for discrete voltage levels with considerations for mode transition overhead. The GSM encoder/ decoder benchmark chosen for our comparison consists of 87 tasks and 137 edges. The tasks are mapped to three PEs, which can operate at two voltage ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Algorithm for Energy Optimization



39:19

modes. Note that, the search space of this benchmark is relatively large and includes 287 ≈ 1.54742×1026 solutions. Therefore, exhaustive search for finding optimal solution is not viable. The energy overhead of transition from each mode to the other is 52 μJ. More details about this benchmark can be found in Schmitz et al. [2004]. The energy consumption of the system without DVS optimization is 1.961 mJ. We ran both algorithms on this benchmark using a 2.8GHz Pentium 4 machine with 1GB RAM. The optimal solution generated by the ILP consumes 1.476 mJ, exactly the same as the solution produced by ASG-VTS. Perhaps the most interesting comparison is the algorithm runtime: 10 minutes for the ILP algorithm, vs. just 0.58 seconds for ASG-VTS. This shows that not only could ASG-VTS find a solution as good as the optimal for this particular real-life application, it does so 1000 times faster than the optimal ILP solver. 6. CONCLUSIONS To achieve energy efficiency at the system level, one of the key problems to solve is simultaneous voltage selection and task scheduling on heterogeneous multiple-processor system under deadline constraints. As the problem is NPHard, we propose a stochastic heuristic called Adaptive Stochastic Gradient Voltage-and-Task Scheduling (ASG-VTS), which combines slack distribution and iterative adjustment of task ordering. Through cycles of slack recovery and distribution, the algorithm quickly converges to high quality solutions. To reduce the runtime complexity of slack distribution, we define a novel, lightweight data structure that tracks time-based relationships instead of tracking full dependencies among tasks. Experimental results over publicly available benchmarks and our own design have shown that our ASG-VTS algorithm can quickly and consistently find near-optimal solutions with few mode transitions. With an average 4.8× reduction in the number of mode transitions, the solutions generated by ASG-VTS can tolerate relatively high mode transition overheads. In a real-life GSM encoder/decoder benchmark, ASG-VTS finds a solution as good as the one solved by the optimal ILP algorithm, but takes 0.5 second vs. over 10 minutes. It is also 150 times faster than previous energy-gradient techniques. These properties make ASG-VTS one of the best choices for design space exploration of increasingly complex systems. We have also developed a Web-based interface for the ASG-VTS algorithm [Gorjiara 2004]. REFERENCES ABDI, S., PENG, J., YU, H., SHIN, D., GERSTLAUER, A., DOEMER, R., AND GAJSKI, D. 2003. Systemon-Chip Environment (SCE Version 2.2.0 Beta): Tutorial. Tech. rep. CECS-TR-03-41, CECS, University of California Irvine. ANDREI, A., SCHMITZ, M., ELES, P., PENG, Z., AND AL-HASHIMI, B. 2005. Overhead-conscious voltage selection for dynamic and leakage energy reduction of time-constrained systems. IEE Proceedings—Computers and Digital Techniques 152, 1, 28–38. ANDREI, A., SCHMITZ, M., ELES, P., PENG, Z., AND AL-HASHIMI, B. M. 2004. Overhead-conscious voltage selection for dynamic and leakage energy reduction of time-constrained systems. In Proceedings of DATE. IEEE Computer Society, Los Alamitos, CA. BAMBHA, N. K., BHATTACHARYYA, S. S., TEICH, J., AND ZITZLER, E. 2001. Hybrid global/local search strategies for dynamic voltage scaling in embedded multiprocessors. In Proceedings of CODES. ACM Press, New York, NY, 243–248. ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

39:20



B. Gorjiara et al.

CAI, L., GERSTLAUER, A., AND GAJSKI, D. 2003. Retargetable profiling for rapid, early system-level design space exploration. Tech. rep. CECS-TR-04-04, CECS, University of California Irvine. October. DICK, R. P. AND JHA, N. K. 1999. MOCSYN: Multiobjective core-based single-chip system synthesis. In Proceedings of Design, Automation and Test in Europe. IEEE Computer Society, Los Alamitos, CA, 263. EUROPEAN TELECOMMUNICATION STANDARDS INSTITUTE (ETSI). 1996. Digital cellular telecommunications system; enhanced full rate (EFR) speech transcoding (GSM 06.60). ¨ , R., GERSTLAUER, A., AND ZHAO, S. 2000. SpecC: Specification Language GAJSKI, D. D., ZHU, J., DOMER and Methodology. Kluwer Academic Publishers, Boston, MA. GAREY, M. R. AND JOHNSON, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, New York, NY. GORJIARA, B. 2004. http://www.ece.uci.edu/~bgorjiar. GORJIARA, B., BAGHERZADEH, N., AND CHOU, P. 2004. An efficient voltage scaling algorithm for complex socs with few number of voltage modes. In Proceedings of ISLPED. IEEE Computer Society, Los Alamitos, CA, 381–386. GORJIARA, B., CHOU, P., BAGHERZADEH, N., JENSEN, D., AND RESHADI, M. 2004. Fast and efficient voltage scheduling by evolutionary slack distribution. In Proceedings of ASP-DAC. IEEE Computer Society, Los Alamitos, CA, 381–386. GRUIAN, F. AND KUCHCINSKI, K. 2001. LEneS: Task scheduling for low-energy systems using variable supply voltage processors. In Proceedings of ASP-DAC. ACM Press, New York, NY, 449–455. INTEL. 2007. Intel XScale microarchitecture. http://developer.intel.com/design/ intelxscale. JEJURIKAR, R. AND GUPTA, R. 2004. Dynamic voltage scaling for system-wide energy minimization in real-time embedded systems. In Proceedings of ISLPED. IEEE Computer Society, Los Alamitos, CA, 78–81. LEUNG, L.-F., TSUI, C.-Y., AND KI, W.-H. 2004. Minimizing energy consumption of multipleprocessor-core systems with simultaneous task allocation, scheduling and voltage assignment. In Proceedings of ASPDAC. ACM Press, New York, NY. LUO, J. AND JHA, N. K. 2003. Power-profile driven variable voltage scaling for heterogeneous distributed real-time embedded systems. In Proceedings of International Conference on VLSI Design (VLSI’03). IEEE Computer Society, Los Alamitos, CA, 369–375. MARTIN, S. M., FLAUTNER, K., MUDGE, T., AND BLAAUW, D. 2002. Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessor under dynamic workload. In Proceedings of ICCAD. ACM Press, New York, NY, 721–725. SCHMITZ, M. AND AL-HASHIMI, B. 2001. Considering power variations of DVS processing elements for energy minimisation in distributed systems. In Proceedings of ISSS. ACM Press, New York, NY, 250–255. SCHMITZ, M. T., AL-HASHIMI, B. M., AND ELES, P. 2002. Energy-efficient mapping and scheduling for DVS enabled distributed embedded systems. In Proceedings of DATE. IEEE Computer Society, Los Alamitos, CA. SCHMITZ, M. T., AL-HASHIMI, B. M., AND ELES, P. 2004. System-Level Design Techniques for EnergyEfficient Embedded Systems. Kluwer Academic Publishers, Buston, MA. SPALL, J. C. 2003. Introduction to Stochastic Search and Optimization. John Wiley & Sons, Inc., New York. NY. VON WEYMARN, M. 2001. Development of a specification model of the EFR vocoder. Tech. rep. ICS-TR-01-35, University of California Irvine. ZHANG, Y., HU, X. S., AND CHEN, D. Z. 2002. Task scheduling and voltage selection for energy minimization. In Proceedings of DAC. ACM Press, New York, NY. ZHANG, Y., HU, X. S., AND CHEN, D. Z. 2003. Energy minimization of real-time tasks on variable voltage processors with transition energy overhead. In Proceedings of ASP-DAC03. ACM Press, New York, NY, 65–70. Received January 2006; revised August 2006; February 2007; accepted March 2007

ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 39, Pub. date: Sept. 2007.

Suggest Documents