to the system level by simultaneously optimizing for multiple de- vices. Experimental ... Our motivating example is a software defined radio (SDR). It is a complex ...
Optimizing Mode Transition Sequences in Idle Intervals for Component-Level and System-Level Energy Minimization ∗ Jinfeng Liu, Pai H. Chou Center for Embedded Computer Systems University of California, Irvine, CA 92697-2625, {jinfengl, phchou}@uci.edu
Abstract New embedded systems offer rich power management features in the form of multiple operational and non-operational power modes. While they offer mechanisms for better energy efficiency, they also complicate power management decisions in the presence of realtime constraints. Traditional dynamic power management techniques based on localized break-even-time analysis with simple on/off power controls often yield suboptimal if not incorrect results globally. To address these problems, this paper presents two core algorithms for reducing idle energy consumption at the component level and system level. The first algorithm discovers the optimal sequence for mode transition over multiple power modes under timing constraints. It assists the second algorithm that performs a sophisticated global search strategy to aggressively explore system-wide energy savings by correctly interpreting the constraints across all subsystems. Experimental results show that in an embedded radio system where idle energy cost matches or exceeds the active energy consumption, our technique can further reduce the idle energy by 50–70%, which translates into 30–50% of overall system energy compared to existing techniques.
1
Introduction
As power becomes a primary concern for embedded systems, components are being built to support a richer set of power management options in the form of more power modes. Previously, a device may have only on, off, and possibly standby modes; nowadays, a device can have many more operational modes and non-operational modes. By running the device at a different voltage/frequency combination or communication data rate, the system can save energy by better adapting its performance to the workload. In contrast to earlier dynamic power management (DPM) techniques that handle only simple on/off modes, recent DPM techniques have been extended to handling more power modes. They model multiple power modes as a state machine. To change power mode, the power manager follows the state transition arcs by going through a number of intermediate modes if necessary. Unfortunately, today’s power management techniques have not been able to take full advantage of such a rich set of options for more effective power management, for several reasons. First, they do not handle hard timing constraints. Second, the decision is based on a simple break-even time, possibly steered by a stochastic model constructed from profiling, but these are heuristics that have not been rigorously proven for optimality. Third, these DPM techniques power manage peripherals as independent units based on localized views, without considering dependencies among them. This paper proposes a systematic treatment of the problem of selecting power mode sequences to minimize energy at the system level under timing constraints. Previous approaches either perform ∗ This
research was sponsored in part by DARPA contract F33615-00-11719 and by National Science Foundation under grant CCR-0205712.
USA
direct mode changes without sequencing, or they do mode sequencing by functional necessity without exploiting power optimization opportunities. Moreover, their decisions are based on simple breakeven time without timing constraints. In this paper, our approach is two-fold: we first present a technique that optimizes the transition sequence for an individual device; second, we generalize this to the system level by simultaneously optimizing for multiple devices. Experimental results show that our algorithm can discover new mode transition sequences that are otherwise too intricate for designers to discover manually. Our sequencing approach also has the effect of smoothing out the power consumption curve, making the system friendlier to batteries. Equally importantly, the low runtime complexity of our algorithms makes them amenable to online power management.
2
Related Work
Dynamic voltage scaling (DVS) and dynamic power management (DPM) are two classes of techniques commonly used in reducing energy consumption in embedded systems. DVS works by lowering the voltage and the frequency such that the same amount of work can be performed at not only a lower power level but also less total energy despite the longer execution time, thanks to the quadratic scaling of power. Many variations of DVS have been proposed on processors [1, 2] and power/datarate scaling on communication links [3]. On the other hand, DPM works by shutting down idle subsystems [4, 5]. The shutdown and power-up decision can be based on fixed idle times (the break-even time), or an adaptive timeout scheme based on learning, profiling or prediction [6, 7]. While many low-power techniques are processor-oriented and mainly focus on active tasks (e.g., scheduling), the focus of this work is on the system level, where the processor is one of the components, and communication and I/O could cost much energy, even when idle. DPM techniques target these subsystems, but the time-out or break-even-time approaches are not suitable for systems whose components have multiple operational and nonoperational modes. Moreover, many such power managers cannot handle system-level constraints when they manage the system as a set of individual devices. We observe that the shutdown decisions, including the shutdown time, duration, and sequences on different components, are mutually dependent and can affect each other’s power management opportunities. If not carefully coordinated, localized power management decisions can lead to lower energy efficiency globally. To address this problem, a few recent studies model multiple power modes at the system level. Li et al proposes a model of dependencies between power modes across subsystems [8] and attempts to optimize mode transition sequence by shortest path algorithms to reduce energy [9], though such a solution is proven not to be optimal by our study.
PA
Channel N
Channel 2
Channel 1 Antenna Radio Power amp
PA
Antenna Radio Power amp
PA
power
power mi
mlow
mi
Antenna Radio Power amp
mlow
mi
mi
mi
mi
mj
mk
mlow
mlow
mlow
mp
mq
mi
CP
Transceiver
XC
Transceiver
XC
Transceiver
MD
Modem
MD
Modem
MD
Modem
BP
Base-band Processor
BP
Base-band Processor
BP
Base-band Processor
Central Processor
CP
Central Processor
mj
mk
mlow
mlow
mp
mq
mi
mi
mlow m low
(b) multiple mode transtions with less time & energy
time
(c) multiple mode transtions with more time but less energy
Figure 2: Multiple mode transitions during idle intervals. power
mi
mlow1
mlow1
mi
power mi
mi
mi
PM
mlow1
mlow1
mlow2
mlow2
time
Figure 1: The block diagram of SDR.
3
mi
time
(a) break-even with optimal two mode transtions
mi Power Manager
mi
mlow m low
time
XC
power mi
Motivating Example
Our motivating example is a software defined radio (SDR). It is a complex embedded system consisting of radio antennae, power amplifiers (PA), A/D and D/A converters, signal processors, data communication links, and general purpose processors running real-time operating systems. Its block diagram is shown in Fig. 1. This SDR is organized into multiple channels, each of which contains a frontend radio antenna with a power amplifier (PA) to send and receive RF signals. The analog signal is strengthened by a transceiver (XC) before a modem (MD) converts it to base-band digital signals. Each channel is equipped with a dedicated base-band processor (BP). Each BP is networked with a pool of globally-shared central processors (CP) that are equipped with more signal processing capabilities, including voice/image/video processing, encryption, compression and etc. The power manager (PM) is a separate processor that controls the power levels of all components by issuing commands to set their power modes. The activities on each channel contain a mix of periodic and sporadic messages. The external messages can either arrive at fixed times or their arrival times can be negotiated. Not all messages need to be handled by the central processors. For example, each channel regularly perform handshaking with remote parties, and these can be completely handled by the base-band processor. If a message requires additional processing, e.g., encryption or voice, then one of the central processors will process it and send the result back to the remote party. Many components have multiple power modes. For example, the power amplifier PA can operate at 372W, 52W, and 9W for sending/receiving signals in a range of different signal strengths and signal/noise ratios. Plus, it can be put into a low-power standby mode. The transceiver and modem both support on/off/standby modes. The processors support on/off/sleep modes. Reducing Idle Energy with Multiple Modes Among the components in the SDR, the power amplifier presents itself with the most power management opportunities due to the highest power consumption and its multiple power levels. Because the message arrival times are sparse, it is possible to either turn off the PA, or set it to standby mode to reduce energy consumption. Traditional DPM techniques would perform break-even time analysis. The break-even time is the time duration that yields the same energy consumption whether the device remains in the same operational power mode the entire time or changes to a lower-power mode temporarily and back to operational. If the idle interval is longer than the break-even time, then changing to low-power mode and back can reduce energy. Break-even analysis usually yields the
time
(a1) break-even can only reache mlow1 but not mlow2 power mi
mi
mlow2
mlow2
mi
(a2) multiple mode transtions can reach mlow2 with less energy power
mi
mlow1
mlow1
mlow2 time (b1) break-even can reache mlow2 as optimal sequence
mi
mi
mlow2 time (b2) multiple mode transtions stop at mlow1 (higher power) with less energy
Figure 3: Multiple transitions to reach different low-power states.
greatest energy savings on idle intervals when the component has only one operational mode. However, many new electronic devices are offering many more modes. For example, a hard drive has over a dozen modes; network interface cards support multiple bit rates and therefore power levels. Multiple modes offer new power management opportunities. Fig. 2(a) shows a sequence that first puts a component into lowpower mode mlow after an active task finishes execution in operational mode mi ; the component remains in mode mlow for some time before reverting back to mi just before the next active task starts execution. Break-even analysis would consider this solution optimal, but the actual optimal solution is another mode transition sequence mi → m j → mk → mlow . The solution in Fig. 2(b) consumes less time and energy, while the one in Fig. 2(c) takes more time but consumes less energy. Moreover, more options are available when reverting from mlow back to mi , because it does not have to be the exact reversal of the first sequence mi → m j → mk → mlow . Many opportunities are available with multiple modes, but they may not be obvious. In Fig. 3(a) the component has two lowpower modes mlow1 , mlow2 . By break-even analysis we can reach only mode mlow1 but not mlow2 (Fig. 3(a1)), either because the duration is not long enough for two transitions mi → mlow2 → mi , or the duration is shorter than the break-even time. With multiple modes, an alternative path may reach the lower power mode mlow2 with reduced energy (Fig. 3(a2)). Perhaps even more counterintuitive is the choice between different low-power modes. Suppose break-even analysis allows changing to lowest power mode mlow2 (Fig. 3(b1)). However, a better solution might be a sequence that reaches the higher power level mlow1 instead (Fig. 3(b2)). Our third example shows that the scenarios of multiple-mode transition sequences are more sensitive to time. In Fig. 4(a), we assume the sequence is optimal for the idle interval with delay T . However, if the idle interval has a delay T 1 = T +∆T as in Fig. 4(b), then the optimal sequence for Fig. 4(a) is no longer optimal. The extra time ∆T enables an alternative path with lower energy but a
mj
power
mj
power mi
mi
mmlow low
mlow m low
T1 = T + ∆T
time
T
time
(b) optimal sequence or low-power mode may change with a duration of idle interval T1
(a) optimal sequence with the duration of idle interval T
Figure 4: Optimal mode transition sequences vary with duration. recv data
PA
send result
B1 R1
R1
CP cannot be turned on and off quickly, it must remain with full power between IP1 & IP0
extra overhead on PA
PA
B1
recv data
send result
R1
R1
delay R1 to reduce idle energy on CP
CP
IP1
IP0
(a) extra overhead due to system-level constraints across PA - CP
IP1
CP
IP0
(b) redicomg overhead on CP requires extra overhead on PA
Figure 5: Energy trade-off on idle intervals across subsystems.
longer delay, which is not possible in (a). We should anticipate that the optimal sequence may change again with another delay T 2, possibly with a different target low-power mode. This is in contrast to break-even analysis where the optimal sequence is always the same for all T > break-even time. Another new feature with multiple modes is that the starting mode can be different from the destination mode, making it an O(M 2 ) problem given a total of M modes, and the optimal solution of each sub-problem varies with the delay parameter T . As a result, the analysis on multiple modes and multiple-mode transition sequence subsumes break-even analysis, which is a special case with two different transition sequences. Previous work has modeled modes and transitions as a directed graph, and it was suggested that optimal mode sequencing be solved as a shortest energy or time path problem in this graph. However, the optimality of a multiple-mode transition sequence is related to neither the shortest energy path nor the shortest delay path. Even the partial sequences between mi → m j → mk → mlow and a reverse path mlow → m p → mq → mi are orthogonal to the shortest delay/energy paths. This property will be discussed in the next section. Finally, in the general case, the delay and energy costs on mode transition towards a low-power mode do not scale monotonically with the power level of the starting mode. For example, the assumption “turning off from a high power level is more difficult” may not be true depending on specific design constraints of the device. Correlated Idle Intervals on the Entire System The SDR example highlights another limitations with previous techniques including break-even analysis: they do not handle correlated idle intervals. One common strategy is to combine multiple active intervals into one to reduce the fragmentation of idle intervals and the number of mode changes. Processor caches and hard disk buffers actually fall into this category although they originally aimed for performance. Such techniques are applicable to independent or loosely constrained subsystems. In reality, the lengths of the idle intervals of all components are closely connected by system-level constraints, including dependency, deadline, and resource sharing in an entire system. Therefore, we must go one more step further by taking all idle intervals, as well as all active intervals, into a system-level optimization framework. Fig. 5 illustrates one scenario where channel 1 of the SDR re-
ceives a request R1 for image processing on CP. Assuming fixed active intervals have already been allocated for regular handshaking signals, R1 can be scheduled to arrive at any time within a certain window, as long as the channel is free (i.e., in an idle interval). However, once R1 is received, CP must compute the result as soon as possible and send it back. The power manager’s job is to pick the right arrival time of the request to reduce the energy consumption of the system. During the window of R1’s possible arrival times, a time slot has already been allocated to message B1. By scheduling R1 right next to B1, mode change overhead on both the power amplifier PA and other components on channel 1 can be reduced. However, since the image processing task IP1 must start immediately upon receiving R1, this requires additional activities to change the power modes on CP. CP consumes much less power than PA (30W vs. 372W), but it takes a much longer time to execute the power-down and power-up sequences than PA as an analog device (2-3 seconds vs. 100 ms). In fact, the extra energy cost on CP exceeds the total energy savings on the biggest power consumer PA plus all other components on channel 1. A better approach is to allow IP1 to be close enough to any pre-allocated activities on CP (e.g., another task IP0) such that the overhead to power-up CP is minimized, although it involves additional overhead on PA by not scheduling R1 close to B1. This may appear somewhat contrived, but this example is actually simple compared to real-life embedded applications. Many surprises could occur if system designers overlooked important system-level constraints and dependencies. In this example, targeting the biggest power consumer can actually lead to an inferior solution.
4
Component-Level Energy Optimization
This section first defines our component model in terms of resources, modes, mode transitions, and mode transition sequences. We then pose the energy minimization problem as finding the optimal mode transition sequence between a pair of modes given an overall timing constraint on delay. We formulate our problem with a geometric representation and present the optimization algorithm to solve a more general problem by optimizing for all pairs of modes for all delay constraints in a single run. 4.1
Component Model
Definition 1 (Resource and Mode) An execution resource M consists of a set of M modes mi , 1 ≤ i ≤ M. Each mode mi has an idle power level P0i ≥ 0. A mode mi can be either an operational mode or a nonoperational mode (such as idle, sleep, and off). The operational power may be dependent on the active tasks and need not equal P0i . Definition 2 (Mode Transition) A mode transition mt(i→ j) refers to the procedure to change M’s mode from mi to m j . mt(i→ j) relates to two parameters d(i→ j) ≥ 0 and e(i→ j) ≥ 0 that specify the respective the delay and energy consumption of this process. Mode transitions can be represented by either a mode transition diagram or a mode transition matrix. It should be noted that not all pairs of transition mt(i→ j) are valid due to various design constraints. For example, it might not be possible to directly shut off a device from a high-power mode. If a mode mi cannot be directly changed to mode m j , then d(i→ j) = e(i→ j) = +∞. Definition 3 (Mode Transition Sequence) A mode transition sequence MTS(i→ j) consists of sequential mode transitions mti→k1 , mt(k1 →k2 ) , . . . , mt(kN−1 → j) . The first transition starts from mode mi and the last transition finishes at mode m j . There are
E
N − 1 intermediate modes mk1 , mk2 , . . . , mkN−1 with a total of N mode transitions. Each transition is followed by the next immediately. Let k0 = i, kN = j. The overall delay and energy consumption of mode transition sequence MTS(i→ j) is D(i→ j) = ∑N−1 n=0 d(kn →kn+1 ) N−1 and E(i→ j) = ∑n=0 e(kn →kn+1 ) .
T - E space for (i
j)
the stringht line representing a sequnce S E = Es + Ps x (T - Ds) slope = Ps Es
end point (Ds, Es)
Ds
Definition 4 (Delay-Constrained Mode Transition Sequence) A delay-constrained mode transition sequence MTS(i→ j) (T ) is an MTS(i→ j) that requires the resource to start with mode mi at time 0 and to be in mode m j by time T . T ≥ D(i→ j) and is called the delay constraint. During a delay-constrained mode transition sequence, each transition mt(kn →kn+1 ) may follow the previous transition mt(kn−1 →kn ) after some non-zero delay while remaining in mode mkn . We call all such modes mkn standing modes MST D . Without loss of generality, the ensuing text assumes only one standing mode for simplicity, although there can be multiple standing modes. We denote the single standing mode by overloading the symbol ms as if it were the sth mode, that is MST D = {ms }. The energy consumption of MTS(i→ j) (T ) is E(i→ j) (T ) = E(i→ j) + P0s × (T − D(i→ j) )
(1)
Note that lowercase symbols d(i→ j) and e(i→ j) represent the delay and energy of a single mode transition mt(i→ j) ; while uppercase symbols D(i→ j) and E(i→ j) refer to the delay and energy of a sequence of mode transitions. Finally, D(i→ j) (T ) and E(i→ j) (T ) denote the delay and energy of a mode transition sequence, or simply sequence for short, constrained by T . Sequence MTS(i→ j) is a spe/ cial case of sequence MTS(i→ j) (T ) with T = D(i→ j) and MST D = 0. 4.2
Problem Statement: Energy-Optimal Sequence
Problem 1 (Optimal Mode Transition Sequence) Given: (a) a resource M with M modes m1 , m2 , . . . , mM , (b) a source mode mi , and a target mode m j , (c) a delay constraint T , Find: the optimal delay-constrained sequence MTS(i→ j) (T ) that minimizes energy consumption E(i→ j) (T ). Before seeking an algorithm to solve Problem 1, we first examine a few interesting properties of this problem. Property 1 (Min-Power Standing Mode) If MTS(i→ j) (T ) is the optimal sequence with minimum energy E(i→ j) (T ), then its standing mode ms must consume the least amount of idle power among all modes mk0 = mi , mk1 , mk2 , . . . , mkN−1 , mkN = m j . That is, P0s = min P0kn 0≤n≤N
Property 1 can be directly derived from Equation (1). It must be noted that the shortest delay path from mi to m j (with the minimum D(i→ j) ) does not lead to the minimum energy. The shortest delay path can only decide whether a valid sequence exists: if T < min{D(i→ j) }, then no valid sequence is possible. Property 1 also indicates that the minimum energy path is not relevant. A minimum energy path is associated with the minimal E(i→ j) . However, the second term P0s × (T − D(i→ j) ) could be much more dominant; or it may not be a valid sequence if T < D(i→ j) . As a result, even if the shortest delay path happens to be the same as the minimum energy path, it still does not necessarily yield the minimum energy consumption.
T
Figure 6: The geometrical representation of a sequence S.
Property 2 (Non-Simple Path) A path from mode mi to mode m j is called non-simple if it contains more than one visit to a given mode mn . An optimal sequence MTS(i→ j) (T ) may contain a nonsimple path – i.e., a mode mn can be visited twice if the min-power standing mode ms is visited between the two visits to mn . Property 2 indicates that the resource can be changed from the source mode mi to a low-power mode mlow through a sequence of mode transitions that visit mn . Then, another sequence will progress towards the target mode m j by visiting mn for the second time. Property 2 also implies that Problem 1 cannot be solved solely by shortest-path based algorithms that consider only simple paths. 4.3
Geometric Representation of the Solution Space
In order to assist power management decisions in a wide range of operational conditions, we must be able to provide a series of optimal solutions given different values of T . We observed that given two close values of delay T and T + δt, the two optimal solutions are likely to contain the same (unconstrained) sequence. The only difference is that in the latter case, the resource will stay an extra δt time in the standing mode. We now define a geometric representation of the solutions space of Problem 1 with all possible delay constraints T ∈ [0, +∞). Let us reexamine Equation (1) in a two-dimensional T -E space for each (i → j) pair. We rewrite the equation as E = ES + PS × (T − DS )
(2)
where ES , DS and PS are energy consumption, delay and idle power level of the min-power standing mode of a sequence S, starting from mode mi to mode m j (with (i → j) omitted for brevity). ES , DS and PS are all constants for a given sequence S. Equation (2) is represented by a straight line in T -E space shown in Fig. 6. It starts from point (DS , ES ) toward +∞ in the T axis, with a slope equal to PS . This straight line represents all possible values of energy consumption E with delay constraint T ≥ DS , when a sequence S is selected. When T < DS , no solution exists for sequence S. Without ambiguity, we use symbol S to refer to both the sequence S and the straight line that represents S in the ensuing text. Multiple sequences can start from mode mi and end in m j for each pair (i → j). Fig. 7 shows four cases with two sequences S1 and S2 under the conditions that S1 and S2 rule out each other. When a line S1 (or one of its segments) is closer to the T axis, it always produces smaller E values than S2 that stays atop. As a result, the corresponding sequence S1 becomes the optimal solution, which is highlighted with solid line segments. From Fig. 7, it can be realized that break-even analysis is a special case with only two lines representing two sequences: (mi → mi ) with no change of mode, or changing to a lower power mode mlow with (mi → mlow → mi ). The break-even point is exactly the intersection of the two lines. The scenarios becomes more complex for multiple lines as shown in Fig. 8. The optimal solutions are represented by the line segments that are closest to the T axis ∀T ∈ [0, +∞). By our convention, we refer to the set of these line segments as the
E
UPDATE - OUTLINE (L[1 : |L|], S)
E
S2:
S1:
E = Es2 + Ps2 x (T - Ds2)
E = Es1 + Ps1 x (T - Ds1)
S1
S2
Condition: Es2
Condition:
Ds1 DS then insert DS , T0 , S into the beginning of L; (1∗ ) 3 for Ti , Ti+1 , Si ∈ L do 4 if Si 6= S and ES + PS × (Ti − DS ) ≥ ESi + PSi × (Ti − DSi ) and PS ≥ PSi then break; (2∗ ) 5 if Si 6= S and ES + PS × (Ti − DS ) ≥ ESi + PSi × (Ti − DSi ) and ES + PS × (Ti+1 − DS ) ≥ ESi + PSi × (Ti+1 − DSi ) then continue; (3∗ ) 6 if Si 6= S and ES + PS × (Ti − DS ) ≤ ESi + PSi × (Ti − DSi ) and ES + PS × (Ti+1 − DS ) ≤ ESi + PSi × (Ti+1 − DSi ) then replace Ti , Ti+1 , Si with Ti , Ti+1 , S in L; (4∗ ) 7 else do 8 D0 := the intersection of line S and Si within interval [Ti , Ti+1 ); 9 if ES + PS × (Ti − DS ) < ESi + PSi × (Ti − DSi ) then (5∗ ) 10 modify Ti , Ti+1 , Si with D0 , Ti+1 , Si in L; 11 insert Ti , D0 , S before D0 , Ti+1 , Si in L; 12 else (6∗ ) 13 modify Ti , Ti+1 , Si with Ti , D0 , Si in L; 14 insert D0 , Ti+1 , S after Ti , D0 , Si in L; 15 for Ti , Ti+1 , Si ∈ L[1 : |L| − 1] do (7∗ ) 16 if S = Si = Si+1 = Si+2 = . . . = Si+k then merge Ti , Ti+1 , Si , Ti+1 , Ti+2 , Si+1 , . . . , Ti+k , Ti+k , Si+k into one item Ti , Ti+k , S in L;
Revised Problem Statement and Algorithm
Problem 2 (Optimal Sequences for All Source-Target Modes) Given: (a) A resource M with M modes m1 , m2 , . . . , mM , (b) any pairs of modes mi and m j , ∀1 ≤ i, j ≤ M, (c) any delay constraint T ∈ [0, +∞), Find: all optimal sequences MTS(i→ j) (T ) defined by Problem 1. Now we construct an algorithm to solve the more general Problem 2. We first introduce a utility algorithm that updates the set of the lower profile L when adding a new line S. The set of lines in the lower outline is L : {Ti , Ti+1 , Si }, i = 0, 1, . . . , |L| − 1, T|L| = +∞. [Ti , Ti+1 ) is the interval of each line segment, and Si is the optimal sequence for the interval [Ti , Ti+1 ). Si ’s parameters are ESi , DSi , PSi . L is sorted by the start of intervals Ti . With a new line characterized by S : ES , DS , PS , the lower outline L is updated by Algorithm UPDATE - OUTLINE . The algorithm first checks when L is empty, S becomes the only element in L (0∗ , line 1) and the algorithm terminates. Otherwise, S provides a smaller beginning time DS , S is inserted as the first element of L in order to keep L sorted (1∗ , line 2). Then, the full list of L is examined by the following cases. For each element Ti , Ti+1 , Si in L, if S is completely ruled out by Si for T ≥ Ti , it is not necessary to check the remaining items in L (2∗ , line 4). However, if Si rules out S only in interval [Ti , Ti+1 ), S may still be optimal in another interval. The algorithm should continue in the next interval (3∗ , line 5). On the other hand, if S rules out Si only in interval [Ti , Ti+1 ), then S becomes the optimal solution in this interval (4∗ ,
→ 1 : N], e[1 : N → 1 : N]) / for i := 1 to N do for j := 1 to N do L[i → j] := 0; S := [1, 1]; while S 6= 0/ do i := S[1], j := S[|S|]; if S is valid (Properties 1 and 2) then UPDATE - OUTLINE (L[i → j], S); append 1 to the end of S; else if j 6= N then S[|S|] = j + 1; else while |S| 6= 0 and S[|S|] = N do remove last element of S; if |S| = 1 then S[1] = i + 1, append 1 to the end of S; return L[1 : N → 1 : N];
COMPUTE - OUTLINE (d[1 : N
1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 10: Algorithm COMPUTE - OUTLINE.
line 6). Rather than 2∗ , 3∗ and 4∗ , S will always intersect with Si at D0 ∈ (Ti , Ti+1 ). Then, the algorithm modifies L and insert a new element accordingly in cases 5∗ (lines 9–11) and 6∗ (lines 12– 14). Finally, if S replaces adjacent elements in L, then they will be merged into a single element (7∗ , lines 15–16). Fig. 11 shows the scenarios of cases 0∗ – 7∗ . Based on Algorithm UPDATE - OUTLINE, the global optimization algorithm to solve Problem 2 is Algorithm COMPUTE - OUTLINE. It computes all lower outlines L[i → j] by updating all valid paths between (i → j). It appears to be expensive since it enumerates nonsimple paths between all pairs of (i → j). However, based on Properties 1 and 2, many invalid paths are eliminated immediately. Algorithm UPDATE - OUTLINE can also efficiently rule out any “bad” sequences that result in higher energy consumption. In addition, it computes the entire matrix L for all pairs in a single run, instead of optimizing for Θ(M 2 ) pairs of (i → j). Once it completes, the result can be permanently saved into a lookup table that contains all the optimal sequences under all conditions for a given resource. Note that given the lower outline L[i → j] with any delay constraint T , normally it takes O(log |L|) times to find the optimal sequence by searching a sorted list L in the lookup tables. An alternative is to limit T to integer values and use discrete T values as indices to the lookup table. Therefore, searching for the optimal mode transition sequence table with T takes only Θ(1) time. This is illustrated in Fig. 12. From now we assume that T is an integer variable, and the optimal sequence table for a resource M is S0M (i→ j) [0 : max(T )].
E
E
S is the only element in L
S
E
insert S in the front of L
E
S is ruled out completely, S break
S is ruled out only in [Ti, Ti+1], continue
S
S
T
before after T0
before after T0
S1
(0) E
T
T1 T2
before after
Si Si
Ti Ti
(1) E
S replaces Si in [Ti, Ti+1]
S1 S2
T0 T1
S1
Si
Ti+1 Ti+1
T
before after
E
T
(3)
(2)
S intersectes with Si update & insert S new entry
Ti+1 Ti+1
Si Si
Ti Ti
E
S intersectes with Si update & insert Si new entry
merge S in adjacent entries S
S
Si S
before after
Si Si=S
Ti Ti
D'
T
Ti+1 Ti+1
before after
Si Ti Ti+1 Ti Si=S Ti+1=D' Si+1 Ti+2
D'
T
before after
Ti Ti Si
(5)
(4)
Si
Ti+1 Ti+1=D' Si+1=STi+2
Si=S
T
before after
Ti Ti
(6)
Si+1 Si+2 =S =S Ti+1 Ti+2 Ti+3 Ti+1 Si=S
T
(7)
Figure 11: Geometric illustration to Algorithm UPDATE - OUTLINE.
E
E S6
S5 S3 S1 T0
S1
S4
S2
S3
S4
S5 S6
S7
S8
S9
5.1
S2 T1
T2
T3
T4
(a) before: search for the interval where Ti 0. A larger value of EG(x,t) is preferable for more aggressive energy savings. Definition 6 (Maximum Energy Gain) MEG(x), the maximum energy gain of a task x, is the maximum value of all energy gains. MEG(x) = max EG(x,t) t∈∆(x)
If MEG(x) is achieved with a start time tm , that is MEG(x) = EG(x,tm ), we denote tm as tMEG (x). For maximum energy savings with a single task move, we pick a task u with the maximum MEG(u) among all tasks. Then, we shift task u to tMEG (u) to achieve this level of maximum savings. At this point, all affected tasks v must modify their slacks ∆(v) (Property 3), as well as the energy gain functions EG(v,t) and MEG(v). According to Property 3, the change is limited to a small set of tasks. For other tasks, their slacks and energy gain functions remain the same. Now, we can pick another task u0 with the maximum MEG(u0 ) to repeat this process, until no positive energy gain is possible. This forms the basis of our system-level energy optimization algorithm. 5.3
System-Level Optimization Algorithm
Algorithm OPTIMIZE - SYSTEM first calls OPTIMIZE - IDLE to optimize all existing idle intervals. Then, it computes slack and energy gains for all tasks and construct a heap-based priority queue Q in
1, 25
target source
OPTIMIZE - IDLE ({M}, {S0M (i→ j) [0 : max(T )]});
374W
HIGH
MID
LOW
0.03, 9
42W
MID
9W
LOW
STANDBY
0.15, 0.3 0.04, 0.4
0, 0
0.02, 0.1
0, 0
0.08, 20 0.03, 0.9
0.1W STANDBY 0.1, 20 0.08, 3.2 0.04, 0.3 0.01W
OFF
0.02, 4 0.03, 4.5 0.06, 0.48 0, 0
HIGH
0.2, 50 0.15, 5.2 0.12, 1
OFF
0.01, 0.2
1, 25 OFF
Stand by
0.01W 0, 0
0.01W 0, 0
25W
0, 0
0.1, 0.05
XC 20, 600
5, 50
OFF
ON
0, 0
PA 0.1, 0.1
5, 50
0.1W 0.1, 1.5
Stand 0.1W by
2, 60
20, 600 ON
0.1, 0.1
10W
OFF
0.01W 0.01, 0.01
Sleep
0.1W 2, 60
0, 0
0.1, 3
MD
BP/CP
ON
30W
Figure 15: The mode transition graphs (tables) in SDR.
which each entry contains a record of hx,tMEG (x)i, with MEG(x) as the key (lines 1–6). We find the record hu,tMEG (u)i with the maximum key value MEG(u) as the energy gain (line 1). This leads to the maximum energy savings by shifting u to tMEG (u) (lines 8–11). Then, MEG(u) and tMEG (u) are updated (line 12). Q must also update the record hu,tMEG (u)i and adjust itself as a heap (line 13). For any task v that is affected by shifting u (determined by Property 3), v’s slack ∆(v), energy gain EG(v,t), and MEG(v),tMEG (v) must be updated (lines 15–17). As a result, Q must also update its records and adjust itself accordingly (lines 15–17). This process continues until the maximum energy gain < 0. The runtime complexity of the algorithm is analyzed as follows. The execution time of each shift is decided by the inner loop (lines 14–18), within which the update to ∆(v) can be very efficient (details in [10]). As a result, EG(v,t), MEG(v) and tMEG (v) can be quickly updated by lookup tables. We estimate that the update to MEG(v),tMEG (v) takes O(1) time, and it takes O(lg N) time to adjust the heap Q. Therefore, the running time of each shift takes |v|O(lg N), where |v| is the number of tasks in the set of all affected tasks v. Usually |v| N. Note that Fig. 14 only sketches the concept of Algorithm OPTIMIZE - SYSTEM . In reality, the outer loop may still continue if no savings seem available (GAIN ≤ 0). However, positive gains may appear after a few shifts. To precede when GAIN ≤ 0, the algorithm must take additional measure to avoid oscillation, which refers to cases where one or more tasks are always selected to be shifted back and forth repeatedly. We evaluated a few effective alternative heuristics in practice.
6
Experimental Results
The mode transition diagram of PA, XC, MD and BP/CP are shown in Fig. 15. Without loss of generality, we assume BP and CP are the same type of processor for simplicity, although they can be different types of processors, ASICs or DSPs. We present two case studies to compare our techniques with break-even analysis. We assume the PA must operate at the highest power level (372W) for long range transmission. When PA is sending or receiving signals, all components on the same channel must be active simultaneously. However, the SDR system can negotiate with the remote party to set up the exact time slot for each communication instance as a way to save energy. Fig. 16 illustrates the first case study. The PA receives a periodic signal in every 400ms. The signal requires processing activities on BP (but not on CP). Both receiving and sending activities take 20ms, and the base-band processing takes 160ms. The result must be sent back before the next signal arrives. Fig. 16(a) shows the power management scheme with break-even analysis. During the four idle intervals on PA, break-even analysis can reduce the power level to 42W at the mid-power level. With Algorithm OPTIMIZE - IDLE, we are able to reach the standby mode at 0.1W during the first and the third intervals within the 160ms duration, shown in Fig. 16(b). We can also completely shut down PA
XC D BP
0
50
100
150
200
250
300
350
400
450
500
550
OFF STANDBY @ 0.1W @ 0.01W
MID @ 42W
MID @ 42W
MID @ 42W
MID @ 42W
PA
600
650
700
750
OFF STANDBY @ 0.1W @ 0.01W PA
PA
800
Time
(a) energy reduction with beak-even analysis 160.6J (total) 44.6J (active) 112.8J (idle)
XC MD BP
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
XC MD BP
Time
(b) energy reduction with idle transition optimization 114.8J total 44.6J active 70.2J idle
0
reduced transtion
reduced transtion
OFF @ 0.01W
OFF @ 0.01W
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
Time
(c) energy reduction with global optimization 80.8J total 44.6J active 36.2J idle
Figure 16: Experimental results with the first case study. reduced B1 overhead B2R1 on PA
B3
B2
B1
B4
PA
PA
extra overhead on CP XC MD BP CP 0
500
1000
1500
IPI
2000
2500
3000
3500
4000
4500
5000
5500
6000
(a) greedily optimizing for PA 690.6J (total) 580.3J (active) 70.3J (idle)
Time
500
1000
1500
2000
2500
3000
3500
B4
R1
B2
B1
PA
4000
4500
5000
5500
6000
Time
(b) greedily optimizing for CP 630.9J (total) 580.3J (active) 50.6J (idle)
XC MD BP CP 0
B3R1
trading-off overhead on PA with CP
IPI IP0 (fixed)
reduced overhead on CP
IP0 (fixed) XC MD BP CP 0
B3 extra overhead on PA
500
1000
1500
2000
2500
3000
3500
B4
IPI IP0 (fixed)
4000
4500
5000
5500
6000
Time
(c) global optimization 630.7J (total) 580.3J (active) 50.4J (idle)
Figure 17: Experimental results with the second case study.
in the second and the fourth intervals with 200ms idle time. As a result, the idle energy is reduced from 112.8J to 70.2J with a 38% improvement, which translates to 29% overall energy savings on the channel. In Fig. 16(c), the system-level optimization Algorithm OPTIMIZE - SYSTEM allows to further cut the idle energy by 68% to 36.2J compared to (a), and total energy is reduced by 50%. Fig. 17 illustrates the second scenario. The PA receives a beacon signal once in every two seconds from the remote party in fixed time slots. No acknowledge signal is required. However, after the second beacon signal, an image processing request R1 will come in. The image processing activity IP1 must run on CP that is idle most of the time except at time 5s, when a pre-allocated task IP0 must wake up CP and remain active for 0.5s. The power manager can decide the arrival time of R1 ahead of the time. If no special treatment is taken, R1 will arrive after B2, and the idle periods can be optimized by Algorithm OPTIMIZE - IDLE, as shown in Fig. 17(a). However, this schedule requires to turn on CP at least two second before IP1 starts. There is not enough time to turn CP off before IP0 arrives. As a result, CP must stay idle at 30W for a few seconds. An alternative solution is shown in Fig. 17(b). Since CP is slow in mode transition, its transition energy surpasses PA, although PA’s power level is 12× higher than CP. Then, we greedily optimize for the transition energy of CP by scheduling task IP1 together with the pre-allocated task IP0. This measure can reduce the idle energy from 70.3J to 50.6J. This solution can still be improved by Algorithm OPTIMIZE - SYSTEM. In Fig. 17(c), R1 is shifted next to B3. This move reduces transition energy on PA, but it forces CP to remain idle for an extra period of time. Such a trade-off can improve energy saving by extra 0.2J. Although 0.2J is trivial compared to overall energy cost in this case study, the similar scenario in other applications could make a different impact. The interesting issue with this example is that greedily optimizing for the major power or energy consumers often contradicts the goal of system-level optimization.
7
Conclusion
This paper presents a new approach to optimizing the modetransition sequence for systems with rich power management options while operating under a wide dynamic range. We propose two core algorithms, one for deriving the optimal mode transitions for each component locally, and the other for extending it to modeling
and optimizing the timing behaviors of all idle intervals globally. We demonstrated the effectiveness of our algorithms on a commercial software-defined radio system where idle periods dominate the total energy cost. Our combined algorithms is capable of achieving additional energy savings that are not possible with existing techniques.
References [1] Gang Quan and Xiaobo (Sharon) Hu. Energy efficient fixed-priority scheduling for real-time systems on variable voltage processors. In Proc. Design Automation Conference, pages 828–835, June 2001. [2] Jiong Luo and N.K. Jha. Power-conscious joint scheduling of periodic task graphs and aperiodic tasks in distributed real-time embedded systems. In Proc. International Conference on Computer-Aided Design, pages 357–364, November 2000. [3] L. Shang, L.-S. Peh, and N. K. Jha. Dynamic voltage scaling with links for power optimization of interconnection networks. In Proc. International Symposium on High-Performance Computer Architecture, pages 91–102, February 2003. [4] L. Benini, A. Bogliolo, and G. De Micheli. A survey of design techniques for system-level dynamic power management. IEEE Transactions on VLSI Systems, 8(3):229–316, June 2000. [5] Qinru Qiu, Qing Wu, and Massoud Pedram. Dynamic power management in a mibile multimedia system with guaranteed quality-of service. In Proc. Design Automation Conference, pages 834–839, June 2001. [6] Yung-Hsiang Lu and G. De Micheli. Adaptive hard disk power management on personal computers. In P. Lomax, R.J.; Mazumder, editor, Proceedings Ninth Great Lakes Symposium on VLSI, pages 50–53, 1999. [7] Eui-Young Chung, Luca Benini, and Giovanni De Micheli. Dynamic power management using adaptive learning tree. In Proc. International Conference on Computer-Aided Design, pages 274–279, 1999. [8] Dexin Li, Pai H. Chou, and Nader Bagherzadeh. Mode selection and modedependency modeling for power-aware embedded systems. In Proc. Asian and South Pacific Design Automation Conference, pages 697–704, January 2002. [9] Dexin Li, Qiang Xie, and Pai H. Chou. Scalable modeling and optimization of mode transitions based on decoupled power management architecture. In Proc. Design Automation Conference, pages 119–124, June 2003. [10] Jinfeng Liu, Pai H. Chou, Nader Bagherzadeh, and Fadi Kurdahi. Power-aware scheduling under timing constraints and slack analysis for mission-critical embedded systems. Technical Report IMPACCT-01-03-01, University of California, Irvine, March 2001.