Optimizing Mode Transition Sequences in Idle ... - Semantic Scholar

14 downloads 30 Views 124KB Size Report
to the system level by simultaneously optimizing for multiple de- vices. Experimental ... Our motivating example is a software defined radio (SDR). It is a complex ...
Optimizing Mode Transition Sequences in Idle Intervals for Component-Level and System-Level Energy Minimization ∗ Jinfeng Liu, Pai H. Chou Center for Embedded Computer Systems University of California, Irvine, CA 92697-2625, {jinfengl, phchou}@uci.edu

Abstract New embedded systems offer rich power management features in the form of multiple operational and non-operational power modes. While they offer mechanisms for better energy efficiency, they also complicate power management decisions in the presence of realtime constraints. Traditional dynamic power management techniques based on localized break-even-time analysis with simple on/off power controls often yield suboptimal if not incorrect results globally. To address these problems, this paper presents two core algorithms for reducing idle energy consumption at the component level and system level. The first algorithm discovers the optimal sequence for mode transition over multiple power modes under timing constraints. It assists the second algorithm that performs a sophisticated global search strategy to aggressively explore system-wide energy savings by correctly interpreting the constraints across all subsystems. Experimental results show that in an embedded radio system where idle energy cost matches or exceeds the active energy consumption, our technique can further reduce the idle energy by 50–70%, which translates into 30–50% of overall system energy compared to existing techniques.

1

Introduction

As power becomes a primary concern for embedded systems, components are being built to support a richer set of power management options in the form of more power modes. Previously, a device may have only on, off, and possibly standby modes; nowadays, a device can have many more operational modes and non-operational modes. By running the device at a different voltage/frequency combination or communication data rate, the system can save energy by better adapting its performance to the workload. In contrast to earlier dynamic power management (DPM) techniques that handle only simple on/off modes, recent DPM techniques have been extended to handling more power modes. They model multiple power modes as a state machine. To change power mode, the power manager follows the state transition arcs by going through a number of intermediate modes if necessary. Unfortunately, today’s power management techniques have not been able to take full advantage of such a rich set of options for more effective power management, for several reasons. First, they do not handle hard timing constraints. Second, the decision is based on a simple break-even time, possibly steered by a stochastic model constructed from profiling, but these are heuristics that have not been rigorously proven for optimality. Third, these DPM techniques power manage peripherals as independent units based on localized views, without considering dependencies among them. This paper proposes a systematic treatment of the problem of selecting power mode sequences to minimize energy at the system level under timing constraints. Previous approaches either perform ∗ This

research was sponsored in part by DARPA contract F33615-00-11719 and by National Science Foundation under grant CCR-0205712.

USA

direct mode changes without sequencing, or they do mode sequencing by functional necessity without exploiting power optimization opportunities. Moreover, their decisions are based on simple breakeven time without timing constraints. In this paper, our approach is two-fold: we first present a technique that optimizes the transition sequence for an individual device; second, we generalize this to the system level by simultaneously optimizing for multiple devices. Experimental results show that our algorithm can discover new mode transition sequences that are otherwise too intricate for designers to discover manually. Our sequencing approach also has the effect of smoothing out the power consumption curve, making the system friendlier to batteries. Equally importantly, the low runtime complexity of our algorithms makes them amenable to online power management.

2

Related Work

Dynamic voltage scaling (DVS) and dynamic power management (DPM) are two classes of techniques commonly used in reducing energy consumption in embedded systems. DVS works by lowering the voltage and the frequency such that the same amount of work can be performed at not only a lower power level but also less total energy despite the longer execution time, thanks to the quadratic scaling of power. Many variations of DVS have been proposed on processors [1, 2] and power/datarate scaling on communication links [3]. On the other hand, DPM works by shutting down idle subsystems [4, 5]. The shutdown and power-up decision can be based on fixed idle times (the break-even time), or an adaptive timeout scheme based on learning, profiling or prediction [6, 7]. While many low-power techniques are processor-oriented and mainly focus on active tasks (e.g., scheduling), the focus of this work is on the system level, where the processor is one of the components, and communication and I/O could cost much energy, even when idle. DPM techniques target these subsystems, but the time-out or break-even-time approaches are not suitable for systems whose components have multiple operational and nonoperational modes. Moreover, many such power managers cannot handle system-level constraints when they manage the system as a set of individual devices. We observe that the shutdown decisions, including the shutdown time, duration, and sequences on different components, are mutually dependent and can affect each other’s power management opportunities. If not carefully coordinated, localized power management decisions can lead to lower energy efficiency globally. To address this problem, a few recent studies model multiple power modes at the system level. Li et al proposes a model of dependencies between power modes across subsystems [8] and attempts to optimize mode transition sequence by shortest path algorithms to reduce energy [9], though such a solution is proven not to be optimal by our study.

PA

Channel N

Channel 2

Channel 1 Antenna Radio Power amp

PA

Antenna Radio Power amp

PA

power

power mi

mlow

mi

Antenna Radio Power amp

mlow

mi

mi

mi

mi

mj

mk

mlow

mlow

mlow

mp

mq

mi

CP

Transceiver

XC

Transceiver

XC

Transceiver

MD

Modem

MD

Modem

MD

Modem

BP

Base-band Processor

BP

Base-band Processor

BP

Base-band Processor

Central Processor

CP

Central Processor

mj

mk

mlow

mlow

mp

mq

mi

mi

mlow m low

(b) multiple mode transtions with less time & energy

time

(c) multiple mode transtions with more time but less energy

Figure 2: Multiple mode transitions during idle intervals. power

mi

mlow1

mlow1

mi

power mi

mi

mi

PM

mlow1

mlow1

mlow2

mlow2

time

Figure 1: The block diagram of SDR.

3

mi

time

(a) break-even with optimal two mode transtions

mi Power Manager

mi

mlow m low

time

XC

power mi

Motivating Example

Our motivating example is a software defined radio (SDR). It is a complex embedded system consisting of radio antennae, power amplifiers (PA), A/D and D/A converters, signal processors, data communication links, and general purpose processors running real-time operating systems. Its block diagram is shown in Fig. 1. This SDR is organized into multiple channels, each of which contains a frontend radio antenna with a power amplifier (PA) to send and receive RF signals. The analog signal is strengthened by a transceiver (XC) before a modem (MD) converts it to base-band digital signals. Each channel is equipped with a dedicated base-band processor (BP). Each BP is networked with a pool of globally-shared central processors (CP) that are equipped with more signal processing capabilities, including voice/image/video processing, encryption, compression and etc. The power manager (PM) is a separate processor that controls the power levels of all components by issuing commands to set their power modes. The activities on each channel contain a mix of periodic and sporadic messages. The external messages can either arrive at fixed times or their arrival times can be negotiated. Not all messages need to be handled by the central processors. For example, each channel regularly perform handshaking with remote parties, and these can be completely handled by the base-band processor. If a message requires additional processing, e.g., encryption or voice, then one of the central processors will process it and send the result back to the remote party. Many components have multiple power modes. For example, the power amplifier PA can operate at 372W, 52W, and 9W for sending/receiving signals in a range of different signal strengths and signal/noise ratios. Plus, it can be put into a low-power standby mode. The transceiver and modem both support on/off/standby modes. The processors support on/off/sleep modes. Reducing Idle Energy with Multiple Modes Among the components in the SDR, the power amplifier presents itself with the most power management opportunities due to the highest power consumption and its multiple power levels. Because the message arrival times are sparse, it is possible to either turn off the PA, or set it to standby mode to reduce energy consumption. Traditional DPM techniques would perform break-even time analysis. The break-even time is the time duration that yields the same energy consumption whether the device remains in the same operational power mode the entire time or changes to a lower-power mode temporarily and back to operational. If the idle interval is longer than the break-even time, then changing to low-power mode and back can reduce energy. Break-even analysis usually yields the

time

(a1) break-even can only reache mlow1 but not mlow2 power mi

mi

mlow2

mlow2

mi

(a2) multiple mode transtions can reach mlow2 with less energy power

mi

mlow1

mlow1

mlow2 time (b1) break-even can reache mlow2 as optimal sequence

mi

mi

mlow2 time (b2) multiple mode transtions stop at mlow1 (higher power) with less energy

Figure 3: Multiple transitions to reach different low-power states.

greatest energy savings on idle intervals when the component has only one operational mode. However, many new electronic devices are offering many more modes. For example, a hard drive has over a dozen modes; network interface cards support multiple bit rates and therefore power levels. Multiple modes offer new power management opportunities. Fig. 2(a) shows a sequence that first puts a component into lowpower mode mlow after an active task finishes execution in operational mode mi ; the component remains in mode mlow for some time before reverting back to mi just before the next active task starts execution. Break-even analysis would consider this solution optimal, but the actual optimal solution is another mode transition sequence mi → m j → mk → mlow . The solution in Fig. 2(b) consumes less time and energy, while the one in Fig. 2(c) takes more time but consumes less energy. Moreover, more options are available when reverting from mlow back to mi , because it does not have to be the exact reversal of the first sequence mi → m j → mk → mlow . Many opportunities are available with multiple modes, but they may not be obvious. In Fig. 3(a) the component has two lowpower modes mlow1 , mlow2 . By break-even analysis we can reach only mode mlow1 but not mlow2 (Fig. 3(a1)), either because the duration is not long enough for two transitions mi → mlow2 → mi , or the duration is shorter than the break-even time. With multiple modes, an alternative path may reach the lower power mode mlow2 with reduced energy (Fig. 3(a2)). Perhaps even more counterintuitive is the choice between different low-power modes. Suppose break-even analysis allows changing to lowest power mode mlow2 (Fig. 3(b1)). However, a better solution might be a sequence that reaches the higher power level mlow1 instead (Fig. 3(b2)). Our third example shows that the scenarios of multiple-mode transition sequences are more sensitive to time. In Fig. 4(a), we assume the sequence is optimal for the idle interval with delay T . However, if the idle interval has a delay T 1 = T +∆T as in Fig. 4(b), then the optimal sequence for Fig. 4(a) is no longer optimal. The extra time ∆T enables an alternative path with lower energy but a

mj

power

mj

power mi

mi

mmlow low

mlow m low

T1 = T + ∆T

time

T

time

(b) optimal sequence or low-power mode may change with a duration of idle interval T1

(a) optimal sequence with the duration of idle interval T

Figure 4: Optimal mode transition sequences vary with duration. recv data

PA

send result

B1 R1

R1

CP cannot be turned on and off quickly, it must remain with full power between IP1 & IP0

extra overhead on PA

PA

B1

recv data

send result

R1

R1

delay R1 to reduce idle energy on CP

CP

IP1

IP0

(a) extra overhead due to system-level constraints across PA - CP

IP1

CP

IP0

(b) redicomg overhead on CP requires extra overhead on PA

Figure 5: Energy trade-off on idle intervals across subsystems.

longer delay, which is not possible in (a). We should anticipate that the optimal sequence may change again with another delay T 2, possibly with a different target low-power mode. This is in contrast to break-even analysis where the optimal sequence is always the same for all T > break-even time. Another new feature with multiple modes is that the starting mode can be different from the destination mode, making it an O(M 2 ) problem given a total of M modes, and the optimal solution of each sub-problem varies with the delay parameter T . As a result, the analysis on multiple modes and multiple-mode transition sequence subsumes break-even analysis, which is a special case with two different transition sequences. Previous work has modeled modes and transitions as a directed graph, and it was suggested that optimal mode sequencing be solved as a shortest energy or time path problem in this graph. However, the optimality of a multiple-mode transition sequence is related to neither the shortest energy path nor the shortest delay path. Even the partial sequences between mi → m j → mk → mlow and a reverse path mlow → m p → mq → mi are orthogonal to the shortest delay/energy paths. This property will be discussed in the next section. Finally, in the general case, the delay and energy costs on mode transition towards a low-power mode do not scale monotonically with the power level of the starting mode. For example, the assumption “turning off from a high power level is more difficult” may not be true depending on specific design constraints of the device. Correlated Idle Intervals on the Entire System The SDR example highlights another limitations with previous techniques including break-even analysis: they do not handle correlated idle intervals. One common strategy is to combine multiple active intervals into one to reduce the fragmentation of idle intervals and the number of mode changes. Processor caches and hard disk buffers actually fall into this category although they originally aimed for performance. Such techniques are applicable to independent or loosely constrained subsystems. In reality, the lengths of the idle intervals of all components are closely connected by system-level constraints, including dependency, deadline, and resource sharing in an entire system. Therefore, we must go one more step further by taking all idle intervals, as well as all active intervals, into a system-level optimization framework. Fig. 5 illustrates one scenario where channel 1 of the SDR re-

ceives a request R1 for image processing on CP. Assuming fixed active intervals have already been allocated for regular handshaking signals, R1 can be scheduled to arrive at any time within a certain window, as long as the channel is free (i.e., in an idle interval). However, once R1 is received, CP must compute the result as soon as possible and send it back. The power manager’s job is to pick the right arrival time of the request to reduce the energy consumption of the system. During the window of R1’s possible arrival times, a time slot has already been allocated to message B1. By scheduling R1 right next to B1, mode change overhead on both the power amplifier PA and other components on channel 1 can be reduced. However, since the image processing task IP1 must start immediately upon receiving R1, this requires additional activities to change the power modes on CP. CP consumes much less power than PA (30W vs. 372W), but it takes a much longer time to execute the power-down and power-up sequences than PA as an analog device (2-3 seconds vs. 100 ms). In fact, the extra energy cost on CP exceeds the total energy savings on the biggest power consumer PA plus all other components on channel 1. A better approach is to allow IP1 to be close enough to any pre-allocated activities on CP (e.g., another task IP0) such that the overhead to power-up CP is minimized, although it involves additional overhead on PA by not scheduling R1 close to B1. This may appear somewhat contrived, but this example is actually simple compared to real-life embedded applications. Many surprises could occur if system designers overlooked important system-level constraints and dependencies. In this example, targeting the biggest power consumer can actually lead to an inferior solution.

4

Component-Level Energy Optimization

This section first defines our component model in terms of resources, modes, mode transitions, and mode transition sequences. We then pose the energy minimization problem as finding the optimal mode transition sequence between a pair of modes given an overall timing constraint on delay. We formulate our problem with a geometric representation and present the optimization algorithm to solve a more general problem by optimizing for all pairs of modes for all delay constraints in a single run. 4.1

Component Model

Definition 1 (Resource and Mode) An execution resource M consists of a set of M modes mi , 1 ≤ i ≤ M. Each mode mi has an idle power level P0i ≥ 0. A mode mi can be either an operational mode or a nonoperational mode (such as idle, sleep, and off). The operational power may be dependent on the active tasks and need not equal P0i . Definition 2 (Mode Transition) A mode transition mt(i→ j) refers to the procedure to change M’s mode from mi to m j . mt(i→ j) relates to two parameters d(i→ j) ≥ 0 and e(i→ j) ≥ 0 that specify the respective the delay and energy consumption of this process. Mode transitions can be represented by either a mode transition diagram or a mode transition matrix. It should be noted that not all pairs of transition mt(i→ j) are valid due to various design constraints. For example, it might not be possible to directly shut off a device from a high-power mode. If a mode mi cannot be directly changed to mode m j , then d(i→ j) = e(i→ j) = +∞. Definition 3 (Mode Transition Sequence) A mode transition sequence MTS(i→ j) consists of sequential mode transitions mti→k1 , mt(k1 →k2 ) , . . . , mt(kN−1 → j) . The first transition starts from mode mi and the last transition finishes at mode m j . There are

E

N − 1 intermediate modes mk1 , mk2 , . . . , mkN−1 with a total of N mode transitions. Each transition is followed by the next immediately. Let k0 = i, kN = j. The overall delay and energy consumption of mode transition sequence MTS(i→ j) is D(i→ j) = ∑N−1 n=0 d(kn →kn+1 ) N−1 and E(i→ j) = ∑n=0 e(kn →kn+1 ) .

T - E space for (i

j)

the stringht line representing a sequnce S E = Es + Ps x (T - Ds) slope = Ps Es

end point (Ds, Es)

Ds

Definition 4 (Delay-Constrained Mode Transition Sequence) A delay-constrained mode transition sequence MTS(i→ j) (T ) is an MTS(i→ j) that requires the resource to start with mode mi at time 0 and to be in mode m j by time T . T ≥ D(i→ j) and is called the delay constraint. During a delay-constrained mode transition sequence, each transition mt(kn →kn+1 ) may follow the previous transition mt(kn−1 →kn ) after some non-zero delay while remaining in mode mkn . We call all such modes mkn standing modes MST D . Without loss of generality, the ensuing text assumes only one standing mode for simplicity, although there can be multiple standing modes. We denote the single standing mode by overloading the symbol ms as if it were the sth mode, that is MST D = {ms }. The energy consumption of MTS(i→ j) (T ) is E(i→ j) (T ) = E(i→ j) + P0s × (T − D(i→ j) )

(1)

Note that lowercase symbols d(i→ j) and e(i→ j) represent the delay and energy of a single mode transition mt(i→ j) ; while uppercase symbols D(i→ j) and E(i→ j) refer to the delay and energy of a sequence of mode transitions. Finally, D(i→ j) (T ) and E(i→ j) (T ) denote the delay and energy of a mode transition sequence, or simply sequence for short, constrained by T . Sequence MTS(i→ j) is a spe/ cial case of sequence MTS(i→ j) (T ) with T = D(i→ j) and MST D = 0. 4.2

Problem Statement: Energy-Optimal Sequence

Problem 1 (Optimal Mode Transition Sequence) Given: (a) a resource M with M modes m1 , m2 , . . . , mM , (b) a source mode mi , and a target mode m j , (c) a delay constraint T , Find: the optimal delay-constrained sequence MTS(i→ j) (T ) that minimizes energy consumption E(i→ j) (T ). Before seeking an algorithm to solve Problem 1, we first examine a few interesting properties of this problem. Property 1 (Min-Power Standing Mode) If MTS(i→ j) (T ) is the optimal sequence with minimum energy E(i→ j) (T ), then its standing mode ms must consume the least amount of idle power among all modes mk0 = mi , mk1 , mk2 , . . . , mkN−1 , mkN = m j . That is, P0s = min P0kn 0≤n≤N

Property 1 can be directly derived from Equation (1). It must be noted that the shortest delay path from mi to m j (with the minimum D(i→ j) ) does not lead to the minimum energy. The shortest delay path can only decide whether a valid sequence exists: if T < min{D(i→ j) }, then no valid sequence is possible. Property 1 also indicates that the minimum energy path is not relevant. A minimum energy path is associated with the minimal E(i→ j) . However, the second term P0s × (T − D(i→ j) ) could be much more dominant; or it may not be a valid sequence if T < D(i→ j) . As a result, even if the shortest delay path happens to be the same as the minimum energy path, it still does not necessarily yield the minimum energy consumption.

T

Figure 6: The geometrical representation of a sequence S.

Property 2 (Non-Simple Path) A path from mode mi to mode m j is called non-simple if it contains more than one visit to a given mode mn . An optimal sequence MTS(i→ j) (T ) may contain a nonsimple path – i.e., a mode mn can be visited twice if the min-power standing mode ms is visited between the two visits to mn . Property 2 indicates that the resource can be changed from the source mode mi to a low-power mode mlow through a sequence of mode transitions that visit mn . Then, another sequence will progress towards the target mode m j by visiting mn for the second time. Property 2 also implies that Problem 1 cannot be solved solely by shortest-path based algorithms that consider only simple paths. 4.3

Geometric Representation of the Solution Space

In order to assist power management decisions in a wide range of operational conditions, we must be able to provide a series of optimal solutions given different values of T . We observed that given two close values of delay T and T + δt, the two optimal solutions are likely to contain the same (unconstrained) sequence. The only difference is that in the latter case, the resource will stay an extra δt time in the standing mode. We now define a geometric representation of the solutions space of Problem 1 with all possible delay constraints T ∈ [0, +∞). Let us reexamine Equation (1) in a two-dimensional T -E space for each (i → j) pair. We rewrite the equation as E = ES + PS × (T − DS )

(2)

where ES , DS and PS are energy consumption, delay and idle power level of the min-power standing mode of a sequence S, starting from mode mi to mode m j (with (i → j) omitted for brevity). ES , DS and PS are all constants for a given sequence S. Equation (2) is represented by a straight line in T -E space shown in Fig. 6. It starts from point (DS , ES ) toward +∞ in the T axis, with a slope equal to PS . This straight line represents all possible values of energy consumption E with delay constraint T ≥ DS , when a sequence S is selected. When T < DS , no solution exists for sequence S. Without ambiguity, we use symbol S to refer to both the sequence S and the straight line that represents S in the ensuing text. Multiple sequences can start from mode mi and end in m j for each pair (i → j). Fig. 7 shows four cases with two sequences S1 and S2 under the conditions that S1 and S2 rule out each other. When a line S1 (or one of its segments) is closer to the T axis, it always produces smaller E values than S2 that stays atop. As a result, the corresponding sequence S1 becomes the optimal solution, which is highlighted with solid line segments. From Fig. 7, it can be realized that break-even analysis is a special case with only two lines representing two sequences: (mi → mi ) with no change of mode, or changing to a lower power mode mlow with (mi → mlow → mi ). The break-even point is exactly the intersection of the two lines. The scenarios becomes more complex for multiple lines as shown in Fig. 8. The optimal solutions are represented by the line segments that are closest to the T axis ∀T ∈ [0, +∞). By our convention, we refer to the set of these line segments as the

E

UPDATE - OUTLINE (L[1 : |L|], S)

E

S2:

S1:

E = Es2 + Ps2 x (T - Ds2)

E = Es1 + Ps1 x (T - Ds1)

S1

S2

Condition: Es2

Condition:

Ds1 DS then insert DS , T0 , S into the beginning of L; (1∗ ) 3 for Ti , Ti+1 , Si ∈ L do 4 if Si 6= S and ES + PS × (Ti − DS ) ≥ ESi + PSi × (Ti − DSi ) and PS ≥ PSi then break; (2∗ ) 5 if Si 6= S and ES + PS × (Ti − DS ) ≥ ESi + PSi × (Ti − DSi ) and ES + PS × (Ti+1 − DS ) ≥ ESi + PSi × (Ti+1 − DSi ) then continue; (3∗ ) 6 if Si 6= S and ES + PS × (Ti − DS ) ≤ ESi + PSi × (Ti − DSi ) and ES + PS × (Ti+1 − DS ) ≤ ESi + PSi × (Ti+1 − DSi ) then replace Ti , Ti+1 , Si with Ti , Ti+1 , S in L; (4∗ ) 7 else do 8 D0 := the intersection of line S and Si within interval [Ti , Ti+1 ); 9 if ES + PS × (Ti − DS ) < ESi + PSi × (Ti − DSi ) then (5∗ ) 10 modify Ti , Ti+1 , Si with D0 , Ti+1 , Si in L; 11 insert Ti , D0 , S before D0 , Ti+1 , Si in L; 12 else (6∗ ) 13 modify Ti , Ti+1 , Si with Ti , D0 , Si in L; 14 insert D0 , Ti+1 , S after Ti , D0 , Si in L; 15 for Ti , Ti+1 , Si ∈ L[1 : |L| − 1] do (7∗ ) 16 if S = Si = Si+1 = Si+2 = . . . = Si+k then merge Ti , Ti+1 , Si , Ti+1 , Ti+2 , Si+1 , . . . , Ti+k , Ti+k , Si+k into one item Ti , Ti+k , S in L;

Revised Problem Statement and Algorithm

Problem 2 (Optimal Sequences for All Source-Target Modes) Given: (a) A resource M with M modes m1 , m2 , . . . , mM , (b) any pairs of modes mi and m j , ∀1 ≤ i, j ≤ M, (c) any delay constraint T ∈ [0, +∞), Find: all optimal sequences MTS(i→ j) (T ) defined by Problem 1. Now we construct an algorithm to solve the more general Problem 2. We first introduce a utility algorithm that updates the set of the lower profile L when adding a new line S. The set of lines in the lower outline is L : {Ti , Ti+1 , Si }, i = 0, 1, . . . , |L| − 1, T|L| = +∞. [Ti , Ti+1 ) is the interval of each line segment, and Si is the optimal sequence for the interval [Ti , Ti+1 ). Si ’s parameters are ESi , DSi , PSi . L is sorted by the start of intervals Ti . With a new line characterized by S : ES , DS , PS , the lower outline L is updated by Algorithm UPDATE - OUTLINE . The algorithm first checks when L is empty, S becomes the only element in L (0∗ , line 1) and the algorithm terminates. Otherwise, S provides a smaller beginning time DS , S is inserted as the first element of L in order to keep L sorted (1∗ , line 2). Then, the full list of L is examined by the following cases. For each element Ti , Ti+1 , Si in L, if S is completely ruled out by Si for T ≥ Ti , it is not necessary to check the remaining items in L (2∗ , line 4). However, if Si rules out S only in interval [Ti , Ti+1 ), S may still be optimal in another interval. The algorithm should continue in the next interval (3∗ , line 5). On the other hand, if S rules out Si only in interval [Ti , Ti+1 ), then S becomes the optimal solution in this interval (4∗ ,

→ 1 : N], e[1 : N → 1 : N]) / for i := 1 to N do for j := 1 to N do L[i → j] := 0; S := [1, 1]; while S 6= 0/ do i := S[1], j := S[|S|]; if S is valid (Properties 1 and 2) then UPDATE - OUTLINE (L[i → j], S); append 1 to the end of S; else if j 6= N then S[|S|] = j + 1; else while |S| 6= 0 and S[|S|] = N do remove last element of S; if |S| = 1 then S[1] = i + 1, append 1 to the end of S; return L[1 : N → 1 : N];

COMPUTE - OUTLINE (d[1 : N

1 2 3 4 5 6 7 8 9 10 11 12 13

Figure 10: Algorithm COMPUTE - OUTLINE.

line 6). Rather than 2∗ , 3∗ and 4∗ , S will always intersect with Si at D0 ∈ (Ti , Ti+1 ). Then, the algorithm modifies L and insert a new element accordingly in cases 5∗ (lines 9–11) and 6∗ (lines 12– 14). Finally, if S replaces adjacent elements in L, then they will be merged into a single element (7∗ , lines 15–16). Fig. 11 shows the scenarios of cases 0∗ – 7∗ . Based on Algorithm UPDATE - OUTLINE, the global optimization algorithm to solve Problem 2 is Algorithm COMPUTE - OUTLINE. It computes all lower outlines L[i → j] by updating all valid paths between (i → j). It appears to be expensive since it enumerates nonsimple paths between all pairs of (i → j). However, based on Properties 1 and 2, many invalid paths are eliminated immediately. Algorithm UPDATE - OUTLINE can also efficiently rule out any “bad” sequences that result in higher energy consumption. In addition, it computes the entire matrix L for all pairs in a single run, instead of optimizing for Θ(M 2 ) pairs of (i → j). Once it completes, the result can be permanently saved into a lookup table that contains all the optimal sequences under all conditions for a given resource. Note that given the lower outline L[i → j] with any delay constraint T , normally it takes O(log |L|) times to find the optimal sequence by searching a sorted list L in the lookup tables. An alternative is to limit T to integer values and use discrete T values as indices to the lookup table. Therefore, searching for the optimal mode transition sequence table with T takes only Θ(1) time. This is illustrated in Fig. 12. From now we assume that T is an integer variable, and the optimal sequence table for a resource M is S0M (i→ j) [0 : max(T )].

E

E

S is the only element in L

S

E

insert S in the front of L

E

S is ruled out completely, S break

S is ruled out only in [Ti, Ti+1], continue

S

S

T

before after T0

before after T0

S1

(0) E

T

T1 T2

before after

Si Si

Ti Ti

(1) E

S replaces Si in [Ti, Ti+1]

S1 S2

T0 T1

S1

Si

Ti+1 Ti+1

T

before after

E

T

(3)

(2)

S intersectes with Si update & insert S new entry

Ti+1 Ti+1

Si Si

Ti Ti

E

S intersectes with Si update & insert Si new entry

merge S in adjacent entries S

S

Si S

before after

Si Si=S

Ti Ti

D'

T

Ti+1 Ti+1

before after

Si Ti Ti+1 Ti Si=S Ti+1=D' Si+1 Ti+2

D'

T

before after

Ti Ti Si

(5)

(4)

Si

Ti+1 Ti+1=D' Si+1=STi+2

Si=S

T

before after

Ti Ti

(6)

Si+1 Si+2 =S =S Ti+1 Ti+2 Ti+3 Ti+1 Si=S

T

(7)

Figure 11: Geometric illustration to Algorithm UPDATE - OUTLINE.

E

E S6

S5 S3 S1 T0

S1

S4

S2

S3

S4

S5 S6

S7

S8

S9

5.1

S2 T1

T2

T3

T4

(a) before: search for the interval where Ti 0. A larger value of EG(x,t) is preferable for more aggressive energy savings. Definition 6 (Maximum Energy Gain) MEG(x), the maximum energy gain of a task x, is the maximum value of all energy gains. MEG(x) = max EG(x,t) t∈∆(x)

If MEG(x) is achieved with a start time tm , that is MEG(x) = EG(x,tm ), we denote tm as tMEG (x). For maximum energy savings with a single task move, we pick a task u with the maximum MEG(u) among all tasks. Then, we shift task u to tMEG (u) to achieve this level of maximum savings. At this point, all affected tasks v must modify their slacks ∆(v) (Property 3), as well as the energy gain functions EG(v,t) and MEG(v). According to Property 3, the change is limited to a small set of tasks. For other tasks, their slacks and energy gain functions remain the same. Now, we can pick another task u0 with the maximum MEG(u0 ) to repeat this process, until no positive energy gain is possible. This forms the basis of our system-level energy optimization algorithm. 5.3

System-Level Optimization Algorithm

Algorithm OPTIMIZE - SYSTEM first calls OPTIMIZE - IDLE to optimize all existing idle intervals. Then, it computes slack and energy gains for all tasks and construct a heap-based priority queue Q in

1, 25

target source

OPTIMIZE - IDLE ({M}, {S0M (i→ j) [0 : max(T )]});

374W

HIGH

MID

LOW

0.03, 9

42W

MID

9W

LOW

STANDBY

0.15, 0.3 0.04, 0.4

0, 0

0.02, 0.1

0, 0

0.08, 20 0.03, 0.9

0.1W STANDBY 0.1, 20 0.08, 3.2 0.04, 0.3 0.01W

OFF

0.02, 4 0.03, 4.5 0.06, 0.48 0, 0

HIGH

0.2, 50 0.15, 5.2 0.12, 1

OFF

0.01, 0.2

1, 25 OFF

Stand by

0.01W 0, 0

0.01W 0, 0

25W

0, 0

0.1, 0.05

XC 20, 600

5, 50

OFF

ON

0, 0

PA 0.1, 0.1

5, 50

0.1W 0.1, 1.5

Stand 0.1W by

2, 60

20, 600 ON

0.1, 0.1

10W

OFF

0.01W 0.01, 0.01

Sleep

0.1W 2, 60

0, 0

0.1, 3

MD

BP/CP

ON

30W

Figure 15: The mode transition graphs (tables) in SDR.

which each entry contains a record of hx,tMEG (x)i, with MEG(x) as the key (lines 1–6). We find the record hu,tMEG (u)i with the maximum key value MEG(u) as the energy gain (line 1). This leads to the maximum energy savings by shifting u to tMEG (u) (lines 8–11). Then, MEG(u) and tMEG (u) are updated (line 12). Q must also update the record hu,tMEG (u)i and adjust itself as a heap (line 13). For any task v that is affected by shifting u (determined by Property 3), v’s slack ∆(v), energy gain EG(v,t), and MEG(v),tMEG (v) must be updated (lines 15–17). As a result, Q must also update its records and adjust itself accordingly (lines 15–17). This process continues until the maximum energy gain < 0. The runtime complexity of the algorithm is analyzed as follows. The execution time of each shift is decided by the inner loop (lines 14–18), within which the update to ∆(v) can be very efficient (details in [10]). As a result, EG(v,t), MEG(v) and tMEG (v) can be quickly updated by lookup tables. We estimate that the update to MEG(v),tMEG (v) takes O(1) time, and it takes O(lg N) time to adjust the heap Q. Therefore, the running time of each shift takes |v|O(lg N), where |v| is the number of tasks in the set of all affected tasks v. Usually |v|  N. Note that Fig. 14 only sketches the concept of Algorithm OPTIMIZE - SYSTEM . In reality, the outer loop may still continue if no savings seem available (GAIN ≤ 0). However, positive gains may appear after a few shifts. To precede when GAIN ≤ 0, the algorithm must take additional measure to avoid oscillation, which refers to cases where one or more tasks are always selected to be shifted back and forth repeatedly. We evaluated a few effective alternative heuristics in practice.

6

Experimental Results

The mode transition diagram of PA, XC, MD and BP/CP are shown in Fig. 15. Without loss of generality, we assume BP and CP are the same type of processor for simplicity, although they can be different types of processors, ASICs or DSPs. We present two case studies to compare our techniques with break-even analysis. We assume the PA must operate at the highest power level (372W) for long range transmission. When PA is sending or receiving signals, all components on the same channel must be active simultaneously. However, the SDR system can negotiate with the remote party to set up the exact time slot for each communication instance as a way to save energy. Fig. 16 illustrates the first case study. The PA receives a periodic signal in every 400ms. The signal requires processing activities on BP (but not on CP). Both receiving and sending activities take 20ms, and the base-band processing takes 160ms. The result must be sent back before the next signal arrives. Fig. 16(a) shows the power management scheme with break-even analysis. During the four idle intervals on PA, break-even analysis can reduce the power level to 42W at the mid-power level. With Algorithm OPTIMIZE - IDLE, we are able to reach the standby mode at 0.1W during the first and the third intervals within the 160ms duration, shown in Fig. 16(b). We can also completely shut down PA

XC D BP

0

50

100

150

200

250

300

350

400

450

500

550

OFF STANDBY @ 0.1W @ 0.01W

MID @ 42W

MID @ 42W

MID @ 42W

MID @ 42W

PA

600

650

700

750

OFF STANDBY @ 0.1W @ 0.01W PA

PA

800

Time

(a) energy reduction with beak-even analysis 160.6J (total) 44.6J (active) 112.8J (idle)

XC MD BP

0

50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

XC MD BP

Time

(b) energy reduction with idle transition optimization 114.8J total 44.6J active 70.2J idle

0

reduced transtion

reduced transtion

OFF @ 0.01W

OFF @ 0.01W

50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

Time

(c) energy reduction with global optimization 80.8J total 44.6J active 36.2J idle

Figure 16: Experimental results with the first case study. reduced B1 overhead B2R1 on PA

B3

B2

B1

B4

PA

PA

extra overhead on CP XC MD BP CP 0

500

1000

1500

IPI

2000

2500

3000

3500

4000

4500

5000

5500

6000

(a) greedily optimizing for PA 690.6J (total) 580.3J (active) 70.3J (idle)

Time

500

1000

1500

2000

2500

3000

3500

B4

R1

B2

B1

PA

4000

4500

5000

5500

6000

Time

(b) greedily optimizing for CP 630.9J (total) 580.3J (active) 50.6J (idle)

XC MD BP CP 0

B3R1

trading-off overhead on PA with CP

IPI IP0 (fixed)

reduced overhead on CP

IP0 (fixed) XC MD BP CP 0

B3 extra overhead on PA

500

1000

1500

2000

2500

3000

3500

B4

IPI IP0 (fixed)

4000

4500

5000

5500

6000

Time

(c) global optimization 630.7J (total) 580.3J (active) 50.4J (idle)

Figure 17: Experimental results with the second case study.

in the second and the fourth intervals with 200ms idle time. As a result, the idle energy is reduced from 112.8J to 70.2J with a 38% improvement, which translates to 29% overall energy savings on the channel. In Fig. 16(c), the system-level optimization Algorithm OPTIMIZE - SYSTEM allows to further cut the idle energy by 68% to 36.2J compared to (a), and total energy is reduced by 50%. Fig. 17 illustrates the second scenario. The PA receives a beacon signal once in every two seconds from the remote party in fixed time slots. No acknowledge signal is required. However, after the second beacon signal, an image processing request R1 will come in. The image processing activity IP1 must run on CP that is idle most of the time except at time 5s, when a pre-allocated task IP0 must wake up CP and remain active for 0.5s. The power manager can decide the arrival time of R1 ahead of the time. If no special treatment is taken, R1 will arrive after B2, and the idle periods can be optimized by Algorithm OPTIMIZE - IDLE, as shown in Fig. 17(a). However, this schedule requires to turn on CP at least two second before IP1 starts. There is not enough time to turn CP off before IP0 arrives. As a result, CP must stay idle at 30W for a few seconds. An alternative solution is shown in Fig. 17(b). Since CP is slow in mode transition, its transition energy surpasses PA, although PA’s power level is 12× higher than CP. Then, we greedily optimize for the transition energy of CP by scheduling task IP1 together with the pre-allocated task IP0. This measure can reduce the idle energy from 70.3J to 50.6J. This solution can still be improved by Algorithm OPTIMIZE - SYSTEM. In Fig. 17(c), R1 is shifted next to B3. This move reduces transition energy on PA, but it forces CP to remain idle for an extra period of time. Such a trade-off can improve energy saving by extra 0.2J. Although 0.2J is trivial compared to overall energy cost in this case study, the similar scenario in other applications could make a different impact. The interesting issue with this example is that greedily optimizing for the major power or energy consumers often contradicts the goal of system-level optimization.

7

Conclusion

This paper presents a new approach to optimizing the modetransition sequence for systems with rich power management options while operating under a wide dynamic range. We propose two core algorithms, one for deriving the optimal mode transitions for each component locally, and the other for extending it to modeling

and optimizing the timing behaviors of all idle intervals globally. We demonstrated the effectiveness of our algorithms on a commercial software-defined radio system where idle periods dominate the total energy cost. Our combined algorithms is capable of achieving additional energy savings that are not possible with existing techniques.

References [1] Gang Quan and Xiaobo (Sharon) Hu. Energy efficient fixed-priority scheduling for real-time systems on variable voltage processors. In Proc. Design Automation Conference, pages 828–835, June 2001. [2] Jiong Luo and N.K. Jha. Power-conscious joint scheduling of periodic task graphs and aperiodic tasks in distributed real-time embedded systems. In Proc. International Conference on Computer-Aided Design, pages 357–364, November 2000. [3] L. Shang, L.-S. Peh, and N. K. Jha. Dynamic voltage scaling with links for power optimization of interconnection networks. In Proc. International Symposium on High-Performance Computer Architecture, pages 91–102, February 2003. [4] L. Benini, A. Bogliolo, and G. De Micheli. A survey of design techniques for system-level dynamic power management. IEEE Transactions on VLSI Systems, 8(3):229–316, June 2000. [5] Qinru Qiu, Qing Wu, and Massoud Pedram. Dynamic power management in a mibile multimedia system with guaranteed quality-of service. In Proc. Design Automation Conference, pages 834–839, June 2001. [6] Yung-Hsiang Lu and G. De Micheli. Adaptive hard disk power management on personal computers. In P. Lomax, R.J.; Mazumder, editor, Proceedings Ninth Great Lakes Symposium on VLSI, pages 50–53, 1999. [7] Eui-Young Chung, Luca Benini, and Giovanni De Micheli. Dynamic power management using adaptive learning tree. In Proc. International Conference on Computer-Aided Design, pages 274–279, 1999. [8] Dexin Li, Pai H. Chou, and Nader Bagherzadeh. Mode selection and modedependency modeling for power-aware embedded systems. In Proc. Asian and South Pacific Design Automation Conference, pages 697–704, January 2002. [9] Dexin Li, Qiang Xie, and Pai H. Chou. Scalable modeling and optimization of mode transitions based on decoupled power management architecture. In Proc. Design Automation Conference, pages 119–124, June 2003. [10] Jinfeng Liu, Pai H. Chou, Nader Bagherzadeh, and Fadi Kurdahi. Power-aware scheduling under timing constraints and slack analysis for mission-critical embedded systems. Technical Report IMPACCT-01-03-01, University of California, Irvine, March 2001.

Suggest Documents