Battery - friendly design of signal processing algorithms - Signal

4 downloads 0 Views 363KB Size Report
In this paper we present an approach to designing battery- friendly .... shown in dashed lines. It can be .... this simple example we see that splitting the subtrees.
BATTERY - FRIENDLY DESIGN OF SIGNAL PROCESSING ALGORITHMS Praveen Raghavan & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University, Tempe, AZ - ,85287 (praveenr. chaitali}@asu.edu we demonstrate how frequency scaling can be used to prolong battery life. We show the effect of frequency scaling on different slack distribution schemes for a complex algorithm such as MPEG2. A slack distribution scheme that generates a load profile with decreasing order of load currents does substantially better than other orderings.

ABSTRACT

Portable systems today are designed with lowering the energy consumption as the primary design metric. This is unfortunate since maximizing battery lifetime is a more appropriate inetric, and lowering energy does not necessarily mean improving battery lifetime. In this paper we first s h o w h o w to design battery-friendly implementations of common signal processing kernels such as FIR filters and FFT. The basic idea is to generate a load profile that results in better battery behavior. Next, we demonstrate how frequency scaling can be used effectively to improve the battery behavior of an application such as MPEG2.

_--

This paper is organized as follows: Section 2 motivates the approach and the setup used for the experiments. Section 3 discusses the battery model and its properties. Battery friendly FIR filters are described in section 4. Section 5 gives details o n battery efficient FFT decomposition. Section 6 explains how slack utilization can be done for MPEG2. Finally section 7 concludes the paper.

1. INTRODUCTION

Use of portable devices in many signal processing and communication applications is on the rise. These designed with reduction in. energy consumption as the prime design metric. The assumption is that reducing the energy consumption of the system will increase the lifetime of the battery. This is not necessarily true, since two implementations that have comparable energy consumption can have very different battery performance. In this paper we present an approach to designing batteryfriendly versions of signal processing algorithms so as to maximize the battery lifetime. In recent years, there has been some work in analyzing the battery characteristics [ l , 21, and using these characteristics to design battery-conscious task scheduling algorithms (In the context of operating system) [3, 4, 51. There has been no work done in the area of design of batteiy-efficient signal processing algorithms.

I n this paper we first look at how common signal processing kernels like FIR, FFT can be modified so as to make [lie kernels more battery-friendly. While many of the techniques are borrowed from energy-efficient algorithm design, some of the techniques capitalize on the nonlinearities of the battery and are unique to battery-operated systems. For instance, while changing the order of the computation mildly changes the energy consumption, i t has a significant effect on the battery performance. Next,

0-7803-7795-8/03/$17.00@ 2003 IEEE

2. BACKGROUND

2.1 Motivational Example In the design of portable systems, the assumption is that reducing the energy consumption will increase the lifetime of the portable device. This is not necessarily true, since two implementations that have comparable energy consumption can have very different battery performance. To illustrate this phenomenon consider the example shown in table 1.

E

Current

V

250

304

2400

100

24 I

2400 I

2250

4.04

1

18.70 24.00

Table I : Same energy consumption is not equivalent to same battery behavior

The first four cases (I to IV) have different load profiles (current vs. time) hut have the same energy consumption. Simulations on DualFoil, a low-end battery simulator, show that case IV that has the lowest average load current has the lowest reduction in battery voltage, and thus the best battery performance.

more than a factor of 2. An effective way of decreasing the load current is by lowering the frequency of operation. Lowering the frequency by a factor of s results in lowering the voltage. by a factor of. sI =f(s)=3.33s'-0.24s2+0.49s+0.42, (obtained empirically) and lowering the current by a factor of s x s,. The significant drop in the load current results in enhanced battery performance - a feature that should be exploited in systems that have slack. Current

Now consider cases IV and V, which have different energy consumption. Note that case V has a lower energy consumption as compared to case IV. However, the drop i n battery voltage i n case V is 24.00% as compared to 18.70% for case IV. The lower drop in battery voltage is because case IV has a lower load current than case V. This demonstrates that lower energy consumption does not necessarily mean that the implementation is battery efficient.

f \

I \ c

2.2 Simulation Environment The battcry simulator used in this paper to estimate the reduction i n battery voltage is DUALFOIL a low-level Liion simulator [ 6 ] . The estimation error is within 5%. The energy and current estimation for our programs was done using Joule Track [ 7 ] , a web-based tool for sothvare energy profiling of a StrongARM based processor. Energy prediction by JouleTrack is within 3% of error. The frequency of the StrongARM processor can be varied in I I discrete steps from 206MHz to 59MHz.

Time L Fig 1. Battely voltage vs. Discharge time

2L Time

In the design of battery-friendly algorithms, it is important to consider the load profile (current vs. time) generated and choose the one that results in better battery performance. In our earlier work in [3], we have demonstrated that the performance of a battery is related to the order of the loads that it is subjected to. We illustrate this with the help of the example in Fig. 2 where three tasks have been ordered in three different ways. Case I that corresponds to increasing order of loads has the highest drop in battery voltage while case 111 that corresponds to tasks ordered in decreasing order of loads has the lowest drop in battery voltage and thus the best performance. Thus, tasks should be ordered in decreasing order of load currents.

3. BATTERY PROPERTIES

The two most important properties of a battery are the rate capacity effect and recovery. The rate capacity states that the charge capacity of battery is dependent on the rate of discharge; the capacity increases as the load current decreases. The recovery effect is due to self-recharging which happens when the battery is allowed to rest for sometime. Fig. I illustrates the rate capacity effect by plotting the battery voltage as a function of discharge time for two different loads. Note that decreasing the load by a factor of 2 results in the battery lifetime increasing by

4. BATTERY-FRIENDLY F I R FIR filtering is one of the most commonly used signal processing operations. An N-tap FIR filter is defined by: N-I

y[nI- C x [ f l - k l h [ k l k-0

t

4

,I

Case - 11

Case - i

Case- 111 X Drop in Battery

Voltage = 18.16%

Voltage = 14.35%

Load (mA

Load (mA)

Time in min

-.

60

40

20

Timeinmin

,

Figure 2: Battery behavior,for different load profiles

305

20

Timeinmin

35 -b

45

FIR filtering is essentially a set of MAC (Multiply and Accumulate) operations. Efforts to reduce energy consumption i n FIR filters include choice of coefficients so that the Hamming distance between successive cocfficients is reduced, use of differential coefficients [SI, use of canonical sign digit representation [9], etc. Also Energy-Quality tradeoffs have been demonstrated for such filters in [IO]. While these.techniques can be used for lowering tlie energy consumption, additional gains in battery lifetime call be obtained by shaping its current profile. Consider an FIR filter with the variation in coefficients as shown i n Fig. 3. The corresponding current profile is shown i n dashed lines. It can be seen that for larger coefficient values, tlie current drawn is larger. Hence if we can reorder the coefficients as shown in Fig. 4, we can have a non-increasing current profile, which would give rise to a better battery performance. The accuracy of the result is also plotted along side the current profile with a dotted line. A battery-friendly FIR filter computation would thus coiisist of reordering the coefficients in decreasing order and computing the larger-valued coefficients first.

than the remaining coefficients. The inputs to the FIR filter are correlated. Table 2 shows the energy consumption in the original filter and the reordered filter, and the percentage drop in battery voltages for these examples. These values are obtained after repeating the filter computation 6 x IO6 times, which corresponds to few tens of minutes. While the energy consumption is almost identical, the drop in battery voltage for the reordered filter is about 1.5 - 2% less.

I

Old Arraneement

1

1

New Arranccment

I

%Drop In

Energy(1)

% Drop In Battery

Energy(J)

128.34

13.82%

129.84

Filler

Battery

,

13.10%

2

I

64.26

1

9.73%

I

65.48

1

9.46%

3

I

5922

1

830%

I

6126

1

860%

Table 2 : FIR Filter Results at f = 206MHz (when program was executed 6 x I O 6 times) Now consider the case where the filters are approximated by using fewer taps at the expense of quality degmdation as in [IO]. Fig. 5 shows the battety voltage drops for the original, re-ordered filter with all taps and re-ordered filter with twenty taps (that causes a small quality degradation of 3.21%) for Filter I in Table 2. Thus, reducing the number of taps results in a smaller drop in battery voltage (7.24% instead of 13.1%) with very little quality degradation. Note that as the frequency of operation reduces, the difference in the drop in battery voltage between the original and reordered filter is smaller. This is because the variation in the current in the original and the reordered arrangements reduces with frequency.

I ' - Coefftcienls

.............................

rime

Current

Accuracy

___,

Figure 3: FIR Filter (Initial Arrangement) ...............................................

...

Time

Coefficients - - - . Current ...........

.I

Accuracy

-.

Figure 4: FIR Filter (New Arrangement)

s"qYmcy

I.Ongl".l

m R ~ ~ O d ~ r d OR..",dErrd ~A1,~

(20 CarR)I

Figure 5 : Drop in battery voltage for original and reordered filter at different frequencies for Filter I

We illustrate this with the following three filter examples: The first example is a 128 tap complex filters while second and the third is a 64 tap complex filter. All the filters have coefficient values ranging from 0 to lx106; about twenty coefficients have values that are much higher

306

____ Time (in inin)

4.905

5.934

6.113

6.035

5.435

1.699

Energy (in

10.242

12.36

12.72

12.60

11.36

16.08

3.219

3.597

3.622

3.644

3.41

4.18

Joules)

'%Drop

iii

battery voltage

Tabie 3: Results for 26 point FFT (when executed 6 x I O ' times) 5. BATTERY-FRIENDLY FFT I n many areas of mobile computing, there is a great demand for signal-processing algorithms based on the Discrete Fourier Transform (DFT). An efficient way to implement DFT is the Cooley-Tukey algorithm referred to as the Fast Fourier Transfonn (FFT) [II].This reduces the arithmetic cost of the computation of DFT of size N fKom O(N') to O(N log(N)). There are many ways in which a FFT can he decomposed. All decompositions have the same arithmetic cost hut depending on the data access patterti, the time required for computation differs dramatically [12]. Interestingly, t h e different decompositions have very different energy consumption and vety different battery performance.

5.2. Energy and Battery Voltage Results

The results for Z6 point FFT for various decompositions are shown in Table 3. All the results are for the case when the code is executed 6 x IO' times. Note that there is a significant difference in both the execution time and energy consumption for even such a small example. The drop in battery voltage is not an eye-catcher since the execution time is of the order of few minutes and the drop is very small in the first few minutes of operation. From this simple example we see that splitting the subtrees equally is not the best strategy. In fact, for the StrongARM platform, it is preferable to do a right hand tree decomposition as opposed to left hand tree decomposition. However, the optimal decomposition would differ from machine to machine.

5.1. Mathematical Framework

Consider a 2" point FFT decomposed into two FFTs of sizes 2' and 2'" where II = I + i n L = 2', M = 2'", N = 2" and N = LM. This FFT can be expressed as a double sum over the elements of the rectangular array multiplied by the corresponding phase factors. Hence we have:

The calculation of equation 1 can be subdivided into three steps: I . Computing M-Point DFTs for each of the L rows. 2.

3.

Table 4 'shows the time, energy consumption and percentage drop in battery voltage for the top five decompositions for various point FFTs mentioned in [12]. It can be clearly seen that for the top five cases, there is a wide variation in the time required to compute the FFT, energy consumed for the computation as well as the battery voltage. For the 1024-point FFT, the time taken for a (2'~(2~x2'))decomposition is 13.19% higher than that of a (Z4x(Z1xZ5)decomposition resulting in a 20% difference in drop of battery voltage when executed 6 x IO' times. Note that the decomposition that results in minimum time is tlie same as the decomposition that results in minimum energy, which is the same as the decomposition that results in higher battely performance. This is because for the StrongARM processor, the variation in current as it executes the different algorithms is negligible. Since the load profiles of different algorithm depend only on the execution time, as the execution time increases, energy increases and the battery voltage decreases.

Scaling with the twiddle factor Finally, computing L-point DFTs for each of the

M Columns

307

Figure 6 ; Battery performance at different frequencies for FFTs of size n = 2? to n = 2''

6. BATTERY - FRIENDLY MPEG Video compression is most vital for storage and transmission of video data. MPEGZ offers one of the most efficient ways to compress video data. The block diagram for MPEG2 is show in Fig. 7. We have divided the MPEG2 kernel into 3 main parts as shown by the dotted blocks, viz. DCT, Motion Estimation (ME) and VLC Encoding. Table 5 describes the time, current and energy for these blocks while processing the I frame at 206 MHz. The overhead due to the other parts is ignored compared to these three blocks. Table 4: Results for 5 best decompositions in [I21 for n to n = 2"' point FFT repeated 6 x IO' times

=

1

2'

Next we analyze the drop in battely voltage for different frequency settings of the StrongARM processor. As the frequency reduces, the time increases, hut the current reduces significantly. The large drop in the load current results i n significantly better battery performance. Fig. 6 shows the drop in battery voltage when the best decomposition (from Table 4) is executed at 3 different frequencies. Clearly, if there is sufficient timing slack, the proccssor should be operated at a lower frequency.

I

Motion Estimation

I

~~

Kc

Time (in Clsec) Current (in A)

46.1 0.2669

584.9 0.2565

242.5 0.2680

Energy (in VJ)

18.68

225.50

99.00

...........................................................................................

~

.................................

Memory

f

VLC-Block

...............................

DCT - Block

Malian Vectors

............

................................................... Motion Compensation

Memory

Estimation

................................................... Figure 7: MPEG Block Diagram

308

r

output

I

Configuration 2 does better than configuration 1 since the 3’d task (VLC),is allocated a larger chunk of the slack resulting in the current profile being steeper.

The DCT has been implemented using the Chen-Wang algorithm and motion estiinatioi? has been implemented using the full search method. The block size is 8 x 8 and the frame size is 352 x 288 (CIF). Clearly motion estimation is the most energy-consuming block and appi-oximations in it will lead to a longer battery life.

I. CONCLUSION In this paper we have looked at ways in which signal processing algorithms can be made more battery-efficient. We first looked at how reordering and frequency scaling affects battery lifetime in case of FIR filters. Next we looked at how various decompositions for FFT affect the battery life. Finally, examined how slack can be utilized in a battery-friendly way, in case of larger signal processing algorithms, like MPEGZ.

The time required to run all three blocks at full speed (206 MHz) is 864.1psec. Consider the case when the I frames arc run at full speed and the P and B frames are run at lower frcquciicy based on available slack. Assume that it is possible to operate the different blocks (DCT, Motion Estimatioii, VLC) at different frequencies. Next we show various ways to distribute the available slack and their effect 011 batteiy pcrforinance. Table 6 shows the drop in battery voltage for different distributioiis of available slack. The slack, time and energy are listed for one iteration but the drop iii battery voltage is quoted for 6 x 10’ iterations, which corresponds to few tens of minutes.

8. REFERENCES [ I ] T. Martin, “Balancing Batteries, Power and Performance:

System Issues in CPU Speed Setting for Mobile Computing,” Ph.D. Dissertation, Aug. 1999.

For instance when tlie slack available is 309psec, the first arrangement has a n energy consumption of 235.14pJ compared to 252.38pJ for the second arrangement. Hence we would expect the battery voltage for the second arrangement to drop more. But this is not the case; the drop in voltage iii the first arrangement is larger than the second arrangement. The main reason for this is that the cuireiits are arranged in increasing order in the first arrangement and i n decreasing order in the second arrangement. Mot.

Slack

mc

Est. (in

MHz) 192

78.9

(in MHz) I

I

206

, 1

Time (in &sec)

121 M. Pedram and Q. Wu, ‘’ Design Considerations for Battery Powered Electronics,” Proc. of DAC, 1999. [3] P . Chowdhury, C. Chakrabarti, “Battery-aware Task Scheduling for System-on-Chip Using VoltageIClock Scaling,” Signal Processing Syslems. IEEE Workshop on, pp. 201-206, 2002. [4] D.Rakhmatov, S. Vrudhula and C. Chakrabarti, “ BatteryConscious Task Sequencing for Portable Devices including VoltagelClock Scaling,” Proc. of DAC. pp - 189-194, 2002. [5] I. Luo and N. Jha, “Battery Aware Static Scheduling for Distributed Real Time Embedded Systems”. Proc. of DAC, 2001

%Drop in Battery

Energy (in pl)

Voltage

,

[6] T. Fuller, M. Doyle, and 1. Newman, “Simulation and Optimization of the Dual Li-ion Insertion Cell,” Jour. Eleclrochemical Society, 14(1), 1994.

I

943.3 1304.16

1

17.85

[7] A. Sinha, A. Chandrakasan, “JouleTrack - A Web Based Tool for Software Energy Profiling,” Proc. of DAC. pp 220225, 2001. ~

133

89

1173

252.38

12.33

1520.8

173.12

9.98

111 fact tlie slack distribution algorithm should be such that the later tasks get a larger share of slack i.e., operate at a lower frequency. Table 7 shows two configurations with decreasing load profiles, equal energy consumption, and comparable execution times that have a 1% difference in battery voltage drop even when executed for tens of

we4 567.9

[9] R. Hartley, “Optimization of CSD Multiplier for Filter Design,” IEEE Inrl. Symposium on Circuits and Syslems. Vol. 4, pp 1992-1995, 1991. ~

1 1 F~ ;:1 i 1 1

iniiiiite

Suggest Documents