Conforming Inverted Data Store for Low Power Memory

2 downloads 0 Views 167KB Size Report
Abstract. In this paper, we propose a 'conforming inverted data store' scheme for reducing the power consumption in memory components. It re- duces the power ...
Conforming Inverted Data Store for Low Power Memory You-Sung Chang, Bong-Il Park and Chong-Min Kyung Dept. of EE, KAIST, Taejon, Korea

[email protected], [email protected], [email protected]

1 Introduction The investigation of power consumption in individual sub-blocks of processors has demonstrated that memory components consume the greatest percentage of the total processor power[1, 2]. StrongArm processor that is one of the leading low-power processor dissipates over 40% of the total power in cache memory. Similarly, an x86-compatible CISC processor core called ACCENT, which has no internal cache, consumes 35% of its total power in µ-code ROM which occupies 8% of the whole chip area. The significant power consumption and large chip area occupied by memory components warrants the application of design methods that limit the power usage of memory components. In addition, the development of integrated systems based on Memory Merged Logic (MML) technology indicates that the performance and power consumption of single chip systems will be increasingly determined by memory. Architecture design, circuit and logic design, process development, and CAD algorithms have all been researched as a means of reducing power consumption of microprocessors. For memory devices these works have focused predominantly on circuit and architecture designs. More specifically, the works reported in the literature related to low-power implementation of memory include special cell structures[3, 4], charge recycling[5], limited bitline swing[6], selected block activation[7]. However, all these works have suggested schemes for reducing power only in the physical point of view. Whilst these works have demonstrated a correlation between physical memory cell design and power consumption, the relationship between the memory content and power consumption has not been addressed. In this paper, we establish a relationship between memory contents and power consumption of the memory. We propose a novel concept, 'conforming inverted data store', which can be used to alter the memory contents and minimize the power consumption by memory components.

2 Concept of Conforming Inverted Data Store To achieve high-speed memory access, most memories utilize sense-amplifying schemes with precharging. Under the precharg-

ing scheme, each bitline is charged to a reference voltage every clock cycle preparing the next operation. For a memory read request, the memory controller activates the corresponding wordline and makes storage elements drive bitlines. The storage elements can be either passive elements like capacitors for DRAM or active elements like transistors for ROM and latches for SRAM. Once a small difference is made between the bitline voltage and the reference voltage, sense-amplifier amplifies the small difference and accelerates the bitline evaluation. In the large memory arrays the majority of power is consumed when driving the high capacitance of the bit- and wordlines. Moreover, it is observed that the power consumed by a memory component is dominated by the power required to drive the bitlines[6]. With the precharging mechanism, the 'conforming inverted data store' provides an efficient power reduction by decreasing the number of bitline togglings. The key idea of the scheme is to make the majority of the memory contents conform to the preset precharging bit value. When data are stored in a memory, it selectively inverts the data such that the resultant data maximally coincide with the precharging bit value. For the convenience of easy explanation, we define 'conforming bit value' as the same bit value as the precharging bit value, and 'unconforming bit value' as the different bit value from the precharging bit value. With the definition, the conforming bit value does not invoke discharging of a bitline during memory accesses, and the precharged value on the bitline remains until the next precharging operation. No dynamic power dissipation occurs for the access of conforming bit values. Consequently, the number of the bits holding the unconforming bit value presents the amount of the dynamic power dissipation by bitline togglings. The more we change accessing bits to have the conforming bit value, the more the power consumption reduces. Of course, this scenario works with assumption of a single bitline evaluation and a precharging value of Vdd or Vgnd . (A) 30%

(A) 30%

(B) 60%

(C) 20%

(E) 75%

(F) 45%

(D) 35%

after conforming inversion

(G) 15%

Address Decoder

In this paper, we propose a 'conforming inverted data store' scheme for reducing the power consumption in memory components. It reduces the power consumption by conforming memory contents to a precharging value of the memory. It selectively stores normal or inverted data so to reduce the total number of accessing bits different from the precharging value. In this way, bitline toggling during memory access is minimized and this ultimately contributes to a reduction in power consumption. We develop two practical implementations for the proposed method, that are vertical strip, and horizontal strip inversion schemes. Simulation results indicate that implementation of the strip-based inversion schemes contribute to a power reduction up to 50%.

Address Decoder

Abstract

(E) 25%

(F) 45%

(H) 95%

(D) 35%

(G) 15%

(H) 5%

Sense Amplifier

Example Parameters A B Pi 0.2 0.3 12 12 Ci αi 0.050 0.033

(B) 40%

(C) 20%

Sense Amplifier

C 0.2 12 0.050

D 0.1 4 0.100

E 0.6 18 0.017

F 0.6 6 0.017

G 0.4 24 0.025

H 0.2 12 0.050

Figure 1: Concept of conforming inversion of a memory consisting of 8 Na blocks each shown with its block fidelity, Nii .

Given memory statistics, the conforming process can be viewed as a block definition and selective inversion problem. Fig. 1 shows a conceptual view of the block inversion of a memory. For a quantitative illustration of the block inversion in Fig. 1, the power consumption by memory bitline array can be expressed based on the block index i as follows; P = ∑(Ci  αi  i

Nia 2  pi )  V  f Ni

(1)

where, Ci is the sum of all bitline capacitances in the block i, pi is the block activity factor for the block i, V is the voltage swing, and f is the operating frequency. In addition, we define and introduce two new parameters, bitline activation factor, αi , and block fidelity, Nia Ni for the block i, in Eq. 1. The bitline activation factor presents the ratio of the number of activated bitlines over all bitlines in a Na block for the block access. The block fidelity, Nii presents the ratio of the number of unconforming, i.e., switching bits over all bits accessed in the block. Once a proper rectangular block division of a memory block and memory access traces for a target application are obtained, all the parameters in Eq. 1 can be easily calculated. A block inversion changes the value of block fidelity. If a block Na Na is inverted, the block fidelity is changed from Nii into 1 ; Nii . This means that if a block has block fidelity over 50%, it is required to invert the block to come up with a lower power consumption. In the example of Fig. 1, each block is tagged with its block fidelity. To lower the power consumption the block 'B', 'E' and 'H' should be inverted. In the example, the inversion leads to overall 48.5% of power reduction. Now, the remaining problem in applying the conforming inversion is how to divide a memory into sub-blocks. The blocks could have any shape and any size in ideal cases as Fig. 1. However, the choices are practically restricted by the overhead in separating blocks and reserving its inversion information.

Table 1: Power reduction as obtained by applying the vertical strip inversion to on-chip mask ROM's

3 Conforming Vertical Strip Inversion

Since the data in a given bitline of RAM dynamically changes as time goes on, the static vertical strip inversion introduced in the previous section is not adequate for RAM. In this section, we propose another way of strip-based inversion, called the horizontal strip inversion. The horizontal strip inversion is applicable to RAM as well as ROM, though its overhead is larger than the vertical strip inversion. The horizontal strip inversion takes a horizontal strip as a block for inversion as can be seen from its name, where a horizontal strip can be a word or double word that has a logical meaning. Fig. 4 draws the horizontal strip inversion of a RAM. In the horizontal inversion scheme, we dynamically choose whether to store inverted data or normal data. Because the decision on data inversion varies and must be updated whenever the content of the horizontal strip changes during run time, it is needed to store an inversion indicator in addition to data for a strip. To do this, input stage must include a multiplexing logic to select and store the normal or inverted data according to the decision of inversion as shown in Fig. 4(a). As expected, the output stage figures a similar view as the input stage. The output stage dynamically inverts the bitline data or not according to the inversion indicator of the horizontal strip being addressed as shown in Fig. 4(b). Whenever memory write occurs, the decision on inversion is made based on the number of unconforming bits in the horizontal strip assuming the data requested to be stored has already been written on the strip. The inversion indicator is set with the unconforming value if the number of unconforming bits in the strip becomes larger than dN =2e (where N is the number of bits of the strip), and the inverted image of the data is stored. Using this strategy, storing the inversion indicator never appears as an overhead that overwhelms the profits of inversion in the view point of power. The decision always leads to a decreased number of unconforming bits after inversion including the inversion indicator bit. Fig. 3 shows some examples of the strip inversions indicating the change in the number of unconforming bits.

The simplest choice of block for data inversion is the whole plane of memory array. If target memory consists of multiple banks or arrays, those are also good candidates for blocks. However, the decision for the plane inversion is deeply biased. Typical memory bit patterns tend to have, and thus convey, more '0's than '1's for most applications, while common value of precharging in the mask ROM and on-chip cache SRAM is logic '1'. Most cases, trivial whole plane inversion appears as the best solution for simple planebased inversion. In the 'vertical strip inversion' scheme, we choose more delicate one as a block candidate, a vertical strip of a memory array. Vertical strip inversion is well adaptable to ROM, and is shown in Fig. 2. In Fig. 2, a vertical strip is a column of basic cells. A vertical strip is stored with the inverted bit values if the vertical strip has more unconforming bit readings in the simulation of the target application. Of course, the strip inversions could be decided only by static analysis of bit patterns, if we could assure uniform distribution in accessing ROM addresses. the candidate strip to be inverted Basic Cell

Basic Cell

WL 0

WL 0

WL 1

WL 1

BL 0

WL n-1

WL n

BL 1

BL m

after vertical inversion

BL 0

BL 1

WL n-1

WL n

Figure 2: Vertical strip inversion in a mask ROM

The overhead for applying the vertical strip inversion is absolutely negligible. We just need to select the polarity of each output buffer according to the decision on inversion of the strip. The output buffer must be an inverting buffer for the strips inverted, and a normal buffer for the strips not inverted. In addition to the reduction of the switching probability of the bitline, the number of drain contacts in the bitline and, thus, the

On-chip mask ROM ACCENT Microcode µ-code ROM IU X-Pipe µ-code ROM IU Y-Pipe µ-code ROM MARCIA FPU X-Pipe µ-code ROM FPU Y-Pipe µ-code ROM FPU Constant ROM

Power Reduction 17.0% 8.2% 14.3% 26.8% 11.7% 26.4%

total capacitance in the bitline is reduced as seen in Fig. 2. Therefore, the effect of the conforming inverted store is practically larger than that could be expected from the decrements in the number of unconforming bit readings. Table 1 shows the simulated power reduction caused by the vertical strip inversion in various mask ROM's of ACCENT[2] and MARCIA[8] processors, which are respectively 80386 and Pentium-compatible microprocessor designed at KAIST. We gathered the statistics from the mask ROM's by running test programs on the processors. The power reduction is compared with the one for which the whole plane inversion has already been properly performed. For all the test examples, vertical strip inversion yields enhanced power reduction. The amount of reduction spreads from 8.2% up to 26.8% without trifling timing and area overheads.

4 Conforming Horizontal Strip Inversion

Raw data 0111 0011 0001 0000

# of UCB (1) (2) (3) (4)

! ! ! !

II 1 1 0 0

Stored data 0111 0011 1110 1111

# of UCB (1) (2) (2) (1)

Figure 3: Representation of data including inversion indicator(II) in the horizontal strip inversion reducing the number of unconforming bits(UCB) assuming the unconforming bit value is '0'

1

1 0 1 1 0 1 1 1

1

1 0 1 1 0 1 1 1

0

1 1 0 1 0 0 1 1

0

1 1 0 1 0 0 1 1

1

0 1 1 1 0 1 1 0

1

0 1 1 1 0 1 1 0

1

1 1 1 0 0 0 1 1 accessed

1

1 1 1 0 0 0 1 1

1

1 1 0 1 1 1 0 0

1

1 1 0 1 1 1 0 0

0

1 1 1 0 0 0 1 1

0

1 1 1 0 0 0 1 1

the strip being

0 1

0

0

Decision Logic

1

Input Data : 00101100

(b) Memory read operation

Figure 4: Horizontal strip inversion scheme in a RAM selects the inverting or non-inverting buffer according to the inversion flag of each wordline, for (a) memory write, and (b) for memory read; It assumes that the strip size is same as the accessing word size

50

50 data RAM code RAM

45

5 Conclusion

35

35 Power Reduction(%)

40

30 25 20

30 25 20

15

15

10

10 5

0

0 32-Bit

16-Bit 8-Bit Size of the Horizontal Strip

4-Bit

32-Bit

(a) 099.go

16-Bit 8-Bit Size of the Horizontal Strip

4-Bit

(b) 124.m88ksim

50 50

data RAM code RAM

45 40

data RAM code RAM

40 Power Reduction(%)

Power Reduction(%)

35 30 25 20

30

20

15 10

10

5 0

0 32-Bit

16-Bit 8-Bit Size of the Horizontal Strip

4-Bit

32-Bit

(c) 126.gcc

40

35

35

30 25 20

data RAM code RAM

45

40

Power Reduction(%)

Power Reduction(%)

4-Bit

50 data RAM code RAM

45

30 25 20

15

15

10

10

5

5

0

0 32-Bit

16-Bit 8-Bit Size of the Horizontal Strip

4-Bit

32-Bit

(e) 130.li

16-Bit 8-Bit Size of the Horizontal Strip

4-Bit

(f) 132.ijpeg

50

50 data RAM code RAM

45

40

35

35

30 25 20

30 25 20

15

15

10

10

5

data RAM code RAM

45

40

Power Reduction(%)

Power Reduction(%)

16-Bit 8-Bit Size of the Horizontal Strip

(d) 129.compress

50

References [1] J. Montanaro et al., ”A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor”. IEEE Journal of Solid-State Circuits, pages 1703–1714, November 1996. [2] Y.-S. Chang et al., ”ACCENT: A CISC-Type Configurable Processor Core”. Proceedings of the 3rd International Conference on ASIC, Nov 1998. [3] M. Ukita et al., ”A Single Bitline Cross-Point Cell Activation Architecture for Ultra Low Power SRAMS”. International Solid State Circuits Conference, pages 252–253, February 1994. [4] H. Mizuno and T. Nagano, ”Driving Source-Line(DSL) Cell Architecture for Sub-1-V High-Speed Low-Power Applications”. 1995 symposium on VLSI Circuits, pages 25–26, June 1995. [5] T. Kawahara et al., ”A Charge Recycle Refresh for Gb-Scale DRAM's in File Applications”. IEEE Journal of Solid-State Circuits, pages 715–722, June 1994. [6] B. S. Amrutur and M. Horowitz, ”Techniques To Reduce Power In Fast Wide Memories”. Proceedings of the 1994 Symposium on Low Power Electronics, pages 92–93, 1994. [7] A. P. Chandrakasan, A. Burstein, and R. W. Brodersen. ”A Low-Power Chipset for a Portable Multimedia I/O Terminal”. IEEE Journal of Solid-State Circuits, pages 1415–1428, December 1994. [8] Y.-S. Chang et al., ”Verification of a Microprocessor Using Real World Applications”. Proceedings of the 36th Design Automation Conference, June 1999.

data RAM code RAM

45

40

5

In this paper, we proposed an effective way to reduce the power consumption in the memory components. The proposed scheme, conforming inverted data store, reduces the power consumption of memory by selectively storing normal or inverted data, so that the majority of bits being accessed have the precharging value consequently leading to less bit-line toggling and less power consumption. Considering practical implementation in embedded systems, we developed the proposed scheme into realistic solutions, vertical strip and horizontal strip inversions. The former works with statistics of application traces, but the latter works without any statistics. Our simulation results show that the strip-based inversions assure the power reduction up to 50% in spite of assuming memory contents are already well-biased for power. The enhancement is really dramatic considering the rather minor implementation overheads. In addition to the the dramatic power reduction, inherent generality and compatibility with the previous circuit-oriented approaches make the proposed schemes more promising.

0

Output Data : 00101100

(a) Memory write operation

Power Reduction(%)

Fig. 5 shows the simulation results on the horizontal strip inversion. To get the RAM access traces in the simulation, we used the 'Shade' tool of Sun Microsystems with SPECInt95 benchmark suite. In the simulation, we tried four fixed-size horizontal strips, i.e., 32-bit, 16-bit, 8-bit and 4-bit strips. Considering the real implementation, we designed simulation models such that the memory access which stores the data having the smaller width than the strip size, does not change the inversion indicator. The inversion indicators are included in the simulation model and power estimation, but the input/output multiplexing logic is excluded in the power estimation for we limited the object of power measuring as a memory 'array' that dominates the whole power consumption. In Fig. 5, power reduction presents the improvement over that of proper whole plane inversion. To get more information, we trace the RAM accesses separately for code RAM and for data RAM, and estimate the power reduction for each. The simulation results show a steady increase in power reduction as the strip size becomes smaller. However, the overhead in area in percentage increases as the strip size decreases, because the percentage of additional storage space is required to store the inversion indicators. In Fig. 5, the power reduction of code RAM is almost flat over various benchmarks of SPECInt95. In contrast to that data RAM reflects the characteristics of benchmarks. For 'compress' and 'ijpeg', which have massive arithmetics, the power reduction of data RAM is very large even up to over 50%, while 'li' and 'm88ksim' having intensive logical operations show rather less power reduction than that of code RAM. Finally, we notice two irregular points in Fig. 5. 'compress' and 'ijpeg' show exceptional increases in power reduction when horizontal strip size decreases from 8-bit to 4-bit. This means, for the two benchmark programs, the inclusion of inversion indicators begins to be overloaded at the strip size of 8-bit.

5

0

0 32-Bit

16-Bit 8-Bit Size of the Horizontal Strip

(g) 134.perl

4-Bit

32-Bit

16-Bit 8-Bit Size of the Horizontal Strip

4-Bit

(h) 147.vortex

Figure 5: Power reduction achieved by applying the horizontal strip inversion to an embedded RAM for various benchmarks

Suggest Documents