processor, cache. 1. Introduction. Energy consumption issues play an increasingly important role in the design of new electronic digital systems [1]. This change ...
Power Consumption Awareness in Cache Memory Design with SystemC Smail NIAR*, Samy MEFTALI, Jean-Luc DEKEYSER INRIA-FUTURS, DART Project, University of Lille, France [niar, meftali, dekeyser]@lifl.fr
Abstract* This study presents the development of a cache memory module in a component library, designed for fast and synthetic embedded system simulation. This paper demonstrates also the possibility of integrating an existing power consumption analytical model in a SystemC description at the cycle-accurate register-transfer level (RTL). Keywords: SystemC, power consumption, processor, cache.
1. Introduction Energy consumption issues play an increasingly important role in the design of new electronic digital systems [1]. This change in designer attitude is primarily motivated by a desire to increase battery autonomy in embedded and mobile systems, to take the thermal issues affecting the cooling, packaging and reliability of embedded and high performance systems into account, and finally to manage the environmental impact of mobile computer systems. In addition, over the last few years, important progress has been made in the field of integrated circuit technology. High-performance, low-cost embedded systems have been designed using a system-on-chip (SoC) approach [2]. A side effect of this development is that SoC have become more and more complex, requiring high-level tools (for simulation, performance estimation and synthesis) during the design phase. The SystemC language is a C++ library. Its aim is to facilitate complex system design by supporting hardware and system-level modelling [3]. However, although over the last few years, many research projects have been devoted to improving and facilitating simulation with SystemC, very little attention has been paid to the question of power consumption evaluation in SoC design using this language [4][5]. To remedy this lack, we have designed a SystemC module library that serves as a framework for a new design methodology dedicated to embedded systems. These modules allow easy performance (execution time) and energy *
Also at the University of Valenciennes, France
consumption estimations. With our design methodology, SoC descriptions are synthetic, modular and accurate. These three characteristics are very important in that: • Synthetic descriptions permit better functional and structural understanding of the SoC. They make it possible to have several abstraction levels in the same project, thus offering a compromise between precision and speed during simulation. • Modular descriptions make it possible to reuse existing modules to design new SoC, with the only cost being the separation of the SoC's "implementation" and the "functional" aspects. • Accurate and detailed descriptions both guarantee that the performances measured by simulation are equivalent to those of the final SoC hardware and prevent any ambiguity in the ultimate implementation phase. This paper provides a detailed description of one of the library modules, the cache memory module, at the cycle-accurate RTL. Briefly, this cache module has the following features: • Modular power consumption evaluation, based on an analytic model. • Modular SystemC-based specifications, for module reuse. • Cache memory configuration exploration, for determining the best cache configuration for each application for the SOC.
2. Function and importance of cache memory for an embedded system New applications such as multimedia, image processing, telecommunication and network applications are memory-centric and require processing more and more data in less and less time. For this reason, growing numbers of embedded hardware platforms integrate ever-increasing cache sizes. For instance, the new intel Xscale core embedded processor has two 32-way associative 32KB data caches (for instructions and data). The size of the instruction and data caches in the new MIPS32 architecture can range from 256 bytes to
4Mbytes. This tendency will most likely continue in future embedded processors because of new the applications' needs in terms of memory bandwidth. One of the consequences of this trend is that the number of transistors and the space devoted to caches have also increased. In some embedded hardware platforms (such as the Intel ArmStrong), the area taken by the cache memory can attain 50% of the total core, for a power consumption of up to 50% of the total power consumption of the microprocessor system. In addition to their impact on performance, most cache structures are independent of the processor architecture and the instruction sets. Given this context, we chose to present an example describing an on-chip cache. To evaluate the access time, and the power consumption of the cache module in our library, we used an existing analytical cache model, namely Cacti. This model is an integrated access time, power consumption and chip area model for onchip cache memories and it supports multibanked caches. In the Cacti model, each bank is composed of several units: arrays for storing tags and data, tag comparators and multiplexers for selecting a word (typically 8 bytes) out of a cache line consisting of B bytes. In this paper, only one bank is considered. In order to evaluate the access time, the per-access energy consumption and the chip space using Cacti, the user must determine the following parameters: • S: total size in bytes • B: block size in bytes • Assoc: associativity • T: technology size (0.1 µm by default) • Pread: the number of input • Pwrite: the number of output • Pread_write: the number of input-output ports. In this study, Pread=Pwrite=0 and Pread_write=1. Using these parameters, Cacti determines the best layout (or configuration) that will optimize both the access time and the energy consumption. . More details about the power consumption model used by Cacti are given in [6,7].
3. Cache memory with SystemC The SystemC library is object oriented and allows a clear separation between structures and behaviours of architectural components. It permits also hierarchical designs (hierarchical sc_module). SystemC offers also several design possibilities at several abstraction levels. In fact, it contains both high level data types and low levels ones. These later allow bit-accurate, cycle-accurate specifications which are able to give accurate performance estimations. For all these reasons we decide to specify our library using SystemC. Figure 1 shows the position of our module as a level 1 (L1) cache. The figure also shows the cache's communication interfaces with the
processor as well as with the next memory level, which may be either the second cache level or the main memory. The protocol used for implementing processor-cache and cache-nextLevelMemory is the same. It is an asynchronous protocol and uses 3 control signals (request, write, and ack) and two buses (address and data). Transfers are engaged on behalf of either the processor (when executing a memory instruction i.e. load or store) or the L1 cache (when a cache miss occurs). adress
adress
$req
Processor
write
Data Ack
L1 Cache
memReq memWrite
Data
Next Memory Level
Ack
Figure 1. The cache module as a level 1 (L1) cache and their transfer protocols. When the processor decodes a memory instruction, the request signal is asserted. If the referenced block is present, then the Ack signal from the L1 cache to the processor is affirmed, and the operation, either read (write=0) or write (write=1), is performed in the cache. Otherwise, the block is first transferred from the next memory level, and only then is the Ack signal asserted. More details about the data transfer protocols are presented in figure 2 (in the page), which illustrates 3 data transfers in a system. Due to space limitation in this paper, the memory access latency is fixed to zero (Lat=0). The cache-to-memory bus width is twice as wide as its cache block size (Blocsize=2). In the first transaction, there is a miss in address 0 (3 cycles). In the second transaction, there is a hit (1 cycle). The third transaction in figure 2 shows the beginning of a cache miss at address “1000”, which generates a conflict with block 0. This block must then be saved (3 cycles) before loading the new block (3 cycles). Figure 3 depicts the internal structure of the cache. It consists of 4 unit types: the decoder, Assoc banks, the replacement policy logic, and the cache controller logic. One SystemC method (sc_method) is associated to each one of these units. Connections between these modules are implemented by signals (sc_signal) through ports. The bank unit stores both tags and data and the comparator logic is used to check the match between the requested block and selected block in the bank. After this comparison, a hit signal is sent to the cache controller logic, which sends the Ack signal to the CPU. The replacement policy logic holds block histories and, in the case of conflict, determines which block to eject from the cache. Several policies are available: FIFO, LRU, and random.
.
Reading bloc 0
Hit
1+Lat+BlocSize cycles
1 cycle cycle
Saving bloc 0
1+Lat+BlocSize cycles
Figure 2. Three data transfers in the cache Tag+Data
Address CacheReq
Tag+Data
index Decoder
Ack write Req
tag
Replacement Policy
Tag+Data
Tag+Data
Comparator
Comparator
Hit 0
index
Hit assoc-1
Cache Controler
Bank2Rep init
Consum.
Power Consum. cacti
To/from Main Memory
Figure 3. Internal structure of the cache memory The cache uses the write-allocate policy to deal with write misses. The power consumption evaluation is performed by attaching the Cacti model to the cache controller. In fact, when the cache is declared in the SystemC description, the cache configuration parameters are used to evaluate the access time and the energy consumption for each access to the cache. These two values are stored by the cache module. These values in conjunction with activity statistics of the cache-module (number of accesses with hits, misses, external bus access, etc.) are used to evaluate the total execution time in cycles, as well as the total energy consumed by the cache at the end of the simulation.
4. Using the cache module in a SystemC SoC description Our cache modules can be used in two different ways. First, they can be used separately to analyze the cache performance of a given application. In this case, the cache is activated by the following command: sc-cacheAnal –f
-config
where sc-cacheAnal is the SystemC cache name, and represents the file containing the list of memory access addresses generated by memory tracing during functional simulation. The parameter corresponds to the cache
configuration file. The configuration file contains the following parameters: –nlines –bsize –assoc –readPorts –writePorts –readWritePorts