50 ELECTRONIC SYSTEMS AND CONTROL DIVISION RESEARCH 2003
Compressed Memory Core Zulkifli M Yusof, David J Mulvaney, Jose L Nunez-Yanez and Vassillios A Chouliaras
Index Terms— Compressed Memory, Data Compression
I.
INTRODUCTION
W
E come into contact with embedded systems in many aspects of our modern life be it in the form of mobile phones, personal data assistants (PDA) or the electronics in the cars we drive. These systems are becoming complex and resource demanding. Embedded microprocessor code/data memories are becoming important to investigate since the program memory requirements on these systems is doubling each year [1]. There are a number of constraints in the development of new semiconductor memory technology relating to the fabrication process, the development cost and larger die size yields. Compression has been proposed to increase the effective memory capacity instead of actually increasing available memory physically [2]. Fabrication trends have seen dynamic memory being embedded into VLSI devices, this fact to make the idea of implementing the compressed memory core a viable alternative to integrating more memory [3]. In our research we propose a memory core that includes a compression/decompression engine and a compressed memory management unit transparent to the rest of the system architecture. The compressed memory core will be develop as soft IP (Intellectual Property), such This work was supported by Malaysian Government Z.M.Yusof. is PHD Student of the Electronic System Design Group (email: Z.M.Yusof@ lboro.ac.uk). J.L.Nunez-Yanez is Research Fellow in Electronic System Design Group and joint supervisor for the research. (email:
[email protected]) V.A.Chouliaras is Lecturer and joint supervisor for the research. (email: V.A.Chouliaras @lboro.ac.uk) D.Mulvaney is Senior Lecturer and joint supervisor for the research. (email:
[email protected])
0.85 Mean Compression Ratio
Abstract— Several state of the art technologies are leveraged to establish an architecture for a low-cost and high performance memory system that more than doubles the effective size of the installed main memory without significant added cost. Compressed Memory core is an effective approach to increase the memory density in an embedded system or any system which has limited size of memory. This paper presents a an architecture and memory management of a compressed memory core. An additional memory level has been introduced to solve the latency occuring due to decompression / compression activities. A Sophisticated memory management technique dynamically allocates main memory storage in small cells to accommodated the variable size compressed data without significant wasted space due to fragmentation.
that it can be reused in any type of system on chip design, to fully utilize the embedded memory creating a greater awareness of what others are doing.
0.8 0.75 0.7 0.65 0.6 0.55 256
512 1024 2048 Block Length (bits)
4096
application
canterbury
executable
general
memory
user
8192
Fig. 1. Compression Ratio of Xmatch Pro (16 word dictionary size)
The Xmatch Pro Compression engine will be used in this work due to its capability compressing small blocks of data [4] . It requires only 16 words dictionary size to obtain an adequate compression ratio in memory context as shown in fig. 1. It also known as Giga bit/s through put compression engine. The memory data investigation has shown the significant compression can be even for such a small compression block size. A small achieved compression bock size reduces random access latency. II. COMPRESSED MEMORY ARCHITECTURE The overall organization of the compressed memory core and data flow model is shown in fig. 2. When a cache miss occurs a block of data is searched from the RAM(1), decompression buffer and from RAM(2) one after another. When a miss occurs at RAM(1) and the data item is found in the decompression buffer, its corresponding RAM(1) block is copied into the RAM(1) as the same time it is sent to the Cache. If the requested data item is not found in the decompression buffer but in the RAM(2), the compressed block is first decompressed by the decompressor. The decompressed block is then stored in the decompression buffer and the requested block of RAM(1) block size is sent to the RAM(1) and Cache simultaneously. When a modified RAM(1) block is replaced, its uncompressed block is searched and modified in the decompression buffer first. When a miss occurs in the decompression buffer, the compressed RAM(2) block is decompressed and modified in the output buffer of the
Department of Electronic and Electrical Engineering, Loughborough University, LE11 3TU, UK
51
ELECTRONIC SYSTEMS AND CONTROL DIVISION RESEARCH 2003 decompressor. Then, the modified RAM(2) block in the decompression buffer is transferred to the compressor and re-compressed. A detailed description of the management of the compressed RAM(2) is continued in the next section.
Fig.2. Architecture of Compressed Memory Core
The main purpose of the decompression buffer is to reduce decompression overhead, guaranteeing the inclusion property with the RAM(2). Therefore, one fundamental rule for managing the decompression buffer is that it can store only those blocks that are compressed in the RAM(2). When a compressed RAM(2) block is replaced with other blocks, the corresponding block in the decompression buffer is invalidated. To achieve this, there is one invalidation bit for earch entry within the decompression buffer. Consistency between the decompression buffer and the RAM(2) is maintained automatically because the buffer is updated first before the RAM(2). III. COMPRESSED MEMORY MANAGEMENT To manage a compressed memory efficiently Link Fit method is proposed [5]. A logical block is stored in a number of basic building blocks, which we call cells. A cell is a multiple of contiguously addressed words. Cells do not be allocated contiguously, but are linked together to form logical entity. Link-Fit relies on the (almost) uniform access delay characteristic of solid state memory, which does not incur a large penalty for accessing cells spread over need the address space. The Link-Fit method is based on simple linear linked lists. There is one linked list of cells for each to logical block stored, and one free list linking together all the free cells. The physical allocation unit is the cell and allocation requests are rounded up to the nearest number of cells. Fig. 3 shows the memory layout and the data structures used in Link-Fit. There is an entry the logical to physical mapping table for each logical block. This entry holds a pointer to the first and the last cells used for storing the compressed data. Having a pointer to the last cell in the block improves deallocation speed. The “Head” of the free list is the entry point to the free class. Storage is allocated from the “Head” and returned to the “Head”. Thus the free list is maintained in LIFO manner. The “Tail” is a marker
effectively notifying when memory is exhausted. Link-Fit is initialized by linking all the cells together. When compressed data is written the compressed memory manager routes the data to the first cell on the free list, and every time the current cell is filled the subsequent data is
Fig. 3. Link Fit Data Structure
routed to the next cell on the free list. When the entire compressed logical block has been written the cells used are taken off the free list and the address of the first and last cells used are entered in the mapping table. The cell storing the compressed storing the compressed data are already linked together. Removing the cells from the free list is simple; the link pointer in the last allocated cell is simply used as the new “Head” of the free list. IV. CONCLUSION The affects of Uncompressed Memory block size, Compressor Dictionary size and Compressed Memory cell size on Latency and Capacity must be further analyze. A cycle accurate model will be use to investigate these effects. Latency depends on Compression engine throughput, Memory Access time and Memory Management time. Storage capacity increases as the compression ratio decreases. ACKNOWLEDGEMENT The members of Electronic System Design Group (ESDG) have been supportive in most of my work. This environment has helped me towards the progress of this research work.
Department of Electronic and Electrical Engineering, Loughborough University, LE11 3TU, UK
52 ELECTRONIC SYSTEMS AND CONTROL DIVISION RESEARCH 2003
REFERENCES [1] [2] [3] [4] [5]
J.L. Hennessy and D.A. Petterson., Computer Architecture: A Quantitative Approach. 1996, San Francisco, CA: Morgan Kaufmann. Andrew Wolfe and Alex Chanin. Executing Compressed Programs on an Embedded RISC Architecture. in 25th Annual International Symposium on Microarchitecture, December 1992. 1992. Kazumasa Yanagisawa and Jun Sato, DRAM Module for System on LSI. Hitachi Review, 1998. 47(4): p. 107-114. Jose Luis Nunez-Yanez., Gbit / second Lossless Data Compression Hardware, in Department of Electrical Engineering. 2001, Loughborough University: Loughborough. p. 169. Morten Kjelso, Simon Jones, Memory Management in FlashMemory Disks with Data Compression. Proceedings of IWMM'95, published as LNCS, 1995. 986: p. 399-413.
Department of Electronic and Electrical Engineering, Loughborough University, LE11 3TU, UK