Low Power Embedded System Design Using Code ...

1 downloads 0 Views 416KB Size Report
the compression method to store the program code into memory. When a microprocessor unit (MPU) requests to fetch a new instruction, data, which is read from ...
4S-2012 – Code Compression

1

Low Power Embedded System Design Using Code Compression Quynh Ngoc Do†, Thong Chi Le‡ [email protected], [email protected]  Abstract— Code compression technique is a method which can compress many microprocessor (MPU) instructions into one. After the compression process is done, each cell of program memory will contain several compressed instructions. The main purpose of the code compression is to reduce power consumption. In this paper, this is done by reducing the number of times to access the program memory. Since the power per program memory access is much larger than when it was temporarily stored and read out again. When the code size is reduced, the size of program memory is also reduced correspondingly, it lead to the device size is small and thin. Index Terms—Code compression, decompression, low power embedded system

20th century, computer scientists have begun to research on the compression method to store the program code into memory. When a microprocessor unit (MPU) requests to fetch a new instruction, data, which is read from memory, will be unpacked and return back original instruction. This one is supplied to decode unit of the MPU. Currently, two compressed instruction set are popularly existing in commercial products, these are: IBM’s CodePack [4] and ARM’s Thumb [5]. In this paper, only the code compression technique is interested and performed research. The reason for this choice is: the decompression code module can be implemented outside the processor; there is no change on MPU core. A compression block will be inserted between the MPU and program memory.

I. INTRODUCTION ower consumption is always a primary matter in embedded systems today. As the size of the devices smaller and thinner, this constraint does not permit the system can integrate a large capacity battery for using in a long time. The complex applications with large code size require large amounts of power to complete execution. This issue is mandatory that embedded systems are designed to reduce power consumption as much as possible. To reduce the power consumption, the design engineer can perform on either hardware or software. On the hardware, the following techniques are often used widely: clock-gating [1] clock off for the region does not need to work anymore, using multiple threshold voltage levels [2] for different blocks with different operating speeds, power-gating [2] turn off the block that is no longer working, and dynamic frequency/voltage scaling (DFVS) [3] for scaling the frequency/voltage on appropriate workloads. On the software, solutions are often focused on optimizing compiler. This helps to build small and highly efficient execution code. Starting from the 80s of the

P

† Quynh Ngoc Do is the digital designer of Integrated Circuit Design Research and Education Center (ICDREC). Address: Floor 7, 6 Quarter, Vietnam National University Ho Chi Minh City Building, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam. (E-mail: [email protected]).

‡ Thong Chi Le is Lecturer, Faculty of Electrical and Electronics Engineering Deputy Head, Postgraduate Study Office, Ho Chi Minh City University of Technology. Address: 268 Ly Thuong Kiet St., Dist. 10 Ho Chi Minh City, Vietnam. (E-mail: [email protected])

II. CODE COMPRESSOR A. Code compression flowchart BEGIN

1

Start of program code

2

READ new instruction code 0 Instruction in dictionary list ?

3

ADD instruction into dictionary list SET counter of occurrence to 1

1 INCREASE the counter of occurence

0

4

End of program code ? 1 SORT the counters in decreasing order SUPPLY a new set of opcode WRITE ROM initial file WRITE Dictionary initial file

5

END

Fig. 1. Compressor process flowchart

The goal of the program is implemented to count the occurrences of the separated instruction. Then each instruction of list is encoded with compression code in order from more to less. It includes the following steps: 1. 2. 3.

Start compression flow Read new instruction code Check if the instruction already occurs - Yes: increase the counter of this instruction

4S-2012 – Code Compression

4.

2

- No: add this instruction to dictionary list and set counter of this instruction to 1. Check if the end of the file reaches? - Yes: go to step 5 - No: go back to step 2 This loop is performed until the program reaches the end of file.

5. -

Sort all counter in decreasing order Replace original code with compressed code Write contents of ROM and Dictionary initial files

B. Compression tool The software is built based on Visual Basic programming language. The input file of program is the native instruction file in .BIN format. It will be read out and compressed before exporting to a compressed code file in .HEX format.

2

3

1

III. CODE DECOMPRESSOR A. Integrated into a system Instruction Address

Program memory

Code Decompressor

Fetch Unit

Processor Core

Fig. 2. Code Decompressor in a system

Code decompression is performed on-the-fly by a hardware decompression module when the program is dynamically executed. This module is placed between the instruction memory and Fetch unit of the MPU, as shown in Fig. 2. The decompression is cache- (or memory-) block oriented. Then the block is fetched and decompressed in order, one symbol at a time. Finally, the entire uncompressed memory block is placed in the instruction buffer. When a code is loaded (fetched), it will make reversely the compression process to return code exactly as originally which is described by the instruction set of MPU. Loading and unpacking is done continuously in real time (on-the-fly) to ensure the maintaining of the MPU performance. In addition, the compression code also allows the MPU to load multiple instructions at the same time; it provides enough bandwidth [6] for a superscalar MPU. This kind of the processor can perform more than one instruction in one clock cycle. B. Decompression flowchart BEGIN

4

instruction request

1 1

2

0

Fig. 1 The GUI of Code compressor tool

ROM read ROM addr. = inst addr. request LATCH into previous addr. register

3

Descriptions of 4 sections are as follows: 1. List all the instruction code of the program. The instructions are arranged in the order of it in program memory. 2. This list box contains 3 columns: list of distinct instructions, the number of instruction occurrences, and new compression opcode of current instruction. 3. Dictionary entry is the number dictionary entries are used to store decompression instruction. Data radix means radix of all output files. ROM file name is the contents of program code in compression type. Dictionary file name contains the contents of dictionary in native type. 4. Display the status messages of compression process and compression results. Open button is used to select program source code, the extension of files are *.BIN (Binary file). Analyze button will start compression process.

address already ?

4

5

ACCESS dictionary USE inst addr. request [2] to select inst. SET inst. ready

read valid ?

0

1 LATCH inst. buffer ACCESS dictionary USE previous addr. [2] to select inst. SET inst. ready

new request ?

6

1

0 END

Fig. 2. Decompressor process flowchart

The flowchart does the opposite of the compressed steps to return the original code which the ARM MPU can understand and execute. It includes the following steps: 1. 2.

Get instruction request from MPU core Check if this address is buffered - Yes: access Dictionary and return original instruction - No: send request to ROM

4S-2012 – Code Compression 3. 4.

3

Read program memory Check if data valid to read - No: wait until data is valid - Yes: go to step 5

B. Instruction process flow ARM Processor Core executes instructions in five pipeline stages:

5. -

Latch data into instruction buffer. It can be read again for the next request Access dictionary Use bit 2 of previous address to select an instruction Set ready instruction

6.

Check if a new request occurs - Yes: go to step 2 - No: go to IDLE state

1.

2.

3.

IV. ARM MICROPROCESSOR CORE A. ARM microprocessor model cpu_inst_req prg_data [31:0]

Instruction memory (IM): use content of instruction address register to point to program memory and fetch instruction code. This instruction is supplied to Instruction Decode (ID) stage. Instruction Decode (ID): this stage gets instruction from IM stage and decodes to control signals for following stages. Besides, at this stage, all operands are read and supply to ALU. Execution (EX): this stage gets control signals and operands from ID stage. These ones are used to select appropriate block in EX stage to perform function of instruction. In the case of branch instructions, EX results are branch target. In the case of memory access instructions, EX results are data memory address request to load/store.

clk

PC

prg_addr [31:0]

ram_datai [31:0]

32-bit ARM Microprocessor (MPU)

dec_inst_ready clk rst_n

ram_we ram_addr [31:0]

ROM

IM stage clk

IR

ram_datao [31:0]

ID stage

ID

RF

Fig. 3 ARM microprocessor core clk

This ARM core is used to run a program code result which is outputted from code decompressor block. The result of program is executed by the ARM MPU will prove that the implementation of compression and decompression is exactly.

Operands

ALU

EX stage clk

Table 1 Pin descriptions

ALU result or Effective address RAM

Name

In/Out

Descriptions

clk

Input

System clock (rising edge)

rst_n

Input

System reset (active low)

dec_inst_ready

Input

Instruction ready from decompressor Instruction code from program memory

prg_data [31:0]

Input

ram_datai [31:0]

Input

cpu_inst_req

Output

prg_addr [31:0]

Output

ram_addr [31:0]

Output

Data address request to SRAM

ram_we

Output

Write enable control signal to SRAM

ram_addr [31:0]

Output

Write data to SRAM

Data from SRAM Instruction request to program memory Instruction address request to program memory

DM stage

clk

Result

WB stage

Fig. 4 Pipeline flow control

4.

5.

Data memory (DM): this stage is useful for load/store instructions. For loading, address from EX stage is supplied to data memory and latch read data at output of memory. For storing, address, write enable signal and write data are supplied at the same cycle, these data are written to data memory. If instructions are not load/store, the control signals and EX result will pass through this stage. Write back (WB): this stage write the result from DM stage to register file. After this stage, an instruction is completely done.

4S-2012 – Code Compression

4

Each stage executes in one clock cycle. When the pipeline is full, average one instruction operates in one clock cycle.

V. EVALUATION RATIOS The goal of research can reduce the power consumption of MPU. The quality of compression is evaluated by using four factors: 1. Compression ratio: Static efficiency on program code size 2. Relative ratio: the potential traffic reduction 3. Energy ratio: energy efficiency is computed by equivalent switching activity per instruction 4. Fetch time ratio: decompression speed efficiency

Evaluation ratio name Compression Ratio (%) Relative ratio Energy ratio Fetch time ratio

Add/Sub

Logic

Subroutine

Basic

66.06

123.04

124.00

118.18

0.5263

0.5208

0.5231

0.5317

0.5827

0.5779

0.5798

0.5875

0.7632

0.7604

0.7615

0.7659

ratio) and fetch time are also reduced; they will help to increase the performance of the whole system. Because each compressed instruction includes many original instruction, so that each program memory access can respond for many MPU instruction request. When a compressed instruction is decompressed, the unencumbered instruction will be buffered in internal memory. The next MPU request maybe use these buffered instruction, the external memory requests which must claim long delay timing are unnecessary. Code compression method is suitable solution to reduce energy consumption but don’t change the architecture of MPU. The system builder needn’t care about which MPU is used in their system. This convenient lead to independence on vendor device supplier. Using code compression technique is easy in developing. The algorithm compression can develop separately from MPU development. ACKNOWLEDGMENT

All ratios are lower than 0.77 except compression ratio. The relative ratio is approximately 0.53. The energy ratio is approximately 0.59. The fetch time ratio is approximately 0.77. These ratios mean that the compression technique is good in reducing the power consumption and fetch time. But the code size is still greater than the original code size. According to above table, this code compression method doesn’t give a better compression ratio. Most programs have the code compression ratio is equal to or greater than 1. When instructions are repeated many times in the program, the compression ratio is good. Because the number of original instructions in the dictionary is small. On the contrary, the compression ratio is bad. To improve this ratio, reducing the length of the compression code needs to be done. A popular algorithm is Huffman encoding. The algorithm will give the length of the opcode that are shorter as much as possible. With the compressed code length is not fixed, this leads to difficulty in reading the code length and decompressing the instructions.

VI. CONCLUSION The code compression brings major benefits in reducing energy consumption for program memory. As these examples are done in such subjects, reducing of power consumption on the program memory can reach 40%. This power is used to fetch compress code from program memory, access internal memory and the combination circuits of decoder. When the size of compressed instruction file is smaller, it needs little memory space to store. The cost of completed system is cheaper. The number of program memory access (relative

I would like to tell my sincere gratitude to my advisor Dr. Thong Chi Le for his attention, enthusiasm, and knowledge. His guidance helped me in all the time of research and writing of this paper. Besides, I would like to thank Prof. Mo Luong Dang, advisor of VNU-HCM and ICDREC, for his advices, insightful comments. I thank my colleagues in ICDREC, for the stimulating discussions, for the useful advices about implementation method. REFERENCES [1] F. Emnett and M. Biegel, "Power Reduction Through RTL Clock Gating," SNUG San Jose, 2000. [2] R. Aitken, A. G. Shi, M. Keating, and D. Flynn, Low Power Methodology Manual For System-on-Chip Design.: Springer, 2007. [3] Jeabin Lee, Byeong-Gyu Nam and Hoi-Jun Yoo, "Dynamic Voltage and Frequency Scaling (DVFS) Scheme for Multi-Domains Power Management," SolidState Circuits Conference, 2007. ASSCC '07. IEEE Asian, pp. 360-363, 2007. [4] IBM, PowerPC Code Compression Utility Users Manual, 3rd ed., 1998. [5] S. Segars, K. Clarke, and L. Goude, "Embedded control problems, thumb and the arm7tdmi," IEEE Micro, vol. 15, no. 5, pp. 22-30, Oct. 1995. [6] Y.-Y. Tsai, et al., "Code Compression Architecture for Memory Bandwidth Optimization in Embedded Systems," CASLab, 2006.