2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India
OPTIMIZATION OF AN APPLICATION SPECIFIC INSTRUCTION SET PROCESSOR USING APPLICATION DESCRIPTION LANGUAGE Vicky Rameshlal Dodani, Nikhil Kumar, Umakanta Nanda, Kamalakanta Mahapatra Electronics & Communication Engineering Department, National Institute of Technology, Rourkela, India
[email protected] Abstract— Application Specific Instruction Set Processor (ASIP) is becoming essential to convergent System on Chip (SoC) Design. Usually there are two approaches to design an ASIP. One of them is at Register Transfer Level (RTL) and another is at just higher level than RTL and is known as Electronic System Level (ESL). Application Description Languages (ADLs) are becoming popular recently because of its quick and optimal design convergence achievement capability during the design of ASIPs. This paper presents the implementation and optimization of an ASIP using an ADL known as Language for Instruction Set Architecture (LISA) and CoWare Processor Designer environment. The CoWare Processor Designer (PD) generates Software Development tools and synthesizable RTL for the processor. The generated RTL can be synthesized using Cadence Encounter.
simulation as the hardware implementation details are very high which are not required for performance evaluation, cycle based simulation and software verification [5-6]. In this paper, implementation of an ASIP using LISA is shown. The implemented processor has few instructions where the description for each instruction of the instruction set (of that specific architecture) is described properly in CoWare platform. A brief description of LISA is specified in the next section. II.
Language for Instruction Set Architecture is very much helpful to reduce the gap between the traditional design of a processor using VHDL or Verilog and instruction set languages for architecture exploration. The syntax of this language is having enough flexibility to describe the processor (RISC, VLIW, DSP, ASIPs, and Special purpose Co-processors) instruction set which have complex pipelining.
Keywords- LISA, ASIP, RTL, HDL, CoWare, Profiling
I.
INTRODUCTION
Nowadays ASIP is considered to be an important member in the processor family because of its specific performance and specific flexibility with low cost for solving problems in a specific domain. The specialization of an ASIP provides a tradeoff between the flexibility of a general purpose CPU and the performance of an ASIC. The flexibility of these processors can be achieved by many ADLs [1-2] like LISA, EXPRESSION, MIMOLA etc. Different phases of design of the processor are distributed among different designers in their respective fields. There should be some type of communication between the groups of design engineers or between the phases of the design. Out of the above languages, LISA [3-4] is more preferable because of its software development and HDL generation capability.
Generally the processor model that includes LISA consists of two sections. Those are Resource and Operation sections [6]. Processor resources include the internal storage elements of the processor as well as dedicated input/output pins and global variables. The internal storage elements of the processor are represented by its registers and its internal memories. Operation section describe the complete transition function of the processor including pipelining stages such as fetch, decode, execute and write back. This section generally consists of three sub sections. Those are behavior, syntax and coding. Behavior section describes the transition function of the processor. Coding section describes the binary image of the instruction word and the syntax section describes the syntax of that particular instruction in assembly programs.
VHDL and Verilog HDL are widely used to design and simulate a processor keeping in mind to implement in hardware. However, these models cannot be used for design cycle based or instruction level processor
This language is more suitable with the processor designer tool called CoWare Processor Designer [5] for its advanced and flexible features such as:
This work was supported by the Ministry of Communication and Information Technology, Government of India.
978-1-4244-6653-5/10/$26.00 ©2010 IEEE
LISA
325
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India
• • •
• •
• •
Automatic generation of synthesizable RTL with both control and datapath. Accurate profiling capabilities for high speed instruction set simulator. Compatible with extensively used synthesis tools like Synopsys Design Compiler and Cadence Encounter. Software development tool generation like assembler, linker, debugger, C-compiler. Ensures compatibility of instruction set simulator (ISS), software development tools and RTL implementation. Integrated profiling [5-6] helps to optimize instructions for the target architecture. Enables the design team to develop flexible and reusable ASIPs rapidly. III.
Figure 2. Exploration and implementation
V.
ARCHITECTURE IMPLEMENTATION
The LISA compiler should derive all the necessary information from the given LISA description since the generated HDL model does not have any predefined components. Then the generated HDL model can be compared to the LISA model components as shown in figure 3.
DESIGN FLOW OF COWARE
Figure 1 shows the design flow of CoWare.
•
•
LISA memory model derives the memory configuration which summarizes the registers and the memory sets. Resource models [3] give the idea about the structure of the architecture such as pipeline stages and pipeline registers.
Figure 1. Design flow of CoWare
IV.
ARCHITECTURE DESIGN
Two fields are used for the development of architecture. The development tools are realized using a high level language to describe the target architecture, and for implementation purpose hardware description languages [3] are used to model the underlying hardware. It is advantage to combine both the development process and the HDL description. Here the LISA compiler can generate both of these as depicted in figure 2.
Figure 3. Comparison of HDL and LISA model
•
•
After design exploration and application design the target architecture needs to be implemented which is discussed in next part of this paper.
•
326
Functional units are either generated as empty frames or with fully functionality depending on the HDL language used. Coding information in the instruction set model and the timing model results the decoders. Pipeline controller is also generated from the above.
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India
The designer will have full control over the generated HDL model with all its components. The generated HDL model can be analyzed with respect to power, area and time constraints and the optimized HDL model can be replaced with the handwritten HDL code written by the experienced designers. VI.
IMPLEMENTATION RESULT
First a code has been developed for processor in LISA. The processor has 3 pipeline stages namely Fetch (FE), Decode (DC) and Execute (EX) and a total of 19 instructions can run on it. They are addition (add), subtraction (sub), logical and (and), logical or (or), logical xor (xor), 1’s complement (alu1op), multiply (mul), multiply and accumulate (mac), shift left(shl), shift right (shr), increment (incr), decrement (decr), no operation (nop), load to the register (load), load to the memory (ldm), jump (jump), jump if not equal (jne), move contents to the register (mov), move imidiate value to the register (movi), move contents to the memory (mvm).
Figure 4. LISA debugger window
Figure 5 shows the profiling results for the FIR filter. The operation profiling window shows how many times all the operations are called. For an instance, it is shown that this particular application calls ‘mac’ operation twice and ‘decr’ operation once. This profiling information is very much required to optimize our design.
The LISA description of the processor is then compiled and various software development tools are built using the CoWare Processor Generator. The software development tool suite includes assembler, linker and simulator as well as a graphical debugger frontend. The tools are the enhanced version of those tools used for architecture exploration. The enhancements for the software simulate the ability to graphically visualize the debugging process of the application under test. Tools like assembler and linker can be enhanced in functionality as well. More than 30 assembler directives, labels and symbols are supported by the assembler. The LISA debugger frontend is a generic graphical user interface for the generated LISA simulator as shown in figure 4. It visualizes the internal state of simulation process. Here the C source code, the disassembly of the application as well as all the configured memories and registers (pipeline) are displayed. In this frontend all contents can be changed at the run time of the application.
Figure 5. Operation profiling window
Based on the profiling results, the processor was optimized with regards to resources, memory and operations. The new processor included only those operations required to calculate the convolution using FIR. Thus, the development tools, together with the extensive profiling capabilities of the debugger, enabled analysis and exploration of the application-specific processor’s instruction set architecture to determine the optimal instruction set for the target application domain i.e. convolution using FIR.
The processor debugger provides extensive hardware and software profiling capabilities like register profiling, memory profiling, resource profiling and operation profiling. Memory profiling tells about the access statistics for the memories contained in the processor model. Similarly resource profiling shows the access statistics for all resources modeled with the resource specifier as one of register, program counter and control register in the LISA model. Operation profiling gives us the information about executions for all the operations divided among the pipeline stages.
VII. RTL SYNTHESIS RESULT At first a General Purpose Processor with possible 19 instructions has been developed. Later on an application, named FIR filter, was targeted for which only 8 out of the 19 instructions are being used. The particular FIR filter is of two tap first order with two coefficients. Therefore in the process a novel ASIP is developed. Both these
An assembly code is then written to calculate convolution using FIR filter. The assembly code of the filter, when run on the processor, calculates the convolution of two sequences.
327
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India
processors execute the same FIR filtering algorithm and a comparative assessment is brought out.
Parameters
Number of Cells
Area (in µm²)
Power (in nW)
24682 6856
75489 23611
3870366.052 1847776.752
Processor Processor 1 ASIP
Table 2. Synthesis Results
VIII. CONCLUSION In this paper, a processor using LISA and CoWare Processor Designer has been implemented. The same model was then optimized to an ASIP, an FIR filter in our case. According to the profiling results, the optimization was with respect to resources like data memory, program memory, instruction set and number of general purpose registers. The RTL for both the processors was generated and synthesized. The synthesis results were compared and ASIP was found much better than the previous processor in terms of power and area.
Figure 6. Optimized Result
Figure 6 shows the operation profiling window of the optimized processor. It is clearly shown in the figure that the operations included in the optimized processors are executed at least once.
Thus the CoWare design flow has been explored. By considering the profiling any ASIP can be implemented and optimized taking our previous processor as a reference.
Table 1 compares the resources and instruction set of both the processors: Parameters
Memory Locations
Instructions
REFERENCES [1]
GPRs
Processor Processor 1 ASIP
[2]
65536 256
19 8
32 16
Table 1. Processor Resources and Instruction Set
[3]
In this way an ASIP optimizing the previous processor has been implemented with regards to: • Data and program memory • Instruction set • Number of general purpose registers
[4]
The Processor Generator tool provided in the Processor Designer generated the synthesizable RTL for both the processors. The RTL was synthesized using Cadence Encounter and the results are given in table 2:
[5] [6] [7]
The library used for the synthesis is TSMC (65nm). Thus a drastic reduction in the area and power requirement can be seen.
.
328
Anupam Chattopadhaya, Arnab Sinha, Dandian Zhang, Rainer Leupers, Gerd Ascheid, Henrich Meyr, ”Integrated Verification approach during ADL driven processor design”, Microelectronics journal 40(2009), page 1111-1123. Welhua Sheng, Jianjiang Ceng, Manuel Hohenauer, Hanno Scharwachter, Rainer Leupers, Henrich Meyr, Institute for Integrated systems, Achen, Germany, ”A novel approach for fexible and consistent ADL driven ASIP design”, DAC’04, June 7-11, 2004, San Diego, California, USA. Andreas Hoffman, Member IEEE, Tim Kogel, Achim Nohl, Gunnar Braun, Oliver Schliebush, Oliver Wahlen, Andreas Wieferink and Henrich Meyr, Fello, IEEE, “A novel methodology for the design of application specific instruction set processors (ASIPs) using a machine description language”. IEEE transaction on Computer Aided Design of integrated circuits and systems, vol20, number 11, Nov.-2001. O. Schliebusch, A. Chattopadhayay, E M Witte, D Kammler, G. Ascheid, R Leupers, H Meyr, ”Optimization techniques for ADL driven RTL processor synthesis” in IEEE workshop on rapid system prototyping(RSP), Montreal, Canada, June 2005. CoWare, The ESL design Leader reference manuals, Product version V2007.1.2, June-2008. CoWare, inc, http://www.coware.com U K Nanda, K K Mahapatra, “Design of a pipelined FIR filter using Architecture Description Language”, National Conference on Wireless Communication and VLSI Design 2010, GEC, Gwalior, India