design of 16 bit low power processor using clock gating ... - Iaeme.com

7 downloads 75 Views 528KB Size Report
Professor, Dept of ECE, JNTU Anantapur, A.P.,India. ... Section III provides design of 16 bit processor and how clock gating is applied. Section IV discusses  ...
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME

ISSN 0976 – 6464(Print) ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), pp. 333-340 © IAEME: www.iaeme.com/ijecet.asp Journal Impact Factor (2012): 3.5930 (Calculated by GISI) www.jifactor.com

IJECET ©IAEME

DESIGN OF 16 BIT LOW POWER PROCESSOR USING CLOCK GATING TECHNIQUE Khaja Mujeebuddin Quadry Research Scholar, JNTU Ananatapur, A.P., India. Email: [email protected] Dr. Syed Abdul Sattar Professor & Dean of Academics, Royal Institute of Technology & Science, Chevella, R. R. Dist. A. P. India. Email: [email protected] Dr. K. Soundara Rajan Professor, Dept of ECE, JNTU Anantapur, A.P.,India. Email: [email protected]

ABSTRACT Low power design is gaining importance due to the increasing need of battery operated portable devices with high computing capability. The reliability of integrated circuit depends on the heat dissipated in the circuit. The cost of the system also increases with the cooling systems for heat removal. A large fraction of the power consumed by a synchronous logic is due to the clock distribution network and the high switching activity at the nodes. Clock Gating is the well known technique used to reduce the clock power. In this paper we have presented the design of 16 bit processor using 90nm technology by applying the clock gating principle at the fine grained level to minimize the power dissipation. I. INTRODUCTION Clock gating is a technique used to reduce power dissipation in clock distributed network. This is achieved by shutting down the clock of any component whenever it is not being used or accessed. It involves inserting combinational logic along the clock path to prevent the unnecessary switching of sequential elements. By shutting down the idle units we can prevent the circuit from consuming unnecessary power. A portion of the clock tree can also be shut down by masking off the clock at the internal node of the tree using an AND gate. 333

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME

Figure1. Processor Power Breakdown[16] This prevents wasteful switching in the clock tree and saves power in the clock tree in addition to saving power in the functional units which are fed by the clock. In modern processors and SoCs the clock distribution network is responsible for an increasing fraction of the dynamic power consumption[15]. The Figure 1 shows the breakdown of power consumption for a recent high-performance microprocessor[16]. The clock power is expected to increase as the complexity and the operating frequency of the circuits keep growing as a result of technology scaling [11]. Designing the clock tree has thus become critical not only for performance, but also for power, and the development of new modeling capabilities and synthesis techniques that help in controlling the clock tree power effectively is one of the challenges that EDA engineers currently have to face[13]. Different solutions for minimizing the power consumed by the clock tree have been investigated in the recent past. In this paper, we have presented the design of 16 bit processor by applying the clock-gating technique for power optimization at the gate and RT levels. The rest of this paper is organized as follows. In Section II we briefly review previous work on minimization of power using clock gating. Section III provides design of 16 bit processor and how clock gating is applied. Section IV discusses simulation results. Finally, Section V concludes the manuscript with some final remarks. II. PREVIOUS WORK The problem of minimizing the power dissipation by clock distribution networks has been addressed by many authors and a brief overview of their work is mentioned below. In [14] Jaewon Oh et.al presented a zero-skew gated clock routing technique for VLSI circuits. In which they constructed a clock-tree topology based on the locations and the activation frequencies of the modules, while the locations of the internal nodes of the clock tree are determined using a dynamic programming approach. .In [11] Hans Jacobson et al. examined the power reduction benefits of a couple of newly invented schemes called transparent pipeline clock-gating and elastic pipeline clock-gating. In their work they have bounded the practical limits of clock gating efficiency in future microprocessors. In [10] Jochen Preiss et al. introduced fine-grain clockgating schemes for fused multiply-add-type floating-point units (FPU). This method based on instruction type, precision and operand values.

334

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME

In[9] Donno et al. presented a methodology in which low-power clock trees are obtained through aggressive exploitation of the clock-gating technology. In[8] M.Kamaraju1, Dr.K.Lal Kishore presented a power optimized ALU for efficient data path with clock gating technique and achieved a saving of 33.3% power dissipation. In[7] M.Kamaraju, Dr.K.Lal Kishore presented a FPGA based power optimized programmable embedded controller with a power dissipation of 15mw and with a frequency of operation of 15Mhz. In [5] Khaja Mujeebuddin Quadry and Dr. Syed Abdul Sattar presented FPGA based design of low power 16 bit processor with a power dissipation of 25mw with operating frequency of 30.931Mhz, and a saving of 21% is achieved after applying various low power techniques. In [6] N.Sivasankara reddy presented a low power 16 bit processor with a power dissipation of 1.37mw, and saving of 29% by using low power techniques. In [4] Samiappa Sakthikumaran1et al. proposed a 16-bit non-pipelined RISC processor with 329.3 µW power dissipation and total area of 65012 nm² using 90nm technologuy. In [3] Jagrit Kathuria et al. presented the review of existing clock gating techniques. In [2] Ali Elkateeb presented a practical introduction to soft-core processor design through the use of step-by-step integrating of the processor’s components. In [1] Shmuel Wimer et al. presented a probabilistic model of the clock gating network that allows to quantify the expected power savings and the implied overhead. They presented expressions for the power savings in a gated clock tree and derived the optimal gater fan-out based on flip-flops toggling probabilities and process technology parameters. The resulting clock gating methodology achieves 10% savings of the total clock tree switching power. However, the described approaches give little attention to integration issues with existing design flows. III. PROCESSOR WITH CLOCK GATING The Design of 16 bit RISC processor is done and clock gatng is applied to the design. The processor has 24 basic instructions involving Arithmetic, Logical, data transfer, Branching, and Control instructions. The processor consist of 16 bit register set R0 to R7, PC,IR,RegY and Add_reg R. The 16 bit ALU is designed to perform the arithmetic and logic opeartions. The control unit generates the control signals according to the instruction being executed. The state machine of the control unit has basically four states idle, fetch, decode and execute.The processor has two flag bits zero and carry.

Figure 2. Processor Architecture

335

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME

The architecture of the processor is shown in Figure 2 it has two buses Bus_1 and Bus_2 which are driven by Mux_1 and Mux_2 respectively. Mux_1 is 8 to 1 multiplexer. Register R0 to R6, and PC are input to Mux-1, 3 bit select line is used to select any of these registers. The output of the Mux_1 is driving the Bus_1.The output of Bus_1 is given as input to ALU, Bus_2,and Memory. Mux_2 is a 4 to 1 mux uses 2 bit selecet line to select ALU output, Bus_1 and memory word, the output of Mux_2 is driving the Bus_2. The data from the Bus_2 is loaded in to any one of 16 bit register by using the respective load _x signal from the control unit. The instruction format of the processor is shown in Figure 3. The source and destination registers are specified by the 3 bit address. The opcode is of 5 bits hence a total of 32 instructions are possible. opcode 1512

1 1

1 0

Source

9 8 7

6

5

4

Destinati on 3 2 1

Figurre 3. Instruction format There is a provision for increasing the number of instructions and number of registers , as 4 bits are left for future use. In case source or destination is a memory location then an address of the memory location is mentioned in the second word of the the instruction. The program counter holds address of the current instruction to be executed. The contents of the program counter are transferred to address register through Bus-1 and Bus-2.The contents of the memory pointed by the address register are transferred to the instruction register through Bus_2 and the program counter is incremented. The instruction is decoded by the control unit and the control signals are generated by the control unit to perform the operation .Once the processor is designed, then verified the functionality for all the instructions. The optimization of the the processor for power dissipation is done by applying the clock gating technique at fine grained level. In clock gating technique clock is disabled to a circuit to save power by eliminating power dissipation on clock network by preventing unnecessary activity in logic modules. In the processor architecture, identified the units for which the clock is to be gated and the condition for the gating is evaluated separately for each of the module. In the case of register file only source and destination registers are used in the execution and the other registers are in idle condition hence the clock is masked with the AND gate by using an enable signal

Figure 4. Clock gating at fine grained level

336

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME

The Figure 4.shows how the various modules are connected to the clock through the masking AND gate with an enable signal. The condition for activating the enable signal for various modules is found out by care fully analyzing the functionality and timing diagram of the processor.

Figure 5. Clock Enable signal for Flip Flop The Figure 5 shows how a clock enable signal is derived for flip flop, whenever there is a no change between previous output and present input the clock signal is masked by the clk_en signal which is computed by perfroming the XOR opeartion between D and Q. Here the cost we are paying for saving the power is extra logic circuit overhead. The carry and zero flag registers are implemented with this method. The group enable signal is generated for 16 bit registers as they consist of group of flip flops. The instruction based clock gating is also incorporated by carefully partitioning the CPU registers into blocks and a common clock enable signal is derived to turn off the register group independently[9]. IV. SIMULATION RESULTS The Figure 6 shows simulation results from 485ns to 715ns, of the simulation time , SUB instruction is executed from 485ns to 525ns (4 cycles), BRZ instruction is executed from 525ns to 555ns (3 cycles), ADD instruction is executed from 555ns to 595ns (4 cycles), AND instruction is executed from 595ns to 635ns (4 cycles), OR instruction is executed from 635ns to 675ns (4 cycles), XOR instruction is executed from 675ns to 715ns (4 cycles), The control signals to execute the above instructions can be seen in the figure.The processor is tested by executing the number of test programs from the test bench and verified the functionality.

Figure 6. Simulation results 337

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME

Figure 7. Net power usage

Table 1. Power and Area report 16 bit Cells/ Leakage Dynamic Processor Area(nm²) power power (nw) (nw) w/o low 1123/ 170..928 347131.269 power 34378 With low 1141/34638 179. 588 281869.759 power

Total Power(nw) 347302. 197 282049. 348

The figure 7 shows the net power usage report generated by cadence Encounter(R) RTL compiler.The Table 1 shows the number of cells, cell area, leakage and dynamic power dissipation of the processor with and without applying clock gating technique. We have observed that 23.15% power saving is achieved after the application of clock gating technique. V. CONCLUSION The 16 bit processor with 90nm technology is designed, simulated, verified the functionality and Power optimization is done by applying the clock gating technique at fine grained level. Instruction level clock gating is done by grouping the modules according to the instructions. The activation functions for enabling and disabling the clock for group of flip flops is evaluated care fully. The power and area are evaluated before and after the application of clock gating technique and is shown in Table 1, it is observed that an overall saving of 23.15% of power is achieved. The absolute power dissipation of the designed processor is 282049.348 nw, compared to the 16 bit processor designed using 90nm technology presented in [4] it is less. The frequency of operation of the designed processor is 226MHz. The leakage power dissipation is going to increase as the technology is scaled down which can be reduced by applying power gating technique in combination of clock gating technique.

338

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME

REFERENCES [1] Shmuel Wimer, Israel Koren, “The Optimal Fan-Out of Clock Network for Power Minimization by Adaptive Gating”, IEEE transactions on very large scale integration (vlsi) systems, vol. 20, no. 10, pp.1772-1780, october 2012. [2] Ali Elkateeb “A Processor Design Course Project: Creating Soft-Core MIPS Processor Using Step-by-Step Components Integration Approach“ International Journal of Information and Education Technology, Vol. 1, No. 5,pp.432-440, December 2011. [3] Jagrit Kathuria,M.Ayoub khan, Arti noor, “A review of clock gating techniques”, International Journal of Electronics and Communication Engineering, , Vol. 1, No. 2 pp.106-114,, Aug 2011. [4] Samiappa Sakthikumaran, S. Salivahanan, V. S. Kanchana Bhaaskaran, “16-Bit RISC Processor Design for Convolution Application” proceedings of IEEE-International Conference on Recent Trends in Information Technology, ICRTIT 2011 978-1-45770590-8 [5] Khaja Mujeebuddin Quadry, Dr. Syed Abdul Sattar, Design of 16 bit low power processor”, (IJCSIS), International Journal of Computer Science and Information Security Vol. 10, No. 6, pp.67-71 June 2012, ISSN 1947-5500 [6] N.Sivasankara reddy, “minimization of power dissipation in 16 bit processor using low power techniques” Asian Journal of Applied Sciences 4(6):657-662, 2011 ISSN 19963343. [7] M.Kamaraju, K.Lal Kishore, A.V.N.Tilak, “ Power Optimized ALU for Efficient Datapath”, International Journal of Computer Applications (0975 – 8887)Volume 11– No.11,pp.39-43, December 2010 [8] M.Kamaraju, K.Lal Kishore, A.V.N.Tilak, “Power optimized programmable embedded Controller”, International Journal of Computer Networks & Communications (IJCNC), Vol.2, No.4,pp 97-107 July 2010 [9] Monica Donno, Enrico Macii, Luca Mazzoni“ power aware clock-tree planning” Proceedings of the 2004 international symposium on Physical design Pages 138-147New York, NY, USA ©2004 ISBN:1-58113-817-2 [10] Jochen Preiss, Maarten Boersma, Silvia Melitta Mueller “Advanced Clockgating Schemes for Fused-Multiply-Add-Type Floating-Point Units” 19th IEEE International Symposium on Computer Arithmetic pp.48-56. 2009 [11]Hans Jacobson Pradip Bose Zhigang HuRick Eickemeyer Lee Eisen John Griswell “Stretching the Limits of Clock-Gating Efficiency in Server-Class Processors” Proceedings of the 11th Int’l Symposium on High-Performance Computer Architecture (HPCA-11 2005) 1530-0897/05 © 2005 IEEE [12] D. Duarte, V. Narayanan, M. J. Irwin, “Impact of Technology Scaling in the Clock System Power,” IEEE Computer Society Annual Symposium on VLSI, pp. 52-57, Pittsburgh, PA, April 2002. [13] D. Duarte, V. Narayanan, M. J. Irwin, “A Clock Power Model to Evaluate Impact of Architectural and Technology Optimizations,” IEEE Transactions on VLSI Systems, Vol. 10, No. 6, pp. 844-855, December 2002. [14]Jaewon Oh and Massoud Pedram “Gated Clock Routing for Low-Power Microprocessor Design” IEEE transactions on computer-aided design of integrated circuits and systems, vol. 20, no. 6, pp 715-722, june 2001 [15]T. Mudge, “Power: A First-Class Architectural Design Constraint,” IEEE Computer, Vol. 34, No. 4, pp. 52-58, April 2001. 339

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME

[16]V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, F. Baez, “Reducing Power in HighPerformance Microprocessors,” DAC-35: ACM/IEEE Design Automation Conference, pp. 732-737, San Francisco, CA, June 1998. [17] Raj Kumar Tiwari and Santosh Kumar Agrahari, “Low Power Arm Processor Based Embedded System” International journal of Electronics and Communication Engineering &Technology (IJECET), Volume3, Issue2, 2012, pp. 369 - 374, Published by IAEME [18] B.K.V.Prasad, P.Satishkumar, B.Stephencharles, T.Prasad, “Low Power Design Of Wallance Tree Multiplier” International journal of Electronics and Communication Engineering &Technology (IJECET), Volume3, Issue3, 2012, pp. 258 - 264, Published by IAEME [19] P.Sreenivasulu, Krishnna veni ,Dr. K.Srinivasa Rao and Dr.A.VinayaBabu, “Low Power Design Techniques Of Cmos Digital Circuits” International journal of Electronics and Communication Engineering &Technology (IJECET), Volume3, Issue2, 2012, pp. 199 208, Published by IAEME AUTHORS PROFILE

Khaja Mujeebuddin Quadry (Member IEEE), Presently working as Associate Professor & Head Of Department ECE, RITS, Chevella, Hyderabad, A.P., India. He has obtained Diploma in Electronics and communication Engineering from state board of Technical Education and Training, A.P India in 1993, BE Degree in Electroics and Communication Engineering from Osmania University in 1997, ME Degree in VLSI & Embedded System Design from Osmania University in 2007. Presently he is Research scholar of JNTUA, Anantapur, A.P., India. He is a Life member of Institution of Electronics and Telecommunication Engineers (IETE) India. He has 6 years of Industrial experience and 8 years of Teaching Experience

Dr.Syed Abdul Sattar, is presently working as a Dean of Academics & Professor of ECE department, RITS, Chevella, Hyderabad. He has completed his B.E. in ECE in 1990 from Marathwada University Aurangabad, M. Tech. in DSCE from JNTU Hyderabad, in 2002, and did his first Ph.D. in Computer Science from Golden State University USA, in 2004, and second Ph.D. in ECE from JNTU Hyderabad, A. P. India in 2007. He is a fellow member of Institution of Electronics and Telecommunication Engineers India, and Life member of Indian society for Technical Education. His area of specialization is wireless communications and image Processing. He has about 21years of experience in teaching and industry together and recipient of national award as an Engineering Scientist of the year 2006 by NESA New Delhi, India. He has about 73 publications in International and National Journals and conferences.Presently he is guiding research scholars in ECE and Computer Science from different Universities. He is a member of Board of studies for a central university and reviewer/editorial member/chief editor for national and International journals.

Dr. K. Soundara Rajan, obtained his Master’s degree and Ph.D. from IIT, Roorkee. He has more than 30 years of teaching experience and 12 years of Research experience. He has guided 10 Phd scholars suceesfully , presently 10 PhD scholars are under his guidance.He has published more than 79 Publications at National and International level. He is Member of an International Research Journal of Higher Education, Life member for ISTE, Regional Coordinator for NAFEN (National Foundation of Indian Engineers, New Delhi.). He is former Principal and Rector of JNTUA , Presently he is OSD to Vice Chancellor at JNTUA, Anantapur, A.P., India. 340

Suggest Documents