Document not found! Please try again

Integrated Heterogenous Modelling for Power Estimation ... - CiteSeerX

2 downloads 0 Views 234KB Size Report
[1] Nikhil Bansal, Kanishka Lahiri, Anand Raghunathan, and Srimat. T.Chakradhar, “ Power Monitors : A Framework For System-. Level Power Estimation Using ...
Integrated Heterogenous Modelling for Power Estimation of Single Processor based Reconfigurable SoC Platform Prakash Srinivasan1 Ali Ahmadinia 1 Ahmet T Erdogan1,2 Tughrul Arslan1,2 Email: {P.Srinivasan, A.Ahmadinia, T.Arslan} @ ed.ac.uk, [email protected] 1

2

School of Engineering and Electronics The University of Edinburgh, Edinburgh, UK

Abstract—Various instruction and transaction based power estimation techniques for processor and on-chip buses have been proposed in the past. In this paper, we propose a heterogeneous power model to estimate the power utilized by complete processor based reconfigurable System-on-Chip (SoC) platform. The proposed model estimates the power consumed by the SoC platform using instruction-based model as well as transactionbased model. In addition we estimate the power consumed by various bus arbitration policies used in the on-chip communication.

I. INTRODUCTION Energy consumption of software has recently emerged as an important factor of system performance with the increasing requirement for low power reconfigurable SoC design. Most of the system-level power modeling techniques have focused on individual system components such as processors, memories, on-chip buses, peripherals, etc. As each component in the SoC exhibits diverse operational characteristics, therefore it is necessary to have heterogeneous power model (e.g., Instruction-level power modeling technique that may be used for a processor and Transaction-level model for on-chip bus). The models found in the literature have specific type of power model based on the components. For example instruction based power model for processor was proposed in [2], and transaction based power model for on-chip communication was proposed in [3] and [4], very few papers have been published on energy model for complete integrated SoC in [1]. This paper first provides a technique to estimate the power consumed by single processor based reconfigurable SoC platform for low power application. In addition it analyses the power consumed by different arbitration policies of the onchip communication. To illustrate this we consider an integrated platform based on a RISC processor for power estimation as it is ideally suited to those applications requiring RISC performance from a compact and power efficient processor.

1-4244-0921-7/07 $25.00 © 2007 IEEE.

Institute of System Level Integration Livingston, UK

We first discuss previous work on power modeling. We then discuss the heterogeneous power model used for power estimation. Next, we describe power analysis of different arbitration policies that can be used in on-chip communication for reconfigurable SoC. Finally we provide simulation results for various power models and conclusion. II. PREVIOUS WORK Work on power estimation, optimization and modeling for low power SoC design has been conducted at different levels of abstraction. In this paper, we focus only on work that deals with high level approaches, i.e., gate-level or higher, as it is most relevant to our work. A methodology for predicting the system-level energy-efficiency when integrating IP modules into a SoC platform based around the AMBA bus was proposed in [2], without considering the power utilized by the peripherals. The work in [5] introduces an instruction-based technique for power evaluation of peripheral cores. An energy-ware architectural design exploration and analysis tool for ARM based system-on-chip design was developed in [6], but did not consider the power consumed by on-chip communication. A case study was done in [7], to evaluate the power consumption of the processor-to-memory communication on system level buses. A simulation framework has been set up to demonstrate how the simultaneous presence of the bus encoding techniques and memory hierarchy can impact the overall system power budget. The work presented in [8] focuses on the development of a quantitative understanding of the relative contributions of different communication architecture components to its power consumption, and the factors on which they depend, without considering power consumption of the processor. The work presented in [9], proposes energy model based on the individual instruction executed on a processor along with the switching activity on busses. But this model was developed without detailed information about

1875

the internal structure, i.e, it uses physical power measurement rather than the HDL based power estimation. In contrast to these approaches, we estimate the overall power consumption of a SoC platform which includes processor, on-chip bus and the peripherals. Our model is similar to the model proposed in [1], but we estimate power at gate level rather than system level to get more accurate results. Also we have analyzed the power consumption of the arbitration policies used by on-chip communication for reconfigurable SoC platform.

considered to be the fundamental unit of a microprocessor. Although better estimations can be obtained using an instruction level energy model (data dependent approach), still it lacks general applicability, being developed only for specific applications (e.g. Embedded Processor). For illustration, we have analyzed instructions shown in Table.1. Each available instruction is placed in a loop and executed on the RISC processor. During this process the switching activity of the instruction in a single cycle is captured to estimate the power.

III. PLATFORM DEFINITION

TABLE 1 INSTRUCTION TABLE

The platform architecture used in this work is shown in figure (1). The main features of the platform includes: RISC Processor core [10], Advanced Microcontroller Bus Architecture (AMBA) with AHB (Advanced High Performance Bus) and APB ( Advanced Peripheral Bus) [11], AHB Arbiter, AHB to APB bridge, on-chip SRAM, Dual timers, UART (Universal Asynchronous Receiver/ Transmitter), and GPIO (General Purpose input/output port). AMBA AHB is used as the on-chip bus architecture for the platform integration. It also supports multiple bus masters, pipelined and burst transfer.

RISC Processor

SRAM

B.

ARBITER

AMBA AHB BUS

System Control

AHB_APB Bridge

AMBA APB BUS

Timer

UART

GPIO

Fig 1: SoC Platform Architecture

IV. HETROGENOUS POWER MODEL FOR INTEGRATED PLATFORM This section briefly describes the methods to be used for the complete integrated platform. In heterogeneous model, we allocate two types of models to the SoC Platform. Instructionbased model is used to estimate the power utilized by the processor and transaction-based power estimation is used to estimate the power utilized by the on-chip bus communication. A.

Instructions (a sample only)

Power Utilized (µw)

MOV

4.53

ADD

4.71

CLR

4.45

INC

4.62

XOR

4.72

TRANSACTION BASED POWER ESTIMATION

The term Transaction Level Modeling (TLM) refers to an abstraction level in the description of a system that provides modeling of the communication between the elements that describe the functional behavior of the system. In a TLM model, the focus is on the data that is passed between two modules, rather than on the way the transfer is accomplished. The proposed transaction based model consists of M1 (RISC Processor), M2 (SRAM), B (Ahb_Apb Bridge), P1 (UART), P2 (Timer) and P3 (GPIO). Figure (2) reveals the communication between the cores and the peripherals. In order to estimate the transaction based power, three scenarios are considered. Each scenario was performed with 4 cycles. Scenario1: Processor communicates data to the UART via the bridge. Scenario2: Processor communicates with the timer via the bridge. Scenario3: SRAM communicates with the UART via the bridge.

INSTRUCTION BASED POWER ESTIMATION

It is important to analyze the power consumed by individual instructions executing on the processor to calculate the power cost of software component, as an instruction is

1876

Fig 2: Transaction based model

TABLE 2

TRANSACTION TABLE

Transaction (Data Path)

Total Power (µw)

Processor core to Ahb_Apb Bridge

100.03

Ahb_Apb Bridge to UART Ahb_Apb Bridge to Timer SRAM to Ahb_Apb Bridge

75.36 57.51 102.89

Fig 3: Power Distribution chart

In addition, under certain circumstances, the arbiter may allocate ownership of the bus to the dummy master. Table.3 shows the fixed priority scheme to determine the priority between the alternate masters.

V. ARBITRATION FOR RECONFIGURABLE PLATFORM From the power distribution chart shown in figure (3), it is important to notice that power contribution of the AHB_APB Bridge is high when compared to other components. This is due to the bridge’s AHB slave interface being always selected, which results in activation of the bridge and APB master logic, causing higher power consumption. Next highest consumption module is the arbitration. But the power consumption tends to vary with different arbitration policies. To demonstrate this, we opted to try the traditional TDMA and Priority based arbitrations. Former can be used for real time requirements and also time slots allocation can be adjusted to meet bandwidth needs, while later with split transfer has efficient performance. The following three arbitration policies have been analyzed: • • •

A.

Time Division Multiplexing Arbitration (TDMA) Fixed Priority Arbitration (FA) Fixed Priority Arbitration with Hold Control (FAWHC) Time Division Multiplexing Arbitration

Time Division Multiplexing Arbitration divides access time on a bus into time slots and then allocates these slots to master in predefined and predictable bandwidth. If a master possessing the current time slot does not issue request then the time slot would be wasted as this scheme strongly depends on the time-alignment of communication requests and slot allocation and therefore on the probability of dynamic variations of the request patterns. B.

Fixed Priority Arbitration

The fixed arbiter arbitrates between the highest and lowest priority masters, each of which is capable of generating bus request signals. For example, consider 5 masters with AHB Master 0 as being the default master.

TABLE 3 PRIORITY SCHEME Priority 1 (highest) 2 3 4 5 6 (lowest)

C.

HSPLIT 4 3 2 1 0 X

HMASTER 0100 0011 0010 0001 0000 1111

Description AHB master 4 AHB master 3 AHB master 2 AHB master 1 AHB master 0 dummy master

Fixed Priority Arbitration with Hold Control

This arbitration is the same as the fixed arbitration except, during a fixed length burst transfer a master may de-assert its grant request, but the arbiter will not change the currently selected master until the penultimate transfer of the burst. If the burst is terminated early due to a split/retry transfer or the master ending the burst, then the grant outputs will change as normal. VI. SIMULATIONS AND RESULTS All the designs were described in Verilog HDL and synthesized using UMC 0.18 micron technology. Power results were obtained using Synopsys Power Compiler by monitoring the switching activity of each net in the SoC design. Table.1 reveals that the costs for the different operation like MOV, ADD, CLR or XOR does not show much of a variation. It may well be the case that the differences in the circuit activity for these instructions are less relative to the circuit activity common to all instructions. For instance, the difference between the MOV and XOR does not show much variation in power consumption. The reason for the similarity of the power consumptions most likely has to do with way ALUs are designed. A common bank of inputs feeds all the different ALU modules, and thus all the modules switch and

1877

consume power, even though on any given cycle, only one of the modules computes useful results. Table.2 shows the results of the three scenarios which were considered for the transaction based model, the power results were split according to bus architecture. For example, the data transfer between the Processor to UART, is split into transfer between Processor to Bridge and transfer between Bridge to UART for simplicity. Power consumed at AMBA AHB is comparatively higher than APB, due to high frequency operation. For simulation of the three conventional arbiters, test benches were written with variations in the bus masters and bus request profiles were observed. All the implementations had arbitration logic that featured 64-cycle slots during which a master can issue incrementing burst transfers. Of the three policies, TDMA is a simpler communication protocol, as the whole contention management procedure is arbiter driven. TDMA might outperform other two arbitration policies in proprietary communication architectures. TDMA guarantees a constant bandwidth to all masters in the pipe, but its overall performance is lower than that of FA. TDMA poor performance can be explained in terms of its inability to support the communicate handshake between two masters, which is necessary for the system of the high level intermaster message passing. This handshake involves a pingpong interaction between the two tasks, and is inefficiently accommodated in a TDMA based policies, wherein only one master is active during each slot. This results in higher latency for the interaction respect to the fixed arbitration case. Despite the lower performance, TDMA-based arbitration is also attractive in many real-time applications where predictability is a critical requirement. TDMA reserves a slot to each processor regardless of the current workload, thus making constant in time the bandwidth perceived by each master, independently of the traffic generated by the other masters. In terms of power consumption, TDMA being the lowest compared with the other two assumes that the time slot is not wasted by requesting masters. The total power consumed by the FAWHC is drastically increased by the hold control that allows one particular master to retain bus, till the current penultimate burst is completed. The power consumed by the FA is comparatively higher than the TDMA method because of the split transfer which is used in the arbitration. From the plot in figure (4), it is clear that operation of each arbitration policy for the same number of masters with similar operating conditions results in different power utilization. There exists a trade-off between contention-avoidance bus arbitration policies and contention-resolution bus protocols. Even though commercial standards provide degrees of freedom for performance optimization, the performance achievable by contention-avoidance policies implemented within contention-resolution protocols cannot be fully exploited, because of their different characteristics.

VII. CONCLUSION In this paper we have presented model to estimate the power consumed by software, by the processor and the onchip communication in the SoC design. We also demonstrated the importance of judiciously allocating various models for different components. We have analyzed the power utilized by the different arbitration policies for reconfigurable system, which allows the designers to easily explore different system architectures taking into account the power dimension.

1878

Fig 4: Total Power utilized by different arbitration Policies

REFERENCES [1]

Nikhil Bansal, Kanishka Lahiri, Anand Raghunathan, and Srimat T.Chakradhar, “ Power Monitors : A Framework For SystemLevel Power Estimation Using Heterogeneous Power Models,” th Proceedings of the 18 International Conference on VLSI Design (VLSI’05), IEEE, pp: 579-585, 2005. [2] Kristian Hildingsson, Tughrul Arslan, and Ahmet T.Erdogan, “Energy Evaluation Methodology for Platform based System-onChip Design ,” Proceeding of the IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI System Design (ISVLSI’04), pp:61-65, Feb 2004. [3] Ikhwan Lee, Hyunsunk, Peng Yang, Sungjoo Yoo Eui Young Chung, Kyu-Myung Choi, Jeong-Taek Kong, and Soo-Kwan Eo, “ Power ViP : SoC Power Estimation Framework at Transaction Level”, Proc of 2006 Conference on Asia South Pacific design automation, pp:551-558, 2006. [4] Nagu Dhanwada, Ing-Chao Lin, and Vijay Narayana, “A Power Estimation Methodology for System C Transaction Level Models”, CODES+ISSS’05 , ACM , pp:142-147, Sept 2005. [5] Tony D.Givargis, Frank Vahid, and Jorg Henkel: “Instruction based System-level Power Evaluation of System-on-a-Chip Peripherals Cores,” IEEE Transaction on VLSI systems vol:10, pp:856-863, Dec 2002. [6] Dan Crisu, Sorin Dan Cotofana, Stamatis Vassiliadis, and Petri Liuha, “High–Level Energy Estimation for ARM-Based SOCs,” SAMOS 2004, LNCS 1333, pp:168-177, Springer-Verlag, Berlin Heidelberg, 2004. [7] William Fornaciari, Donatella Sciuto, and Cristina Silvano, “Power Estimation of System-Level Buses for Microprocessor-Based Architectures : A Case Study,” Proc of 1999 IEEE International Conference on Computer Design, pp:131- 136, Nov1999. [8] Kanishka Lahiri, and Anand Raghunathan, “Power Analysis of System-Level On-Chip Communication Architectures” ,Interantional Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS, ACM , pp: 236-241, Sept 2004. [9] Stefan Steinke, Markus Knauer, Lars Wehmeyer, and Peter Marwedel : “ An Accurate and Fine Grain Instruction Level Energy Model Supporting Software Optimizations,”Proc of PATMOS, pp: 3.2.1-3.2.10, Sept 2001. [10] http://www.mindspring.com/~tcoonan/index.html [11] AMBA Specification Overview , http://www.arm.com/

Suggest Documents