5.12 SLEEP simulation power curve, SW=Mp3, f = 25Mhz, 4k I/D cache . . 44. 5.13 SLEEP ... Thanks to Ad Vaassen and Marino. Strike from ...... [17] Vittorio Zaccaria, Mariagiovanna Sami, Donatella Sciuto, âPower Estimation and. Optimization ...
Circuit and Systems
2006
Mekelweg 4, 2628 CD Delft The Netherlands http://ens.ewi.tudelft.nl/
MSc THESIS An Advanced Cache Power Model for An Embedded Processor using SLEEP Methodology Jia Chen Abstract Low power design of the digital SoC (system on a chip) is a hot research area recently. The power estimation techniques can be implemented at all abstraction levels in the low power design flow, of which system level is on the top. The power estimation in this level can greatly save the designer’s effort and time of going to the lower design levels. However, in system level, the detailed information about the circuit of the system can not be extracted. How to guarantee the accuracy is critical for system level power estimation. The system level power estimation methodology (SLEEP) developed in Philips Research aims at faster simulation than RTL/gatel-level and more accurate than the excel spread sheets approach. It is based on SystemC Transaction Level Modeling (TLM) for system simulation and gate-level power values for accuracy. Seven different types of on chip components are distinguished for power estimation.The processor core is one of the most important components in a digital SoC. Therefore an accurate power estimation of the processor is of CAS-MS-2006-01 great importance to that of the whole system. In this thesis, accuracy of the SLEEP processor power model is improved by adding the cache behavior. The simulation results on an ARM based SoC show that cache power modeling improves average power estimation by 8% comparing to the existing SLEEP processor power model and has an inaccuracy of peak power estimation about 15%. A design space exploration in terms of power with SLEEP using cache power modeling shows that this model can help with quickly and accurately capturing the power behavior of software programs on a SoC.
Faculty of Electrical Engineering, Mathematics and Computer Science
An Advanced Cache Power Model for An Embedded Processor using SLEEP Methodology THESIS
submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in
COMPUTER ENGINEERING by
Jia Chen born in Xi’an, China
Circuit and Systems Department of Electrical Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology
An Advanced Cache Power Model for An Embedded Processor using SLEEP Methodology by Jia Chen Abstract
Laboratory Codenumber
: :
Committee Members
:
Circuit and Systems CAS-MS-2006-01
Advisor:
Rene van Leuken, CAS, TU Delft
Advisor:
Yijun(Sue) Xu, Philips Research, ED&T
Member:
Arjan van Genderen, CE, TU Delft
Member:
Allejan van der Veen, CAS, TU Delft
Member:
Nick van der Meijs, CAS, TU Delft
i
ii
To my parents for their endless love ...
iii
iv
Contents
List of Figures
vii
List of Tables
ix
Acknowledgements
xi
1 Introduction 1.1 Related work . . . . 1.2 Motivation . . . . . 1.3 Thesis goals . . . . . 1.4 Thesis Organization
. . . .
1 1 1 2 2
2 Background Discussion 2.1 Power estimation in low power design flow . . . . . . . . . . . . . . . . . . 2.2 System level power estimation . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 6 8
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
3 SLEEP Methodology 3.1 The whole design flow of SLEEP methodology . . . . . . . 3.2 Power model and characterizations . . . . . . . . . . . . . . 3.2.1 Power model and power parameters . . . . . . . . . 3.2.2 Characterizations . . . . . . . . . . . . . . . . . . . . 3.3 A SLEEP demonstrator . . . . . . . . . . . . . . . . . . . . 3.4 Accuracy in SLEEP . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Accuracy requirement . . . . . . . . . . . . . . . . . 3.4.2 Accuracy factors . . . . . . . . . . . . . . . . . . . . 3.4.3 Accuracy parameters and requirement for processor 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
9 10 10 10 12 13 16 16 16 17 18
4 Processor Power Modeling 4.1 Processor power modeling . . . . . . . . . . . . . . . . . . 4.1.1 Mode based processor power model . . . . . . . . . 4.1.2 Function level processor power model . . . . . . . 4.1.3 Instruction level processor power model . . . . . . 4.1.4 Pipeline state aware processor power model . . . . 4.2 Processor power modeling in SLEEP and characterization 4.2.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Instrumentation/simulation . . . . . . . . . . . . . 4.2.3 Characterization . . . . . . . . . . . . . . . . . . . 4.2.4 Advantages and Disadvantages . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
19 19 19 20 21 22 22 22 23 23 23
v
. . . . . . . . . .
4.3
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
24 26 28 29
5 Simulation 5.1 Validation method . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Hardware architecture setup . . . . . . . . . . . . . . . . . 5.1.2 Software development . . . . . . . . . . . . . . . . . . . . 5.2 Accuracy analysis of SLEEP processor power model . . . . . . . 5.2.1 Simulation for model characterization . . . . . . . . . . . 5.2.2 Simulation for analysis . . . . . . . . . . . . . . . . . . . . 5.2.3 Power analysis . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Experiment with cache power model for ARM1176jzfs processor . 5.3.1 Rationale by simulation result . . . . . . . . . . . . . . . 5.3.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 A design exploration with SLEEP . . . . . . . . . . . . . . . . . 5.4.1 Cache configuration and characterization . . . . . . . . . 5.4.2 Simulation result . . . . . . . . . . . . . . . . . . . . . . . 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
31 31 32 34 34 34 35 36 37 37 40 40 42 42 44 46
4.4
The improvement of SLEEP processor power model . . . . . . 4.3.1 Cache power modeling . . . . . . . . . . . . . . . . . . . 4.3.2 Processor power modeling with grouping instruction set Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 Conclusions and future work
47
Bibliography
52
vi
List of Figures 2.1 2.2 2.3 2.4
A top-down digital SoC design flow. . . . . Gate-level power estimation flow. . . . . . . Power estimation in different levels . . . . . A system level power estimation framework
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
3 5 5 6
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
The whole design flow of SLEEP . . . . . . . . . . . . . . . . SystemC instrument and simulation . . . . . . . . . . . . . . SLEEP power model . . . . . . . . . . . . . . . . . . . . . . . Power view over time and a state machine of power states . . Block diagram of SLEEP setup . . . . . . . . . . . . . . . . . SLEEP power view for each block-Standby tester . . . . . . . SLEEP global power view global-Standby tester . . . . . . . . Percentage of total energy within view blocks-Standby tester
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
9 10 11 12 13 15 15 16
4.1 4.2 4.3 4.4
Power state machine . . . . . . . . . . . . . Function level processor power model power Function level processor power model power ARM1176jzfs cache block diagram [24] . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
20 21 25 26
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16
Validation flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RTL setup block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . Diesel simulation, SW=Dhrystone, f = 25M hz . . . . . . . . . . . . . . Diesel simulation power curve, SW=Mp3, f = 25M hz . . . . . . . . . . Diesel simulation power curve, SW=Mp4, f = 25M hz . . . . . . . . . . Diesel simulation power curve of the core . . . . . . . . . . . . . . . . . Diesel simulation power curve of the core . . . . . . . . . . . . . . . . . Diesel simulation power curve of the core . . . . . . . . . . . . . . . . . SLEEP simulation power curve, SW=Mp3, f = 25M hz . . . . . . . . . SLEEP simulation power curve, SW=Mp3, f = 25M hz . . . . . . . . . Block diagram with the configuration parameters . . . . . . . . . . . . . SLEEP simulation power curve, SW=Mp3, f = 25M hz, 4k I/D cache . SLEEP simulation power curve, SW=Mp4, f = 25M hz, 64k I/D cache . SLEEP simulation, SW=Dhrystone, f = 25M hz (a)cycle time (b)energy SLEEP simulation, SW=Mp3, f = 25M hz (a)cycle time (b)energy . . . SLEEP simulation, SW=Mp4, f = 25M hz (a)cycle time (b)energy . . .
. . . . . . . . . . . . . . . .
31 33 35 36 36 38 39 39 40 41 42 44 44 45 45 45
vii
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . estimation estimation . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
viii
List of Tables 3.1 3.2
Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Accuracy Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 4.2
Processor power models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Instruction group example (ARM1176jzfs) . . . . . . . . . . . . . . . . . . 28
5.1 5.2 5.3 5.4 5.5
IP configration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average power value with Dhrystone benchmark from Diesel simulation Comparison table between Diesel and SLEEP result, SW=Mp3 & Mp4 . Power table of two parts SW=Dhry, Mp3, Mp4 . . . . . . . . . . . . . . Cache power model accuracy, Mp3, Mp4 . . . . . . . . . . . . . . . . . .
ix
. . . . .
32 35 37 38 41
x
Acknowledgements
I would like to thank Prof. Rene van Leuken, my supervisor in TU Delft Circuit and System group for creating this opportunity in Philips research. I am grateful to him for his advice on my graduation project and the final thesis. I gratefully thank Yijun(Sue) Xu, my supervisor in Philips Research ED&T, for her advice and guidence on both academic and non-academic issues. I also would like to thank Philippe Soulard and Bas Arts from Philips Reseach ED&T. I’m very grateful for our project discussions, when I was given the opportunity to benefit from the wealth of their insight and experience. Thanks to Ad Vaassen and Marino Strike from Philips semiconductors RTG for their help on my project. Thanks to Silpa Bhattaram who spent every coffee break with me. And thanks to all the colleagues in ED&T. Thanks to Georgi Gaydadjiev, my program coordinator, and all the professors in Computer Engineering group and Circuit and Systems group who have given me unconditional supports during the two-year program here in Delft. I would also like to say thank you to all my friends in Eindhoven and Delft, who make my life interesting. Finally, I am very thankful to my parents for their constant support and encouragement.
Jia Chen Eindhoven June 26, 2006
xi
xii
1
Introduction
Power consumption has become an important design parameter in the low power design flow for digital SoC. With the increasing of the complexity of the SoC, it is important to estimate power consumption at the early stage of the design cycles. System level design is on the top level of the design flow of digital SoC. Significant opportunities exist at this level for optimizing the system architecture and improving power efficiency. However, since the detailed circuit design is not completed in this level, the main issue is how to guarantee the accuracy of power estimation.
1.1
Related work
A lot of methods have been proposed for estimating power at the system level. Some of the techniques are dedicated to particular components on system. For processor cores, [1] monitors the activity of the various components in the processor’s micro-architecture and uses this information to estimate power consumptions. [2] is a mode based approach. Different modes (e.g. active mode, idle mode) are distinguished. [3] proposes an instruction level power modeling technique. Power models for on chip bus are described in [4]. [5] propose a system level approach for power modeling of Network on chip. For component with regular implementations, such as memories, analytical models have been proposed to estimate power consumption under given access patterns [6]. [7][8] propose a framework for power estimation for a whole system.
1.2
Motivation
SLEEP, a system level power estimation methodology developed in Philips Research, also provides a power estimation framework for digital SoC’s. In SLEEP, seven different components are distinguished for power estimation. The embedded processor is one of the most important components, hence accurate estimation of the processor’s power usage is of great importance. In SLEEP, a processor power model is mode based. Different processor power modes are considered and assigned with different power values. It is efficient and simple for characterization. However, the mode based processor power model has low accuracy. Because different power behavior of software application for each mode is not taken into account. 1
2
CHAPTER 1. INTRODUCTION
1.3
Thesis goals
Therefore the following questions will be answered in this thesis : • Whether the SLEEP processor power model can meet the accuracy requirement? • How to improve the power estimation accuracy for the processor if the SLEEP processor power model is not accurate enough? • How does SLEEP help in the low power design flow of the digital SoC?
1.4
Thesis Organization
This thesis will be organized as follows. • Chapter 2 presents the basic concepts of low power design flow and some power estimation techniques at different design abstraction levels. • Chapter 3 introduces the SLEEP methodology, the estimation design flow, the power modeling and characterization methods in SLEEP. A demonstrator is used to show the basic functionality of the SLEEP methodology. Finally, we discuss the accuracy requirements and accuracy factors of SLEEP. • Chapter 4 presents different power modeling techniques for embedded processors and introduces the SLEEP processor power model. After a detailed analysis, an advanced cache power model is presented for accuracy improvement in SLEEP. An instruction based processor power model is also proposed for further improvements. • Chapter 5 presents a simulation method for model validation and compares the power estimation results of a processor without cache modeling and one with a cache model. Results of a design space exploration that includes changing cache size and cache functions parameters are provided in the end of the chapter. • Finally, in Chapter 6, the conclusion will be drawn, and the future work will be discussed.
Background Discussion
2
The purpose of this chapter is to present the background concepts related to high level power estimation of the digital SoC. Section 2.1 introduces the concepts of a low power design flow and power estimation and discusses the typical power estimation approaches at the different design flow abstraction levels. In this thesis, system level power estimation is our main topic. Finally, in Section 2.2, SystemC transaction level modeling and SLEEP project are shortly introduced.
2.1
Power estimation in low power design flow
The emergence of portable or mobile computing and communication devices such as laptop and palmtop computers, mobile phones, wireless modems and hand held video games, etc., is probably the most important factor driving the need for low power designs. Portable devices are battery-driven. Unfortunately, the advances in battery technology have not kept up with the growth in energy consumption requirements of the various system components. Thus, power dissipation becomes the important parameter besides performance and area. A top-down digital SoC low power design flow [9] is illustrated in Figure 2.1. It consists of power estimation and optimization techniques in each level.
Figure 2.1: A top-down digital SoC design flow.
3
4
CHAPTER 2. BACKGROUND DISCUSSION
A set of power optimization techniques are applied at each design level to meet the design goals. Power estimation is concerned about the calculation of power or energy dissipation, given a percentage of accuracy, at every abstraction level. The purpose of power estimation is to increase confidence of the design ensuring that the power dissipation goals are not violated. Power estimation can provide a feed-back on the optimization phase of the design enabling the exploration of multiple design alternatives. The effectiveness of the power optimizations depends on the degree of accuracy of the power estimation. In this thesis, we will focus on the power estimation techniques. In this section, an overview of various power estimation techniques at different abstraction levels is provided, highlighting their accuracy and efficiency. The accuracy of power estimation greatly depends on the frequency, capacitance and switching activity of the circuit. The power estimation result is always more accurate in lower level since more detailed power information can be extracted.
• Transistor-level power estimation Power estimation at the transistor level can be performed by keeping track of the current drawn from the power supply during the transistor level simulation. The simulation result is close to the real power consumption, very few approximation are made. The main limitation in accuracy is due to an incomplete knowledge and modeling of parasitics and interconnect. This problem can be solved when a circuit layout is available and the wiring parasitics can be extracted. SPICE [11], PowerMill [12] and IRSIM [13] can be used to perform power estimation at the transistor level. • Gate-level power estimation Gate-level power estimation can be performed either probabilistically or by simulation. Probabilistic gate level power estimation focuses on switching activity estimation by means of some a-priori information about the input stream. Gatelevel simulation-based power analysis is based on the measurement of the switching activity in every net of the circuit during a gate-level simulation. The switching activity is combined by a power estimation tool(such as Synopsis DesignPower)with the information from the technology library to compute total power dissipation, as showed in Figure 2.2. Philips Diesel [14] is a gate level power estimation tool as well. The accuracy of the power estimation in this level depends on the high-quality library (e.g. Philips Diesel uses edt lib, a power lib in Philips.) and a good simulation model for interconnect, glitching, and gate delays. Also, high accuracy requires a large set of input vectors and consequently a long simulation time. The power estimation results in this level can be used as reference for estimations at higher levels of abstraction. • Register-transfer level (RTL) power estimation The power estimation approaches at this level model the power consumption of
2.1. POWER ESTIMATION IN LOW POWER DESIGN FLOW
5
Figure 2.2: Gate-level power estimation flow.
more abstract modules, such as muxes, adders, multipliers and registers. They have satisfactory accuracy, but their computational time is much longer when compared with system level power estimation, especially for complicated systems. The power factor approximation(PFA) [15] uses a weighting factor, determined through experiments, to model the average power consumed by a specific module. [16] provides a cycle based power estimation approach in RTL. • System level power estimation The system level approaches estimate power consumption based on high-level descriptions of system’s behavior and its intended application. The power estimation at this level has the lowest accuracy, since the detailed circuit design is not completed. In Section 2.2, we will have a detailed discussion about system level power estimation.
Figure 2.3: Power estimation in different levels
6
CHAPTER 2. BACKGROUND DISCUSSION
Figure 2.3 lists the typical power reduction opportunities and estimation time at various levels of the design hierarchy [17]. The power optimization opportunities are significantly larger at the higher levels and the iteration time used for power estimation is shorter at the higher level. However, the absolute accuracy of power estimation tends to be lower. System level power estimation is the first step in the low power design flow. Designer can take the advantages of system level power estimation to accelerate the whole low power design procedure. And how to improve the power estimation accuracy at system level is an issue for researchers.
2.2
System level power estimation
In system level, the behavior of a system is modeled and the architecture and function of the system are the major design issues. The real circuit feature hasn’t been determined. So the major problem for power estimation at system level is making relation between functional behaviors of the system with the real circuit events. According to different characteristics of each component on the system, different characterization ways are developed for different modules. They will have impact on the accuracy of the whole system.
Figure 2.4: A system level power estimation framework
Figure 2.4 shows a typical framework for system level power estimation [7]. The framework acts as a generic wrapper around the system. Each of the components has
2.2. SYSTEM LEVEL POWER ESTIMATION
7
its own power model. The monitor observes the model’s execution and extracts the data for power estimation. Power analyzers then compute the power consumption of each module. The total power consumption of the system can be deduced in the end. Processor, on chip bus, memories and application specific hardware are distinguished. A significant mount of research has focus on developing power models for individual system component. For the processor core, instruction-level power models associated power consumption with the sequence of instructions executed by the processor are widely used. It can estimate power usage dynamically and accurate relatively. However, to get the whole power table for each instruction is quite a time consuming work. Functional model and state based model are also used for processor core. A more detailed explanation on processor power modeling will be given in the Chapter 4 in this thesis. For components with regular implementations, such as memories and caches, various analytical models proposed to estimate power consumption under given access patterns. A number of models have been proposed for estimating the power consumption of on chip bus, such as transmission line models [19]. For application specific hardware, as well as standard components such as memory controllers, timers and other peripherals, power is estimated at the cycle accurate functional and behavior levels [20].
SystemC is a hardware description language (HDL) based on a C/C++ library. It combines the strong points of the high level object oriented programming language C/C++ and the concurrent timing model of a HDL. SystemC Transaction level models (TLM) are increasingly being used for SoC architecture analysis and early embedded software development. In TLM, the details of communication among computation components are separated from the details of the implementation of computation components. Communications is modeled as channels and transaction requests take place by calling interface functions of these channel models. Unnecessary details of communication and computation are hidden in the TLM and may be worked out later. Transaction-level modeling enables speeding up simulation time, exploring and validating implementation alternatives at the higher level of abstraction. There are open source modeling libraries can be used. More and more IP providers are beginning to provide such models for the users.
Incorporating power estimation techniques into a SystemC functional model designed to run embedded software would be a fast way to get power related information while doing the architecture and performance analysis. System Level powEr Estimation Project (SLEEP) developed in Philips research is based on SystemC TLM. It distinguishes 7 different components and different characterization ways. The power models in SLEEP try to capture the power related activities. The real power figures are assigned to each power activities. SLEEP methodology is a simulation based system level power estimation approach. It shares the same characteristics with other system level power estimation approaches. SLEEP provides some solutions to get a better accuracy.
8
2.3
CHAPTER 2. BACKGROUND DISCUSSION
Summary
System level design is the first step of the design trajectory when designing digital SoC. The power estimation at this level can greatly help to accelerate the procedure for low power design. The main problem for power estimation in system level relies on how to improve the accuracy to meet the power accuracy requirement in different applications. SLEEP aims at providing a new system level power estimation methodology based on SystemC TLM modeling, which can highly improve the tradeoff of the accuracy and efficiency of power estimation in system level. We will have a detailed discussion on SLEEP methodology in the next chapter.
SLEEP Methodology
3
In the previous chapter, the SLEEP methodology was introduced. This chapter contains a detailed description of the SLEEP methodology. The design flow of the SLEEP is explained in Section 3.1. The power model and the characterization of each modelled component are the key steps in the flow, Section 3.2 will give a detailed explanation of these two. A SystemC SoC subsystem is used to show the basic functionality of the SLEEP methodology in Section 3.3. Finally the accuracy issue of SLEEP methodology is addressed in Section 3.4.
Figure 3.1: The whole design flow of SLEEP
9
10
3.1
CHAPTER 3. SLEEP METHODOLOGY
The whole design flow of SLEEP methodology
Figure 3.1 shows the whole design flow of SLEEP methodology. The system architect will collect a number of IP cores to be integrated into a system, which is done at the architecture level. Most of the IP cores are SystemC TLM modules and possibly some of the cores are RTL modules with SystemC wrappers. Instrumentation, the power models and monitors are added to each module as showed in Figure 3.2. They are used to record the power related transactions. Software programs are cross-compiled for target processors. Then the system is ready to be simulated. During simulation, the power related transactions are recorded and stored in power event database. The power event databases will be post-processed and stored into power database. During post-processing procedure, power lookup tables or power equations are used to obtain power results.
Figure 3.2: SystemC instrument and simulation
3.2
Power model and characterizations
The creation of the power models and the power look up tables are the key factors in SLEEP methodology. They determine the complexity of the SLEEP method and the accuracy of the power estimation result.
3.2.1
Power model and power parameters
A power model is a model which records the neccessary information for power related transactions as showed in Figure 3.3. It communicates with a SystemC TLM functional
3.2. POWER MODEL AND CHARACTERIZATIONS
11
model through the specific interfaces to trigger its corresponding monitor which generates power related transactions.
Figure 3.3: SLEEP power model
In SLEEP methodology, the power models are based on transactions. The transactions can be a computation operation or a communication access. In addition to transactions, the models can have different power related modes such as active mode, sleep mode, and idle mode, which will determine the power consumption of a module. Power modes should also be taken into account. A Finite State Machine (FSM) based power model will be used to represent these modes/states. Power consumption in each component in a system should take the following parameters into account (i, j = 1, 2, , N): (all the parameters and the FSM for a power model are illustrated in Figure 3.4 ) • 1) In each state Si , the static power dissipation is indicated by L(Si ). It corresponds mainly to leakage. • 2) In each state Si , energy per transaction Oj is indicated by E(Si , Oj ). The transaction can be interpreted as either a communication access or a computation operation. The duration of this transaction is indicated by T (Si , Oj ). • 3) The energy to switch from state Si to state Sj is given by M (Si , Sj ), and the time required to switch is given by T (S1 , S2 ). • 4) In each state Si , the average power for all transactions can be given by P (Si ). It is aimed to be an average of E(Si , Oj ) - mentioned in the above 2) - when those detailed power figures are not available. E(Si , Oj ) or P (Si ) can be data dependent for some IP cores. In other words, if input data changes, the value of E(Si , Oj ) or P (Si ) can change as well. In the example of Figure 3.4, it shows that at a certain point in time, either E(Si , Oj ) or P (Si ) can be selected. This mainly depends on the availability and required accuracy. For example, if the model is in mode S1 at time t1 and the model is in mode S2 at time t2 , the total energy E will sum up all the 4 elements over time: P E = t1 × L(S1 ) + j E(S1 , Oj ) + M (S1 , S2 ) + t2 × (L(S2 ) + P (S2 )) (3.1)
12
CHAPTER 3. SLEEP METHODOLOGY
Figure 3.4: Power view over time and a state machine of power states
The 4 power parameters are the key elements, which will contribute to power consumption of a power model. They will be stored in a format of lookup tables or formulated into equations. Be aware that all the parameters are technology dependent.
3.2.2
Characterizations
To obtain the above-mentioned power parameters for creating the power look up table, a characterization method is needed. Here are four characterization methods, ordered by increasing accuracy: • from a C description • from a RTL description • from a gate-level description • from an analogue simulation Based on different properties of each components of a general SoC, 7 types of blocks are distinguished in SLEEP: processor cores, application specific IPs, memories, on chip bus, clock trees, clock generators and other peripheral components. Each of the blocks has its own characterization methods to extract power parameters as showed in Table 3.1. Processor core is one of the most important and complex components on the SoC. The power behavior of the processor core will greatly affect the power estimation result of whole system. For modeling the power behavior of a processor, SLEEP uses the average power figure P per “power mode” (e.g. active/sleep/standby/ power down mode). The
3.3. A SLEEP DEMONSTRATOR
Component Processor cores Memory Dedicate HW/IP Interconnectivity Clock trees Clock generators I/O pads
Characterization methods Power modes Read/Write Transactions Bit toggling Average power Average power Bit toggling
13
Parameters P or E, L, M E,L,M P or E, L, M E, L, M P , L, M P , L, M E, L, M
Table 3.1: Characterization
simulation should be timed. Otherwise, it is not possible to know how long the processor has been in a certain state, making it undoable to give an accurate estimation of the consumed power. The processor power modeling and characterization method will be discussed in chapter 4 in detail.
3.3
A SLEEP demonstrator
Figure 3.5: Block diagram of SLEEP setup
An ARM11 SoC subsystem virtual prototype in SystemC is built to show the concept of SLEEP methodology. Architecture exploration and performance analysis and software optimization for low power can also be implemented on this subsystem. The
14
CHAPTER 3. SLEEP METHODOLOGY
subsystem is based on the AXI (Advanced eXtensible Interface) protocol. All models use the AXI API as external interface for integrating into the subsystem unless otherwise mentioned. For models, which are AHB (Advanced High-performance Bus)/VPB (VLSI Peripheral Bus) based, there are adapters, which communicate between the AHB/VPB and the AXI APIs. All of the IP models support PV (Programmer view) / PV-T (timing) modes and support the AXI TLM interfaces for these modes. The block diagram is showed in Figure 3.5. SLEEP subsystem includes the following components: • ARM11 processor core • 64k ROM, 64k SRAM • Interrupter controller • VPB Timer • AXI to VPB bridge The components on this demonstrator represent the 7 different types of blocks on a generic SoC system. It can show how SLEEP can be used on a real SoC system design procedure. All the modules are SystemC models. The processor core is ARM1176 cycle callable model (CCM) surrounded by a SystemC wrapper. Software can be run on this setup. The system simulation is cycle-accurate. SLEEP power models are added to each block. The power log will be generated during the software simulation and will be post processed to get the power result accurate per each cycle. For processor core, it is impossible to add instrument in the ARM1176 CCM, a trace file which records the instruction per each cycle is generated and will be post processed to get the information about the modes changes. Two power modes are distinguished: active mode and standby mode. The power figures of each block in the power look up tables are got from characterization step. The processor power values are from the Supercharger report [3]. Figure 3.6 shows an example of SLEEP power view of each blocks with “standby tester” (the simulation frequency is 8M hz). The “standby tester” will set the timer and then switch the system to standby mode for about 500 cycles and then the timer will send an interrupt signal to switch back the whole system to active mode again. We can see that the power consumption variations on the bus, memory and interrupt controller. The power consumption for the timer is constant. The last line of the power view figure shows the processor power curve, it shows the power behavior of the processor with two values of the active mode and the standby mode. Figure 3.7 gives the global power view of the whole system.
3.3. A SLEEP DEMONSTRATOR
15
Figure 3.6: SLEEP power view for each block-Standby tester
Figure 3.7: SLEEP global power view global-Standby tester
SLEEP can also show the energy consumption per each block as showed in Figure 3.8. We can observe that the processor core consumes most energy of the processor. The SLEEP demonstrator also gives us a concrete concept about how SLEEP work and also the timing issue of the system simulation. SLEEP simulation running “standby tester” can be finished in a few seconds. However, the same application takes about 3 hours to finish with Diesel tool [2] on gate level. SLEEP saves a lot of simulation time and gives the power estimation result of the whole system. Architecture of the system can be easily changed to meet the power requirement. SLEEP can also help for the software power optimization.
16
CHAPTER 3. SLEEP METHODOLOGY
Figure 3.8: Percentage of total energy within view blocks-Standby tester
3.4
Accuracy in SLEEP
As mentioned in the previous chapter, the accuracy is critical for system level power estimation. SLEEP is expected to fulfil the accuracy requirement for generic system on chip design whiling keeping its high efficiency. This section is mainly about the accuracy in SLEEP. First of all, we will look at the power accuracy requirement for different applications.
3.4.1
Accuracy requirement
Different applications demand different power accuracy. Several use cases regarding to power estimation accuracy are listed in Table 3.2. The values in Table 3.2 are under the requirement of system Achitects from business line. The real silicon is used as the reference. The average power value and the maximal power value are both considered. Power estimation tool should also give both power values in a certain inaccuracy range.
3.4.2
Accuracy factors
The accuracy of power estimation in SLEEP mainly depends on three factors: • Power parameters (L, P, E and M) described in Section 3.2. For this issue, during characterization of each IP core the accuracy of its power parameters will be
3.4. ACCURACY IN SLEEP
17
Table 3.2: Accuracy Requirement
identified. Gate level IP cores are used in characterization to obtain the power parameters. • Power models, which model all the power related transactions. For this issue, it also depends on how accurate the SystemC TLM core is modelled. • Typical embedded SW programs running on the target processors. In order to verify the exact accuracy of the SLEEP methodology, special attention should be paid to the three factors above. The accuracy of the power parameters and power models should be checked. Also generic software program should be used as the test case. It should be able to represent the characteristics of the typical embedded software.
3.4.3
Accuracy parameters and requirement for processor
Since SLEEP processor model is cycle-accurate, we can get power consumption of the processor in every cycle. P (i) , i = 1, 2, N, is the power consumption of ith cycle. N is the number of cycles for execution of the program. The average power Pavg of one P program is the average power of N cycle. Pavg = N i P (i)/N . And the average power estimation error is: 0 Pavg − Pavg Avg.Error = | | · 100% Pavg
(3.2)
The peak power value is Ppeak = M ax(P (i)). The peak power estimation error is: P eak.Error = |
0 Ppeak − Ppeak | · 100% Ppeak
(3.3)
Sometimes, it is useful to estimation the variation of power consumption over time. The absolute cycle power error (ACPE) is: ACP E(i) = |
P 0 (i) − P (i) | · 100% P (i)
(3.4)
18
CHAPTER 3. SLEEP METHODOLOGY
The average ACPE (AACPE) over the N cycles is used to measure the accuracy of cycle by cycle power estimation. The lower the AACPE, the higher the accuracy. The processor core is one of the most important components for the SoC system. It is usually the highest power consumer on the system. The power accuracy of the processor core will greatly affect the accuracy of the whole system. As showed in Figure 3.8, processor core consumes over 90% of total energy in the SLEEP demonstrator. Therefore, the accuracy of the power estimation of the process in the system will decide the accuracy of the whole system to a certain extent. Here, we define that Table 5.2 is also the accuracy requirement for the processor in SLEEP. That is Avg.Error