Synthesizing massive parallel simulation systems

Synthesizing massive parallel simulation systems to estimate switching activity in finite state machines Werner W. Bachmann ø, and Sorin A. Huss øø †

Department of Computer Science, Integrated

††

Department of Computer Science, Int e-

Circuits and Systems Laboratory

grated Circuits and Systems Laboratory

Darmstadt University of Technology


Alexanderstr. 10

Alexanderstr. 10

64283 Darmstadt, Germany


- [email protected] ( (+49) 6151 / 16 - 6650 (+49) 177 / 2750430 * (+49) 6151 / 144853

- [email protected] ( (+49) 6151 / 16 - 6649 * (+49) 6151 / 16 - 4810

ABSTRACT This paper presents a methodology for automatic generation of massive parallel simulation systems to estimate the power consumption of sequential CMOS circuits. Based on a high level description of Finite State Machines C++ simulator code is being generated to estimate the power consumption of a CMOS based implementation. From a symbolic FSM description, a Monte Carlo simulation is used to estimate the average switching activity of all circuit components exercised according to user-supplied modelling param eters. Automatic generation of high parallel simulation systems with minimal communication between simulator instances as proposed in this paper allows the use of a wide range of target systems ranging from workstation to supercomputer. Due to the complexity of modern integrated circuits the use of massive parallel simulation systems up to several thousand microprocessors allows a highly efficient simulation of very large sequential ci rcuits.

Synthesizing massive parallel simulation systems to estimate switching activity in finite state machines Werner W. Bachmann ø, and Sorin A. Huss øø †


††






Alexanderstr. 10

Alexanderstr. 10



- [email protected]

- [email protected]

Abstract - This paper presents a methodology for automatic generation of massive parallel simulation systems to estimate the power consumption of sequential CMOS circuits. Based on a high level description of Finite State Machines C++ simulator code is being generated to estimate the power consumption of a CMOS based implementation. From a symbolic FSM description, a Monte Carlo simulation is used to estimate the average switching activity of all circuit components exercised according to user-supplied modelling param eters. Automatic generation of high parallel simulation systems with minimal communication between simulator instances as proposed in this paper allows the use of a wide range of target systems ranging from workstation to supercomputer. Due to the complexity of modern integrated circuits the use of massive parallel simulation systems up to several thousand microprocessors allows a highly efficient simulation of very large sequential circuits.

1 INTRODUCTION Power estimation has become an important issue not only for battery operated devices, where the need for low power implementations is directly connected to operating time of the device, but also for high-speed integrated circuits. With the dramatic decrease in device size and the corresponding increase in the number of devices on a chip, the power consumption is becoming one of the limiting factors for the number of components that can be placed on one single chip. A comprehensive review of existing power estimation techniques is given in (Tsui 1995) and (Devadas 1995). Usually, the dynamic switching process, results in charging and discharging of capacitors and first dominates the total power consumption of a CMOS circuit. The main

conceptual difficulty in estimating the power consumption of a sequential circuit is that ist value states depends on the input signal driving the circuit. The second difficulty is the fact that inputs signals as well as ‘memory’ signals may be (highly) correlated, which has to be taken into consideration when estimating power consumption. Finally, the increasing number of devices results in increasing numbers of states in sequential circuits and this results in turn in very long computation times during power estimation (Najm 1995a), (Tsui 1995). In this paper, we present an approach to high-speed switching activity estimation for sequential circuits at different abstraction levels and its implementation as an object oriented simulation and specification environment.

2 POWER ESTIMATION Power estimation for sequential circuits can be done at various levels of abstraction. The main choices are related to the model of the combinatorial circuit (state transition function) and the memory representation of the states. In order to estimate the power consumption of functions associated with the output lines the same techniques as for the computation of the power consumption of the combinatorial circuit may be used. Choices for modeling the combinatorial circuit are an algebraic model, a zero delay model (used by most power estimation systems), an equal delay model, or a stochastic delay model. The states can be coded in various ways with different consequences for power consumption (Koegst 1996), (Najm 1995b). All sequential circuits are correlated in some way: obviously due to their memory, maybe due to the assembler language this ‘processor’ executes, or due to the limited range of functions which will operate on a specific ASIC (Najm 1995c). The power analyzer has to handle every

kind of correlation within the sets of input and internal signals. Therefore, the only assumption made for this approach is that the FSM is live, which means all states can be reached from every state within a finite number of transitions. After an infinite number of transitions, the current state does not depend on the start state, which means that the FSM is not periodic.

2.1 A Power Dissipation Model The power dissipation of a single node (i.e. output gate) can be described by: P=

1 ⋅ C ⋅ V DD 2 ⋅ f ⋅ D + Q SC ⋅ V DD ⋅ f ⋅ D + I leak ⋅ V DD 2

(1)

where P denotes the total power, VDD the supply voltage, D the switching activity, C the capacitance of the node, QSC the quantity of charge carried by the output transitions due to current flowing from supply to ground, f the frequency of operation, and Ileak represents the static power consumption, respectively. In Eq. (1) C,VDD,QSC and Ileak are technological parameters, which are viewed as constants in high-level simulation. There are some assumptions that have to be made to allow this kind of approach to power estimation: the clock frequency f is defined as a constant, inputs change their values once and only once per clock cycle and all combinatorial circuits compute their final outputs within the clock period. These assumptions do not restrict the technological realization of the circuit, because they are enforced by other implementation constraints as well.

be limited by the speed of the messaging system used for interprocess communication. This also limits the speed of heterogeneous environments where different processors are running at different speed, which can slow down the parallel simulation system to the speed of the slowest subsystem. The switching activity of any circuit is highly sensitive to the selected delay and circuit model (Chou 1996), (Rabe 1996). Furthermore, the switching activity together with technological parameters such as VDD, is one of the key parameters for power consumption in CMOS circuits. Therefore, a ‘high-level’ power estimation has to provide a wide range of delay and circuit models to achieve results that are comparable to gate level estimation techniques.

2.3 FSM Models We define an FSM as a sequential circuit as: FSM = A Q, Σ, δ , q0 whereas Q : States ( Q’: memory with any coding of Q) S : Alphabet of inputs with distribution F(S) ( represented by a set of binary input signals) d : State Transition Function qo : Initial state

The formal representation of Eq. (2) results in model implementations at different abstraction levels, e.g. [14], as depicted in Figure 1 and Figure 2, respectively.

{ {

Input Lines

δ

Present State

2.2 State of the Art Most existing power estimation techniques deal either with combinatorial circuits only (Najm 1995b), (Lim 1996) or are specialized in sequential circuits (Najm 1995a). Many combinatorial power estimation methods require knowledge of the statistical behavior of all input signals, like their switching activity, etc.

(2)

Output Lines

{

}

Q’

Next State

clk

Figure 1 : Behavioral FSM model at gate level a+c

Power estimation methods used for sequential systems often assume a zero delay model (Najm 1995a), (Uchino 1995) for the combinatorial circuit. Therefore, the values for the switching activity inside the combinatorial circuit cannot be computed by the sequential power estimation system.

1 c a

a+b+c

succ

4

2

b

a

b

Calculating the convergence conditions often requires a ‘global view’ of the simulation system which means that every simulator instance requires status information for all other instances (Najm 1995a). Alternatively a specialized ‘convergence control process’ needs information from all the parallel simulators. Thus, this kind of simulation system requires a very fast interconnection between the executing processors. Otherwise the simulation speed will

c

a

3

Figure 2 : Behavioral FSM model at RT level

3 ESTIMATION OF SWITCHING ACTIVITY We denote for signal xi and simulation time k :

Therefore the simulation system should be able to use all available computing power, from workstation to supercomputer, at maximum speed.

Switching activity of signal x 1 x i(k) ≠ x i(k − 1 ) t xi (k) =  (3) 0 else Signal probability P P(x i ) = lim

K →∞

1 K

K

∑ x (k)

(4)

i

k =0

Transition density D D(x i ) = lim

K →∞

1 K

K

∑ t (k) i

(5)

k =0

3.1 Advantages of Massive Parallel Simulation Systems Because it can be shown that the computation of state and transition probabilities from the input signals is NP - hard for a sequential circuit (Najm 1995c),, efficient parallel computation and heuristical estimation of the real state and transition probabilities are the goals of this approach. Therefore a state probability is denoted as the probability of the signal xi for the state qi when xi equals to 1, iff the FSM is in state qi, and 0 otherwise. This is called ‘1 hot coding’. From a user supplied error tolerance ε and confidence value α it can be shown (Najm 1995a) that the total number N of simulations to fit these parameters can be predicted as : 2

N ≥ max( N 1 2 , N 2 , N 3 2 )

(6)

whereas zα N1 = 2 2ε N2 = N3 =

(7)

z α ⋅ 2ε + 01 . + ( e + 01 . )z α 2

2ε 63 + z α

2

2 ε

2 2

expressed in the large number of simulation runs needed to reach a low error tolerance at a high confidence level.

+ 3ε (8) (9)

where zα/2 in Eqs. (7) to (9) is such that the probability of a standard normal variable being greater than zα/2 equals to α/2. E.g. in order to reach a confidence of α = 0.05 and an error ε = 0.05 N = 500 simulation runs will be needed.

4.1 Target System Limitations Supercomputers are typically limited in the amount of memory and IO bandwidth available. Most supercomputers allow efficient communication only between processors directly connected by hardware links. Clusters of workstations are limited by the capabilities of the network connection which is typically 10 - 100 MBit per second. Workstations typically operate at different speed or the speed of the individual workstation is influenced by other processes. A problem specific difficulty when dealing with sequential circuit simulation is the estimation of a convergent simulation run. Applying profiling techniques (Schneider 1997) to different convergence criteria has shown that a notable piece of computing power and communication overhead is spent for continuous convergence checking.

4.2 Requirements for Simulator Implementation To get a high efficient and flexible simulation system the following requirements for the simulator are formulated : 1. The simulation should use all its history to compute the results. 2. The convergence criterion should be checkable by the simulation without the need of communicating with any other process. 3. The convergence criterion needs not to be checked after every simulation run to achieve convergence. 4. All data collected should be used to increase the precision. 5. The memory usage should be as low as possible and tunable by the simulator configuration without changing the simulator code. 6. Interpreted and compiled execution should be supported. The requirements stated above can be met my means of the approach proposed in the following. We define the average state probability pi(k) and the average transition density di(k) for state qi at simulation time k as pi ( k ) =

∑q ( j)

1 k

∑t

4 IMPLEMENTATION One major problem when dealing with sequential circuit simulation is the huge amount of computing power needed to achieve a low error tolerance. In our approach this is

di ( k ) =

k

1 k

i

(10)

j =1 k

j =1

qi (

j)

(11)

and the maximum simulation probability difference (Schneider 1997) for state qi with history length l at simulation time k as diffp i ( k , l ) = max( p i ( j )) − min( p i ( j )) (12)

(

)

(

)

diffd i ( k , l ) = max d i ( j ) − min d i ( j ) (13)

4.4 Architecture of the Statistical Simulator In order implement a design environment that supports any kind of circuit model, a strictly object-oriented approach was chosen. Therefore, all objects used in this design environment and all arithmetic rules are specified as C++ classes and methods. The ASCET system consists of 5 main hierarchical layers of abstraction (Figure 3).

for l l   k − 2  < j <  k + 2      The simulation is said to be convergent at simulation step k with error γ at history length l if

∀ diffp i ( k , l ) ≤ γ ∧ ∀ diffd i ( k , l ) ≤ γ i

(14)

parallel Simulator

parallel computing platform

graphical specification environment

symbolic FSM representation

i

statistical layer

These criteria are chosen heuristically in order to eliminate communication and to fit the requirements. If γ V ε the accuracy of the simulation is not influenced by this criterion. The convergence criterion shown above could be implemented at computation cost O(1) and without any communication overhead. Profiling techniques applied to the simulator show that this kind of convergence control uses less than 2 % of the simulation time even if the convergence is checked after every „clock-cycle“. This criterion fits the requirements.

4.3 Simulator Implementation Because of the complexity of sequential circuit modelling a powerful graphical user interface has been implemented to give maximum support and feedback to the designer. This Graphical Sequential Circuits Environment ASCET is used for all user interaction, whereas the simulators are ‘plug-in’ components. The ASCET system has been implemented and tested on the following platforms with identical results in convergence and accuracy :

HP PA RISC SUN SPARC Intel Pentium and Pentium Pro

Full ASCET Implementation running HP UX running Solaris and Sun OS running SCO Open Server 5 or linux Simulator only

Parsytec GC\PP (64 x Power PC 601) Parsytec Gcel 1024 (1024 x T805) Parsytec CC (48 x Power PC 604)

running PARIX (5.12 Gflops) running PARIX (4.4 Gflops) Available soon running AIX/nK (12.76 Gflops)

Table 1: Supported platforms1

1 We would like to thank the Paderborn Center of Parallel Computing PC2 at the university of Paderborn for providing their resources to support this project

numeric and gate level descriptions

Figure 3 : Architecture of the ASCET system The first layer called numeric layer (Figure 3, bit) implements all operations that are performed on bit values. This layer implements boolean operations, such as or, and, not according to their mathematical definition, as well as operations from a user-supplied cell library such as and3, which describes a 3 input and gate with delay model. All bit-valued operations and their scheduling (delay models) are implemented in this level. The grammar of operations is defined at this level as well. This is done as a standard yacc grammar, so it can be easily adjusted to other operations or cell libraries. The second layer is a statistical layer, where random bitvalued variables and distribution functions are provided (Figure 3, fsm). The ASCET system supports equal distributed, normal distributed, Weibull and hypergeometric distributed bit values. Any user-specified distributions or portions of assembler code for input variables are available at this level The third layer is a symbolic FSM layer (Bfsm), where a fully functional interpreted FSM model can be built. This model is used for ad hoc interpreted simulation, semantic checks and testing. The fourth layer is the layer of graphical interaction (XMfsm), where the user interacts with all elements of the FSM specification and simulation directly. This allows comfortable modeling and testing features, which is important due to the complexity of many sequential circuits. The fifth layer devoted to compiled and massive parallel simulation (Cfsm). For every FSM a specialized simulator code is generated automatically as a stand alone program to perform a high-speed simulation (Figure 3, Figure 3).

The ASCET design system uses an OO database agent (DB) with a relational database back-end to store all design data, so import and export from and to any other design system can be done easily. The menu system and the dialog oriented user interaction is provided by a OSF/Motif based user interface and located in a separate layer (XDfsm). As to allow parallel development and extensive testing the main ACET Layers (Figure 3) are divided into several „sub“ layers which represent the steps of specialization (Figure 3). For example BG represents a numeric model of directed graphs, whereas a finite state machine (located in Bsfm) is a specialized directed graph.

precision and schedule of the graphical evaluation and test environment and the massively parallel simulation system synthesized by ASCET. Furthermore, testing and evaluation of algorithms at a very high (but slow) level of abstraction using all features of object oriented programming environments is supported. The interface to high speed parallel simulation (Figure 5, Cfsm) is not influenced by features located in different inheritance trees or different design layers and the parallel simulator remains very compact in terms of memory usage, machine code execution and user interface.

4.6 Simulator Design The simulation flow of a FSM can be specified by the following diagram.

specialisation

DB

bit

B

BG

fsm

XM

XMBG

Bfsm

XMBA

XMfsm

complexity

BDT

Cfsm

The highlighted parts in Figure 6 (Setup and Evaluation) have to be synthesized by one of the appropriate tools, for example by the ASCET graphical user environment, by one oft the ASCET netlist import filters2 or by a user provided converter. All other functions are provided by the generalized simulation model which is located in Bfsm and optimized for compiled execution at Cfsm. This provides the functions for input generation, convergence control, scheduling and result computation and storing.

BA XDfsm

I simulator setup

input generation

Figure 4: Hierarchical design structure of ASCET II

evaluation of the combinatorical circuit

4.5 Inheritance Schemes scheduling of the desired operations

A generalized inheritance scheme for large projects has be developed which allows the design of very specialized and high efficient code at high levels of abstraction. This scheme provides capabilities like dynamic type checking and design rule checking at the higher levels of abstraction and high speed numeric capabilities used for massive parallel simulation at lower abstraction levels. It thus provides a high performance at a high level of accuracy XM

BG

XMBG

XMBA

calculate results

convergence checking

fsm

Figure 6: Simulator steps

Bfsm

XMfsm

XDfsm

state transition (clock)

Cfsm

XMPfsm

Figure 5: Inheritance scheme in ASCET The inheritance scheme shown in Figure 5 provides machine code equivalent sequences for the basic numeric operations (Figure 4, bit, fsm) on different layers of abstraction. This guarantees identical numeric behavior in

4.7 Synthesis of Simulator Code Actually there are two major representations of simulators that can be synthesized by ASCET. The first one is called ‘symbolic simulator’ which is optimized for execution speed and calculates state probabilities and transitions densities for the states of an FSM

2

This was done with the ISCAS benchmarks for example

only. The ASCET design system optimizes the execution speed by eliminating transitions and the associated equations that are not feasible for a current state. This is done by substituting such equations by a constant 0. This is called ‘symbolic’ because a hardware realization of these equations would always be active, no matter whether the final result depends on the input or not. Thus this kind of simulator is used for estimation at behavioral RT level. The following example details the setup (Figure 6,I) and evaluation (Figure 6,II) methods as they are synthesized by the ASCET system for the simple sequence detector in Figure 7. This is an FSM model for a scanner which identifies the string „abcdefghi“ from a set of inputs, whereas each character of the input is represented by one input signal.

void XfsmSimulator::Initialize() { s_01.SetMyFSM( &fsm ); // state init s_01.SetMyName( "s_01" ); .... t_0001.SetMySourceState( &s_01 ); // transition init t_0001.SetMyTargetState( &s_02 ); t_0001.SetMyConditionFormula( "a" ); .... a.SetMyName( "a" ); // input init a.SetMyFSM( &fsm ); };

Figure 9: FSM setup void XfsmSimulator::FireActiveTransition() { if ( s_09 ) { s_07 = h; // change the active state s_01 = !h; // according to setting of h

¬a

s_02 = s_03 = s_04 = s_05 = s_06 = s_08 = s_09 = s_10 = 0;

1

1

10 ¬i

a

2 ¬b

i

9 ¬h

b

return; }; ..... };

// update the other // states // this emulates // the clk tick

h

Figure 10: Evaluation for symbolic simulation 3

8 ¬g

¬c

g

c

4 ¬d

4.8 Specification of Estimation Task

7 ¬f

d

f

e

5

6

¬e

Figure 7: Simple sequence detector

class XfsmSimulator : public CBfsm { virtual virtual

void void

Initialize(); FireActiveTransition();

CBfsmState s_01 , s_02 , s_03 , s_04 , s_05 , s_06 , s_07 , s_08 , s_09 , s_10; CBfsmTransition t_0001 , t_0002 , t_0003 , t_0004 , t_0005 , t_0006 , t_0007 , t_0008 , t_0009 , t_0010 , t_0011 , t_0012 , t_0013 , t_0014 , t_0015 , t_0016 , t_0017 , t_0018 , t_0019 , t_0020; CBfsmInput a , b , c , d , e , f , g , h , i , j , k; };

Figure 8: C++ Class representing the FSM

Figure 11 : Graphical Sequential Circuits Environment FSM specification can be done with any appropriate design tool e.g. the GUI of ASCET (Figure 11). It can then be translated into a compiled simulator in order to use the numeric capabilities of the ASCET system. This design flow followed for the net-list descriptions of the ISCAS benchmarks[13]. In order to exploit all features of ASCET, the graphical user interface should be exercised which provides a comprehensive functionality such as interpreted and animated simulation (Figure 12), conver-

gence monitoring, activity estimation monitor, and many other features.

The real error in both cases is much smaller than the proposed ε which was the result in all examples where a mathematical solution could be found.

5 EXPERIMENTAL RESULTS 5.1 Benchmarks

Figure 12 : FSM model If equal distribution is selected for the inputs a and b the estimated probability and transition densities should be very close to 0.5 because this FSM toggles its state at half of the possible inputs and keeps its state otherwise. The simulation is performed as a single simulation run with l = 100 and γ = 0.005. The resulting estimated values are given in Table 2.

State s1 s2

Probability 0.499167 0.500833

Density 0.507009 0.507009

The following benchmarks are done at a cluster of workstations with Intel Pentium Pro 200 MHz, running SCO Open Server 5. This system was chosen because it can be used in single user mode to prevent network or other system activity from influencing the benchmark. Due to the design of the system the performance scales linear by the number of available processors. So, the total time of execution for the multiple simulation needed to reach the accuracy specified remains constant until the number of available processors is exceeded. All benchmarks are done with history length l = 25 and diffp = diffd = 0.005 compiled by GNU g++-2.7.2 with all machine dependent optimization turned on. These settings allow ε ≤ 0.01 for the confidence interval, compared to 0.05 in (Najm 1995a). The system was tested on a netlist representations of some of the ISCAS-89 sequential benchmark set. A translation of the entire benchmark set to symbolic representations is currently under development and will provide much faster simulation times, because the simulator code itself will be optimized by the code synthesizer. This is shown at generic2 and generic11 which are of a size and complexity comparable to the s27 and s1423 of the ISCAS benchmark circuits. Single Simulation

Table 2: Sample Simulation For ε ≤ 0.05, l = 100, γ = 0.001 and the confidence α = 0.05 the simulator has to be run 490 times to guarantee the confidence. For ε ≤ 0.005, l = 100, γ = 0.0005 and the confidence interval ∝ = 0.05 there are 9604 runs needed. The results of these runs are presented in Table 3 and in Table 4. State s1 s2 error s1 error s2

Probability 0.49855737285 0.50148070321 0.00144262714 0.00148070321

Density 0.501017100251 0.501061712634 0.001017100251 0.001061712634

Table 3: Simulation results (490 runs) State s1 s2 error s1 error s2

Probability 0.49987352805 0.50013451415 0.00012647194 0.00013451415

Density 0.50016289915 0.50013432490 0.00016289915 0.00013432490

Table 4: Simulation results

name #Inputs #gates #latches generic2 2 generic11 11 s27 4 8 3 s838.1 34 446 32 s1423 17 657 74 s5378 35 2779 179 s9234.1 36 5597 211

compile time 2 sec 2 sec 3 sec 6.7 sec 12.6 sec 33.0 sec 39.0 sec

execution time 0.002 sec 0.095 sec 0.020 sec 0.070 sec 1.600 sec 7.000 sec 9.000 sec

Massive parallel Simulation 10 Nodes: ε α #runs generic2 generic11 s27 s838.1 s1423 s5378 s9234.1

0.05 0.10 460 2.09 sec 6.37 sec 3.92 sec 9.92 sec 86.20 sec 355.00 sec 453.00 sec

0.05 0.01 0.01 0.05 0.10 005 490 6768 9604 2.09 sec 3.35 sec 3.92 sec 6.65 sec 66.29 sec 93.23 sec 3.98 sec 16.53 sec 22.20 sec 10.13 sec 54.07 sec 73.92 sec 91.00 sec 1095.48 sec 1549.24 sec 376.00 sec 4770.60 sec 6755.80 sec 480.00 sec 6130.20 sec 8682.60 sec

Massive parallel Simulation 25 Nodes:

ε α #runs generic2 generic11 s27 s838.1 s1423 s5378 s9234.1


0.05 0.01 0.01 0.05 0.05 0.05 490 6768 9604 2.03 sec 2.54 sec 2.76 sec 3.86 sec 27.72 sec 38.49 sec 3.39 sec 8.41 sec 10.68 sec 8.07 sec 25.65 sec 33.59 sec 43.96 sec 445.75 sec 627.25 sec 170.20 sec 1928.04 sec 2722.12 sec 215.40 sec 2475.48 sec 3496.44 sec

Massive parallel Simulation 100 Nodes (extrapolated): ε α #runs generic2 generic11 s27 s838.1 s1423 s5378 s9234.1





Table 5: Benchmarks

6 SUMMARY AND CONCLUSIONS In this paper we discussed the automatic generation of high-speed massive parallel simulation systems for sequential circuits based on a ‘high-level’ symbolic description of Finite State Machines. Massive parallel simulators can be generated automatically. According to user-supplied modeling parameters. Benchmarks on the ISCAS benchmark circuits show a significant performance improvement which allows the simulation of very large sequential circuits with low error tolerance. Implementing the design and simulation system on several computing platforms, ranging from workstation to supercomputer, with identical results in precision and response proofed that ASCET provides a highly scaleable and portable system.

REFERENCES Brglez, F., D. Bryan and K. Kozminski, Combinational profiles of sequential benchmark circuits, IEEE International Symposium on Circuits and Systems, pp. 19291934, 1984 Chou , Tan-Li, Estimation of Activity for Static and Domino CMOS Circuits Considering Signal Correlation’s and Simultaneous Switching, IEEE Transaction on Computer Aided Design of Integrated Circuits and Systems, pp. 1257-1265, Oct. 1996 Devadas S. and S. Malik, A Survey of Optimization Techniques Targeting Low Power VLSI Circuits,32nd ACM/IEE Design Automation Conference, ses. 16/1, 1995

Koegst, M. , G Franke and K. Freske, State Assignment for FSM Power Design, European Design Automation Conference, pp. 22-28, 1996 Lim, Yong Je, Kyung-Im Son, Heung-Joon Park and Mani Soma, A Statistical Approach to the Estimation of Delay-Depend Switching Activities in CMOS Combinatorial Circuits, Design Automation Conference, pp.445-451, 1996 Najm, Farid N., Shashank Goel and Ibrahim N. Hajj, Power Estimation in Sequential Circuits, 32nd ACM/IEEE Design Automation Conference, ses. 36/5, 1995a Najm, Farid N. and Michael Y. Zhang, Extreme Delay Sensitivity and Worst-Case Switching Activity in VLSI Circuits, ACM/IEEE Design Automation Conference, ses. 36/3, 1995b Najm, Farid N., Feedback, Correlation, and Delay Concerns in the Power Estimation of VLSI Circuits, ACM/IEEE Design Automation Conference, ses. 36/1, 1995c Najm, F., A Survey of Power Estimation Techniques in VLSI Circuits, IEEE Transactions on VLSI Systems, pp. 446-455, Dec. 1994 Rabe, Dirk and Wolfgang Nebel, New Appoach in GateLevel Glitch Modeling, European Design Automation Conference, pp. 66-72, 1996 Schneider, Petra, Concept and implementation of objectoriented simulation techniques for high speed simulation of sequential circuits, Technical University of Darmstadt, Master thesis, 1997 (in German) Tsui, Chi-Ying José Monteiro, Massoud Pedram, Srinivas Devadas, Alvin M. Despain and Bill Lin, Power Estimation Methods for Sequential Logic Circuits, IEE Transactions on VLSI Systems, pp. 404-416, Sep 1995 Uchino, Taku, Fumihiro, Takashi Mitsuhashi and Nokuyki Goto, Switching Activity Analysis using Boolean Approximation Method, IEEE/ACM International Conference on CAD, ses. 1B/1, 1995

Synthesizing massive parallel simulation systems

Synthesizing massive parallel simulation systems

Suggest Documents

Simulation Modelling of Parallel Systems - CiteSeerX

BIAS IN PARALLEL AND DISTRIBUTED SIMULATION SYSTEMS

Parallel Simulation of Multicomponent Systems - Semantic Scholar

SYNTHESIZING EXPERT SYSTEMS AND ...

Parallel and Distributed Simulation of Discrete Event Systems - DRUM

Simulation of Cache-based Parallel Processing Systems using - IJEE

A Parallel Transient Stability Simulation for Power Systems

Simulation Modelling of Parallel Systems in the EDPEPPS ... - CiteSeerX

Parallel and Distributed Simulation of Discrete Event Systems - DRUM

A Parallel Monte Carlo Simulation on Cluster Systems ... - koasas - kaist

Synthesizing Information Systems: the APIS Project - Informatique

Parallel Spatiotemporal Spectral Clustering With Massive Trajectory ...

Utility of different massive parallel sequencing

Targeted massive parallel sequencing: the effective ... - BioMedSearch

Parallel Gradient Domain Processing of Massive Images

Massive parallel-sequencing-based hydroxyl ... - Semantic Scholar

Nonoptical Massive Parallel DNA Sequencing of

A general exact method for synthesizing parallel-beam projections ...

Synthesizing Parallel Graph Programs via Automated Planning - ISS

PARALLEL SIMULATION OF GROUP BEHAVIORS

SIMULATION and ANALYSIS of PARALLEL

PARALLEL STOCHASTIC SIMULATION OF MACROSCOPIC ...

Time parallel gravitational collapse simulation

Parallel Database Systems