Power-Aware Multi-Voltage Custom Memory Models for Enhancing RTL and Low Power Verification Vijay Kiran Kalyanam, Martin Saint-Laurent
Jacob A. Abraham
Qualcomm Technologies Inc., Austin, TX, USA vijaykir,
[email protected]
CERC, The University of Texas at Austin, USA
[email protected]
Abstract—We describe a methodology to model the low power and voltage behavior of multi-voltage custom memories in processors. These models facilitate early power-aware verification by abstracting the transistor-level representation of the memory to its power-aware behavioral RTL model. To the best of our knowledge, this is the first attempt at addressing the power-aware RTL model generation problem for custom memories. In our method, we identify voltage crossing points in transistors across channel connected components and use these crossing points to transform the RTL for power-awareness closely matching its circuit implementation. Without the proposed abstraction technique to generate power-aware RTL, low-power verification of such memories will need to be done using transistor-level simulations that are prohibitively time-intensive and hence impractical. We check for correctness of these generated poweraware memory models through formal equivalence, symbolic simulations, assertion and simulation based verification. These models are also validated using static power-domain checks. By applying this methodology in a power-aware design and verification framework on a commercial processor, we identified and corrected low power circuit and RTL bugs prior to tape-out.
I.
verifying low-power properties of power management schemes for processors. We approach this problem by abstracting the transistor implementation to behavioral RTL level and applying the existing electronic design automation (EDA) tools for enhancing power-aware verification of these memories embedded in a processor. We believe ours is a first to demonstrate this and apply to power-aware design and verification. Unified Power Format (UPF) [3] is used to augment the RTL with information regarding constraints on power domain crossings, isolation cells, sleep logic, etc. Commercial EDA tools use RTL files, along with UPF rules, in order to verify low-power behavior. We outline a strategy to add UPF constraints to RTL behavioral models of custom multi-voltage memories so that they may be plugged into the EDA tool along with the RTL+UPF of the rest of the design. In order to do this, the RTL of the memories must first be partitioned correctly with respect to its various power-domains. After that, UPF constraints of these memories can be specified at the power-domain crossover boundaries to create a power-aware RTL memory model. We then enable these memory models in a processor thus facilitating its electrical and functional verification. The main ideas and key contributions of our work in creating these RTL power-aware memory models are: • Usage of custom memory circuit’s channel-connectedcomponents (CCC) at voltage crossover points for UPF creation of power-domains, power-switches and RTL modeling of sleep and isolation logic. • Application of CCC grouping and dual voltage circuit template matching in memories based on specific properties of these circuits to generate the level shifter and isolation rules. • Transformation of the custom memory RTL by repartitioning and re-writing the RTL based on inferred power/voltage-domains resulting in a cohesive RTL and UPF for memories.
I NTRODUCTION
Standby leakage power and power-gating are trade-offs that need to be carefully designed and architected in a battery constrained processor’s low energy solution. Every state of the art processor has an in-built power management functionality that enables transitions to low power sleep modes and voltagefrequency scaling for achieving high energy efficiency. Since always-on memories consume high standby leakage power, complex multi-voltage custom memories and advanced power gating with fine grained sleep controls are becoming ubiquitous in processors [1], [2]. These multi-voltage custom memories can be verified completely only if it can be modeled correctly. The transistor behavior represents the power-gated and voltage states but switch level simulations cannot be co-simulated in processor’s verification environment due to simulation and memory resource constraints. Hence, there is a need to model and verify the power awareness of these split-grid memory circuits at a higher level of abstraction. There are lack of tools in industry and methodology in literature that propose a way to generate and verify a power-domain partitioned custom memory behavioral RTL given a multi-rail circuit implementation. We call this the “RTL power-aware memory modeling design and verification problem”. We address this problem by providing a technique to incorporate power-awareness into the behavioral RTL models of custom multi-voltage memories. This facilitates the inclusion of low-power aspects of such memories and
c 978-1-4673-7166-7/15/$31.00 2015 IEEE
The generated power-aware RTL memory models using the proposed method will match silicon behavior only if every power-domain of custom memory RTL is partitioned correctly matching its transistor behavior. We validate the correctness of these power-aware models using various techniques like static power-domain checks, formal equivalence, simulation/ assertion based verification. We apply directed and random functional verification tests on these power-aware memory models, which enhances verification by exercising various low power modes like voltage scaling and power-gating etc. A power-collapsed region within a memory can inject Xs or simulate random values whose propagation can expose a design
24
bug resulting in functional simulation failures. We also have legal assertion properties written for sleep, isolation enable and internals of memories’ behavior for various low power modes. We enable power-aware assertion based verification using these models to identify and fix electrical design issues. II.
R ELATED W ORK
Though the focus of this paper is not physical design synthesis of multiple power-domains, which presents challenges as discussed in [4], our proposed method can also be used to generate appropriate RTL+UPF for power-aware synthesis as well. Lots of work has been done in the area of functional verification of memories in processors [5], but RTL modeling of low power behavior of memories with embedded voltage and power-gating information and functional verification of low power memory sleep operation is at its infancy. Also power-aware memory models generated from our work can be used to enhance the verification techniques proposed in [5]. “Power aware model” in our work is not to be confused with a similar term used in the context of power estimation, a completely different concept used in works such as [6]. We also clearly distinguish our work from [7]. In their work, functional power-aware simulation of low power islands and retention flop cells is discussed. But it does not address the problem of modeling and verifying the power-gated custom memories. Authors in [8] demonstrate a way to statically check transistors for power-domain crossing violations. It does not even come close to our objective of power-aware RTL/UPF model generation and verification for custom memory circuits using CCCs. CCCs have been used for transistor circuits earlier [9], but this concept has not been applied to voltagedomain crossing circuits to generate power-domain based RTL partitioning. RTL model generation from transistor netlists is described in [10], but in our case our main contribution is in transformation of existing RTL and generation of its UPF. Circuit template mapping is discussed in [11] for circuit recognition, but our work extends it to dual-rail voltage circuits by leveraging its multi-voltage properties. RTL rewrites have been used for equivalence checking [12], but here we use it to guide our transformation of RTL using our observation that voltage crossings occur across CCCs and helps in correctly partitioning the RTL with respect to power-domains. III.
Voltage level shifters (V LS) help in correct functional operation across voltage crossing regions by level shifting the voltage of a signal to its right voltage level. Isolation cells (ISO) enable electrically safe power shutdown states from affecting powered on regions by fencing off the propagation of non-deterministic logic states. Both V LS and ISO circuitry help prevent malfunctioning of circuits by preventing short circuit currents (Isc ) as shown in Fig. 1. Reducing Isc helps reduce processor’s power consumption hence, increasing battery life and preventing heating and thermal issues. The term isolation and clamp refer to electrical ISO cells. A retention capable memory is defined as any custom memory array that is implemented with power-gated switches that could be controlled to be power-on or off or voltage-scaled independent of its surrounding logic. Power-aware RTL refers to implementing the internal voltage and power-domain crossings and powergating functionality in the RTL. Correctness of RTL poweraware models refers to functionally correct models with right power-domain partitions of the RTL. A CCC is defined as a circuit component that can be reached only through either a source or drain connection of either a power gated or a non-power gated power supply. Example of CCCs connected between two voltage rails are shown in the Fig. 1. Voltage/ power-supply and voltage/power-domain(pdom) are used interchangeably. In the examples discussed in this paper, sleep control is an active low signal (ie when sleep n=0 the memory is power-gated) but clamp enable is an active high signal (ie., when clamp en=1, the clamping function is active). IV.
M OTIVATION
P RELIMINARIES Fig. 2.
Multi-voltage-domains in custom memories
A commercial processor may have many instances of retention capable custom memories (Instruction cache, Data cache, L2 cache, TLB, Debug RAM etc.) with multiple voltage rail crossings and power-gated partitions. These high speed and low area (power) embedded memories has its transistor implementation with internal voltage crossings using dynamic custom electrical V LS and ISO cells. One example of this memory is shown in Fig. 2. Adding power-awareness to the RTL models of these embedded custom memories help enhance early design and verification of various low-power behavior of these memories in the processor. But behavioral Register transfer level (RTL) implementation of a processor is power-domain architecture agnostic and technology independent, i.e RTL does not have the V LS and ISO modeled Fig. 1.
Example of short circuit power across CCCs on different power-domains
2015 33rd IEEE International Conference on Computer Design (ICCD)
25
and is not power-domain partitioned. An always-on/powergated logic region is not embedded in any processor’s RTL implementation. Power-awareness at RTL is only implemented and verified after generating a power-aware model using a power-intent specification format like UPF on top of RTL. A power-aware modeling bug could be introduced if a part of memory is incorrectly modeled as powered-on though the actual transistor behavior is powered-off. This could result in masking circuit bugs due to logic in memories not powering off and hence missing ISO cells at a particular interface internal to memory. Similarly, modeling parts of custom memory RTL logic incorrectly as a power-gated module can result in incorrect processor low-power state behavior, resulting in false fails. Also, partitioning of RTL logic on incorrect voltagedomains would result in missing V LS cells, causing the power-domain checks to fail. Hence, correct modeling of power-awareness in memory RTL (consisting of array bit-cells, control logic and memory glue logic) is crucial. This helps in not masking circuit/RTL bugs in functional verification, thus preventing silicon bug escapes. We next describe our method to generate functionally equivalent RTL that models the power and functional behavior matching the circuit behavior. V.
P OWER -AWARE M EMORY RTL M ODEL G ENERATION
Our proposed methodology starts with generating CCCs from the custom memory transistor implementation. When a CCC across a voltage crossing is inferred, it is matched with dual voltage circuit templates to identify the circuit as a V LS or ISO cell. We use the observation that there exists a partitionable RTL across a crossing with combined groups of neighboring CCCs on different power supplies. We transform the RTL by re-writing RTL such that unique RTL modules can be created for each partition containing a collection of RT L node1 belonging to the same voltage supply. The transformed RTL from our proposed method is functionally equivalent to its original RTL. This transformed RTL along with the generated UPF helps model the voltage and power-awareness of the memory’s circuit implementation. We next describe the steps shown in Fig. 3 for obtaining all the intermediate information for correct RTL partitioning and UPF generation. This method works for a memory with either header or footer based switch design. A. CCC based Voltage Crossing for RTL and UPF Circuit/RTL Traversal and CCC Generation: We use an un-partitioned original custom memory RTL as an input in our flow. The input RTL is not yet power-aware and hence is not useful for power-aware verification. It is usually flat or partitioned without the notion of the voltage or powerdomain crossing. Then, circuit traversal is performed. Circuit traversal involves tracing through the edges of the transistor nodes resulting in circuit netlist decomposed into CCCs. RTL traversal is performed by tracing through all the RT L nodes across the edges representing the RTL signal connectivity in the memory’s behavioral RTL. Power-domain (pdom) Creation: For every unique external power pin in the transistor netlist, unique powerdomains,supply ports and nets in the UPF are created using create power domain and create supply net UPF rules. 1 RT L node: defined as an RTL variable or collection of interconnected RTL variables or expressions
26
Fig. 3.
Power-Aware Custom memory RTL model Generation
Get Supply of Sleep control and Isolation (iso) Enable: For all the sleep logic inputs controlling the power-gates (headerswitches) in the circuit implementation, supply of sleep control logic, switch’s source power supply and its power-gated internal supply in the netlist is obtained. Similarly for all isolation enable inputs controlling the isolation circuits, supply of isolation enable control logic in the netlist is obtained. Inference of Sleep/iso control: For every sleep and isolation control, the sleep logic and isolation control cone in transistor circuit implementation is comprehended. We assume that all the sleep and isolation enable controls are known primary inputs. Its actual logic functionality/transistor implementation may or may not be implemented in RTL for custom memories. If logic is already implemented in the RTL, then RTL partitioning of sleep logic function and isolation control function is performed as described in V-C. If sleep/iso functionality is not in RTL, its functionality is added in RTL or UPF or both, by tracing forward from the sleep pin and isolation pin. This is accomplished by comprehending the transistor implementation, followed by appropriate partitioning of the RTL. The sleep and isolation logic cone in RTL is partitioned into unique module(s) functionally matching the circuit implementation. Generate UPF rules for power switch, partitioned isolation logic and power-gated domains: The logical power-switch(s)2 is (are) mapped to create power switch rule in UPF. The isolation control function obtained is used for the set isolation control with its appropriate isolation power net inferred earlier. 2 logical power-switch(s): The physical switch count need not match its equivalent logical switch count. It could be a many-to-one mapping as a high physical switch count is possible due to current delivery requirements and voltage droop limits. All these switches can be mapped to one equivalent logical switch as long as it has the right logically equivalent sleep controls and electrically-equivalent power-supplies.
2015 33rd IEEE International Conference on Computer Design (ICCD)
Fig. 4.
Multi-voltage Circuit Template Matching
Voltage cross-over points identification: For every created power-domain (including the power-gated domains), if the voltage supplies across the channel of any two neighboring CCCs are equal, we will continue to forward trace through the CCC to the next one. If for any two neighboring CCCs, the voltage supply across its channel is not-equal to each other, the voltage crossing in the netlist is marked. The supply voltages on both sides of the voltage crossing are noted and the CCC is also marked. This is repeated till all CCCs are covered.
We use these properties to extract the CT s from the transistor netlist. The templates are matched against a golden database of multi-voltage CT s. This database additionally has information of circuit’s functionality. This helps determine if the CT classified as a level shifter is either a high-voltage (low-voltage) to low-voltage (high-voltage) level shifter etc. It is also used to classify the isolation cell type as one with isolate (clamp) to high or low or latch.Once the matching template is identified to be belonging to V LS circuit or ISO circuit, the RTL is partitioned as discussed in V-C accounting for the V LS and ISO cell in both the UPF rule and the RTL. If a matching template is not found and count ≥ max count, then,the circuit behavior at the voltage crossing is analyzed. It is possible that this is identified as a circuit bug due to either missing isolation or level shifting circuitry. The legal power state transitions may determine if a relaxed constraint with waivers are needed based on the UPF’s power state table. Another possibility is addition of a new circuit topology to the multi-voltage template database, if it does not already exist.
B. Multi-Voltage Circuit Template Matching In this section we will describe CCC grouping and template matching to identify the dual voltage circuitry as a V LS, ISO using the voltage crossover points identified and marked in V-A. CCC grouping is the process of iteratively combining the CCCs across a voltage crossover point, till the CCCs form a CCC group, that matches a recognizable topology that could be template matched. For example, in Fig. 1, CCC grouping process can be started with CCCn (the CCC on the destination power rail, vdd2) followed by a backward trace to the CCCm (the CCC on the source power rail, vdd1). Now, if the created CCC group is un-matched and if there are remaining un-grouped neighboring CCCs, then the ungrouped ones are continued to be combined recursively till a matching dual voltage circuit template is found or count ≥ max count3 . As shown in Fig. 5, multi-voltage circuit template matching is applied based on certain properties listed below. These properties apply to any entity (entities) that is a CCC or CCC group in a multi-voltage circuit. • CCC voltage-crossing property: There exists atleast one voltage crossing edge, between CCC(CCC groups) powered by one voltage supply, vddin connected to another CCC(CCC groups) connected to a different voltage supply, vddout. • ISO CT property: An isolation (ISO) circuit template (CT ), is one which satisfies CCC voltage-crossing property and has isolation enable input pin(s) on the same supply as the destination rail CCC(s), vddout. • VLS CT property: A level shifter (V LS) circuit template (CT ), is one which satisfies CCC voltagecrossing property and has all the input pins connected to CCCs on the same source supply, vddin and does not have any input pins on the destination supply, vddout. 3 max count:maximum count which can be user adjustable limit. Its value could be 10. Usually the circuit templates can be obtained with atmost 6-7 CCC groups
Fig. 5.
Illustration of Dual Voltage Circuit Properties
An example of dual voltage circuit template is shown in Fig. 4. In this example, as part of the CCC trace and grouping, all the PMOSs (P 1, P 2) and NMOSs (N 1, N 2, N 3, N 4) in the Fig. 4I (on the left) that is showing the transistor implementation will be grouped together as one CCC on vddin. This is because P1, N1, N2 belonging to 1st CCC and P2, N3, N4 belonging to 2nd CCC can be combined to form a CCC group since it has common supply vddout. Since CCC grouping continues recursively till there is a template match, the two inverters, R and S on vddout on the output side connecting to z pin will also be added to this CCC group. The inverter, P connected to a pin will be identified as powered by vddin. This introduces a voltage crossing at the inverter P ’s output as per the CCC voltage crossing property described earlier. Since the en pin has an inverter Q on destination power-rail (vddout), and other inputs are on source powerrail (vddin), using the ISO CT property, this circuit template will be matched to an isolation circuit. During circuit template matching more information will be inferred resulting in a dualvoltage nand+inverter in the Fig. 4II (on the right) whose functionality is isolate to low, when enable is logic 0. C. Power-Domain based RTL transformation As the information about voltage crossings of the CCCs are being inferred and multi voltage templates are generated,
2015 33rd IEEE International Conference on Computer Design (ICCD)
27
the RTL is modified using a series of functionality-preserving transformations. The transistor circuit logic powered by a common voltage rail that is in the transitive fanin cone of the identified template is marked. For this marked circuit whose load is the template, the RTL is partitioned and appropriately re-written such that it is functionally equivalent. This helps to correctly map the voltage-domains in circuits to the corresponding RTL elements. It also results in RTL of all the functional logic partition(s) to be modularized according to power-domains. The circuit templates are modeled with a UPF rule. Remaining unmarked CCCs and its corresponding RTL now becomes a different partition. These set of steps are iteratively applied across all the voltage crossings. Using RTL rewrites and repartitioning, the RTL is transformed resulting in power-domain partitions generating unique RTL module(s) such that it is functionally equivalent to its circuit implementation. Using the UPF, the various partitioned RTL modules are instrumented with appropriate supply ports/nets and power-domains. Based on the result of template matching, the V LS and ISO cells are mapped to the UPF rule create level shif ter and create isolation cell respectively with appropriate supply nets/sets. Additionally, logical mapping of the voltage crossover points results in modeling of missing isolation enable and sleep logic in RTL. Here is an example of original RTL and transformed RTL for the multivoltage circuit shown in Fig. 1
Fig. 6.
O r i g i n a l RTL : i f ( C )D = ˜ (A&B ) ; E = ˜D ; T r a n s f o r m e d RTL b a s e d on power−d o m a i n s : vdd1 ’ s RTL : i f ( C ) D = ˜ ( A&B ) vdd2 ’ s RTL : E= ˜D
During this RTL transformation phase, the RTL rewrites are performed using the associative, distributive, identity, absorption, De Morgan’s laws, minimization and reverse minimization techniques of Verilog as rules. RTL rewrites adds new intermediate RTL variables if the voltage crossing necessitates it. Also in some cases, reverse local rewrites are performed, if one group of the RT L nodes is part of one power-domain and is connected to another group of RT L nodes on a different power-domain. Some of the examples necessitating re-writes are listed below: •
•
•
•
28
Example A: x1&x2|x1&x3 is re-written as x1&(x2|x3) if each of the expressions x1 and x2|x3 are implemented in different voltage rails separated by the electrical V LS or ISO circuitry. Example B: Assume that RTL is written using the encoded address to access the array elements. The actual circuit implementation uses the decoded address. The address inputs and decoded address could be on different supplies needing re-writing. Example C: An RT L node can be a result of decoded address of log2 n bits. But this RT L node is also equivalent to two nodes one of which is the predecoded address that is on a source voltage-domain connected to a post-decoded node generating the n address bits in destination voltage-domain. Example D: In some designs, control signals could flow through a power collapsible domain powered by vdd1, but communicate from vdd3 to vdd2, where vdd3 being Always-On ≥ vdd2 being Always-On ≥
•
From CCC to RTL and UPF
vdd1 being Always-On. In such cases, low power design bugs could be introduced due to controllability issues of the control signal when vdd1 is power collapsed. The solution is either to repartition the RTL with a re-write and re-implementation of circuit to avoid signal flow through vdd1 (or) else to use always-on modules powered by vdd3 for the signal flow implementation in both circuit and RTL. Example E: There were cases of debug of RTL memory model in a processor level simulation due to memory-resource capacity issues of the EDA tool. This was root-caused to hundreds of thousands of memory word line isolation cells added in poweraware memory RTL environment. It was the result of RTL generated with every bit-blasted bit-cell getting an isolation cell. This resulted in large number of ISO cells in the context of a processor level power-aware run causing memory-capacity issues. It was fixed with an RTL re-write by slicing the RTL to individual word level slices, thus significantly reducing the number of ISO cells in power-aware memory RTL. VI.
P OWER -AWARE M EMORY E NHANCED RTL
In the following section we will go through specific examples of how the method proposed in section V is applied to various kinds of memory circuits in memories like RAMs, CAMs etc. Our inputs are the memory’s RTL and its transistor level netlist with custom circuit topologies. We generate transformed memory RTL and memory UPF. The transformed RTL binds with the UPF to generate a complete power-aware RTL memory model. Fig. 6 gives a high level over-view of the transformation from CCC to RTL and UPF. A. Model of Array Decoder and Controls Fig. 7.I shows the array decode logic in one of the memories with Logic1 for control and Logic2 showing a slice of address decode implementation. From the section V-A, CCCs for Logic1 and G1 belongs to vdd1 and CCCs for
2015 33rd IEEE International Conference on Computer Design (ICCD)
Fig. 7.
Array Decoder and Controls
Logic2 and G2 belongs to vdd2. Across the marked voltage crossing, T1,T2,T3 and T4,T5,T6 will be grouped as CCC1 and CCC2 respectively. Since there is a voltage crossing between inverter G1 and CCC1, this triggers dual voltage template matching as it satisfies the CCC voltage-crossing property in section V-A. G1,CCC1,CCC2,G2 are combined next which results in a template match of the dual voltage circuit template of a dynamic V LS that can level shift both high and low voltages. This template also satisfies the VLS CT property. The step outputs are shown in Fig. 7. The original RTL does not have module partitions based on power domains. As described in section V-C, the RTL is transformed such that the control and decode modules are created respectively for Logic1 and Logic2 in the circuit netlist. This is because in the circuit control block with signal ports i, cs, en is on vdd1 while decode block with signal ports adr, clk is on vdd2. The RTL is partitioned such that a new internal signal(RT L node), wen is generated on vdd1 before it is combined with signals belonging to vdd2. Both the UPF and transformed RTL are shown in Fig. 7.II. B. Isolation and Sleep model of Power-Gated Array Fig. 8I shows the circuit implementation of write word line (WWL) in a memory and generated RTL and UPF rule in Fig. 8II. Using description in section V-A unique power supplies and domains are created using UPF rules for vdd3 switch and vdd4. In the original RTL, sleep1 n,sleep2 n, isoen1 and isoen2 are just inputs ports and are not implemented in RTL to electrically protect any logic. In the power-aware RTL, sleep control’s AN D logic function and clamp control’s OR logic function are respectively inferred from the transistor implementation and is added to the transformed RTL. These logic functions are instantiated as sub-modules within the sleep inst because it can be mapped to vdd4 power-domain using a create power domain UPF rule common to both. Generated RTL signals sleep n and isoen in are then used in create power switch and create isolation control UPF rules respectively. Power switch UPF rule infers vdd4 as external supply net and vdd2 switch as internal supply net. This triggers creation of UPF rules on vdd2 switch power-gated supply considering it as a separate power-domain. CCC grouping results in S1,S2,S3,S4 transistors forming a CCC(CCCy).
CCCy’s load is another CCC (CCCz). CCCy’s input is an adjacent CCC(CCCx) inverter across the voltage crossing. This voltage crossing traversal triggers circuit template matching recursively by grouping CCCy, CCCx, CCCz. This results in a successful circuit template match. The match results in a template classified as a clamp-low isolation cell. This is modeled in the UPF with its enable isoen in whose supply is inferred to be vdd4. This UPF rule is applied on vdd2 switch modules’s inputs postdec/wwl and postdec/cwl. Here the RTL rewrites from V-C are used to rewrite clear as ∼∼cwl. C. Model of Array Bitcell, Read, Bypass Logic Similarly the RTL and UPF of Array cell with read/bypass is shown in Fig. 9I,II. The UPF is generated around the partitioned RTL such that it encapsulates all the bitcell slices usually written with f or loop construct for arrays. vdd1 and vdd2 switch are identified as power-domains. create power switch UPF rule is generated for power-gate for module bitcell. R3,R5,R4,R6 constitute a CCC. R1 and R2 on vdd1 constitute the second CCC. The voltage crossing triggers template matching. Here the template is a CCC constituting R1,R2 with level shifting behavior. Now to partition the RTL, one cone of logic is bit-cell on vdd2 switch. This creates the bitcell module. wwl and cwl are rewritten as a decoded signal for bitcell. UPF level shifter rule is created for newly created module array read on vdd1 pdom that generates the read word line. The vdd1 pdom logic constitutes read decode and bypass logic. V LS rule is needed across the voltage cross from module bitcell on vdd2 switch through read word line (rwl) to modules on vdd1. VII.
R ESULTS - P RODUCTION FLOW ENABLEMENT
After the memory power-aware models are generated, it is further validated and enabled in production flow for processor power-aware verification. A. Validation of Generated Power-Aware Memory Models A f ormal equivalence tool was used to sign-off on logical equivalence of the transformed memory RTL against the original RTL. A Symbolic simulator was used for equivalence of custom memory transistor implementation and the transformed RTL. Both these checks ensure the RTL re-written
2015 33rd IEEE International Conference on Computer Design (ICCD)
29
Fig. 8.
Write Word Line Isolation and Power-Gated Array Model
Fig. 9.
Array Bitcell, Read, Bypass Logic
output from our proposed method including sleep and isolation functions are correct. A power domain static checker tool that checks the generated UPF against the transformed RTL was run. It checks for violations on voltage crossings in the generated power-aware memory model. Simulation based verif ication of these models was ensured by power-aware simulation of the processor using a power-aware RTL verilog simulator. A sample functional power-collapse benchmark is picked for illustration here. Fig. 10A,C show custom memory circuit simulations. The power-aware RTL memory simulations at the processor level are shown in Fig. 10B,D. The details of the circuit behavior and power-aware memory behavior is explained as follows: Fig. 10A shows the power-gated state when sleep n is de-asserted (it is active low signal). The charge on the clear node leaks due to power-gating, while the power-aware model in 10B shows the clear bit becoming corrupted (crossed lines) which are equivalent. Fig. 10C,D show the retention state of bitcell and read-operation of
30
the bitcell for transistor simulation matching power-aware model simulation respectively. Thus, the example shows that generated power-aware memory RTL model is equivalent in behavior to its memory circuit for that benchmark. B. Enhancing Processor Low Power Verification The power-aware modeling of memories also helps in defining low power coverage points during various low voltage operational modes of memories. Without usage of these generated memory models, low power RTL functionality and its power-awareness could not have been verified. Replacing power-aware memory RTL with transistor level model in a co-simulation framework makes it impractical due to insufficient abstraction, issues with debuggability etc. Thousands of power-aware RTL functional tests targeting memory low power features in instruction cache, data cache, TLB, CAM, Tag Arrays, L2 cache etc., using the generated power-aware custom memory models was enabled in production suite that
2015 33rd IEEE International Conference on Computer Design (ICCD)
finished in a day or two. Usually, spice level simulation time for exercising a single retention state leakage simulation for a power-gated array would take in the order of atleast a few hours. If power-aware memory models were to be replaced by transistor level model in a co-simulation framework, the run time would be months.
leakage power dissipation due to short-circuit currents in Iddq tests. C. Methodology Comparison Against Existing Work We have proposed a methodology for the memory poweraware model generation of RTL and apply it in a processor verification environment. To the best of our knowledge this has not been addressed in earlier literature. Table I is a comparison against two previous works that are closely related to our work and can complement our solution. TABLE I.
P OWER - AWARE DESIGN AND VERIFICATION SOLUTIONS
Solutions Power-aware functional verification Electrical Checks for voltage-crossing RTL Power-domain partitioning for memories UPF generation for memories Power-aware Custom memory RTL model generation Enhanced power-aware memory RTL/Verification
VIII. Fig. 10. An Illustrative benchmark: Transistor-level Simulation and matching PowerAware RTL Memory Model Simulation
Due to sleep and clamp modeling deep inside the memory as illustrated in VI, power-aware assertion checks can be enabled for assertion based verif ication. This could not be possible without correct RTL partition of memories. Also it was mentioned earlier that incorrect power-aware memory models results in false fails. For the illustration in Fig. 8, a possible legal check is isoen in must be asserted when sleep n is logic 0: assert property(@(posedge clk)(∼sleep_n) |-> (isoen_in)); Power-aware partitioned memory models also enable use of “power-aware memory checkers”, that can flag electrical corruption scenarios by probing inside the memories. This is impossible using non-power-aware custom memory models. This check shown below is applied on the example in Fig. 8 will fail, either if the partitioning is incorrect or due to a design bug. if (wwl[0] === 1’bX) begin if (isoen_in !== 1’b1) error("Error:vdd3_switch domain power collapsed and isoen_in is not enabled"); As part of this custom memory power-aware modeling and verification effort, various multi-voltage custom memory design bugs were identified and fixed that include : •
•
•
Isolation enable tied off resulting in isolation being not active for bit cells in a memory. Flash clear logic that invalidates the contents of memory was incorrectly clamped to logic high. Either of this could have resulted in memory not able to reliably retain. Power switch whose control was connected to collapsible supply. Similarly, a TLB was incorrectly connected to wrong supply resulting in address translation issues. Also address decode logic was in wrong power supply in RTL, preventing read/write access. Wrong sleep connectivity was caught that could have resulted in incorrect memory being retained. Reversed and incorrect power-domains in the FIFOs across the asynchronous bus interface resulting in
[7]
[8]
Yes No No No No No
No Yes No No No No
Our Work Yes Yes Yes Yes Yes Yes
C ONCLUSION
With mobile devices becoming increasingly common, the low power sleep and voltage scaling features in memories are going to be architected and envisioned in processors. Our methodology to generate power-aware RTL models for memories embedded in a processor enhances both early power-aware RTL design and the quality of power-aware verification in processors. Our proposed work is a synergy of power-domain aware RTL generation and multi-voltage custom memory circuits, where very little work exists, though process technology and energy requirements are enforcing multi-voltage memories on chip. We applied our technique and the generated models in a production flow environment of a commercial processor, making power-aware verification possible for custom multivoltage memories. It is highly advantageous that the proposed strategy results in models (RTL, with UPF rules bound to it) that can be plugged into existing commercial EDA tools. In this process, we identified and corrected low power bugs before tape-out. It resulted in working silicon with low power custom memory features. R EFERENCES [1] M. Saint-Laurent, et.al, “A 28nm DSP powered by an on-chip LDO for highperformance and energy-efficient mobile applications,” ISSCC,2014. [2] D. Flynn, “An ARM perspective on addressing low-power energy-efficient SoC Designs,” ISLPED,2012. [3] IEEE 1801, “IEEE standard for design and verification of low power integrated circuits.”2013 [4] S. H. Chen, et.al, “Experiences of Low Power Design Implementation and Verification,” ASP-DAC,2008. [5] N. Krishnamurthy, et.al, “Validating PowerPC microprocessor custom memories,” IEEE Design and Test of Computers,2000. [6] D.M. Brooks, et.al, ”Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors,” IEEE Micro,2000. [7] A. Crone and et. al, “Functional verification of Low power designs at RTL,” PATMOS,2007. [8] J. Lescot,et. al, “Static low power verification at transistor level for soc design,” ISLPED,2012. [9] R. E. Bryant, “Extraction of gate level models from transistor circuits by fourvalued symbolic analysis,” ICCAD,1991. [10] K. Singh, et. al, “Extracting RTL models from transistor netlists,” ICCAD,1995. [11] R. Bartolotti, et. al, “Constraint management and checking in template-based circuit design,” MTV,2010. [12] A. Koelbl, et. al, “Solver technology for system-level to RTL equivalence checking,” DATE,2009.
2015 33rd IEEE International Conference on Computer Design (ICCD)
31