Design verification of a super-scalar RISC ... - Semantic Scholar

2 downloads 0 Views 545KB Size Report
{hgen-interrupt (levell));}. MACRO-END. In this example, an interrupt is issued to the processor when the CPU issues the instruction pointed to by the label loop.
Design Verification of a Super-scalar RlSC Processor Babu Turumella, Aiman Kabakibo, Manjunath Bogadi, Karunakara Menon, Shalesh Thusoo, Long Nguyen, Nirmal Saxena, Michael Chow HaL Computer Systems

1315 Dell Avenue, Campbell, CA 95008, U.S.A. Figure 1 illustrates various components of the HaL RI Computer System. The processor system, Spurc64 is implemented on a multi-chip module (MCM). The MCM contains a CPU chip, a memory management unit (MMU) chip, and instructioddata cache chips [11.

Abstract This paper provides an overview of the design veriJcation methodology for HaL’s Sparc64 processor development. This activity covered approximately two and a hrfyears of design development time. Objectives and challenges are discussed and the verification methodology is described. Monitoring mechanisms that give high observability to intemal design states, novel features that increase the simulation speed, and tools for automatic result checking are described.

.

F

Multi-Chip Module (MCM)

I

-

Also presented in this papel; for the first time, is an analy-

sis of the design defects discovered during the verification process. Such an analysis is useful in augmenting verification programs to target common design defects.

1 Introduction Design verification is a critical component in the design development time of high-end microprocessors. Design development, prior to the production release of chips and systems, comprises architecture specification, mapping of architecture specification to a design description language, and verification to ensure that the mapped design description complies with the architecture specification and is free of design defects.

Memory System

p E q1-

00000000 00000000

Figure 1. HaL R1 System ConEguration

Design development time is largely influenced by the number of latent defects in the design, the mean time to find and repair design defects, and the speed of simulators used to verify the design. For complex microprocessors,it is not unusual to have development times in the range of two to three years.

HaL Sparc64 is the first implementation of the 64-bit SPARC-V9 [4] instruction set architecture (Figure 1). The CPU is super-scalar and follows the data-flow principles of out-of-order execution. It supports precise exceptions, branch prediction, and employs register-renaming. The instruction and data caches are four-way set associative and are non-blocking [3]. The MMU provides the important function of translating virtual addresses to physical addresses that reference HaL’s memory system [5].

The number of latent defects is often a function of the completeness of the architectural specification, and the complexity of its implementation. The quality of the verification process is reflected both in the fraction of latent defects that are detected (test coverage) and the meantime-to-detect. This mean-time-to-detect is also reduced by using faster simulation tools. These are the factors that control the cost and quality of designs.

The use of simulation has emerged in the past decade as a key method for verifying large designs. Typically, simulation is several orders of magnitude slower than the real hardware. This difference in speed puts a limit on what can be realistically verified by simulation. Simulation was a major aspect of our verification process, but we also paid attention to other areas relating to test coverage,

The major theme of this paper is to demonstrate how these challenges were addressed for the advanced 64-bit microprocessor development at HaL Computer Systems. 472

0731-3071/95$4.000 1995 IEEE

U

been implemented to make defect isolation efficient. Protocol violation checkers and assertion checkers are some of the tools that implement these mechanisms.

mean-time-to-detect defects, design defect analysis, and tools that enhance productivity of the verification process.

2 Objective and Methodology

The difficulty of analyzing test failures increases with the size of the design. Smaller designs are easier to analyze and also have less impact on simulation speed; in contrast, larger designs not only impact simulation speed but may also require more analysis for defect isolation. To address this issue, a hierarchical approach for verifying the design was adopted (Figure 3). Logic verification was carried out at the block level, unit level, chip level, and processor level. Simulations have been carried out at the system level with processor, memory subsystem, and the input/ output subsystem to verify the processor operation in a complete system.

To verify the design of HaL's Sparc64 processor, the verification strategy needed to address the following objectives: 1. Detect all design defects in simulation. 2. Quickly discover and isolate defects. 3. Optimally manage simulation resources.

The following sections detail how these objectives were addressed and describe some of the challenges that needed to be met.

2.1 Exposing Defects

system

The test space of processor design is so huge that it is very difficult to completely specify it (Figure 2). As a result, directed testing (special hand-coded test cases) alone would not be sufficient to find all the design defects. On the other hand, doing exhaustive random testing is not realistic, since it would require a tremendous amount of simulation resources.

I

Processor

I

CPU

2.3 Utilization of Simulation Resources A number of steps were taken to maximize the utilization of simulation resources. A network load distribution software system was used to keep a pool of about a hundred dedicated simulation machines busy at all times. Special tools have been developed to use dedicated co-processor simulation engines called CoSims[7],which perform best when used to simulate a large number of clock cycles without any interruption.

Figure 2. Test Coverage

The following types of verification programs have been used to cover the test space as effectively as possible.

Some of the tests that required thousands of cycles to bring the processor to a particular state, were converted to checkpoint-based tests to save simulation cycles.

1. Directed testing of all architecturaland implementation

specific cases that could be specified was used as the principal source for detecting design defects. Coverage provided by these tests was improved by incrementally incorporating feedback from various types of coverage analysis schemes. 2. Random testing of the design has been used to complement the coverage obtained by directed testing. Several metrics such as number of cycles between test failures and defect discovery rate, were used to bind this activity. 3. The kernel portion of the operating system and some of the software applications such as compilers and database utilities were simulated as a part of the verification process. Unlike the random tests, these tests tend to exercise the processor in a realistic fashion.

3 Testcase development Design verification of the processor requires, running of verification programs (also called testcases) through the processor model and comparing the results produced with the expected results. Testcase development is one of the major components of the verification process.

3.1 Functional Classification Depending on their function, the testcases have been classified into one of the following three functional categories: 1. Architecture Ver#cation Programs (AVPs) are a set of

assembly language testcases, that focus on validating the design for conformance with HaL's implementation of SPARC-V9 architecture.

2.2 Defect Isolation Quick isolation and repair of defects is a key factor in speeding up the design cycle. Several mechanisms have

473

2. Implementation VerificationPrograms (IVPs) focus on verifying the mechanisms employed in the processor micro-architecture. These programs are key to the test coverage as they focus on exercising all the intemal mechanisms in the design. A substantial amount of time was spent in identifying, developing and maintaining this category of testcases. 3. Random VerificationPrograms (RVPs)are intended to create complex and hard to anticipate conditions in the design. AVPs and Ivps are primarily handcoded. RVPs are created using random program generators, which are tuned to concentrate on specific aspects of the design.

if (&cpu-issued(&pc-at(loop)) {hgen-interrupt MACRO-END

(levell));}

In this example, an interrupt is issued to the processor when the CPU issues the instruction pointed to by the label loop. cpu-issued and gen-interrupt are library mutines provided by the simulation environment.

This category of tests allowed us to develop testcases for mechanisms in the processor that are very difficult to access using just assembly language instructions.

4 Random Verification Program Generators

3.2 Testcase Categories

Multiple program generators, each adopting a different approach were developed. Some of the techniques that we used yielded very good results, by discovering defects caused by situations that were very hard to anticipate. They also gave us feedback on the coverage of the directed tests, which was used to augment the directed tests during later stages of the verification process.

The hierarchical approach to the design verification of Sparc64 required that tests be developed to run on design models at different levels of integration. Testcases have been categorized as follows.

3.2.1 Vector-based testcases Vector-based testcases are IWs, mainly used for functional verification at unit level. These testcases apply input vectors to the logic blocks and check the responses after every simulation cycle. They verify that every individual unit has the required functionality, before its integration into a higher level model.

Some of the kinds of instruction streams used were: Random combinations of biased sequences of instructions, which were carefully hand-crafted by studying the internal details of the processor. Random opcodes with a mix of branches with random target addresses. The generation algorithm was designed to insure that infinite loops are not generated.

3.2.2 Assembly language testcases This category of testcase was developed using SPARC-V9 assembly instructions and consists of AVPs, RVPs and, some IWs. A mechanism to derive assembly language testcases from higher level programs, such as C was developed to ease the testcase development effort. Testcases using this approach are called checkpoint-based tests. A program developed using a high level language is run on the software behavioral simulator, halsim [6] until the point of interest is reached. A checkpoint containing the complete architectural state is then extracted. This state is transferred to the processor model being simulated.

Random combination of blocks of instructions that are specified by the user. Studies were done to extract the compiler-generated sequence of instructions and use them as blocks.

5 Result Checking In order to achieve highest confidence in the correctness of the state of the processor, a variety of checking mechanisms were used. Most of the checking was embedded into the simulation environment, that improved the efficiency of result checking.

The operating system kernel and other software applications were simulated using this approach. This technology allowed us to simulate a variety of programs, and was an important part of the verification process.

5.1 Self checking Expected results were determined and compared with the observed results, as a part of the testcase. Testcases that used this kind of result checking are targeted to find very specific problems in the design, or for cases where comparison with a reference simulator is not possible.

3.2.3 Hybrid testcases This category of tests consists of IVPs. The tests have a mix of assembly language instructions and vectors to access the internal state of the design, using libraries provided by the simulation environment.

5.2 Reference Checking

The following example is portion of a hybrid testcase, that verifies an intemal mechanism of the CPU:

Reference checking is a generalized method of checking results at the architectural level. This mechanism, known as halsim-compare, involves running the testcase on a SPARC-V9 behavioral simulator, halsim, in parallel with

add %gO,l,%g4 loop: o r %gl,%g2,%g3 MACRO-START

474

the processor model, and comparing the architectural state extracted from both simulators at the end of every instruction commitfed [2] by the processor model.

to be very useful in identifying some of the interface problems in the design.

5.4 Assertion Checkers

5.2.1 Extraction of Architectural State from the Processor model Extracting the architectural state of the Sparc64 processor which has super-scalar, out-of-order, speculative instruction execution capabilities is a complex task. To extract the values of all the registers each time an instruction completes, hulsim-compare extracts the state of the CPU and cancels the effect of all the active instructions that were speculative.

Assertion Checkers are a set of programs which look at the internal state of the processor, every cycle, and check to ensure that no violation of a design specification has occurred. When a violation is detected, they suspend execution of the testcase, and report errors. These types of checkers were very effective in identifying some of the defects that caused state consistency problems in the design.

6 Coverage Analysis

5.2.2 Signature Compare Mechanism Signatures for the processor architectural state and instruction completion (commit) sequence are accumulated every quantum of cycles. Using this information, the test is run on halsim and signatures are generated in a similar way. These signatures are compared and mismatches are flagged as errors (Figure 4).

Achieving complete test coverage is the ultimate goal of any design verification process, but quantifying this is a difficult problem. We relied to a large extent on manual analysis and feedback to enhance the coverage of the verification programs. Specifications for the testcases were generated by enumerating different scenarios, detailing all the conditions that needed to be tested. Periodic peer review was carried out to increase the verification coverage. In certain cases, to make sure that the testcases adhered to the specification, traces of simulation were recorded and post-processed by tools which identify the events that occurred due to the application of the testcases.

Halsim Instructions

2 3 4 5 6

2

0 1

- - *Signature -

Computation

--* ~

2--+

Clock

4 - -2

2

t - 3* t - 1-

4

4 - - 2

6

5

Events that occurred at the functional interfaces were recorded during simulation, using event loggers. Matrices of different possible combinations of events were generated manually, and combinations not exercised by the testcases were identified, using the recorded data. A high degree of coverage for interface protocols and state machines was achieved, using this scheme.

I

\

Verification program coverage of the design was improved, by incorporating the feedback from these analyses.

1 Figure 4. Signature Compare Mechanism

7 Simulation environment

Asynchronous CPU interrupts are handled by passing the interrupt to halsim. The halsim-compare mechanism gave a comprehensive results checking strategy for verification. It also enabled us to increase the number of available CoSim accelerator simulation cycles.

The simulation environment for this project had different functional layers, with testcases being applied to the outermost layer. One of the unique features of this environment is that it provides an identical interface to both the simulator and the hardware. This enabled us to re-use most of the verification programs that were developed during the simulation phase of the verification process to verify the actual hardware when it became available.

5.3 Protocol Violation Checkers Interface protocols between chips and functional units are well specified. It is important that these protocols are never violated. Since not all protocol violations show up as problems at the architectural level, result checking at this level would be ineffective. A set of protocol violation checkers were developed, which look at the protocols between functional units every simulation cycle and flag errors whenever a violation occurs. These checkers proved

Tools which monitor the pipeline stages, probes for the internal nets and event loggers providing high visibility into the design during simulation, are an integral part of the simulation environment. Assertion checkers and halsim-compare are also part of this environment.

475

8 Design Defect Tkacking and Analysis

After a certain point, simulation proves to be an inefficient means of discovering defects. At this point, the design can be fabricated, and verification effort migrates to the real hardware.

Design defects identified during different stages of the verification process were tracked, analyzed, and used as leading indicators of the stability of the design. Information provided by these indicators was used to dynamically refine development strategies, in certain cases.

For the Sparc64 project, a total of 3 billion cycles were simulated on gate level workstation models comprising almost twenty million transistors. These simulated tests would have taken 1.8 seconds of execution time on the manufactured hardware, running at 154Mhz.

8.1 Defect Discovery rate The defect discovery rate for the Sparc64 processor, sampled over a period, is shown in Figure 5. Finding the last few defects takes substantial time and resources. Note that this plot represents only a sampling of defects found.

Cycies in Millions

Defects (sample)

A

Figure 6. Usage of simulation cycles

8.3 Defect Discovery by Different lspes of Testing

Figure 5. Defect discovery rate

Design defects have been categorized by different categories of tests. We observed that 92% of all the defects were found using directed testing, consisting of hand-coded tests. This shows that directed testing is the most effective way of providing high test coverage. However, it is not sufficient to have only directed testing for identifying all the design defects.

Finding the last few defects is usually the stage where random testing is in effect. It must be noted that determining the manufacturable quality of design with respect to design defects is not a simple problem. This is because there is lack of a priori knowledge of the number of latent defects. Another factor that compounds this problem is that the defect discovery rate must drop to zero and must be observed for a long time, and this is problematic due to time-to-market pressures.

8.4 Analysis of defects found by random testing A detailed analysis of all the defects found by random tests was performed. They fall into the following categories:

In our experience, some of the spikes in the defect discovery curve are due to the late changes in the architectural specification. One could infer from this that to cut down on the design cycle, design specifications have to be completed early on and should not undergo any major changes during the later stages of the project.

1. Deficiency in directed testing, which is due to defects in the testcases. 2. Ambiguous design specification 3. Complex scenarios, that cannot be specified during testcase enumeration. Our analysis showed that approximately 47% of the defects found by random programs are due to ambiguities in the design specification. Roughly 27% of the design defects escaped detection by the directed tests due to erroneous testcases. Other defects are caused by very complex cases which occur only if the processor runs for an extended period of time.

8.2 Simulation cycles The utilization of simulation cycles during the design cycle is shown in Figure 6. Although the plot represents the simulation cycles versus time, each point on the x-axis represents a multiplicity of tests that cover different functional attributes. Simulation resource requirement steadily increases, as the stability of the design improves. The requirement peaks during the final stages of the process, as longer simulations are required to uncover each defect.

This analysis shows that the design verification process currently employed in the industry requires much human

476

involvement, and has some exposure to human errors. This also makes the design and verification process expensive.

experience that design verification takes a major fraction of the design development process, both in terms of personnel and computational resources. The design defect classification study shows that a majority of design defects in these complex systems are algorithmic, and therefore need special methods to guarantee adequate detection. We have seen that design quality and verification methodology are key factors that determine microprocessor development cost and time.

8.5 Defect Classification A study was done to examine the nature of defects found. This study revealed that majority of defects found in the MCM system were algorithmic (Figure 7). Defects that were due to incorrect or incomplete implementation of specification, incorrect interface protocol implementation, or misinterpretation of specification,were classified as algorithmic defects. Other defects were due to incorrect boolean logic (logic defects) in the design description language and defects due to typographical errors (Typos). There were also many invalid defects that were due to verification programs that were causing false alarms. Figure 7 is a sample cumulative plot of the classified defects. The x-axis of the graph is indexed by defect identification number. The ordering of defect identification numbers is in order of detection time; however, any two successive defects could have varying elapsed times between detections.

t

10 Acknowledgments The authors are grateful to Nabil Masri, James Katz, Amro Sirhan, Bharat Baliga-Savel, Mohsin Ali, Adel Alsaadi, Suresh Thirumalaiswamy, Nandini Akula and Srinivas Kondapalli for their contributions to this project. We also like to thank Marc Spaulding, Rajeev Bharadwaj and Don Hudson for their involvement in the tool development. We are grateful to John Gmuender for his input on methodologies and Ushir Shah and his group for the support of the design automation tools. We appreciate the assistance from Takeshi Ibusuki and Hideki Adachi. We are greatful to Richard Simone for providing the leadership. Thanks are due to Peter Freier for reviewing the paper.

Defects 200;

Algorithmic

11 References

150’

1. Wilcke, W., “Architectural Overview of HaL Sys-

tems,” Proc. IEEE Compcon-95,March 1995. 2. Shen, G., and N. Patkar, et. al., “A 64b 4-Issue Out-ofOrder Execution RISC Processor,” Roc. Intl. Symp. on Solid State Circuits, Feb. 1995. 3. J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufman Publishers, San Mateo, California, 1990 4. D. L. Weaver and T. Germond, 7’he SPARCArchitecture Manual, Version 9, Prentice Hall, 1993. 5. Saxena, N. R, and D. Chang, et. al., “Fault-Tolerant Features in the HaL Memory Management Unit,” Special Issue on Fault-Tolerant Computing, IEEE Transactions on Computers, Feb. 1995. 6 . Barach, D. and I. Kohli, et. al., “HALSIM - A Very Fast SPARC-V9 Behavioral Mode1,”proc. MASCOTS95, Jan. 1995. 7 . AIDA Reference Manual, Vol. 2, Teradyne Inc. 1990.

Defect D I-

5

Figure 7. Cumulative Defect CIassi6cation

The purpose of Figure 7 is to demonstrate that the majority of design defects are algorithmic. One of the ways to address the problem of algorithmic defects is to create a repository of design verification tests that almost exhaustively cover the architectural specifications. A manual enumeration of all functional attributes of a specification is bound to be error-prone. In the verification development process several tools (random program generators) were developed that helped generate most of the architectural specification test cases automatically. More work needs to be done in this area.

9 Conclusions This paper has presented verification methodology for a commercial processor development project that spanned approximately two and a half years. It is evident from our

477

Suggest Documents