The Software/Hardware Co-Debug Environment with Emulator Baodong Yu, Xuecheng Zou Department of Electronic Science and Technology Huazhong University of Science and Technology Wuhan China
[email protected]
Abstract It is a challenge to debug the software and hardware in the SOC for that neither the software nor the hardware is error-free. By combining the emulator and the simulator, with the new software debug engine, the new bus status monitor, and the new checkpoint technology, the high speed, easy-used software/hardware co-debugging environment is presented in this paper.
1. Introduction Usually, the SOC design consists of the processor, the memory to store the software program, and the other custom logic. To verify the design of the software and the hardware as soon as possibly, the software and hardware co-debug environment is required. The works on the JHDL debug environment are reported in the [1]-[3]. The [1]-[3] present the hardware debug environment, which provides the good obervability for hardware debugging, but they are toward the JHDL. The software debug engine is also absented. The Extension to the JTAG is used in the [4]-[6]. They use the extension of the JTAG to debug the hardware or software. But the JTAG bandwidth is limited, only the small information can be transferred between the host and the emulator. The [5] analyzes the main debugging requirement in the processor-based application, and the Data Bus Monitor (DBM) is presented to collect the status of the data bus. The [6] provided the debug method for the software with the JTAG interface. In the [7], the debug engine structure for capturing the key registers of the processor is reported; but the debug method for the processor itself is not presented. In our paper, the universal debug environment for the software and hardware including the processor and the other custom logic is presented. Identify applicable sponsor/s here. (sponsors)
The challenges of the software/hardware co-debug are the following. Firstly, the simulator is too slow to meet the requirement of software debugging, but emulation accelerator only provide the very limited observability and controllability. Secondly, the processor is not error free, not only the software but also the processor and peripheral should be verified when the errors were encountered. Thirdlyˈusually the SOC design is very large, the high bandwidth and high capacity to the emulator and simulator is required. The software debug engine not only can debug the software with on the flying speed, but also can used to debug the hardware logic. Combing the hardware checkpoint and the scan technology and the parallel running HDL simulation software, we can provide the high speed and high obervability and controllability. With the transaction creation and detection functions of the bus status monitor, we can quickly distinguish the software error and the hardware error. This paper provides the high speed, high effective, and high obervability and high controllability debug environment for the software and hardware debug. The rest of the paper is organized as the following. In the section II, we introduce the infrastructure for software debug tool and hardware debug tool. In the section III, we introduce the universal debug environment and debug method for the hardware and software. At last, we give our conclusion.
2. Infrastructure for Software/Hardware co-debug With the new software debug engine and the bus status monitor, the high speed, easy-used software debug tool is presented. With the new checkpoint technology and embedded logic analyzer, the high speed and high observability hardware debug tool is presented. A. Software Debug Engine
Proceedings of the 9th International Database Engineering & Application Symposium (IDEAS’05) 1098-8068/05 $20.00 © 2005 IEEE
The software debug engine is used to provide the controllability and observability for the software debug tool. The structure of software debug engine is presented in the Fig 1. The command decoder unit (CMD Decoder) of the software debug engine can interpret the macro debug command from the JTAG interface. The instruction decoder unit (ID) of the processor interprets the macro debug command embedded in the program. The condition trigger unit of the software debug engine supports the single step debug and condition breakpoint and emits the control signal to the register control unit (REG Ctrl Unit) according the command from the CMD decoder or the instruction decoder unit of the processor. The register control unit can capture the value of the key registers of the processor and send them to the host by the Run Length Encoder (RLE). The register control unit also can modify the key registers of the processor according the debug command In our design, the JTAG can run at 30MHZ clock, so the bandwidth is 2.0875MB/s. For example, we need to trace the 64 registers, each is 4 bytes. The table 1 is the comparison of the debugging speed to the processor. Table 1. Comparison of the debug speed for the processor debug method speed under debug SW debug engine 8K cycle/second without RLE SW debug engine with 800K cycle/second RLE Seamless (co-simulation 1K-5K cycle/second tool) B. Bus Status Monitor The bus status monitor (BSM) is used to monitor the bus status including the system bus and the memory bus. The BSM not only can collect the bus status and stores them into the two-port block RAM of the FPGA, but also can construct the read and write transaction on the bus. The BSM can read data from or write data to the modules which connect to the bus. The data in the block RAM can be read out to the host. The read and write commands are come from the host by the JTAG interface. Fig. 2 is the diagram of the BSM for the memory bus, which consists of the status collect unit, the JTAG interface, the memory access unit, and the transaction compression unit. The status collect captures the bus status at every clock. The memory access unit can construct read and write transaction for the specific module according the command from the host. The transaction compression unit identifies the read and writes transaction on the
bus. The read transaction and the write transaction are compress into a single data object. For the several clock operations on the bus can be identified as only one read or write transaction, then the transaction compression unit can greatly decrease the required bandwidth of the BSM. The table 2 is the comparison of the compression ratio of the BSM under different compression type. The memory also supports the burst operation. The 'I' in the table 2 indicates the address and data is independent, the 'S' indicates the address and data share the same wires. The width of data bus and address bus are 32 bits. JTAG I/F
RLE
CMD Decoder SW Debug Engine
Condition Trigger Unit
Reg Ctrl Unit
IF
Key Reg
ID CPU
Other Logic
Fig.1 software Debug Engine Table 2. Compression ratio of different compression type Compression ratio
Compression type I
S
Without compression
1
1
Run length compression
0.7
0.65
Transaction compression
0.46
0.51
Proceedings of the 9th International Database Engineering & Application Symposium (IDEAS’05) 1098-8068/05 $20.00 © 2005 IEEE
Transaction compression
Status collect unit BSM JTAG I/F
Mem Access Unit
Fig.2 the BSM for the memory bus C. Infrastructure for Hardware Debug The processor itself and other custom logic should be verified in the debug process. The snapshot technology is used in [8] and [9] to get the full information of the hardware at the appointed time. In [8], the read back technology is used to get the whole information of the hardware. It is very slow to read out the configuration bit stream from the FPGA, for the bit stream is large and the bandwidth of the JTAG is low. It is also a time-consuming process to extract the required signal from the bit stream. In [9], the scan chain technology is applied in the whole FPGA. To get the signal value at arbitrary time, the whole design should be dumped at the check point and simulated from the previous check point, which usually is slow and requires a large mount of the memory. To decrease the simulation time and the required memory, the divided and conquer strategy for the check point is used in this paper. The whole design is divided into many sub modules. We can only capture the state element of the interesting sub module at the check point, which greatly decreases the dumping time. For the combinational logical signal can be easily gotten from the state element of the design, then only the state element information is required to construct the snapshot for the sub module. The scan chain technology is resorted to capture the state element at the check point. For the internal state of the general scan chain is damaged when scan out the internal state element in the general scan chain, then the circular scan chain is used in this paper. The output data of the circular scan chain not only are sent to the host by JTAG interface, but also are pushed into the scan input.
Fig. 3 is the scan chain manager unit. As the circular scan chain is similar with the general scan chain for the DFT, the commercial DFT tools can be used to generate the full scan chain. We only need to design the scan chain manager unit. The input signals of the sub module are captured by the embedded logic analyzer (ELA) in the Fig. 4. The ELA captures the data on the every clock and stores them into the two-port block RAM of the FPGA. For most of the input signals are not changed at every clock, the run length encoder (RLE) can be used to compress the captured data. The max run length is 1024 in our experiment. At the active edge of the clock, the new data is captured and compare with the previous data. If they are same, the run length increases one, otherwise, the previous data and the run length is stored into the block RAM, and the new data is recorded. The RLE can greatly compress the captured data. The table 3 is an example of compression ration for the sub module of the column Reed-Solomon decoder which is used in the DVD.
Fig. 3 scan out the state element at check point
Fig.4 the ELA with RLE to capture the input and output for sub-module
Proceedings of the 9th International Database Engineering & Application Symposium (IDEAS’05) 1098-8068/05 $20.00 © 2005 IEEE
Table 3. Compression ratio of the sub module of the column Reed-Solomon decoder Sub module Input signal Compression number ratio Syndrome 12 0.7 Key 267 0.013 equation Chien search 131 0.017 With the state element of the check point and the input signals of this module, the detailed signals of the sub module can be gotten by simulating the sub module from the previous check point. For we only dump and simulate the interesting sub module, we can greatly decreases the dumping time, the simulation time, and the required memory.
3. Universal Co-debug Environment
read or write transaction on the bus according the software debug command. B. The hardware debug environment By combining the high speed of the emulator, the high observability of the simulator, we provide the fine debug environment for the hardware. The SOC design is partitioned into many sub modules. We can only dump and simulate the interesting sub modules. These simulators can parallel run on the different computer from the check point of the sub module. The hardware debug tool can execute by the emulator speed, at the same time, it owns the high observability of the software simulator. The Table 4 is the comparison of the run time and required memory of the different debug method to find an error in the translation lookaside buffer (TLB) of the 32 bit REX CPU (100MIPS) at the 1s. Table 4. Comparison of the run time and required memory
The debug environments for the software and hardware are all based on the emulator, so software and hardware are synchronous executed in native. When the user debugs the hardware, the executing software provides the stimulus for the hardware; on the other hand, when the user debugs the software, the hardware provides the high speed platform. The software and hardware debug tools are tightly integrated. A. Software Debug Environment With the software debug engine, we can setup the break point for the software program, execute the single step debugging, view or modify the register of the processor. The software debug environment is integrated with the hardware debug tool. When we suspect the validity of the hardware, we can immediately bring up the hardware debug environment. At first, the software symbol table is established. The debug command can be embedded in the software program in advance or emitted by the JTAG interface when the software is running. The software debug tool can setup the break point and execute the single step debugging by the software debug engine in the emulator. The software debug tool also can view and modify the key registers of the processor by the register control unit of the software debug engine. The memory map is established when the co-debug environment is initialized, and can synchronize with the memory of the emulator by the BSM. The user can view or modify the memory by the memory access unit of the BSM. The memory access unit can construct the
type
Run time
seamless
2731s
Required memory (average) 1300MB
whole design from check point Sub design from check point
276s
1500MB
113s
173MB
C. The co-debug for the hardware and software For we can not confirm that the processor and the other custom logic is error-free, the co-debug method is required for the SOC design. Since the software and the hardware are all base on the same emulator, they are synchronous in native. The software debug tool and hardware debug tool can easily exchange data through the single co-debug kernel.
4. Conclusion With the software debug engine and the BSM in the emulator, the software debug tool as a part of the codebug environment can provide the high speed and easy-used debug environment for the software engineer. At the same time, with the new check point technology and the run length encoder, the hardware debug tool can provide high speed, high obervability, low memory consuming hardware debug environment. For the hardware debug tool and the software debug
Proceedings of the 9th International Database Engineering & Application Symposium (IDEAS’05) 1098-8068/05 $20.00 © 2005 IEEE
tool are based on one single kernel, they are integrated and synchronize natively.
5. References [1] Hutchings, B.L.; Nelson, B.E. Unifying simulation and execution in a design environment for FPGA systems, VLSI Systems, IEEE Transactions on , Volume: 9 , Issue: 1 , Feb. 2001, pp: 201-205. [2] Hutchings, B.; Nelson, B. Developing and debugging FPGA applications in hardware with JHDL, Signals, Systems, and Computers, 1999. Conference Record of the Thirty-Third Asilomar Conference on , Volume: 1 , 24-27 Oct. 1999, pp:554-558 [3] Roesler, E.; Nelson, B. Debug methods for hybrid CPU/FPGA systems, Field-Programmable Technology, 2002. (FPT) Proceedings 2002 IEEE International Conference on , 16-18 Dec. 2002, pp:243 – 250 [4] Maier, K.D. On-chip debug support for embedded Systems-on-Chip,Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , Volume: 5 , 25-28 May 2003, pp:565-568
[5] Alves, G.R.; Ferreira, J.M.M. From design-for-test to design-for-debug-and-test: analysis of requirements and limitations for 1149.1, VLSI Test Symposium, 1999. Proceedings 17th IEEE , 25-29 April 1999, pp:473 - 480 [6] Dae-Young Jung; Sung-Ho Kwak; Moon-Key Lee. Reusable embedded debugger for 32 bit RISC processor using the JTAG boundary scan architecture, ASIC, 2002. Proceedings 2002 IEEE Asia-Pacific Conference on , 6-8 Aug. 2002, pp:209 - 212 [7] Liu Jianhua; Zhu Ming; Bian Jinian; Xue Hongxi, A debug sub-system for embedded-system co-verification, ASIC, 2001. Proceedings. 4th International Conference on , 23-25 Oct. 2001, pp:777 – 780 [8] Tomko, K.A.; Tiwari, A. Hardware/software codebugging for reconfigurable computing, High-Level Design Validation and Test Workshop, 2000. Proceedings. IEEE International, 8-10 Nov. 2000, pp: 59 - 63 [9] Zan Yang; Byeong Min; Gwan Choi. Si-emulation: system verification using simulation and emulation, Test Conference, 2000. Proceedings. International, 3-5 Oct. 2000, pp:160 – 169
Proceedings of the 9th International Database Engineering & Application Symposium (IDEAS’05) 1098-8068/05 $20.00 © 2005 IEEE