host [12]. Also a 'monitor program' can be included into the processors' application software. ..... bit microcontroller architecture with advanced on-chip.
ON-CHIP DEBUG SUPPORT FOR EMBEDDED SYSTEMS-ON-CHIP K.D. Maier 1
University of Kent, Department of Electronics, Canterbury, CT2 7NT, UK 2 Infineon Technologies AG, Cores and Modules, Munich, Germany
ABSTRACT This paper presents an on-chip hardware architecture to support application software development for embedded Systems-onChips (SoC). This architecture provides debug support for one manufacturer's complete SoC platform. This includes a significant number of 16-bit and 32-hit microcontroller and DSP cores, spanning a multitude of application specific systems ranging from wireless systems to engine controllers. The debug support architecture is modular and can be divided into three main components: i) processors specific debug resources, ii) a serial communication interface to connect the SoC with a debug host computer system and iii) a number of interconnection links to communicate between the serial communication interface and the processor debug resources. This modular approach allows the debug support architecture to be adapted to almost any system configuration (including multiple processor cores), while keeping the impact on silicon real estate for production devices at a minimum.
1. INTRODUCTION More and more embedded electronic systems consist of just one highly integrated microchip. Due to higher levels of integration technology, these Systems-on-a-Chip (SoC) are taking the place of systems consisting of a number of components hosted on a printed circuit board (PCB) [Z]. The increasing level of integration offered by SoCs not only allows to save cost over PCB based solutions but also promises a wide area of new possibilities, especially for embedded systems that face hard realtime requirements; like portable multi-purpose appliances and high performance embedded computing.
Highly integrated SoCs also result in novel challenges for the development of application software. With PCBs the software development is greatly aided by the possibility to observe internal nodes simply by attaching a logic analyser or oscilloscope to the interconnection structure between the components. Internal states of more complex components such as processors can be observed by replacing the standard production processor with a special debug and emulation version (Bond-Out device) that provides visibility of internal states, allows to record trace data on bus activities and to watch for particular events in the processor. The Bond-Out device is connected to a host desktop system and is managed by the software running on this host [12]. Also a 'monitor program' can be included into the processors' application software. This monitor allows observing
of internal system states and communicates these to the outside
HI. A SoC is integrated on one single chip, there is no straightforward way to access internal nodes: Previously observable interconnection stmclures and component boundaries are now chip internal and can not be read using oscilloscopes or logic analysers. Furthermore, processor cores cannot be swapped with special debug and emulation hardware anymore. Monitor based approaches can still be used for processor cores, but are limited in observability of system internal nodes: also communication of the core internal states to the outside is intrusive for the system. One way to solve the problem was to develop Bond-Out Emulation (BO) devices of the production SOC. These are altered versions of the whole SoC that have a much higher number of pins than the production device. The additional pins can be used to observe internal nodes of the SoC that would remain hidden otherwise and offer debugging support for the software developer. While these devices allow a higher controllability they suffer from the fact that there are physical differences between a BO and a production device. Particularly with increasing pin numbers, higher numbers of gates and higher operating frequencies these approaches become less and less useful.
The production device itself can have some dedicated circuits for on-chip debug support (OCDS) that can be used to gain access to some of the processors' internal nodes. These devices use a serial debug interface to communicate between the SoC and the host computer of the developer. A dedicated debug interface can be used for this purpose [ l , IO]. Alternatively the IEEE JTAG [SI interface that is only used for scan-chain based functional tests during production can be reused and does not require further ports to be added to the SoC [4, 51. These debug and emulation resources typically offer features like stopping the processor when certain data addresses or instructions are fetched [13]. Also the program counter state of the processor can be traced. While hardwardsoftware co-design methods significantly aid the overall development process [3, 7. 91 the gap between simulated hardware behavior and the effective behavior of the manufactured silicon keeps on widening due to higher clock frequencies and increasing levels of integration as a result of the continuous advancement of semiconductor technology. This is critical in embedded systems that are typically required to work at maximum performance within the technology used while maintaining real-time performance in sometimes extreme environments. l h e s e challenges underline the fact that effective on-chip debug support is required to provide:
0-7803-7761-3/03/$17.00 02003 IEEE
V-565
Visibility of intemal system state in the SOC. This should not be limited to one processor only but provide access to all processors in the system. Access to other complex peripheral modules should be possible. To suit easy adaptation for application specific SoCs. it should be generic and easily adaptable for different system architectures. Limited cost impact on the SoC - the debug features are present in every produced SoC and therefore effect the silicon real estate significantly. The importance of effective on-chip debug support is shown by the fact that approximately half of the total system development effort is devoted to verification tasks after first silicon has been returned from the foundry [II]. This also includes the debugging of the application software. This paper presents the on-chip debug support for one semiconductor manufacturer's complete SoC platforms. The debug support has been developed to support a significant number of 16-bit and 32-bit microcontroller and DSP cores, spanning a multitude of possible application specific systems designs ranging from wireless systems to engine controllers etc. At the same time the hardware effort for this debug support was required to be well balanced between offering maximum flexibility while keeping the impact for production devices at a minimum.
2. DEBUG SUPPORT ARCHITECTURE Supporting application specific SoCs that can consist of a number of diverse processor cores and peripheral modules, requires a debug support architecture that can be scaled to any system configuration. Beside this scalability it requires also to be so flexible that it can be adapted to specific needs of the application software development. To solve this, a modular approach has been followed. that divides the debug supporl architecture in three complementing parts: An extended JTAG module that provides the communication interface between the SoC and a debug host computer, two types of modules that connect the debug interface to processors using high-speed busses or directly with a processors where busses are not available and lastly processor specific on-chip debug support (OCDS) modules that control the actual processor specific debug resources.
2.1 The Extended JTAG Module The JTAG (IEEE 1149 standard) [6] port was originally developed to be a dedicated interface for boundary scan and chip internal tests during the chip production. As both of these applications are not used during normal operation of the system, the JTAG port can then be reused as a serial interface for chip debug. To be used in application specific SoCs with diverse architectures using either 16-bit and 32-bit microcontroller and DSP cores, a stmcture had to be found that provides the same generic interface for any architecture. Reusing the JTAG port introduces constraints: Chip tests and boundary scans still need to be functional and standard compliant. For this reason the JTAG module holds a JTAG core
that is fully compliant to the standard when communicating to the host computer system. To allow chip intemal communication io several processor cares, the JTAG module was extended to have specific communication instructions. A number of 1 0 client modules can be attached to the extended JTAG module (see Figure I).The standard ITAG signals are modified to extended (JTAG+) signals that allow to select, configure, and reauwrite data from an IO client module. This forms an arbitration mechanism to share the same physical JTAG port between several 10 clients and debuggers.
10 Client 0
JTAG Module
10 Client 1
IO Client N
Figure 1: JTAG Module and 10 Clients.
2.2 IO Clients The normal operation of any 10 Client is to configure debug resources, as well as read out internal states. The debug resources are accessed using memory mapped registers. This requires to read from and write to memory locations. This normal mode of operation is the Read /Write (RW) mode in which the access is controlled by the debug host system that communicates to the JTAG 10 module. Debugging using monitor programs should be possible to provide an alternative to using processor debug resources. This can be achieved by a second mode: the Communication I Monitor mode. The host system is also the master in this configuration (due to the JTAG standard requirements): the communication is facilitated by the debug host polling the 10 client for debug data. As indicated above two types of IO clients are available: where a high speed multi-master bus (in this case the proprietary FPI bus) is available, the bus 10 client can accesses the OCDS of processors attached to this bus. By using the FTI bus a number of processors can be accessed from the same IO client. Secondly, without F'PI bus a processor specific 10 client has to be used for each processor's OCDS.
2.2.1 Multi master bus IO client Figure 2 shows the multi master bus IO client. The external debugger operates this IO client across the JTAG module. Using this type of IO client allows that one bus IO client can access the OCDS of any processor connected to the bus. This is done by a
V-566
debug bus interface that allows to read from and write to OCDS registers of processors connected to the bus. Furthermore to read from o r write to any memory location accessible through the bus is possible. Using the bus IO client could result in a significant load for the bus. This can he avoided by configuring the 10 client to have a low bus access priority. This will in turn result in real-time critical communication in the SoC being served with priority.
JTAGt signals
10 Client
Debug Bus
Inledace
Figure 2: Multi Master Bus 10 Client
operating systems), the instruction pointer, data addresses of reads and writes as well as data values. Execution of a software break instruction: A software break instruction allows to explicitly generate a debug event. This can be for instance used by a debugger to temporarily patch code held in RAM in order to implement breakpoints. Break pin input: An external debug break pin allows the debugger to asynchronously interrupt the processor. Depending on the action determined for each debug event the following actions can be taken: 1. Halt the processor: This action causes the system to suspend execution with halting the instruction flow. The halt mode can still be interrupted by higher priority user interrupts. It then relies on the external debug system to interrogate the target purely through reading and updating through the debug port. 2. Start executing a monitor program: Another possible action is to cause the processor to call / branch to a monitor program. 3. Trigger a transfer: This action is used to transfer data blocks to the debug host system. The OCDS module allows data and instruction flow tracing. 4. Activate pin: an external pin can be activated to show that a specified debug event has occurred.
2.2.2 CPU specific IO client For processors not connected to a multi-master bus, a bus IO client can not be used. In these cases a processor specific 10 client is required that allows direct communication between OCDS and the host computer using the JTAG module (see Figure 3).
-
JTAG+ signals
I
10 Client PrOCeESOr
core
OCDS Module
Figure 3: Processor Core specific 10 Client
2.3 OCDS Module The actual processor debug functionality is specific to each respective core and its’ development is part of the general processor core development. Separating the debug communication from the OCDS allows to tailor the debug resources directly to the associated processor core. The functionality can be divided into two parts: 1) the observation of a debug event occurring and 2) the respective action taken according to the event that has occurred. Principle debug events are: Hardware trigger combination: Hardware triggers may be set as combination of the task active (when using real-time
3. Implementation and Performance The JTAG+ communication architecture has been implemented to work as a serial link connectiong the JTAG module with the different IO clients. A dedicated ‘start’ bit stream segment has been developed to identify the beginning of data being transmitted on the serial link. The start bit-stream is followed by additional instructions to select, configure and write or read data from the different IO modules and provides an effective mechanism to communicate debug events, trace systems states and configure debug resources. The JTAG module is currently implemented to serve two processor specifc 10 clients, the C166S-VI and C166S-VZ which both are 16-bit processors, and two multi-master bus clients to suit 16-bit and 32-bit FPI busses. While this implementation is supporting all processor and DSP cores in house, it can be easily adapted to suit other cores with debug interfaces (e.g. ARM, MIPS) that can be obtained under a license agreement. One way to integrate such a core would be to develop a processor specific IO client that can he used to integrate the core’s debug port into the platform debug concept presented. The main advantages of this on-chip debug support architecture are the ease of application to specific SoCs with a number of processor cores and that there are no extra pins, beside the JTAG port, required for the system on chip. This allows to keep the cost impact of the flexible multi-core debug support at a minimum compared to solutions with dedicated debug interfaces for each processor core. At the same time it provides access to all cores present in the system. A serial debug communication interface such as the JTAG port has a limited data rate: JTAG clock rates can be varied over a wide range as required by the user. Maximum rates we have
V-567
achieved are about 30MHz. At this clock frequency the maximum net data rate between the IO clients and the debug host system is 16.7MbitIs. If additional bandwidth is required (e.g. for tracing instruction and data flow trace), specific trace compression interfaces can be connected to the OCDS modules. Nevertheless the configuration of the OCDS modules will still be done through the base debug support architecture presented here.
4. System on Chip Integration The modular debug architecture provides straightforward integration into any application specific SOC. Once the general system architecture is in place with details like the number and types of processors as well as the intemal busses, the debug architecture can be defined as well. For processors sharing a (multi-master) FPI bus, a bus IO client is used to connect the processors to the ITAG module. Only when no FPI bus is available, a direct connection between the JTAG module and the processor OCDS is established using a processor specific 10 client. Figure 4 shows a typical example architecture currently under development with three processors. Processor 0 (a 16-bit C166S-V2 core) has no FPI bus connection and is therefore connected with a specific IO client 0. Processor 1 (a 32-bit Tricore) and processor 2 (a 32-bit Carmel DSP core) are sharing access to a FPI bus and are connected to the JTAG module with bus IO client 1.
5. Conclusions
[SI I-J Huang, T-'A Lu: ICEBERG: An embedded In-circuit Emulator synthesizer for microcontrollers, DAC99, New Orleans, Lousiana, 1999, pp. 580-585. [6] IEEE JTAG 1149.3 Standard [7] D Kirovski, M Potkonjak. LM Guerra: Cut-based functional debugging for programmable Systems-on-Chip, IEEE Transactions on Very Large Scale Integration '(VLSI) Systems. Vol. 8, No. 1,2000, pp. 40-50. [SI KD Maier: A high performance / low power single cycle 16bit microcontroller architecture with advanced on-chip debugging features for SoC solutions, ITG Fachbericht 164, Entwurf lntegrierter Schaltungen, 10th EIS Workshop, Dresden, VDE Verlag, 2001, pp. 177-181. [9] I Marantz: Enhanced visibility and performance in functional verification by reconstruction, DAC 98 San Francisco, Califomia, pp. 164-169. [IO] C Melear: Using Background Modes for Testing, Debugging and Emulation of Microcontrollers, IEEE Conference Proceedings of Wescon '97, 1997, pp. 90-97. [ I I] Semiconductor Industry Association (SIA): International Technology Roadmap for Semiconductors. http://public.itrs,net/Files/ 2000UpdateFinal/ 2kUdFinal.htm [I21 S O'Reilly: Debugging Drivers with Emulators and Logic Analyzers. Embedded Systems Programming, Feburary 1998, pp. 84-95. 1131 M Potkoniak. S Dev. . _ ~. K Wakabavashi: Desien-ForDebuain.e ._ - of Application Specific Designs, ICCAD95, 1995, pp. 295-301. [I41 Y Zorian, EJ Marinissen, S Dey: Testing embedded-corebased system chips, Computer, Vol. 2, Nr. 6. 1999, pp. 5260
-
~
The debug architecture presented is, due 10 its modular structure, easily adaptable for application specific SoCs. It provides effective debug support for a complete SoC platform with several processor families using 16-bit and 32-bit busses. The support is not limited to SoCs with single processors, virtually any combination of different processors is possible. Tracing can be done through the debug communication interface and can be further enhanced through dedicated trace compression units. Observability is not limited to processors but any memory location accessible through the FPI bus can be read / written to. All these features make the presented debug support architecture an effective and flexible solution to aid application software development.
m
I
6. References [ I ] R Bannatyne: Debugging Aids for Systems on a Chip, IEEE Conference Proceedings of Wescon '98, 1998, pp. 107-111. [2] M Bimbaum, H Sachs: How VISA answers the SoC dilemma, Computer, Vol. 2, No. 6, 1999, pp. 42-50. [3] B Clement, et al.: Fast prototyping: a system design flow applied to a complex System-On-Chip multiprocessor design, DAC99, New Orleans, Lousiana, 1999, pp. 420-424. [4] I-J Huang, C-F Kao: Exploration of Multiple I C E S for Embedded Microprocessor Cores in an SOC Chip, Proceedings of the second IEEE Asia Pacific Conference on A S K S (AP-ASIC 2000). pp. 31 1-314.
I
I
Figure 4 SoC Debug Setup Example with 3 Processor Cores
V-568