with the reaction of the device under test in an ATA-hard-disk for later analysis. .... Data. Address. Processor. ERT. ATA-HDD. Trace Port b) Embedded ...
Embedded Real-Time-Tracer – An Approach with IDE Babak Rahbaran , Matthias Fuegger , Andreas Steininger
Institute for Computer Engineering, Embedded Computing Systems Group Vienna University of Technology, Vienna, Austria rahbaran,fuegger,steininger @ecs.tuwien.ac.at
Abstract — Debugging software that runs on highly integrated System-on-Chip devices is complicated because conventional debug tools (like traditional In-Circuit Emulators and Logic Analyzers) cannot be used with embedded processor cores. To cope with this problem we provide two solutions, an IDE Embedded Real-TimeTracer and offline Monitoring. We analyze the problem of sustaining the required data rate for interface and storage media and elaborate the limitation of our approach. In an application example we show that the Real-Time-Tracer can have advantages even for debugging the results of fault injection experiments. Experimental fault-injection plays a key role in the process of fault tolerance validation. Our Embedded Real-Time-Tracer captures a trace of the injected faults and stores it together with the reaction of the device under test in an ATA-hard-disk for later analysis.
1 Introduction The application of real-time tracing covers a wide range. It is used for debugging real-time programs running on a host computer as well as for tracing states and events in hardware systems. Depending on the application, capturing this data may be a challenging task. All information of interest has to be recorded under real-time constraints for further offline analysis, which is becoming very difficult with higher amount and rates of information to be traced. When using a hardware real-time tracer (like for an FPGA), commonly used protocols like RS232(UART) with a maximum of 115.2 Kbit/s, with 100 Kbit/s, 400 Kbit/s in fast-mode and 3.4 Mbit/s in high-speed mode, CAN with up to 1 Mbit/s and SPI (Serial Peripheral Interface) are no longer suitable for writing real-time data to non volatile mass storage. These interfaces are often the bottleneck of a tracer. In contrast, IDE/ATA provides a theoretical throughput of 133 Mbit/s and the ability to store large amounts of data. Furthermore using the IDE interface has the advantage of coming along without a host PC between the real-time tracer and the mass storage, thus minimizing latency compared to writing a hard-disk via a software layer like an operating system. Finally IDE hard-disks are inexpensive and well suited for read out on a workstation PC, by means of a small program.
RAHBARAN, FUEGGER, STEININGER
2 Real-time tracing 2.1 Terminology A system is said to be a real-time system when its correctness not only depends on the logical results of the computation, but also on the physical time, when the output is generated [1]. In real-time tracing the interest lies on the upper bound of this time, called deadline. Different classes of real-time systems exist. The most restricting systems, and therefore the most challenging, are hard real-time systems. If a result is produced after its deadline this could lead to a catastrophe (injury or loss of human life, fatal effect on environment or economics). If a deadline is missed in a firm real-time system, the output has no utility. This is not the case in soft real-time systems, where even after a deadline has been missed, one can use the result (although it might degrade another service like comfort). An important aspect in real-time systems is the predictability in worst case scenarios so called rare event situations. Interfaces can be divided up into two classifications: In the elementary interface the control and data flow point into the same direction (from the sender to the receiver), whereas in composite interfaces, a control flow from the receiver to the sender exists, too. The latter one is often used for explicit flow control like backpressure. However slowing down the sender is often not possible in real-time systems, because the sender is not in the sphere of control of the receiver. In elementary interfaces implicit flow control is used, which means that the sender and the receiver agree on the data rate boundary before run time. 2.2 Objectives In a traditional start/stop debugging environment the processor, or the application, has to be stopped to freeze and analyze the state of the system. There are two major problems with this in real-time systems: 1. In safety critical control systems, it may be impossible to safely stop the processor. 2. Even if it is safe to stop the processor doing so will change the state of the system. In particular, in real-time systems many bugs may depend on the timing of the software relative to external events (e.g. clock interrupts). Stopping the processor will change the relative timing of events, hence causing the bugs to disappear or to show up as different problems. The main aim of a real-time tracer is to track behavior of a process. Tracing can take place at different levels of abstraction. Examples for processes to trace are the interaction of real-time tasks or events inside a CPU. It is important to note that even if the traced process itself is not real-time, the act of tracing this process is still a real-time problem. Another important aim is not to influence the observed system. The system being traced should behave (nearly) identical to the same system without the real-time tracer. Finally the real-time tracer should not exceed the observed system with respect to complexity and overhead requirements. At the hardware level this means a low number of transistors, memory blocks and pins used by the tracer, whereas at the software level CPU utilization and main memory consumption play a key role. Furthermore this means that the interface of a real-time tracer should not depend on the device under test, or even the test itself.
EMBEDDED REAL-TIME-TRACER
This paper is going to concentrate on real-time tracing at the hardware layer. At this level of abstraction real-time tracing is mainly used for debugging prototypes. Real-time tracing in hardware devices (FPGAs) used to be done as shown in Figure 1a. The signals of interest were routed through the complete design up to the top level. There each signal had to be assigned to a separate pin and was connected to a logic analyzer. This had the disadvantage, that the number of pins needed for a test was relatively large or even exceeded the number of pins available on the SOC (FPGA), thus making the test impossible (see 2.2). Furthermore the debugging interface depended on the current test, so the test setup (e.g. logic analyzer signal assignment) had to be altered when the signals to be traced were changed. A new approach is the usage of a hardware module, the ERT (Embedded Real-TimeTracer) as shown in Figure 1b. The idea is that observing is done internally by a module residing on the observed chip itself. This allows a large number of signals to be traced without having to route them through pins. Another advantage is, that a constant, standardized interface can be used for transferring the observation data, which makes the approach more generic and decreases complexity. ATA-HDD
Logic Analyser
SOC(FPGA)
SOC(FPGA)
Trace Port
ERT Data Peripheral
Address
Data Processor
a) tracing, traditional approach
Peripheral
Address
Processor
b) Embedded Real-Time-Tracer
Figure 1: Real-Time Tracing Altera Corporation has developed the “SignalTap II Embedded Logic Analyzer” [2] which is a hardware module residing on the chip to be debugged. It works like a traditional logic analyzer, which means that upon a user definable trigger condition samples are taken and stored locally within the device under test. SignalTap allows to use the unused memory on chip for tracing (up to a maximum of 128k). For further analysis the captured data is streamed out via the JTAG interface and displayed at a workstation running “Quartus II software waveform display”. The disadvantage, however, is that the data streaming does not occur on-line, which limits the amount of data that can be traced to the free memory of the device. Furthermore upstreaming has to be done via the JTAG port. This means very limited data rates and the need of a PC during the process of tracing.
3 IDE Embedded Real-Time-Tracer 3.1 Strategy Tracing can be classified as firm real-time application: If acquisition, processing and storage of one system state can not be finished until the next acquisition is due, the trace becomes incomplete and hence useless. When building a tracer we must therefore be able
RAHBARAN, FUEGGER, STEININGER
to capture data ideally at any time and with any rate. This is, however, impossible in practice. Since we do not want to apply backpressure on the observed system (this would imply stopping the processor, which we have already pointed out as inappropriate), the best we can do is design the tracer for the highest data rate, that we can attain with a given recording media, determine the limits and try to consider these limits in a given tracing scenario in the sense of implicit flow control. For our purpose, we have decided to use a hard-disk, since we expect large amounts of experimental data to be collected, and since the IDE interface seems to provide sufficient bandwidth. In order to approach the bandwidth limits of this media we decided to access the hard-disk without a file system. This means, that the hard-disk appears as an array of 16 bit words to the tracer. The access latency can be minimized, because the procedure to search the correct file on the hard-disk and write into it can be eliminated. A “standardized” hard-disk access would not only require a PC to manage the disk, but would in addition lead to a considerable jitter. As an example, writing to a FAT32 file would imply the following steps [3]: First of all, the correct partition has to be determined. This is done by reading in the first sector (containing the so called master boot record) and stepping through the linked list of Partition Entries. Next, the tracer would have to read from the FAT32 BPB (BIOS Parameter Block), a data structure residing in the first sector of the partition. It contains the BPB RootClus [3] value which gives the number of the first cluster1 of the root directory. The root directory itself is, like any other directory or file, organized as a linked list of clusters. From there on the directory path has to be followed. It can be argued, that most of the lookup can be done statically at startup, but managing the clusterchain of a file is still an overhead that has considerable impact on the jitter, an important property in real-time systems. Furthermore supporting file system access would increase complexity and size of the Real-Time-Tracer. 3.2 Block Diagram The IDE Real-Time-Tracer is an embedded tracer. It is being developed for usage in (even small) FPGAs, so that it can be flexibly integrated into an existing FPGA solution. To keep complexity low and facilitate further improvements and extensions, the ERT has been split up into functional blocks. The modules have been designed under the constraint to minimize the required chip size. Writing data to an IDE/ATA hard-disk can be seen at 2 layers of abstraction. The lower layer deals with the timings of how to write and read a hard-disk register. These depend on the mode the hard-disk is currently in. In our IDE Real-Time-Tracer we are using PIO mode 2. Writing registers is used to set parameters and finally to execute the commands (like WRITE SECTOR(S) [4]), while reading these registers is important to get feedback information on the state (like busy, error, ready for data transfer). The upper layer is represented by the PIO Data Out Protocol [5]. It determines what register to listen/write to, in order to write a sector. The IDE ERT is implemented in VHDL. It consists of the modules shown in Figure 2. The modules responsible for writing and reading a hard-disk register are called Register write, and Register read, respectively. The PIO Data Out Protocol is 1
a cluster is a group of sectors
EMBEDDED REAL-TIME-TRACER
implemented by the Transfer module. Finally, a ROM contains a signature which is written to the hard-disk at the end of tracing. These four modules have been integrated into the top level design of the ERT.
SigRom Data
Register_read Transfer
Data
Data
Control
Data & Control
Control
Control
IDE Bus
Reset
Data (15:0) SigRom Control
Register_write
external Control Data
Data
Control
Control
Data & Control
Embedded Real-Time-Tracer
Figure 2: IDE Embedded Real-Time-Tracer implementation in VHDL
Writing to an IDE/ATA hard-disk can only be done sector by sector and not wordwise. When beginning a new sector, during a certain dead time2 no words can be transferred by the Real-Time-Tracer. This can be overcome by introducing a FIFO between the “ERT” module and the tracer’s interface. The FIFO size depends on the length of the dead time and the throughput of the hard-disk. 3.3 Throughput Unfortunately the time required for writing one sector is not bounded in the ATA/ATAPI specification. It is however possible to calculate the throughput under certain constraints. The time it takes to write/read a single register is determined by the PIO mode and is called cycle time (see [5]). For PIO mode 2 , while in older devices, supporting PIO mode 1, . Devices supporting PIO mode 3 and above are powered up in PIO mode 0, 1 or 2. These hard-disks get the value of from word number 68 in the IDENTIFY DEVICE parameter list, which can be retrieved by issuing the IDENTIFY DEVICE command. Complexity and requirements would be increased a lot, if this command would be implemented by the ERT, so the timing has to be set by the user manually in the VHDL design. Switching between PIO modes can be done by issuing the SET FEATURES command with the “set transfer mode” subcommand. The duration for a normal PIO 2 register transfer can be looked up in the ATA/ATAPI specification. The rate was determined under the assumption of 16 bit transfers, which is the case when writing to the “Data Register” of a hard-disk: 2
!"$#&%(' )
time needed to initialize the hard-disk before beginning a new sector
(1) (2)
RAHBARAN, FUEGGER, STEININGER
The IDE Embedded Real-Time Tracer uses a slightly different timing yielding:
#
(3) (4)
% ' )
However still only specifies a register transfer duration under normal circumstances. The hard-disk may slow down a register operation by asserting the Iordy signal and hold it active up to . This may be the case for slower hard-disks. The upper bound of the duration to read/write a single register can be calculated as follows:
: IORDY Setup time : IORDY Pulse Width : IORDY assertion to release : DIOR-/DIOW- to address valid hold : DIOW- data hold : DIOR- data hold
is always the maximum, independent of the PIO mode. The timings needed can be found below:
for all PIO modes for all PIO modes for all PIO modes PIO mode 1 PIO mode 2
With these parameters the length of a maximum length PIO register transfer can be calculated, again with the assumption of 16 bit transfers without overhead:
!!#
% ' )
(5) (6)
It is important to notice, that no overhead (e.g sector addressing) was taken into account when calculating the data rates. To adapt our model, we have to consider the PIO Data Out Protocol. This requires the following procedure. Write all parameter registers (device, cylinder high, cylinder low, sector number, sector count, command). Afterwards the status register has to be read at least two times, to find out if the hard-disk currently is busy or an error has occurred. This sums up to the number of 8 write accesses assumed in (7). A maximum number of readings unfortunately is not defined. The process of writing is done by consecutive write accesses to the data register. After having written 256x16 bit data to the disk, the procedure is started from the beginning. The lower bound of the time it takes to write a sector can thus be calculated by: #"%$'&)(
&+*
,.-+*
(7)
EMBEDDED REAL-TIME-TRACER
For normal PIO mode 2 read/write timings, substituting #"%$'&)(
with (1), this sums up to:
#&%(' )
and for maximum length PIO read/write timings, taking (5) into account: #"%$'&)(
.
#&%(' )
The protocol implemented in IDE ERT slightly differs from the above calculations. The time it takes to write one sector is given by: #"%$'&)(
*
,.-
*
The extra cycle is due to the insertion of an idle state, where the IDE real-time tracer waits for data from the FIFO. The additional 35ns originate from reading out the FIFO between the writes to the data register of the hard-disk. Again for normal IDE Embedded Real-Time-Tracer PIO timings with (3) yields: #"%$'&)(
.!
! #&%(' )
(8) (9)
and for worst case PIO timings with (5): #"%$'&)(
#&%(' )
(10) (11)
In conclusion we see that the ERT is able to transfer data rates between # % ' )
and "# % ' ) , depending on the time the hard-disk allows to read/write registers. This is equal to writing about 721k to 2M 16 bit words per second or a sampling interval of about down to " for a single word. These data rate boundaries, however, have been calculated under the prerequisite that the PIO Data Out Protocol is processed with no additional waiting. Due to the ATA/ATAPI specification [5] a hard-disk is allowed to take an indefinitely long time to process the PIO Data Out Protocol. This means that no hard lower bound can be given for the data rate of the ERT. Particular measurements, however, have shown that the throughput can be bounded by statistical means. Notice that in the presence of a transmit buffer this limitation affects the average sample rate of our tracer rather than the temporal resolution. The precision for capturing the state of the system is independent from these data rates. If desired the system state can be captured every single nanosecond, but if this high sampling frequency is sustained over an excessive period, the buffer will overrun. 3.4 Setup The interface has been kept simple on purpose (see 2.2). The ERT mainly appears as an extended FIFO to the device under test. Addressing of the sector which has to be written
RAHBARAN, FUEGGER, STEININGER
is done transparently. The ERT interface can be divided into 3 parts: data, control and ATA/IDE hard-disk interface (consisting of 25 signals). The control interface is bidirectional, which means that the IDE Embedded Real-Time-Tracer represents a composite interface. In order to avoid the undesired backpressure we have to consider the above limitations on HDD data rates in the planning of the experiments. When integrating the IDE ERT into an FPGA design a list of setup steps has to be followed: 1. determine FIFO size: The FIFO size is application specific and has to be calculated under consideration of the deadtime. This can be done by an approach with queuing theory which is rather complicated (see [6]). 2. integration: The IDE real-time tracer, consisting of the ERT module and the FIFO is integrated into the FPGA design. 3. initialization: An IDE/ATA hard-disk is connected to the FPGA and the synthesized design is downloaded. After this has been successfully accomplished, the ERT automatically resets and initializes the hard-disk. 4. run: During this step data is written to the hard-disk. 5. offline analysis: The data can be read out for analysis with a simple program running on a normal workstation. The idea is to read directly from the device file of the harddisk when using Linux. This makes raw access possible.
4 An Application - FIDYCO In the following we want to demonstrate the application of our concept to the practical example of fault injection. For this purpose we will give a brief introduction into basics of fault injection and analyze the requirements on the IDE ERT. 4.1 Fault Injection Bascis The problem of fault-tolerance validation is a very challenging: It must be shown that the system can tolerate all hypothesized faults in all conceivable situations. Since the effects of physical faults on circuits are too complex to be reduced to manageable models, an experimental fault injection approach is often the only choice. Using the conventional methods fault tolerance assessment is still a tedious, expensive and error-prone process. With the availability of very fast and complex FPGAs it has recently become feasible to integrate complete industrial custom chips or even systems-on-a-chip into a programmable hardware platform. This opens up a new dimension with respect to fault injection. The programmable hardware platform can act as a real-time emulation of the ASIC while – as often required for the experiments – providing access to all internal nodes. With this basic idea in mind we have developed a concept for an FPGA-based fault injection tool. In order to prove the practical feasibility of our concept we have implemented it and describe details of the respective tool FIDYCO in the next section. 4.2 FIDYCO We have developed a fault injection toolset called FIDYCO (Flexible on-chip fault Injector for run-time Dependability validation with target specific COmmand language, [7]);
EMBEDDED REAL-TIME-TRACER
that is originally targeted for use in FPGA-based platforms. The basic ideas of FIDYCO are: move the fault injector to the target and move the target to the FPGA. We propose a hardware/software fault injection environment, with the hardware part implemented in VHDL and downloaded to an FPGA. The software part is on the host side to reduce the chip area overhead. The main aim is the development of a flexible and open system which is liable to test nearly every type of component. Ideally, the restriction is the size of the FPGA only.
Host Interface
FIDYCO Control Block
Parser Local Fault Library ROM
Command RAM
Executor Analyzer
ATA-HD
FI_Vectors
Triggerpattern
Data Generator
Device under Test
Register-Write-layer
Register-read-layer
Transfer-layer Embedded Real-Time-Tracer
Golden Node
∆ Data Collector
Figure 3: FIDYCO Block Diagram [7]
The design of FIDYCO has been kept highly modular. Its main parts are the ControlBlock, the DataCollector, Embedded Real-Time-Tracer and Host Interface. The fault injection vectors will be transmitted to the ControlBlock from where it is forwarded to Device Under Test and Golden Node3 (Using it is optional). The output of DUT and GN will be fed into the DataCollector. Two signals are compared inside the Datacollector, and in the case of difference, the results are sent to Embedded Real-Tracer. If the user decides to speed up the experiment, he can integrate both DUT and GN in the HW and reaches a high resolution of one clock cycle for fault injection and observation, however, at the cost of the increased chip area overhead for the GN. The Embedded Real-Time-Tracer captures a trace of the fault injection vectors executed by FIDYCO, running in real-time, and stores these fault injection vectors in an ATA-harddisk for later analysis. It is also possible to select a trigger condition, to inject the fault injection vectors to particular(internal register or signal) or special modules (e.g. ALU) of the DUT. The complex trigger conditions are also available. When the trigger condition occurs, FIFO stops capturing the trace data, either immediately or some time later, hence ensuring that the FIFO retains a trace of the system‘s behavior around the time the trigger condition occurred. 3
Depending on his priorities the user can make a trade-off between speed-up and chip area overhead.
RAHBARAN, FUEGGER, STEININGER
4.3 Embedded Real-Time-Tracer in FIDYCO The IDE Real-Time-Tracer has been developed for FIDYCO, which needed a possibility to trace and store high data rates and volumes. The former solution for sending of fault injection vectors and receiving the respective results was an RS232 interface, which allows theoretical data rates up to 115.2Kbit/s. Sending fault injection vectors to the FIDYCO is still done via RS232, because this allows interactive communication with a terminal at a workstation, and for sending fault injection vectors to FIDYCO far lower data rates are needed. As mentioned in section 3.4 the FIFO size of the IDE Real-Time-Tracer must be determined for every application. The same applies to FIDYCO. To calculate the maximum possible data rate, the process of fault injection has to be analyzed. At the beginning FIDYCO fetches the first fault injection vector either from the local fault library or from the RAM and injects it immediately after the next trigger event. The user has two options to define the observation time of fault injection. The first one is to inject the fault and reset the target system after each single fault injection. In this method the user should reset the DUT, if the failure occurs. The second method is to proceed with the next fault injection without a reset of the target system, if no failure has been observed. This method saves time-consuming reset and synchronization after every fault injection is avoided. If the failure is detected, the observed data will be forwarded to the IDE Real-TimeTracer. On the other hand, if the observation time is exceeded, FIDYCO will exit or start the process from the beginning, depending on the mode. Between the injection and the failure occurrence, data may be traced, too, to measure the propagation of the fault. In the high speed mode FIDYCO can collect data every clock cycle(33 ns at clock rate of 30 Mhz) During one clock cycle maximum 10 Byte-packet4 results can be produced. Since the observation time is maximum 0.5 s, this means a data amount of 1.5 Bytes results during this time. This yields $# % ' ) , which is a very high data transfer rate. Our IDE controller can only handle up # % ' ) (see 3.4). To cope with this problem, it is necessary to think about a realistic fault model, which does not need to record for each clock cycle. Then the Real-Time-Tracer can record all of these results on the hard-disk, without missing experiment results. The advantage of our ERT is, an experiment could be run in a long time period automatically. All the results will be saved on the hard-disk through ERT. The more the volume of the hard-disk, the longer the experiment can be run.
In conjunction with FIDYCO the benefits of our Real-Time-Tracer approach are: – Non-intrusive trace and debug – Capture up to 33 Mbit/s – Observed system runs at full speed – Traces both instruction and data – Cycle-accurate trace 4
2 Bytes for Address, and two 32 bits (it depends on DUT and GN width) each for DUT and GN
EMBEDDED REAL-TIME-TRACER
– Data will be allowed to be captured for later analysis and profiling with powerful PC based tools. – Comprehensive trigger facility, allowing trace to be captured on complex sequential conditions – Comprehensive filter conditions to control which data is captured – Low cost. If the chip has an on chip trace buffer the trace can be used without no additional hardware beyond that used for normal debugging. Even if the chip does not have an on-chip buffer the cost of an external trace buffer is considerably less than the cost of a full logic analyzer. – Access to any desired location within the processor. Since the trace is captured within the chip the ERT can easily access the processor.
5 Conclusion Real-time and dependable systems are extremely difficult to test in the environment where they are intended to run. Deterministic debugging requires considerable extra communication among the distributed monitoring agents for coordination purposes- and even so, there is always an element of nondeterminism in experiments.This extra communication and the additional demands on real-time tracers result in perturbations to the natural behaviour of the system, whose observation becomes infeasible then. The use of IDE ERT provides new ways of debugging real-time and non-real-time embedded system. In many cases the availability of real-time trace allows the easy analysis of bugs that would otherweise be difficult or impossible to analyze. In this context, the fault-injector that includes an embedded Real-Time-Tracer can significantly reduce the time taken to debug the resulting systems, and hence produce significant cost saving. In addition our IDE ERT is one of the techniques that gives truly accurate performance profiling information. This allows performance bottlenecks to be found and resolved, and hence eases the development of performance critical application.
References [1] Hermann Kopetz. Real-time systems. Kluwer, Boston, Mass., 2002. [2] Altera Corporation. SignalTap II Embedded Logic Analyzer. www.altera.com/products/software/pld /design/verification/signaltap2/sig-index.html. [3] Microsoft Corporation. FAT: General Overview of On-Disk Format. www.microsoft.com/hwdev /download/hardware/fatgen103.pdf, Dec 2000. [4] Technical Committee T13. Register Delivered Command Set- Logical Register Set, ATA/ATAPI-7 V1, Rev. 4. http://www.t13.org/, Dec 2003. [5] Technical Committee T13. Parallel Transport Protocols and Physical Interconnect, ATA/ATAPI-7 V2, Rev. 4. http://www.t13.org/, Dec 2003. [6] David A. Patterson and John L. Hennessy. Computer Organization & Design. Morgan Kaufmann, San Francisco, California, 1998. [7] Babak Rahbaran, Andreas Steininger, and Thomas Handl. Built-in fault injection in hardware– the fidyco example. In Proceedings of Second IEEE International Workshop on Electronic Design, Test and Applications- Delta 2004, Perth Australia, pages 327–332, 2004.