TLM Design of the AMBA AHB Protocol

9 downloads 0 Views 704KB Size Report
Verilog. The highest relevance for design-space exploration and rapid prototyping have transaction level models, which ... the semantics of SystemC/TLM is defined formally in [11], .... assertions hold in the synthesized IP. Also ...... A Tutorial on.
Model Checking a SystemC/TLM Design of the AMBA AHB Protocol Marcel Pockrandt Technische Universit¨at Berlin Berlin, Germany [email protected]

Paula Herber International Computer Science Institute Berkeley, California, USA [email protected]

[19]. There exist some approaches to formalize the semantics of SystemC, e. g., [17], [5], [7], [23], [15], but they are mostly either limited to the synthesizable subset of SystemC, or they require a tedious manual formalization of a given design. In contrast to purely synchronous hardware design languages, SystemC/TLM supports concurrent processes, dynamic sensitivity and timing, and abstract communication. State-of-the-art formal hardware verification techniques are thus not applicable as well. In previous work, we have presented an approach to overcome these problems by formalizing the semantics of SystemC/TLM with the help of U PPAAL timed automata [9], [10]. U PPAAL timed automata [2] have the advantage that their semantics is formally well-defined, and that they come with the U PPAAL tool suite. The U PPAAL tool suite provides means to animate and simulate timed automata models, and, most importantly for us, also a model checker that enables the fully-automatic verification of safety, liveness, and timing properties. In [9], we presented an approach to automatically transform a given SystemC design into a U PPAAL timed automata model, which instantaneously enables the application of the U PPAAL model checker. This approach was extended in [10] for the TLM 2.0 standard. We showed the applicability of this approach with two small case studies, namely a loosely-timed model that uses a blocking transport and an approximately-timed model that uses a 4-phase non-blocking transport. In this paper, we present a novel and highly optimized version of our transformation from SystemC/TLM into U P PAAL timed automata. In contrast to our previously proposed approach, we do not aim at a transformation which is as close as possible to the implementation of the TLM core interfaces. Instead, we focus on efficiency issues, with the general aim of enhancing the scalability of model checking of the resulting timed automata model. The main challenge when formalizing the semantics of the TLM core interfaces is the payload event queue (PEQ), which is used to maintain a queue of SystemC event notifications, where each notification is associated with a transaction object. This mechanism is used for non-blocking communications, where the communication method immediately returns but a

Abstract—Transaction Level Modeling (TLM) is gaining more and more importance to quickly evaluate design alternatives in multimedia systems and other mixed HW/SW systems. However, the comprehensive and automated verification of TLM models is still a difficult challenge. In previous work, we presented an approach for model checking of SystemC/TLM designs based on a transformation into U PPAAL timed automata. In this paper, we present an optimized version of our previously proposed transformation, and show its effectiveness with experimental results from an industrial case study. The key idea is to generate a U PPAAL model that is especially tailored for being model checked. This significantly reduces the semantic state space and makes model checking considerably faster and less memory-consuming. We demonstrate this by comparing the verification times of both versions for our previously used case study, and by presenting results from a new and larger case study, namely a TLM implementation of the AMBA Advanced High-performance Bus (AHB). The AMBA bus is one of the most popular on-chip bus architectures in IP-based embedded SoCs, and it is used in many multimedia applications. The case study shows that with the proposed optimizations, our approach is applicable for industrial real world examples. The detection of a serious bug, namely a deadlock situation in a certain scenario, and also the verification of some important safety, liveness, and timing properties provide evidence for the usefulness of our approach.

I. I NTRODUCTION SystemC/TLM [11], [19] is widely used for modeling and simulation in HW/SW co-design. Its main advantages lie in early platform development, fast simulation and evaluation of different design alternatives, and in its flexible integration with pure hardware description languages such as VHDL or Verilog. The highest relevance for design-space exploration and rapid prototyping have transaction level models, which provide enough details for architecture evaluation but are much faster to simulate than bit- and cycle-accurate models. However, while TLM models are well-suited for architecture evaluation and rapid prototyping, they are difficult to verify. In particular, formal verification and automated testing are typically not supported by TLM design frameworks. For example, for SystemC/TLM, there exist a powerful simulation environment and many tools to ease development with graphical interfaces and static analysis tools, but not even the semantics of SystemC/TLM is defined formally in [11],

978-1-4577-2122-9/11/$26.00 ©2011 IEEE

Sabine Glesner Technische Universit¨at Berlin Berlin, Germany [email protected]

66

ESTIMedia 2011

further action on the transaction object is typically scheduled for later execution. The postponed execution of actions on transaction objects is difficult to model in the semantics of U PPAAL timed automata if there are multiple concurrent notifications. In [10], we presented a formalization of the PEQ using four different timed automata, which is consistent with the SystemC/TLM reference implementation that uses as many methods and processes. However, in this paper, we present an alternative formalization that only requires two automata. The most important optimization of our new approach is that we model the PEQ in a more abstract way. With that, we reduce the number of clocks and variables, and consequently reduce the size of each semantic state and the overall semantic state space significantly. Additionally, we use a simple live variable analysis to temporarily reset all unused variables in order to make it possible for the model checker to detect and use symmetries in the model. To show the effect of our optimizations, we present experimental results for the approximately-timed model that uses a 4-phase non-blocking transport, which we also used in [10], and compare the verification times. To demonstrate the practical applicability of our approach, we applied it to a new and significantly larger design, namely the TLM 2.0 solution for the AMBA AHB released by Carbon Design Systems 1 in February 2011. The rest of this paper is structured as follows: In Section II, we summarize related work. In Section III, we briefly introduce SystemC and U PPAAL timed automata. In Section IV, we review our formal semantics for SystemC/TLM as presented in [9], [10]. In Section V, we describe our optimized version for the formalization of the TLM 2.0 standard core interfaces. Finally, we present experimental results in Section VI and conclude in Section VII.

is ignored, and the transformation is performed manually. Besides, the state machine models do not reflect the structure of the underlying SystemC designs. Traulsen et al. [22] proposed a mapping from SystemC to PROMELA, but they only handle SystemC designs at an abstract level, do not model the non-deterministic scheduler and cannot cope with primitive channels. Zhang et al. [23] introduced the formalism of SystemC waiting-state automata. Those SystemC waiting-state automata are supposed to allow a formal representation of SystemC designs at the delta-cycle level. However, the approach is limited to the modeling of delta-cycles, the scheduler and complex interactions between processes are not considered and the formal model has to be specified manually. In [14], Man presented the formal language SystemCFL , which is based on process algebras and defines the semantics of SystemC processes by means of structural operational semantics style deduction rules. SystemCFL does not take dynamic sensitivity into account, and considers only simple communications. The concept of channels is neglected. A tool to automatically transform SystemC to SystemCFL is presented by Man in [15]. However, it does not handle any kind of interaction between processes. Karlsson et al. [12] verify SystemC designs using a petrinet based representation. This introduces a huge overhead because interactions between subnets can only be modeled by introducing additional subnets. To the best of our knowledge, only a few approaches directly target SystemC-TLM designs. In [16], a toolbox for the analysis of transactional SystemC designs is proposed, which is based on a transformation from SystemC to heterogeneous parallel input/output machines (HPIOM). The approach is similar to ours in that it also provides an executable formal semantics and an automatic transformation. However, time is not explicitly considered there. In particular, the timing behavior is heavily over-approximated, which makes the models difficult to verify. Furthermore, certain aspects of the SystemC semantics are disregarded, for example, the overriding of pending notifications. They do not support the TLM 2.0 standard, but focus on STMicroelectronics’ TAC implementation (Transaction Accurate Communication Channel). In [18], the authors propose to transform SystemC-TLM models into communicating state machines. However, they target the TLM concepts on an abstract level and do not capture the precise semantics of the TLM transport mechanisms or sockets, and the transformation can only be performed manually. In [4], the authors propose a translation from SystemC/TLM into LOTOS, and they use the verification toolbox CADP to import C Code into the LOTOS model. This approach is very expressive and captures a large share of SystemC and C++ constructs. However, the translation has to be done manually and they also do not support the TLM 2.0 standard. In [6], bounded model checking is used on untimed SystemC designs. Though this approach works on the abstraction level of TLM models,

II. R ELATED W ORK There have been several approaches to provide a formal semantics for SystemC. For example, a definition of the simulation semantics based on abstract state machines is given by M¨uller et al. [17] and Ruf [20]. The purpose of their work is to provide a precise description of the SystemC scheduler. However, the system design itself, as built from modules, processes and channels, is not covered and therefore cannot be verified with this approach. Salem [21] presented a denotational semantics for the SystemC scheduler and for SystemC processes, but only for a synchronous subset. Similarly, Große et al. [5] present an approach for formal verification of SystemC designs using Binary Decision Diagrams (BDDs) and bounded model checking, but only for the synthesizable subset. In contrast to our approach, they are not able to cope with dynamic sensitivity or timing. Habibi et al. [8], [7] proposed program transformations from SystemC into equivalent state machines. In these approaches, time 1 http://www.carbonipexchange.com/

67

it can neither handle the OSCI TLM 2.0 standard nor any timed SystemC construct. Recently, Bombieri et al. [3] presented an approach for model checking TLM 2.0 IPs by synthesizing RTL IP models from them and applying RTL model checkers to the model. An important advantage of this approach is that they can verify whether existing TLM assertions hold in the synthesized IP. Also, the separation between protocol and functionality presented there is very interesting. However, as they rely on hardware synthesis to construct the IP from the TLM model, the whole model checking approach is restricted to the synthesizable subset of SystemC.

2) Sockets are used to connect initiator and target modules. 3) The generic payload can be used to represent arbitrary transaction objects 4) The base protocol is a set of rules on how to use the TLM core interfaces to achieve maximal interoperability. TLM models often use one of the following two coding styles: Loosely-timed models are typically expected to use the blocking transport interface and temporal decoupling. Approximately-timed models are more accurate, and they are typically expected to use the non-blocking transport interface and the payload event queues, which make it possible that the processing of a transaction is postponed for later execution.

III. P RELIMINARIES A. SystemC SystemC [11] is a system level design language and a framework for HW/SW co-simulation. It allows modeling and executing of both hardware and software on various levels of abstraction. The design flow usually starts with approximately timed transaction-level models that are refined to time-accurate models of hardware and software components. It is implemented as a C++ class library, which provides the language elements for the description of both hardware and software, and an event-driven simulation kernel. A SystemC design is a set of communicating processes, triggered by events and interacting through channels. Modules and channels are used to represent structural information. SystemC also introduces an integer-valued time model with arbitrary time resolution. The execution of a SystemC design is controlled by the SystemC scheduler. It controls the simulation time and the execution of processes and handles event notifications and updates primitive channels. Like typical hardware description languages, SystemC supports the notion of delta-cycles, which impose a partial order on parallel processes. The execution order of these processes is chosen non-deterministically.

C. U PPAAL Timed Automata Timed automata [1] are finite-state machines extended by clocks. A timed automaton is a set of locations connected by directed edges. Two types of clock constraints are used to model time-dependent behavior: Invariants are assigned to locations and enforce progress by restricting the time the automaton can stay in this location. Guards are assigned to edges and enable progress only if they evaluate to true. Networks of timed automata are used to model concurrent processes, which are executed with an interleaving semantics and synchronize on channels. U PPAAL [2] is a tool suite for modeling, simulation, and verification of networks of timed automata. The U PPAAL modeling language extends timed automata by bounded integer variables, a template mechanism, binary and broadcast channels, and urgent and committed locations. Bounded integer variables are manipulated with a C like action language. It is possible to declare local or global variables. Global variables can be used to pass values between processes in a network of timed automata. The U PPAAL template mechanism can be used to instantiate timed automata with different variables. In particular, it is also possible to instantiate an automaton with parameter p in a way that p is replaced by a global variable (operations on p will then be applied to the global variable). We will use this for the binding mechanism in our transformation. Binary channels enable a blocking synchronization between two processes, whereas broadcast channels enable non-blocking synchronization between one sender and arbitrarily many receivers. Urgent and committed locations are used to model locations where no time may pass. Furthermore, leaving a committed location has priority over non-committed locations. A small example U PPAAL timed automaton is shown ◦ , and in Figure 1. The initial location is denoted by request? and ack! denote sending and receiving on channels, respectively. The clock variable x is first set to zero and then used in two clock constraints: the invariant

B. The TLM Standard Transaction Level Modeling (TLM) is mainly used for early platform evaluation, performance analysis, and fast simulation of HW/SW systems. The general idea is to use transactions as an abstraction for bit-accurate hardware data types, which are transmitted between different modules by abstract function calls rather than pin- and cycle accurate bus protocols. This enables simulations on different abstraction levels, trading off accuracy and simulation speed. The main goal of the TLM standard [19] is to provide interoperability between different transaction level models. The core of the TLM standard is the interoperability layer, which comprises the TLM core interfaces, sockets, a generic payload and a base protocol. 1) The core interfaces implement standard blocking and non-blocking transport mechanisms.

68

request? x=0

x = mintime value = f(t)

Figure 1.

Example Timed Automaton

notify

Processes

request update

activate delta delay

advance time

B. SystemC-TLM Semantics

deactivate

Primitive Channels Figure 2.

Events

wait

methods

update end update start

In [10], we have presented a formal semantics for the TLM standard core interfaces (i. e., blocking and nonblocking transport) by mapping them to an equivalent U P PAAL timed automata representation. To achieve this, we adopted the formalization we presented in [9] for sockets and transactions, and we presented a set of timed automata models that precisely capture the semantics of the TLM core interfaces. The formalization of sockets and the generic payload requires some restrictions on the set of input designs because some of their characteristics can generally not be transformed into an equivalent timed automata representation. This means that we have to impose the following additional restrictions on a given SystemC/TLM design: 3) We require that sockets are created statically, and that socket binding only takes place before elaboration time. 4) The instantiation of the generic payload with a concrete transaction type has to consist only of (possibly multiple) bounded integers. 5) The number of concurrent non-blocking transport requests must be bound by a statically determinable maximum. If these restrictions are met, the transformation of sockets only requires to determine which methods are bound to a socket and then use a standard call-return semantics. The transformation of the TLM core interfaces was more challenging as it requires additional semantical constructs, namely the payload event queue (PEQ). The purpose of PEQs is to enable the independent implementation of the delays of different communication phases in the target and the initiator. To this end, a PEQ is able to manage a time-ordered list of transaction objects. A transaction object is inserted by calling a notify function, whose parameters the transaction object and a delay. When the delay expires, a predefined callback function peq_cb is called on the transaction object and handles its further processing. The principle of a PEQ is shown in Figure 3. To capture the semantics of the PEQ, it is necessary to keep track of a set of concurrent notifications, and to invoke the callback method at the correct times and in the correct order. In [10], we presented four timed automata to capture the semantics of a PEQ: One automaton implementing a time-

Scheduler

Representation of SystemC designs in U PPAAL

x = mintime enables the corresponding edge ∪ c depict urgent and at mintime. The symbols and committed locations. IV. F ORMAL S EMANTICS OF S YSTEM C/TLM A. SystemC Semantics In [9], we have presented a formal semantics for SystemC by defining a transformation from SystemC into U PPAAL timed automata. The transformation preserves the (informally defined) behavioral semantics and the structure of a given SystemC design and can be applied fully automatically. It can handle all relevant SystemC language elements, including process execution, interactions between processes, dynamic sensitivity and timing behavior. It only requires two restrictions: 1) We cannot handle dynamic process or object creation. 2) Only bounded integer variables are supported. The first restriction should hardly narrow the applicability of the approach, as dynamic object and process creation are rarely used in SystemC designs. The second restriction is also acceptable, as most data types used in SystemC designs can be converted to bounded integers. Figure 2 shows how we represent SystemC designs in U PPAAL. Each method is mapped to a single timed automata template. Process automata are used to encapsulate these methods and care for the interactions with event objects, the scheduler, and primitive channels. The interactions are modeled using U PPAAL channels. For example, the processes notify events using notify, and the events trigger the processes over a wait channel if they are notified. To formalize the execution semantics of SystemC, we developed predefined timed automata models of the SystemC scheduler, processes, events and other SystemC constructs

69

xj notify(

tj

, dj)

enqueue() t1

t0

PEQ t0 peq cb(

...

ti

tj

for local variables, because the whole timed automaton representing a method or process is only reset to its initial state if all its local variables are reset too. In our optimized version of the TLM core interface formalization, we made use of all three potential optimization angles. However, the most effective optimization is a reduction in the number of clocks because the model checking effort is exponential in the number of clocks. With a maximum number of n concurrent PEQ notifications and k PEQs in use, our previously proposed formalization of the PEQ required n · k + 1 clocks in the global elapsed array, n for each PEQ instance and one for a global clock which is used for comparisons. In our optimized transformation, we manage the timing for each transaction object in the PEQ locally in a separate process. The use of a separate process for each transaction object may sound as if producing an overhead at first. However, the U PPAAL model checker can cope much better with processes than with data structures. Saving the global clock, which was previously used for comparisons, yields an additional advantage. In particular, by using local clocks, which run independently from each other, we do not need to keep track of the differences between all clocks. Furthermore, the optimized version has no need for sorting the queued events, which also reduces the amount of used variables and eliminates the computational overhead. In the optimized transformation, we formalize the PEQ semantics using only two automata. Merging the PEQ mechanism into only two automata poses two difficulties: First, we want to faithfully respect the PEQ semantics, without losing or adding any behavior (i. e., we don’t want to perform any under- or over-approximation). Second, we need to embed the PEQ automata properly into the rest of the SystemC semantics. In particular, it is important to correctly capture the interactions with the SystemC scheduler. The requirements are as follows: 1) For each PEQ entry, the callback method must be invoked exactly at the time where the delay expires. 2) All PEQ entries must be processed in the correct order. 3) A PEQ notification must never be blocked, it must always be immediately accepted. 4) A timed automaton with a local clock must send the scheduler an advancetime signal whenever its local clock expires. This is necessary to ensure that the scheduler starts a new delta-cycle. 5) A timed automaton with a local clock must also synchronize on advancetime as a receiver. This is necessary to ensure that whenever a new delta-cycle is started by the scheduler, all actions that should take place at the same time are executed in the same deltacycle. This is particularly important to ensure that all possible interleavings of concurrent processes are considered. 6) When invoking the callback method, a PEQ behaves

tj = xj + dj ti < tj ≤ tk tk

...

dequeue() )

Figure 3.

Payload Event Queue (PEQ)

ordered list, one modeling the interface of the PEQ (i. e., the notify function the PEQ provides), one for fetching events from the queue and invoking the callback function, and one for the PEQ event. To be able to keep track of a set of timed event notifications, we introduced a global clock array elapsed. In this array, we used one entry for each (possible) PEQ element, i. e., the size of the array was determined by the maximal number of concurrent nonblocking transport requests of a given design. We used the difference between the delay stored in a queue element and its elapsed clock to achieve a time-ordering on PEQ elements in the timed automaton that implements the timeordered list, and to release PEQ event notifications if the clock reaches the delay. V. O PTIMIZED T RANSFORMATION The main goal of the optimized version of our transformation of the TLM core interfaces into U PPAAL timed automata is to reduce the semantic state space, which must be explored during model checking. We perform this reduction without loss of information, i. e., our optimized version does not under- or over-approximate the behavior of the TLM core interfaces. The key idea of our optimization is an encoding of payload event queues that is better suited for model checking. A semantic state of a U PPAAL timed automata model comprises the set of current locations, the values of all data variables, and the clock zone computed from the values of all clock variables. To reduce the semantic state space which is explored by the model checker, there are basically three possibilities: 1) Reduce a single semantic state by reducing the number of locations, variables or clocks. 2) Reduce the number of reachable symbolic states by, for example, reducing the number of clocks or the range of variables. 3) Make it easier for the model checker to detect symmetries in the model. Note that symmetry detection can, for example, be eased by resetting unused variables. This is in particular helpful

70

... peq_fetch#ctrl? m_peq_notify_ctrl! m_peq_notify_param_trans = tran, m_peq_notify_param_phase = phase, m_peq_notify_param_t = delay

peq_cb#ctrl! peq_cb#param#tran = peq_fetch#param#tran, peq_cb#param#phase = peq_fetch#param#phase peq_cb#ctrl?

...

deactivate! readyprocs-Figure 5.

Figure 6.

Timed Automaton of an nb_transport Function

Timed Automaton for Callback Invocation

notification as soon as the callback method is invoked. For every PEQ we have to create one instance of the callback invocation automata and n instances of the notify automaton for a maximum number of n concurrent PEQ notifications. For a better understanding of the interactions between the two PEQ automata and the rest of the U PPAAL model, have a look at Figure 6 and Figure 7. Figure 6 shows how an nb_transport function uses the interface of the PEQ (notify) to enqueue a transaction object together with an associated delay. Figure 7 shows how a callback function receives a transaction object from the PEQ by storing its values into local variables. Note that the four automata are connected by instantiating their parameters with the same global variables. For example, for a given PEQ instance, peq_notify#ctrl in Figure 4 is instantiated with the same global channel as m_peq_notify#ctrl in Figure 6. Similarly, the global variables are used to connect the parameters, e. g., peq_notify#param#trans and m_peq_notify#param#trans are bound to the same global variable. In the reference implementation, the parameters trans and phase are passed by-reference, i. e., their values must be copied into local variables within the notify automaton and back to the nb_transport automaton when the execution of notify is finished. A schematic of the interconnections between the different automata is given in Figure 8. The nb_transport method invokes a peq_notify automaton in a non-blocking fashion, i. e., it continues execution immediately. A binary channel is used for all control channels, which ensures that this is a one-aone communication and that another peq_notify automaton is non-deterministically chosen for each PEQ entry. The peq_notify automata synchronize themselves with the scheduler through the broadcast channels advancetime and deltadelay, and they invoke peq_fetch through a binary channel if their delay expires. Finally, peq_fetch invokes the callback method peq_cb, which optionally may invoke another call to nb_transport. Our formalization faithfully respects the semantics of the PEQ implementation in the TLM 2.0 standard with only one exception: in the TLM 2.0 implementation, PEQ entries that expire at the same time are processed in a deterministic order. The sorting algorithm inserts those elements at the last

as a standard SystemC process. In the original implementation, this is ensured by a dedicated SystemC process that fetches transaction objects from the queue and invokes the callback function. The two U PPAAL timed automata that precisely capture the semantics of the PEQ mechanism and also meet these requirements are shown in Figure 4 and 5. The automaton in Figure 4 receives incoming notifications, stores the transaction object (its payload and its phase) in the local variables payload and phase, and the given delay in a local delay variable. Furthermore, it starts a local clock c by resetting it to zero. Then, it changes to a location with an invariant c