Modeling and Verification of a Time-triggered Networking Protocol

1 downloads 0 Views 339KB Size Report
EOF: End Of Frame. CAN data frame bits. Basic_Cycle = 1. Basic_Cycle = 2. Transmission ..... case message length (maximum bit stuffing) for any transmission ...
Modeling and Verification of a Time-triggered Networking Protocol G. Leen1, D. Heffernan2 (1) ISERC / CSRC, Foundation Building, University of Limerick, Ireland. [email protected] (2) CTVR, Elec. and Comp. Eng. Dept,. University of Limerick, Ireland [email protected]

Abstract Analysis estimates that more than 80% of all current innovations within vehicles are based on distributed electronic systems. Critical to the functionality and application domain of such systems is the underlying communication network. Current advances in control networking technology indicate that time-triggered architectures offer improvements in deterministic behaviour, which are particularly appropriate for safety-critical and real-time applications. Here we present novel work on the formal specification and formal verification of a timetriggered protocol: ISO 11898-4 - Time Triggered communication on the Controller Area Network (TTCAN)®. This work has been carried out using the UPPAAL model checker based tool set which is capable of verifying safety properties as formalised by simple reachability properties. These verifiable properties are a subset of those possible in a full realisation of Timed Computation Tree Logic (TCTL). Three TTCAN network automata and a medium automaton were designed. Nine properties including deadlock were examined. The results provide a high degree of confidence in the correctness of the TTCAN protocol specification. The formal verification research work described here was conducted in parallel with the preparation of the ISO standard protocol specification for TTCAN.

1. Introduction Electronic and software content represent an increasing percentage of the manufacturing cost of vehicles where current estimates are in the order of 20% – 30% [17]. As with all computerised systems, software ‘reliability’ or correctness is of paramount importance. This is particularly the case when the systems involved are safety critical in nature. Currently the automotive industry is pursuing the technology necessary for the widespread deployment of X-by-wire

systems in vehicles [22] [14]. Such systems will replace many existing mechanical and hydraulic elements in vehicles. For example steer-by-wire will substitute the steering column and power steering apparatus with a configuration of steering angle sensors, the appropriate communications network and motors to control the position of the road wheels[19]. Similarly, brake-by-wire technology will replace much of the hydraulic and mechanical elements involved within a distributed electronic control solution. In light of the forthcoming X-by-wire critical applications, the industry is taking progressive measures to ensure the reliability and safety of such systems. One such initiative is to use formal methods to verify the correctness of these systems. Using rigorous mathematical techniques various facets of the systems’ design and behaviour may be examined in an exhaustive manner. This paper presents the recently completed phaseone of a project on the formal verification of the communications protocol: ISO 11898-4, Time Triggered Controller Area Network (TTCAN). This work was conducted concurrently with the design of the TTCAN protocol and provides strong evidence to support its correctness. Although TTCAN might not be deployed in strict steer-by-wire or brake-by-wire applications, it may be implemented in such systems which have ‘dormant’ mechanical backup. The formal specification and verification tool used in this work is UPPAAL [5] [3], developed jointly by Uppsala University, Sweden and Aalborg University, Denmark. The properties which may be examined are essentially invariant and reachability properties, as defined by the following abstract syntax: ij ::=  … E ~‘E E ::= D ~E1šE2~™E Where D is an atomic formula, i.e. either an atomic clock (or data) constraint or a component location. In addition a number of inductively derived properties may also be tested, e.g. bounded liveness properties,

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

data frame. The MC schedule is divided into Basic Cycles (BC), where each BC commences with a message known as the Reference Message and terminates with the occurrence of the next Reference Message. A BC is subdivided into time windows. The duration sequence of time windows in all BCs is equal. This regular structure correlates to a 2-dimentional matrix, where a row represents a BC, and a column, known as a Transmission Column (TC) which corresponds in duration to a time window interval. Thus time windows of a TC are of equal value in the time domain, however they may differ in the data domain i.e. they may contain different messages. IFS: SOF: RTR: IDE: r0:

Inter Frame Space Start Of Frame Remote Transmit Request IDentifier Extension Reserved bit

ACK

Max. 8 bytes

15

1

1

1

Data field

CRC

1

1

1

4

Time_Mark 1

Time_Mark 2

Time_Mark 3

Arbitration & control field

Transmission Transmission Column Column 1 2

7

ACK

Transmission Column 3

3

EOF

Ref_Mark

11

Time_Mark 4

bits

Transmission Column 4

Ref. Msg.

ID 5

ID 10

ID 13

ID 4?7?11

ID 1?2

Basic_Cycle = 0

ID 3

ID 6

Ref. Msg. Basic_Cycle = 1

ID 1?2

Basic_Cycle = 2

ID 1?2

ID 4?7?11

ID 8

ID 4?12

ID 6

Ref. Msg.

ID 9

Cycle Time (NTU) 0

130

Arbitrating Tx Enable Window

290

450

Exclusive

610

Free

780

Reference Message

ID 'X' CAN Frame, Identifier 'X'

Figure 1 TTCAN matrix cycle TTCAN defines a further time interval at the start of each TC window (except for TC zero). This interval, known as the Transmission Enable Window (TEW), defines the ‘launch window’ for the message. Messages may commence transmission only during this interval, provided the medium is idle. Such a requirement ensures that messages released in their respective time window do not over-run into the next medium at this time. CANs’ native bitwise arbitration determines which message is successfully transmitted.

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

EOF IFS

Global Time

A real time branching temporal logic where time elements are expressed implicitly, relative time references are supported and interpreted in terms of an additive continuous time domain, that is non-negative real numbers. 2 At the time of preparing this paper the document ISO11898-4 Road vehicles -- Controller area network (CAN) -- Part 4: Time-triggered communication, was in the Final Draft International Standard (FDIS) stage and registered for formal ISO approval. 3 Note: ‘?’ in the context of this illustration represents ‘or’ and indicates that two or more messages are competing for access to the

Delimiter

1

DLC DATA CRC

Delimiter

SOF

3

Identifier

RTR IDE r0

IFS

Transmission Column 0

1

Data Length Code Cyclic Redundancy Check ACKnowledge End Of Frame

Logic "1" Logic "0"

2. TTCAN This section provides a concise overview of the TTCAN2 protocol, for further information refer to [20] [18] [11].The TTCAN protocol realises a global static schedule for message transactions based on a TDMA structure. The TTCAN protocol itself is essentially based on the addition of a session layer (OSI layer 5) to the existing CAN protocol stack (OSI layers 1 and 2). Time is divided into time windows and messages are scheduled for transfer within the bounds of these time windows. The schedule itself, known as the Matrix Cycle (MC), defines a finite number of message transactions, over a finite time interval. Once the schedule has completed it repeats indefinitely (much like a weekly bus time table repeats each week). Figure 1 illustrates an example TTCAN MC3 and CAN

DLC: CRC: ACK: EOF:

CAN data frame

Ref_Mark

etc. As stated in [6] and [13] the simple class of verifiable properties are a subset of those possible in a full realisation of Timed Computation Tree Logic1 (TCTL) [4]. The UPPAAL tool set has been used to verify a number of communication protocols, industrial case studies and UML statecharts with real-time extensions [23]. In order to examine properties of the TTCAN protocol an accurate model reflecting the protocol characteristics was created using the UPPAAL system editor. Models of three TTCAN network nodes and the physical medium were designed, while nine formal properties including deadlock were examined using the UPPAAL verification engine. This paper is organised as follows. Section 2 presents a brief overview of the TTCAN protocol. It is assumed that the reader has some prior knowledge of the underlying CAN protocol, otherwise the reader may refer to [16] [7] [8] [12]. Section 3 discusses the framework for the formal verification of TTCAN. Section 4 describes the system automata. The formal verification of the TTCAN protocol is discussed in Section 5. Finally, section 6 draws a number of conclusions and discusses proposed further work. For a more in depth discussion on the formal verification of the TTCAN protocol, the reader is referred to [21].

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

TE2 No S1, S2 or S3 Errors for 1 complete Matrix

TE1

Bus Idle @ Tx_Enable window for a time (Exception Ref. triggers)

No Error for 1 complete Matrix

Tx_Overflow

(S0)

Tx_Underflow Any Rx/Tx MSC diff. >2 Rx MSC==7

Application WD Config. Error CPU Action

Ref. Watch Dog

Config. Error

Config. Error Application WD Ref. Watch Dog

(S3)

Application WD

Severe Error

Ref. Watch Dog

All CAN bus operations stoped

CPU Action

CPU Action

Tx_Overflow Tx MSC==7

(S1)

(S2)**

Warning

Error

No Errors for 1 complete Matrix expept S1-Errors Rx MSC==7

Tx_Underflow

Any Rx/Tx MSC diff. >2

TE2* Bus Idle for a time @ Tx_Enable Win. (Exception Ref. triggers)

Ref_Trig_Offset:=Initial_Ref_Trig_Offset

(S2)** Disable frame Transmissions (ACK still possible) Ref_Trig_Offset:=127 (May Tx Reference messages)

Figure 2 TTCAN error state transition diagram As the occurrence of a Reference Message ensures the progression of the MC, measures have been taken to ensure the presence of the Reference Message on the network. Nodes capable of releasing a Reference Message are known as ‘potential time masters’ and the node currently responsible for the release of Reference Messages is known as the ‘active time master’. If the active time master fails for any reason then a potential time master will release a Reference Message thereby synchronising the TTCAN network and ensuring the proper execution of the time-triggered communications cycle.

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

Tx MSC==7

No Error

Tx MSC==7

Synchronisation of the member nodes in a TTCAN network is realised through the creation of a global

event in time to which all other events are referenced. This event, known as a ‘Ref Mark’, is the frame synchronisation pulse generated at the sampling point of the Start of Frame (SOF) bit of a Reference Message. Conceptually on the occurrence of this event all member nodes reset a counter which records the passage of time during a BC. This notion of time is referred to as Cycle Time. System events such as the activation of a Tx_trigger or an Rx_trigger are indexed using Cycle Time.

Tx_Overflow

time window and thus prevent corruption of the MC temporal integrity. In the TTCAN Matrix Cycle there are three fundamental types of time window: free windows, exclusive windows and arbitrating windows. Free windows are scheduled bus idle periods; they allow for later system expansion. Exclusive windows are intervals where a single specific message is scheduled to have exclusive transmission rights on the medium, without competition from other nodes on the network. During arbitrating time windows, two or more nodes may arbitrate for medium access. CAN’s native medium control access mechanism is based on an nondestructive-bitwise-arbitration policy which resolves conflicts in this situation. When two or more arbitrating windows are sequential they may be appended to form a larger merged arbitrating time window. In this case the TEW for the merged arbitrating windows are joined as illustrated in Figure 1, ( in BC 1, the time windows TC2 and TC3 are merged). The TTCAN protocol defines two register sets to control the transmission and reception of messages, these are Tx_Triggers and Rx_Triggers respectively. Associated with a Tx_Trigger register set is a pointer to the single specific message structure, an index for the TC and BC to define when the message is to be released and a repeat factor. The repeat factor sets the period within a TC when the message is again released, provided the message is periodic within the scope of a MC TC. Rx_Triggers are similar to Tx_Triggers, however they record whether or not a given message has been received since the start of a given BC. Associated with each message appearing in an exclusive window is a Message Status Counter (MSC). An MSC has a bounded range of 0-7, which records successful and failed message transactions by incrementing and decrementing the MSC appropriately. Figure 2 illustrates the error state machine transition behaviour for a TTCAN node. There are two levels of synchronisation quality in TTCAN: Level 1 and Level 2. Level 2 is an extended version of Level 1. In both implementations system time is measured in units known as Network Time Units (NTU). In Level 1 the NTU is equal in duration to a nominal CAN bit time. In Level 2 the NTU is referenced to a fraction of the physical second. Additionally, Level 2 provides mechanisms to improve the synchronisation quality within the network (Level 2 will not be further discussed here, see [9] [10]).

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

To draw a simple analogy with the way that TTCAN works, consider for a moment an airport runway, which is a mutually exclusive resource, much like the physical medium in the case of a CAN network. Now consider airplanes to be equivalent to message frames on the CAN network. Then, as in the case of airports, in order to realise the maximum potential of the limited resource a schedule or timetable is enforced for all flight arrivals and departures. Similarly in the case of a TTCAN network a schedule is enforced for message transactions.

3. Formal description of the protocol A formal representation of the TTCAN protocol was created using the UPPAAL tool suite. The timed automata used to model the TTCAN protocol are finite state automata decorated with a finite set of real-value clock variables. For a comprehensive description of the timed automata and associated networks thereof, as applied in the UPPAAL verification tool, refer to: [2] [15] [23]. The next section provides a brief explanation of the TTCAN model, for a mode detailed description please refer to [21].

3.1. The Formal verification framework Uppaal uses finite-state automata composed of edges and vertices extended with real-value clock and data variables to describe real-time systems. Clocks record the progression of system time since they were last reset. All clocks progress at the same rate while data variables have zero rate and finite domain. Edges of an automaton are decorated with one or more of three possible types of labels: guards, synchronisation actions and clock resets or assignments to integer variables. A guard is a conjunction of simple timing and data constraints: a timing constraint is of the form C ~ n or C – C1 ~ n, where n is a natural number, C, C1 are clocks and ~ {, , !, d, t}. Data constraints are of a similar form k ~ n, j – k ~ n, where j, k and n are integers. In the absence of a specific guard label the default guard on an edge is true. Synchronisation labels occur in complementary pairs of the form a! and a? where a is the name of the synchronisation channel, ! denotes the sending component and ? denotes the receiving component. Absence of a synchronisation label on an edge implies an internal (non-synchronised) transition path. A system of timed automata consists of a number of individual automatons; each in effect simular to a state

machine structure. As is the general case in state machines operation is executed through the progression of action or control from state to state along the enabled connecting edges. In order to synchronise or coordinate the combined operation of two individual automata control transition along an edge containing the a! label will force the progression of control along the complementing edge of the second automaton provided the second automaton is ready to synchronise on this action. To prevent systems from delaying in the case where automata are able to synchronise, a channel can be declared urgent. This label forces the synchronisation action without delay as soon as it is possible. The current release of UPPAAL 3.4.1 also allows multicast synchronisation in the form of a ‘one to many’ channel. Clock resets are of the form C := n, where C is a clock and n is a natural number. Resets or assignments of integer variables are of the form J := n*J + k where n and k are integer constants, either positive, zero or negative, and J is an integer variable. Vertices or state of an automaton may be further decorated as either initial, committed or urgent. Every automaton must have an initial state where it starts at time zero, denoted by the letter O. Committed states identified by the letter C i.e. ©, enforcing that a transition, synchronisation or not, leaving this state must be taken immediately without delay. Urgent locations identified by the letter U require that, although other transitions which are open elsewhere in the system may be taken first, this urgent transition must occur without the passage of system time. A further mechanism, invariants, may be applied to enforce discrete transitions and wait states within an automaton. In this case a location is labelled with a clock constraint requiring that a transition leaving the location be taken within a specified time bound. State invariants may be combined with transition guards to precisely control the temporal progression of control within an automaton. Invariants may also be used as a means of defining a bounded time interval, within which control may progress from one state to another. This construct introduces a bounded temporal tolerance on the progression of control.

3.2. Assumptions The TTCAN protocol adds a session layer (layer 5) to the existing CAN ISO OSI model. The original CAN protocol itself resides on layers: 1 (Physical Layer) and 2 (Data Link Layer). The purpose of this research was not to verify the operation of the CAN protocol itself in detail but rather the operation of the TTCAN protocol, hence the UPPAAL models reflect

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

the functionality of the TTCAN protocol while only essential services of the underlying CAN layers are modelled. This approach is necessary in order to minimise the complexity of the model for two very valid reasons: one being that exhaustive modelling of the entire OSI stack would have been a mammoth task, while the other being that the resulting verification state space would have been enormous and beyond the capability of current formal verification technology. The TTCAN protocol model presented here is not an exhaustive model and thus imposes a number of restrictions on the behaviour of a TTCAN network, for the aforementioned reasons. Only the functional behaviour and performance of the TTCAN protocol is of interest in this assessment and hence other unnecessary detail is avoided. The restrictions listed below help to remove unnecessary complexity from the model and prevent a state-space explosion during the verification process. Further iterations of the verification process will focus on other specific properties of the protocol, such as variable message length, clock drift between nodes etc. The primary assumptions enforced in the design of the TTCAN model described here are given below, along with the implications for these assumptions. Assumptions: a) The medium does not introduce errors. This assumption allows us to dispense with the bit error, stuff error, frame error, acknowledge error and CRC error checking mechanisms of CAN, which are not in question. b) All messages exchanged are fixed in length. This assumption is made in the context that the worst case message length (maximum bit stuffing) for any transmission is assumed. c) For the purposes of this model all clocks are assumed to proceed at the same rate and thus the NTU is a constant within the system. This implies zero oscillator drift and tolerance, future models will relax this assumption. Implications of assumptions: a) The data field and CRC field are assumed to be consistent and correct, and are therefore ignored as are the other native CAN error checking mechanisms. This follows from assumption l. b) The correct acknowledgement of all messages may be inferred. c) There are no error frames, once a message successfully arbitrates it is transferred without error. This follows from assumption 1.

d) Reference Messages are of fixed length (in bits) regardless of which potential time master produces them, this follows from assumption 2. e) Normal messages of the BC are fixed in length. Additional points: a) Data processing of a logical context takes zero time. b) Certain timeout values have been reduced in order to help minimise the verification state space.

4. The System Model The TTCAN protocol behaviour was abstracted from the text-based specification of the ISO draft protocol during the development of this protocol. The protocol behaviour was then manually translated into automata which represented the essence of the protocols operation. The final model defines a system of 10 timed automata, representing two potential time master nodes, a time receiving node and a CAN physical layer, in the context of a Level 1 TTCAN implementation. The individual system automata for each node were: a combined error state machine and error handler automaton, a protocol scheduler automaton and a transceiver automaton. Each of the three network nodes contains these elements and communicated via the physical medium automaton. Figure 3 illustrates a top-level view of the system automata. Each network node group contains a scheduler. Encoded within the scheduler is the MC information relevant for the correct operation of the respective node. The scheduler also contains a local clock, which records Cycle Time. Thus, the scheduler determines at what point in local time the Rx_Triggers and Tx_Triggers become active and for which messages. The transceiver automata interacts with the physical medium automaton by transferring and receiving messages to and from the physical medium and by observing the state of the physical medium, i.e. idle and busy. The physical medium automaton performs the identifier arbitration function of the underlying CAN protocol. The error handler automaton monitors the progress of the message transaction sequence as defined by the portion of the MC held within the relevant node scheduler. The function of error handling is performed predominantly through the manipulation of MSCs along with additional information derived from the node’s scheduler and transceiver. The error handler contains an error state machine segment which evaluates the calculations of the pure error handler portion of the automaton and

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

determines the error level status for the node. Figure 4 provides an error state machine automaton. TXFRAME! ENDFRAME !

Error _Containment_Node_X

Clocks Bus_Clock

Channels

Clocks

EOF!

TXFRAME? ENDFRAME? FAILED_ARB_x! SOF! N_Frame_Id_X EOF! STABLE_ID! STABLE_ID!

Channels TX_TRIGGER_E_X? RX_TRIGGER_X? MSC_TX_OK? MSC_TX_NOK? CHECK_MSC_X?

MSC_TX_OK_X!

Variables

Variables

Clocks

Bus_Frame_Id N_Frame_Id_x

Bit_Clock_X

Rx_Bit_Array_X[]

Disable_Tx_X MSC_X[] Tx_Count_X E_State_X Pointer_X Largest_MSC_X Smallest_MSC_X MSC_DIFF_X TX_UNDER_X TX_OVER_X Pt_Tx_Message_X Rx_Message_X RX_MSC7_x Rx_Bit_Array_X[] TX_MSC7_X S_One_X S_Two_X S_Three_X Ref_Trig_Offset_X_N Ref_Trig_Offset_X_P Ref_Trigger_X Arb_X

Transceiver_Node_X

Disable_Tx_X

Constants

Channels REF_MARK_X! MSC_TX_OK_X! MSC_TX_NOK_X! TXFRAME! ENDFRAME! TX_TRIGGER_X? FAILED_ARB_X? SOF? EOF? STABLE_ID?

Bus_Idle Bus_Not-Idle

Variables Disable_Tx_X Bus_Frame_Id N_Frame_Id_X Pt_Tx_Message_X Rx_Bit_Array_X[] Cycle_Time_X Cycle_Count_X Arb_X

Constants

MSC_Max Array_End Expected_Tx_Count_X Ref_Trig_Zero Max_Offset Off On

Message_Time_C Ref_ID IFS Msg_X Bus_Idle On Off

Scheduler_Node_X Arb_X

Clocks Ref_Trig_Offset_X_P

Cycle_Time_X Sync_X

Ref_Trig_Offset_X_N Ref_Trigger_X

TX_TRIGGER_E_X! RX_TRIGGER_X! MSC_TX_NOK!

Channels

FAILED_ARB_X!

REF_MARK? TX_TRIGGER_X! TX_TRIGGER_E_X! RX_TRIGGER_X! FAILED_ARB_X? MSC_TX_NOK! SOF?

REF_MARK!

TX_TRIGGER_X!

Physical_Medium

Constants

CHECK_MSC_X!

Variables Tx_Count_X Pt_Tx_Message_X

Rx_Message_X

Pt_Tx_Message_X

Pt_Tx_Message_X Ref_Trig_Offset_X_P Ref_Trig_Offset_X_N Ref_Trigger_X Bus_Frame_Id State_X Rx_Bit_Array_X[] Cycle_Count_X Rx_Message_X Tx_Count_X C_Tick_X Arb_X WATCH_X

Cycle_Count_X

Bus_Frame_Id SOF! FAILED_ARB_x!

Constants

Node_1

with CAN frame identifier 4 to the physical medium automaton in the time window defined by BC 0, TC 1. Provided the bus is idle at the time of transmission and media arbitration is successful, then Node 2 and Node 3 will receive this CAN frame via their respective transceivers. If an Rx_Trigger has been configured for this message in nodes 2 and 3 then this Rx_Trigger will observe that the message has been received correctly and the MSC corresponding to message identifier 4 will be updated appropriately. Should the updated value of this MSC warrant a change of error level then the error state handler will observe this and act accordingly. Design of the formal system is such that functionality is distributed in a manner which endeavours to minimise the use of synchronisation channels and the total number of automata while preserving the real-life structure of a three node TTCAN network ensemble. The design has also considered valuable guidelines provided through the UPPAAL discussion group: x Minimise the non-determinism in the model x Keep the number of clocks as low as possible x Re-set variables that are not relevant anymore to a specific value (typically 0) x Remove redundant states, e.g. those that do not exhibit interesting or possibly error-prone behavior x Minimise the interleaving of parallel processes. (This can often be achieved by declaring states to be committed (i.e. they have to be left immediately and do not contribute to the state space) x Do not introduce unwanted extra behavior that is not specifically relevant to the properties you are investigating x Declare explicit domains on then integer variables

Gap_T_O_X Ref_Trig_Zero Init_ref_Trig_Offset_X A_T_Master P_Master Synchronising Tx_Enable_Col_(X) Start_Tx_Col_(X) RX_Trigger_Col_(X) End_Tx_Col_(X) Init_Watch_Trig Ref_ID_X Watch_Trig

Time Master 1

Node_2

Time Master 2

Node_3

Time Receiving

Figure 3 TTCAN protocol network model Each node, once synchronised to the active time master, transmits messages and registers the reception of messages as per the relevant portion of the MC. For instance, Node 1 may attempt to transmit a message

5. Formal Verification of the TTCAN protocol In this section we present the results of a portion of the analysis completed on the TTCAN protocol. The TTCAN system automata were created in the editor component of the UPPAAL tool suite and loaded into the verification component of the tool suite where various properties were examined. In order to minimise the computational resources required, the UPPAAL verification engine was run from the command line, this approach removed the unnecessary resource overhead of running the graphical user interface component of the tool. The computing platform was an i86 clone, Xeon 1.7 GHz processor, with 4GB of RAM, running LINUX Red Hat V7.1, kernel version

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

Figure 4 Error Containment automation for node 1 2.4.15. The UPPAAL verification engine was version 3.2 Beta 4 (3.1.64), June 2001. It was observed that changing the options in the verification engine had a significant effect on the speed and memory footprint of the verification process. Table 1 provides the resource usage to verify a portion of the TTCAN system using various combinations of options, the effect on resource utilisation and convergence is strongly dependent on the configuration of the verification engine. At this point it must be highlighted, so as to avoid confusion, that selection of the under-approximation option may result in what appears at first glance to be an incorrect result being reported, however the answer is precise when the tool indicates that a state X is rechable, while the answer is inconclusive only when the tool reports that

state X is not rechable. The opposite is true for the over-approximation setting. The authors would suggest when verifying a very large system to experiment with the verification engine configuration preferably using a representative subset of automata, as this exercise may save hours of frustration and CPU time later. It was observed that changing the options in the verification engine had a significant effect on the speed and memory footprint of the verification process. For instance Table 1 illustrates the effect of changing the settings while the input model remains the same. The model in this case was a scaled down subset of the entire system. With each new setting configuration (A - Q) the system was re-loaded into the verification engine, the RAM footprint, execution time and query result were

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

noted. In this case the query was A[] !(Er_H_S_2S.S39) which asks the question: “will the state Er_H_S_2S.S39 ever be reached?”, where this state corresponds to error level 2 in the TTCAN error handler automata. The system itself was correctly configured and not “expected” to reach this error state, in fact the design of the automata did not allow this error state to be reached.

provided. The source node for each message, the start time and end time for all TCs are also given. TC 0

TC 1

1?2 80 BC 0 1?2 1?2 80 BC 1 1?2 1?2 80 BC 2 1?2 0 130

5 120 2 286 13 120 3 286 9 120 2 766 130 290

TC 2

TC 3

TC 4

Explanatory Comment

10 3 130 120 3 1 446 606 4?7?11 4?7?11 130 120 1?2?3 1?2?3 7?11 120 Free 2?3 290 450 450 610

6 100 2 766 8 100 2 766 6 100 2 766 610 780

Message Identifier Message Length (NTU) Message Source Rx_Trigger Message Identifier Message Length (NTU) Message Source Rx_Trigger Message Identifier Message Length (NTU) Message Source Rx_Trigger Start TC (NTU) End TC (NTU)

Table 2 Example Matrix Cycle scheduling information Verification of the error state machine and error handler mechanism in the context of a correctly configured MC was achieved using the queries 1 to 9 listed below. Conforming to current nomenclature, an implicit proposition at(A.l) will be denoted A.l. This formulation reflects the notion of control within the automaton “A” being in location “l”. In addition invariance properties are of the form  … E, whereE is a local property, and  reads: “Always”. The symbol “™” is the negation operator while “›” is the “or” operator and “š” is the “and” operator.

Table 1 Example effect of changing the verification engine settings As can be seen from Table 1 changing the parameters in this case resulted in a variation of 135 s between the fastest and slowest verification process and a variation 147 M bytes in the memory footprint. This example illustrates the importance of finding the correct verification options for a particular system. Table 2 illustrates the MC used in the TTCAN system automata. The MC consisted of 5 TCs and 3 BCs. The table provides the message identifier for each message, when 2 network nodes compete for access during the same time window the symbol “?” separates the possible message identifiers which may be released during this time window. The maximum message lengths, inclusive of maximum bit stuffing are given in NTUs. The Rx-Trigger times for each transmission scheduled in an exclusive time window is also

1.  … ¬ (Error_Containment_Node_1.S39 щ Error_Containment_Node_1.S40 щ Error_Containment_Node_1.S41) 2.  … ¬ (Error_Containment_Node_2.S39 щ Error_Containment_Node_2.S40 щ Error_Containment_Node_2.S41) 3.  … ¬ (Error_Containment_Node_3.S39 щ Error_Containment_Node_3.S40 щ Error_Containment_Node_3.S41) Location names in property one may be found in Figure 4, in the lower section of the combined error handler and state machine for potential time master 1. Location Error_Containment_Node_1.S39 represents a location which is entered only if the S1 error level is active. Location Error_Containment_Node_1.S40 is only entered if an S2 error is active and so on. Properties two and three represent similar locations in the error state machine components of the remaining two network nodes. These queries are equivalent to asking the question: ‘the system never enters into location Error_Containment_Node_1.S39 or into location…..’.

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

As these query locations represent the cumulative effect of errors within the system a large number of possible error conditions can be examined by observing a node’s reported error level. For a correctly configured system the verification shows that all these properties are satisfied, as represented by the combination of queries 1, 2 and 3. This result formally verifies that a system designed in accordance with the chosen interpretation4 of specification ISO11898-4 will never inadvertently enter an undesired error state. This means that the specification itself defines a network node implementation which has no ‘hidden’ execution traces into the aforementioned undesired states for a correctly configured MC, in the previously stated context of this model. The ‘deadlock free’ operation of the system was also verified using property 4 as follows5: 4.  … ¬ deadlock Verification of the correct operation of the error state machine and error handler mechanism in the context of an incorrectly configured MC was verified using the following queries: š› 5.  … (Error_Containment_Node_1.S39 щ Error_Containment_Node_2.S39), with Node 3 absent 6.  … (Error_Containment_Node_1.S39 щ Error_Containment_Node_3.S39), with Node 2 absent 7.  … (Error_Containment_Node_2.S39 щ Error_Containment_Node_3.S39), with Node 1 absent 8.  … ¬ (deadlock), with two masters, and with one potential master and one slave 9.  … (State_1 == 0 щ State_1 == 1 ) phased startup of time masters, with node 2 starting early. The propositions 1 to 3 inclusive, were satisfied to be correct6. Properties 5 – 7 verify the expected functionality of the error state automata, as expected these properties were unsatisfied due to the absent MC 4

As the specification document is written in natural language, ambiguity, unintentional or otherwise, while allowing scope for implementers’ creativity, also provides scope for correct interpretation or otherwise. 5 Note that this semantic for deadlocks uses syntactic transitions, thus invariants on the target state have no influence on whether the source state is deadlocked or not. 6 Hence, in a correctly configured system there is no possible way to enter an error state. Note, this excludes the possibility of external system perturbation, as described in the model restrictions.

messages when the respective nodes were removed from the virtual network. Properties 8 and 9 were also satisfied. In addition to the afore listed properties many additional properties have been verified in the course of the model design process. The model itself required verification of behaviour at the various levels of completeness and paralleled the evolution of the specification. Based on experience gained during this project and from discussions with the UPPAAL team the following provides recommendations on model verification using Uppaal: x try active clock reduction (especially if there are many clocks) x if one suspects a bug, try a depth-first search first x if confident that the system satisfies a safety property (A[] ...) do a breadth-first search x often over-approximation as it is faster method (however it might fail to prove a valid property, as explained earlier) x sometimes, the global/local reduction option helps x Upgrade to the current version of Uppaal

5.1. Computational The individual verification calculations consumed varying amounts of computational time and memory. The properties 1 to 3 inclusive took approximately 2.5 hours each to verify while consuming 1 GB of RAM. The properties 5 to 7 inclusive took approximately 1 minute per property to verify. We also noted that inappropriate configuration of the verification engine options (i.e. breadth first search, re-use of state space, under-approximation, etc.) resulted in 2 GB of memory usage while completing in a little under 20 hours, thus based on this experience we recommend that it is prudent to experiment with these configuration options.

6. Conclusions In this paper we have presented an overview of the new time-triggered protocol, TTCAN and its formal verification. An overview of the formal models of the TTCAN protocol, described as a network of timed automata has been presented. These automata capture the essence of the protocol behaviour and may help elevate any misinterpretations regarding the textural specification. A number of key properties of the protocol have been formally examined, including deadlock free operation.

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

The work described in this paper is novel in that, generally speaking, ISO protocols have hitherto not been formally verified during the design phase of the specification. The removal of errors and flaws in the early stages of such design processes pay large dividends both economically and in effort expended. As J. Author et al. have pointed out ambiguities in the specification itself are often an ‘innocent’ source of error [1]. Fortunately, a formal specification has little scope for ambiguity or mis-interpretation even across natural language boundaries. It is therefore not unreasonable to propose that future international specifications are formally verified prior to release and that a formal specification be included as a component of the specification release documentation. The formal verification of the protocol specification is independent of whether the implementation is realised in software or in hardware. To date the protocol has been realised in software by NEC and in hardware by both Bosch and Hitachi. The mathematical models presented in this paper will now form the basis of continued investigation into the performance of the TTCAN protocol. Issues such as clock drift between nodes will be examined. A number of the model restrictions will be relaxed and a more detailed examination will be conducted. For instance an automaton can be introduced into the system which generates a bounded number of transmission failures on the medium, and the subsequent protocol error containment behaviour may be observed. Indeed, as the models created are quite flexible, actual systems may be simulated and verified prior to implementation. Critical parameters such as worst-case message latency may be examined for specific messages in bounded error conditions. Different MC configurations may be compared and application specific control loop requirements may be formally verified to be satisfied, or otherwise. A future model of an extended TTCAN Level 2 implementation may in future be examined, although some semantic restrictions of the UPPAAL language may make the specification of this feature a complex challenge7.

6.1 Acknowledgement: The authors wish to gratefully acknowledge the assistance of Prof. Robin Sharp, Prof. Hongyan Sun, Prof. Hans Henrik Løvengreen, Edward Todirica and Per Friis of the Computer Science and Technology Division, Informatics and Mathematical Modeling 7

The UPPAAL syntax does not allow the comparison of clock variables and integer variables.

Department, Technical University of Denmark, for their guidance and support. The authors wish to thank Enterprise Ireland, the Irish Research Council for Science Engineering and Technology, the Irish Software Engineering Research Consortium (ISERC), PEI-CSRC Technologies and the University of Limerick for their continued support. Thanks to Linus Torvalds, Rik van Riel, Istvan Matyasovski and Andrea Arcangeli for their support on Linux. Thanks to the UPPAAL team especially Oliver Möller, Johan Bengtsson, Dr. Paul Pettersson and Prof. Wang Yi. Finally, special thanks to Florian Hartwich and Dr. Bearnd Muller from BOSCH, and the other members of the ISO/TC 22/SC 3/WG 1/TF6 expert group.

7 References [1] Author J, Gröner M, Hayhurst K, Holloway M (1999) Evaluating effectiveness of independent verification and validation. IEEE Computer, Oct. pp. 79 – 83. [2] Aceto L, Burgueno A, Larsen K G (1997) Model Checking via Reachability Testing for Timed Automata. Tech report RS-97-29, BRICS, Dept. Computer Science, University of Aarhus, Denmark. ISSN 0909-0878, Nov. [3] Amnell T, Behrmann G, Bengtsson J, D'Argenio P R, David A, Fehnker A, Hune T, Jeannet B, Larsen K G, Möller M O, Pettersson P, Weise C, Yi W (2000) UPPAAL – Now, Next and Future. Proc. Modeling and Verification of Parallel Processes (MOVEP'2k), Nantes, France, June 19 to 23, Springer-Verlag, LNCS, tutorial 2067, 2001, pp. 100-125. [4] Alur R, Courcoubetis C, Dill D (1993) Model-checking in dense real-time. Information and Computation, vol. 104, no. 1 May, pp. 2-34. [5] Behrmann G. David A, Larsen K G, Möller O, Pettersson P, Yi W, (2001) UPPAAL – Present and Future. Proc. 40th IEEE Conference on Decision and Control (CDC'2001), Orlando, Florida, Dec. 4 – 7. [6] Bengtsson J, Larsen K, Larsson F, Petttersson P, Yi W, Weise C (1998) New Generation of Uppaal. Proc. Int’l Workshop on Software Tools for Technology Transfer, Aalborg, Denmark, 12 - 13 July. [7] BOSCH (1991) CAN Specification 2.0, Robert Bosch GmbH, Postfach 300240, D-7000 Stuttgart 30, Germany, Sept. [8] Farsi M, Ratcliff K, Barbosa M (1999) An overview of Controller Area Network. IEE Computer & Control Eng. Journal, vol. 10, Issue: 3, June, pp. 113 – 120. [9] Führer T, Müller B, Dieterle W, Hartwich F, Hugel R Walther M (2000) Time Triggered Communication on CAN (Time Triggered CAN - TTCAN). Proc. 7th Int’l CAN Conference (ICC 00), Amsterdam, Netherlands, Oct. 24-25. [10] Führer T, Müller B, Hartwich F, Hugel R (2001) Time Triggered CAN (TTCAN). Society of Automotive Engineers (SAE), Detroit, Mi., SAE paper no. 2001-01-0073. [11] Heffernan D, Leen G (2002) Time-triggered control network for industrial automation. Assembly Automation, vol. 22, no. 1, pp. 60 – 68.

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

[12] ISO 11898:1993(E) (1993) Road Vehicles, Interchange of digital information - Controller Area Network (CAN) for high speed communications. Nov. [13] Katoen J-P (1998) Concepts, Algorithms and Tools for Model Checking. Lecture notes from the course Mechanised Validation of Parallel Systems (Course number 10359) Semester 1998/1999, Printed with permission by: Department of Information Technology, Technical University of Denmark. [14] Koopman, P. (2002) Critical embedded automotive networks. IEEE Micro, July-Aug. pp 14 – 18. [15] Larsen K G, Pettersson P,Yi W (1995) Model-Checking for Real-Time Systems. Proc. 10th International Conference on Fundamentals of Computation Theory, Dresden, Germany, 22-25 August, Springer—Verlag , LNCS 965, pages 62-88. [16] Lawrenz W (1997) CAN System Engineering. From Theory to Practical Applications. Springer-Verlag, Berlin Heidelberg New York ISBN 0-387-94939-9. [17] Leen G, Heffernan D, Dunne A (1999) Digital networks in the automotive vehicle. IEE Computer and Control Engineering Journal, Dec. vol. 10, issue 6, pp 257 – 266.

[18] Leen G, Heffernan D (2001) Time-triggered controller area network. Computer and Control Eng. Journal, vol. 12, no. 6, Dec. pp. 245 – 256. [19] Leen G, Heffernan D (2002) Expanding Automotive Electronic Systems. IEEE Computer (Outlook Edition) vol. 35, no. 1, Jan. pp. 88 – 93. [20] Leen G, Heffernan D (2002) TTCAN: a new timetriggered controller area network. Microprocessors and Microsystems, vol. 26, issue 2, March, pp. 77-94. [21] Leen G, Heffernan D (2002) Formally Verifying Aspects of Time-Triggered Controller Area Network (Phases 1 & 2a). Tech. report, PEI/CSRC report no. 20020603, main library, University of Limerick. [22] Nossal R, Lang R. (2002) Model-based system development. An approach to building X-by-Wire Applications. IEEE Micro, July-Aug. pp. 56 – 63. [23] www.uppaal.com

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Authorized licensed use limited to: University of Limerick. Downloaded on May 12, 2009 at 04:28 from IEEE Xplore. Restrictions apply.

Suggest Documents