Evaluating the Effects of Transient Faults on ... - Semantic Scholar

Evaluating the Effects of Transient Faults on Vehicle Dynamic Performance in Automotive Systems F. Corno∗, F. Esposito+, M. Sonza Reorda∗, S. Tosato∗ ∗

+

Politecnico di Torino - Dipartimento di Automatica e Informatica - Torino, Italy FIAT Auto - Product & Process Engineering - Integrated Chassis Control - Torino, Italy

Abstract: Current automotive systems are integrating more and more electronic components in the handling and performance areas, for supporting advanced comfort and safety features. The effects of component or network failures raise serious concerns about the overall vehicle stability and safety. This paper proposes a methodology for analyzing at the system level (taking into account both mechanical and electronic components) the implications of transient faults in the electronic part on the overall vehicle response. A prototypical fault injection environment is also presented, and experimental results show how safety specifications for components can be derived from performance objectives set at the vehicle level.

1. Introduction The increasing demand of performance and safety in automotive applications leaded to a considerable increase of electronic systems in the vehicle, aiming at aiding the driver in particularly critical driving conditions. Nowadays, vehicles possess electronicmechanic subsystems that aid drivers during braking maneuvers avoiding the blocking of the wheels (the Anti-blocking Brake System), subsystems that keep the vehicle stable during some critical steer maneuvers (the Vehicle Dynamic Control) or that automatically change the dumpers characteristics (semi-active suspensions). These subsystems communicate over a shared network, but are currently largely independent in their strategies. Automotive manufacturers are now facing the challenge to develop integrated active subsystems, where different control functionalities, implemented by different Electronic Control Units (ECUs) interact and exchange data and commands. The next step, with huge consequences in terms of safety implications, is the design of “by-wire” systems, that should allow better control of steering, braking, accelerating functions but loses the intrinsic passive safety of mechanical components. In this new automotive conception, vehicle performance and safety both depend on the mechanical part as well as on the electronic systems. Only by taking into account the interactions of electronic and mechanical subsystems, and their joint interaction with the environment (driver and road), we may understand and quantify vehicle behavior (and performance) and dependability. Therefore, design and validation methodologies for automotive applications must be able

Paper 47.2 1332

to work at the system level, i.e., taking into account all the electronic and mechanical components. In particular, the focus of this paper is on the evaluation of the effects of transient faults that may happen in the electronic part over the dynamic response of the entire vehicle. This analysis allows us to estimate the robustness of the vehicle system during critical maneuvers with respect to ECU or network failures, and therefore design and evaluate backup strategies for guaranteeing driver’s safety. In fact, any component failure can be tolerated by the intrinsic robustness of the vehicle up to a given time or amount (intervention threshold), but any larger failure must be detected and managed by other active systems, that possibly execute recovery strategies. The main goal of this paper is to define a methodology working at the system level able to evaluate the vehicle behavior in the presence of faults, and therefore estimate the optimal values for intervention thresholds and validate the effectiveness of recovery strategies. A comprehensive reliability evaluation methodology should take into account both formal methods (functional analysis, fault tree analysis, failure mode effect analysis, etc.) that can determine dependencies and estimate probabilities, and simulation experiments that may give quantitative measures related to a subset of faults. The strategy adopted in this paper is to adopt a simulation model for the entire system (mechanics, ECUs, network, driver, environment), where we can execute complete driving maneuvers. The simulation model is enriched by fault injection capabilities [3], that are exploited by a fault injection system to execute simulation campaigns and analyze the effects on vehicle

ITC INTERNATIONAL TEST CONFERENCE 0-7803-8580-2/04 $20.00 Copyright 2004 IEEE

performance. In this way it is possible to inject faults in the electronic subsystems, evaluating their effects directly in terms of vehicle dynamic performance degradation. The methodology proposed in this paper has been developed taking into account Fiat Auto vehicle architecture, which is based on ECUs communicating through a Controller Area Network (CAN). We developed a case study by analyzing faults affecting the Vehicle Dynamic Controller (VDC) [6] inputs and sensors during critical steering maneuvers. The remainder of this paper is organized as follows: section 2 presents some related works and approaches. In Section 3 we outline the main characteristics of the vehicle model. Section 4 presents the fault injection system architecture and the system level methodology for interpreting the results. In Section 5 we present some experimental results on the Vehicle Dynamic Controller (VDC) system. In Section 6 we outline the work in progress and draw some conclusions.

2. Previous Approaches and Related Works The general approaches to reliability evaluation and fault tolerant design may resort to analytical methods or to experimental evaluations. In the latter case, fault injection techniques [2] [3] are commonly adopted, and can basically be grouped into simulation-based techniques, software-implemented techniques, and hardware-based techniques. Several approaches target fault injection in electronic distributed systems. We will mention in the following those that are most closely related to the kind of system we re considering. FIAT [7] is a Software Implemented fault injection system that works at the ECU cluster level. Later, FERRARI [8], a Software Implemented Fault Injection architecture that coordinates various operational modules for initialization, pre-runtime fault injection, data collection and analysis, and user interaction, was presented. HARTS [9] is a fault injection tool that permits to generate faults in a distributed real time system, while FITS [10] allows to inject faults in a time-triggered system, working at the Control Unit operating system layer. MEFISTO [11] is a simulation-based technique that permits to inject faults into VHDL models. Recently, some researchers addressed the problem of the reliability of time triggered network architectures via software implemented fault injection [12] [12], although they have a network-level view. Some work with a subsystem view is presented in [14], targeting fault injection for X-by-Wire applications. A recent paper [15] also analyzes the dependability of the CAN Protocol and of a hardware implementation

of a CAN controller using emulation-based fault injection campaigns. The approach presented in this paper is significantly different from the cited ones, since it works at the system level integrating mechanical and electronics models, and allows us to compute the actual driving response of the vehicle when a fault occurs. We abstract some implementation details in the system model, to allow acceptable simulation times, according to the industrial tradeoff between model accuracy and simulation speed. The proposed methodology builds upon some initial efforts by the same authors [4] [5], where a simple faultable network model was inserted in a vehicle simulation environment. Thanks to the adoption of new techniques, the method proposed here is much more effective, especially in terms of required CPU time. Moreover, we extended our method to deal with faults in network branches, thus significantly increasing the generality of the approach. Finally, the network model used here is significantly more complete than in the previous papers. The paper presents the complete methodology able to perform simulation campaigns and quantify performance loss and safety violations through a threshold-based approach.

3. System-level Vehicle Model To execute complete driving maneuvers and evaluate fault effects on the vehicle response, we adopted a simulation model for the entire system (mechanics, ECUs, network, driver, environment). The choice of the modeling and simulation environment was heavily influenced by the industrial availability of existing models for mechanical and electronics parts, and by the level of knowledge of the tools by automotive designers. For these reasons, the de facto functional modeling standard for most automotive industries is the Matlab™/Simulink™ tool, which has been used in this work. Matlab™/Simulink™ permits to define equations describing the mechanics of the vehicle and to implement control strategies or automotive-purpose blocks. The model we refer to is composed of four different kinds of communicating blocks, as shown in figure 1: • An Input block (DRIVER) able to emulate a driver executing typical vehicle maneuvers. • Mechanical blocks (ENGINE, VEHICLE & BRAKE, STEER) that contain vehicle physical equations and represent the mechanical part of the system. • Electronic blocks (ABS, VDC) that contain control logics and represent the algorithms embedded in ECUs.

Paper 47.2 1333

•

A Network model block that coordinates the communications between electronic blocks. The model simulates the characteristics of a modern vehicle with ABS (Anti-block Brake System) and VDC (Vehicle Dynamic Controller) active systems. The correctness of this model has been experimentally validated against actual vehicle behavior [6]. The following subsections give some more details on each of the blocks in the model. ENGINE D R I V E R

VDC VEHICLE & BRAKE ABS STEER

N E T W O R K M O D E L

3.2.Mechanical dynamics The Vehicle and Brake block contains the mechanical equations of a vehicle. This chassis model uses six degrees of freedom: three equations for translation (longitudinal, lateral, pumping) and three for rotation (roll, pitch, yaw moment); these equations are developed using Lagrange methods. In the step-steer maneuver two physical quantities may be used to estimate the dynamic response of the vehicle: the yaw rate and the sideslip angle. The yaw rate is the variation ratio of the yaw angle, which is the angle around the vertical axis of the vehicle. The sideslip angle is the angle between the direction of the speed vector and the longitudinal axis of the vehicle. Figures 3 and 4 show the typical behavior of yaw rate and sideslip angle during a ISO 7401:2003 maneuver.

Figure 1: Conceptual Vehicle model

3.1.Input maneuvers A maneuver is a set of inputs that a driver can apply to the vehicle; a maneuver is basically composed of steer angle, brake and gas pedal waveforms. The standard parameters of the maneuvers are specified by International Organization of Standardization (ISO). In the experiments presented in this paper, we focused on a step-steer maneuver (ISO 7401:2003), which consists of a rapid steering operation (0° to 100° in 0.25s starting at 2.0s) with the vehicle at full speed (100Km/h). Figure 2 shows the trend of the steer angle during the ISO 7401:2003 maneuver.

Figure 3: Yaw rate trend

Figure 4: Sideslip angle trend

3.3.Electronic strategies

Figure 2: Steer angle trend

Paper 47.2 1334

In the step-steer maneuver, the ABS system remains idle since no braking action is required. On the other hand, the VDC should compensate the under- or oversteering of the vehicle, keeping it on the ideal trajectory, by means of selective braking pulses on each of the wheels. The VDC block adopted in this work describes a control strategy developed in [6]. It uses several input

signals (steer angle, yaw rate signal, etc.) and generates four braking forces to be applied at the four wheels. This strategy follows a yaw rate ideal target that guarantees vehicle stability. The analysis described in this paper focuses on vehicle behavior and performance when VDC input signals are corrupted due to network faults, as detailed in the following subsections. Figure 5: Network model

3.4.Network model The current practice in vehicle design is to develop functional vehicle performance models that abstract from the presence of a communication network. On the other hand, industrial experience has shown that many reliability concerns stem from the presence of a network architecture. For these reasons, we developed a specific network modeling strategy. Many modeling levels of abstraction can be chosen, with different characteristics and precision. In particular, for network communications we may have: • Bit-accurate models [1] that reproduce the actual clock cycles in the network interfaces. • Packet-accurate models that represent network packets as atomic entities and abstract about detailed timings, while still explicitly representing the protocol implementation. • Functional models that model the effects of the network on the real-time analog representation of the signals transported by the network. At each level, all the features of the network (delays, collisions, priorities, errors, etc.) should be modeled, although in different ways and with different levels of detail. The characteristics of the developed model are: • Vehicle oriented: this network model describes communication structures at the vehicle layer, giving more attention to functional aspects of the CAN network. • Configurable: this model allows configuring the network protocols (e.g., CAN [16], [17], TTP [18]) and speed by setting properties of the blocks. Modeling a shared communication medium, such as the CAN network, in an intrinsically point-to-point environment such as Simulink requires some carefully chosen tradeoff. Our network model (Fig. 5) is composed of different communication lines, each modeling the transfer of one variable, and one supervision block (called CAN Manager) that triggers the transfer of the suitable variables at the right time, according to the CAN protocol strategy and managing network collisions by delaying the signals with lower priority.

The core of the network model is the CAN Manager. This blocks contains the Matlab™ instructions to emulate in software the CAN protocol, handling a priority queue the next transmission times for all variables. Figure 6 shows the CAN Manager Simulink™ integration.

Function enable enable

CAN Manager

Next time to transmit Next activation time

Sample enables

Figure 6: CAN Manager At the appropriate transmission time, the enable signal wakes up the function that computes the transmission and manages the collisions. The outputs of the block are: • The set of enable signals, which activate the respective line of communication. Only one line of communication is activated each time, because only one ECU can use the bus. • The next activation time value: it provides the next time of function activation, corresponding to the next packet transmission instant.

4. Fault Injection Strategy The reliability evaluation approach we proposed is based on a Fault Injection System able to execute simulation campaigns and analyze the effects of faults on vehicle performance. In particular, we want to analyze the effects of transient faults in the electronic components, and in particular in the CAN network, on the overall behavior of the vehicle during the execution of a critical maneuver. To evaluate the vehicle response, we define some performance indicators that describe the quality of the resulting vehicle trajectory. Fault injection aims

Paper 47.2 1335

at computing these performance indicators in the presence of faults: the measured deviation of the indicators over the nominal value gives a quantitative indication of vehicle performance loss. In some cases, performance loss is so high that the vehicle loses stability. The outcome of the experiments is therefore a classification of possible faults according to the systemlevel implications in term of vehicle performance and driver safety. In particular, by analyzing the characteristics of the faults causing major performance loss, we may define intervention thresholds to be adopted by ECUs, and we may design alternative backup strategies to regain vehicle control. The fault injection system architecture is shown in figure 7. In particular, the vehicle model described in the previous section is wrapped in a set of Matlab functions that define the faults to be injected and that control the execution of the simulation experiments. Further details about the implementation of the Network Model and the techniques for injecting faults can be found in [4]. The simulation model is also instrumented with a Threshold Manager block, that constantly monitors performance indicators. Maneuver

Fault list

1

2

3 3

4 4

Fault Manager

Vehicle

Performance Indexes

Fault

5

Waveforms

Network Performance Analysis

SIMULINK Tresholds Values

RESULTS

Threshold Manager MATLAB FAULT INJECTION

Figure 7: Fault injection system architecture

4.1.Performance Indicators For each possible maneuver, and according to the type of injected faults, some particular waveforms can be used as indicators of the performance of the vehicle. In particular, we define some specific points on these waveforms (usually maximum or minimum values or inter-peak distances) called performance indicators. By evaluating performance indicators we quantify (comparing the obtained results with the nominal case) the amount of vehicle loss in terms of dynamic performance degradation.

Paper 47.2 1336

Figure 8: Performance indicators Figure 8 outlines the performance indicators relevant to the step-steer maneuver. The adopted indexes are: 1. The maximum value of vehicle yaw-rate 2. The maximum value of vehicle sideslip angle 3. The minimum value of vehicle sideslip angle 4. The maximum value of vehicle lateral acceleration.

Increasing maximum values (1,2,4) or decreasing minimum ones (3) means reducing the vehicle stability during the maneuver.

4.2.Fault Model When working at the system level, the most significant faults to analyze are the ones affecting the integration of different subsystems, i.e., the ones visible on the communication network. In fact, while component-level or subsystem-level faults are usually best dealt with at lower abstraction levels, and reliability problems can be solved by component or subsystem suppliers, some vehicle-level safety implications appear only when the entire system is considered. The industrial experience confirms this fact, as unexpected issues are often detected only after the integration phase. In particular, modeling faults on the communication network may account for different physical defects or malfunctioning conditions: • Disconnection: when an ECU is disconnected from the bus due to labile physical connections, the signals it computes cannot be received by other ECUs, and all receivers use the previous values as input. This case, when transient, corresponds to a Burst loss. • Packet loss: when (due to transient faults like bitflips) a fault causes a CRC code invalidation, the receivers trash the packet and use the previous correct value as reference. • Collisions: when a collision occurs, the ECU with lower priority must wait to retransmit the packet; this means that the ECU signal is delayed, thus introducing a certain amount of delay jitter. • Hardware failure: when a hardware component or sensor fails, its effect can be modeled by the suppression of the relevant information from the network, or by the generation of “error” or “invalid data” packets. In particular, the network model and fault injection system are able to model a transient selective packet suppression fault, i.e., the suppression of a subset of packet types (the subsets are selected according to the actual vehicle cabling and architecture) for a given time interval, ranging from a single packet to a set of consecutive ones. The parameters of this fault model (subset of signals, start time, duration) can be tuned to model all the cited physical faults.

4.3.Threshold Manager

manager [5]. This is a Simulink™ block that monitors the vehicle dynamic performance indicators during the simulations and activates flags when thresholds are violated. Two kinds of signal thresholds are implemented: • Warning thresholds: it corresponding to the case in which a signal overcomes a warning level. A warning condition means that the vehicle is in a critical state in terms of stability, but active systems succeed to maintain the vehicle stable. • Error thresholds: when a signal overcomes the error threshold, the vehicle irremediably loses stability. System safety requires that no fault ever overcomes the error thresholds. When an error threshold is activated, the simulation is immediately stopped. At the and of each simulation, the statistics about active warning or error indicators are stored.

5. Experimental results To show the advantages and limitations of the proposed reliability evaluation methodology and the effectiveness of the prototypical fault injection system, we will report a simple case study based on the vehicle model described in Section 3. The scenario of the implemented case study is a twobranched automotive network shown in figure 9. The first branch (B1) contains the VDC ECU, while the second (B2) links the yaw rate sensor and the steer angle sensor. We set up a fault injection campaign to evaluate the effects of faults affecting the connector C1, whose disconnection blocks the yaw rate and the steer angle information, which are necessary inputs for the VDC strategy. We generated a fault list by varying the start time of faults from 1.0s (the beginning of the steering maneuver) to 2.3s (the end of the transient response) in steps of 0.01s, and the duration from 0.01s to 0.1s in steps of 0.01 seconds, generating 1310 faults. The fault injection time for this fault population corresponds to about 2 days on a consumer PC. ECU 1

…

VDC B1

C1

B2 YAW Sensor

Steer Sensor

Figure 9: Fault injection scenario

To evaluate the simulations results, the research presented in this paper uses an on-line threshold

Paper 47.2 1337

deviation from the fault-free simulation. For industrial confidentiality reasons, numerical performance values and threshold values are omitted. The charts show an overall 3D view (first) as well as projections on the start time-performance loss (second) and durationperformance loss (third) planes. Such diagrams give precise and detailed information about vehicle behavior and trajectory subject to the injected faults. However, for a large number of faults and a large number of maneuvers and performance indicators, more synthetic information is recommended. Such synthetic information may be given by statistics about warning and error threshold activations. Table 1 reports the number of errors or warnings occurred during the fault injection campaign. Every single simulation can activate one or more Warning Thresholds, but these activations do not stop the simulation because the behavior of the vehicle is still acceptable. The activation of a single Error Threshold stops the simulation, being the vehicle behavior definitively compromised. Warnings

Errors

MAX Sideslip angle

90 57

25 42

MIN Sideslip angle

4

0

MAX Lateral acceleration

0

0

MAX Yaw_rate

Table 1: Statistics of fault injection campaign Table 2 shows the number of error flag activations, the number of simultaneous warning flags activation and the number of correct executions. The error activations are a subset of the warning activations: the activation of an error threshold involves (for the same performance indicator) the activation of the warning threshold, but a single simulation can activate one or more warning thresholds from different performance indicators without overcoming error thresholds. Correct Executions

1 Warning

2 Warnings

3 Warnings

Errors

1163 143 4 0 67 Table 2: Statistics of errors/warnings occurrences

Figure 10: Maximum peak of yaw rate (indicator no. 1) The charts shown in figure 10 represent the trend of the performance loss computed for the maximum peak of yaw rate (performance indicator no. 1), were the xaxis is the start time of fault, the y-axis is the fault duration and the z-axis is the percentage performance

Paper 47.2 1338

By analyzing the error occurrences, and in particular by identifying the common characteristics (location, time, duration) of the fault that causes these errors, vehicle designers may infer safety specifications that should be implemented as part of the on-line diagnostic strategies of the ECUs. When such situations are detected, the involved ECUs should switch to alternative control strategies (not using the faulty signals) to maintain control of the vehicle.

6. Conclusions In this paper we described a methodology to evaluate the fault effects in terms of vehicle performance loss. We designed and used a complete functional vehicle model composed of mechanical and electronic blocks and a model of the vehicle network based on the CAN protocol. To implement this functional vehicle model, we adopted the typical automotive modeling tool, Matab™/Simulink™, able to describe both the vehicle physical equations and the electronic components. To evaluate the vehicle dynamic performance loss caused by network faults, we resorted to a simulation-based fault injection approach. To quantify the amount of the performance degradation, some performance indicators are defined and some thresholds are defined, aiming at evaluating the deviation in terms of vehicle behavior. The presented approach allows system engineers to effectively link component-level defects with systemlevel effects, giving designers new tools to design and optimize future-generation automotive systems. For example, this methodology can be used by vehicle designers to define safety specifications and to design recovery procedures. Currently, we are working towards optimizing simulation times and integrating the fault injection environment to hardware-in-the-loop simulators, that should guarantee faster and more accurate computations.

7. References [1] K. Abouda et al., “FPGA implementation using Renoir tools: application for bit timing logic (BTL) synthesis of controller area network with 100% free error”, International Conference on Microelectronics, 1998, pp. 29-32 [2] R. K. Iyer and D. Tang, “Experimental Analysis of Computer System Dependability”, in Fault-Tolerant Computer System Design, D. K. Pradhan (ed.), 1996, Prentice Hall [3] Y. Yu and B. W. Johnson, “Fault Injection Techniques”, in Fault Injection Techniques and Tools for Embedded Systems Reliability Evaluation, A. Benso and P. Prinetto (ed.), 2003, Kluwer Academic Publisher [4] F. Corno, P. Gabrielli, S. Tosato, “System level Analysis of Fault Effect in an Automotive Environment”, 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI System, 2003, pp. 529-536 [5] F. Corno, P. Gabrielli, S. Tosato “Relating vehiclelevel and network-level reliability through highlevel fault injection”, IEEE International High Level

Design Validation and Test Workshop, 2003, pp. 71-76 [6] M. Velardocchia, A. Sorniotti, “Vehicle Dynamics Control (VDC) and Active Roll Control (ARC) Integration to Improve Handling and Comfort”, Proc. of International Conference on Vehicle and Systems, 2002 [7] Z. Segall, D. Vrsalovic, D. Siewiorek, D. Yaskin, J. Kownacki, J. Barton, R. Dancey, A. Robinson, T. Lin, “FIAT - Fault Injection based Automated Testing environment”, Proc. 18th IEEE Int. Symposium on Fault-Tolerant Computing, 1988, pp. 102-107 [8] G. A. Kanawati, N. A. Kanawati, J. A. Abraham, “FERRARI: A tool for the validation of system dependability properties”, Proc. 22nd IEEE Int. Symposium on Fault-Tolerant Computing, 1992, pp. 336-344 [9] K. G. Shin, “HARTS: a distributed real-time architecture”, IEEE Computer 24(5), 25, 1991, pp25-35 [10] R. Hexel, “FITS - A Fault Injection Architecture for Time-Triggered Systems, Proc. in Research and Practice in Information Technology”, Proc. Twentysix Australian Computer Science Conference, Vol. 16, 2003, pp. 333-338 [11] E. Jenn, J. Arlat, M. Rimen, J. Ohlsson, J. Karlsson, “Fault Injection into VHDL Models: the MEFISTO Tool”, Proc. IEEE FTCS-24, 1994, pp. 66-75 [12] A. Ademaj “A Methodology for Dependability Evaluation of the time-Triggered Architecture Using Software Implemented Fault Injection”, Proc. Fourth European Dependabe Computing Conference, EDCC-4, October 2002, [13] A. Ademaj & Al. “Evaluation of Fault Handling of the Time-Triggered Architecture with bus and Star Topology”, Proc. IEEE International Conference on Dependable Systems and Networks, 2003, pp. 123132 [14] T. Ringler, J. Stainer, “Increasing System Safety for by-wire Application in Vehicle by using a Time Triggered Architecture”, Proc. SAFECOMP, 17th Int. Conf. on Computer Safety, Reliability and Security, 1998 [15] J. Perez, M. Sonza Reorda, M. Violante, “Accurate Dependability Analysis of CAN-based Networked Systems”, 16th Symposium on Integrated Circuits and System Design, 2003, pp.337- 342 [16] Can in Automation (CiA) at http://www.cancia.de/can/. [17] Bosch’s Controller Area Network Homepage at http://www.can.bosch.com/. [18] The ttp protocol at http://www.ttagroup.org/ttp/easy_to_read.htm

Paper 47.2 1339