Fault Diagnosis in Automotive Electric Power Generation and Storage ...

1

Fault Diagnosis in Automotive Electric Power Generation and Storage System (EPGS) Anuradha Kodali, Yilu Zhang, Senior Member, IEEE, Chaitanya Sankavaram, Member, IEEE, Krishna Pattipati, Fellow, IEEE, and Mutasim Salman, Senior Member, IEEE

Abstract—In this paper, we present an initial study to develop fault detection and isolation (FDI) techniques for the vehicle systems that are controlled by a network of electronic control units (ECUs). The root causes of the faults include hardware components such as actuators, software within the controllers (ECUs), or the interactions between hardware and software, i.e., between controllers and plants. These faults, originated from various interactions and especially between hardware and software, are particularly challenging to address. Our basic strategy is to divide the fault universe of the cyber-physical system in a hierarchical manner, and monitor the critical variables/signals that have impact at different levels of interactions. Diagnostic matrix is established to represent the relationship between the faults and the test outcomes (also known as fault signatures). A factorial hidden Markov model (FHMM) based inference algorithm, termed dynamic multiple fault diagnosis (DMFD), is used to infer the root causes based on the observed test outcomes. The proposed diagnostic strategy is validated on an electrical power generation and storage system (EPGS) controlled by two ECUs in an environment with CANoe/MATLAB co-simulation. Eleven faults are injected with the failures originating in actuator hardware, sensor, controller hardware and software components (sensor faults are not considered in this paper). The simulation results show that the proposed diagnostic strategy is effective in addressing the interaction-caused faults. Index Terms—Cyber-physical systems, automotive systems, electrical power generation and storage system, electronic control units, CANoe, fault diagnosis, dynamic multiple fault diagnosis, hierarchical tests, fault modeling, test design, sympathetic faults.

I. I NTRODUCTION he automobile is one of the most widely distributed cyber-physical systems. Indeed, today’s high-end vehicles already contain more than 70 distributed microcontrollers (also termed electronic control units (ECUs)) with 100 MegaBytes of code, 5 or more distinct communication networks, and have several thousand data and control signals being exchanged in real-time every second. In the future, this level of complexity is expected to migrate to all ranges of vehicles. The ECUs in modern vehicles perform a variety

T

The study reported in this paper was supported by General Motors. Any opinions expressed in this paper are solely those of the authors and do not represent those of the sponsor. Anuradha Kodali, Chaitanya Sankavaram, and Krishna Pattipati are with Electrical and Computer Engineering Department, University of Connecticut, Storrs, CT 06269-2157, USA (phone: 860-486-2890; fax: 860-486-5585; email: [email protected], [email protected]). Yilu Zhang, and Mutasim Salman are with GM, 30500 Mound Road Warren, MI, USA.

of cyber-physical functions, including stability control, remote monitoring (e.g., via OnStar), energy-efficient propulsion (e.g., battery charge control), navigation with real-time traffic, adaptive cruise control, by-wire steering and braking, keyless entry with push button start, side blind zone detection, lane departure warning, autonomous driving, to name a few [1]. Approximately 80-90% of these vehicle innovations are based on software-embedded systems, and this has resulted in an increase in the number of interactions (coupling) among heterogeneous subsystems. Thus, in order to ensure vehicle performance, one has to be cognizant of the complex interactions among faults in physical devices (e.g., sensors, batteries, motors) [2]-[5] and software algorithms running on embedded processors [6][7], as well as sensitivity of electrical systems to power quality problems [8]. In this paper, we develop a hierarchical strategy to diagnose faults in an Electric Power Generation and Storage (EPGS) system [9] controlled by two electronic control units (ECUs). The EPGS system provides electrical power in an automotive vehicle to meet the electrical load requirements [10]. To do this, the alternator generates voltage according to the set-point computed by the ECUs. The set-point is computed in the ECUs based on the load requirement and estimated battery (secondary power source) state. These systems set up the network of embedded system platform, where EPGS interacts with the two ECUs, viz., engine control module (ECM) and body control module (BCM). The two ECUs communicate through messages (containing one or more signals) via CANbus. In order to isolate equipment faults, we simulated the EPGS using CANoe/SIMULINK co-simulation, where each system is modeled independently in SIMULINK and the communication is enabled through the CANbus in CANoe [11][12]. The SIMULINK models are compiled using CANoe target file and the corresponding “.dll” files are uploaded to the CANoe environment. This environment allowed us to inject faults, evaluate the operational and faulty behavior of the system by monitoring the residuals and infer the failed components. The faults in the interacting EPGS-ECU system range from issues that affect a single subsystem (either hardware or software), to issues that occur only as a result of interaction among multiple subsystems, e.g., interactions between plant (EPGS) and controller (ECUs) via a communication channel (CANbus). These interaction-related faults may be HW/HW, SW/SW, SW/HW or HW/SW cause and effect (i.e., the hardware error causing the software error; or software

2

error resulting from a hardware error) or symptomatic (i.e., both the hardware error and the software error are symptoms of each other) [13]-[16]. When a fault in one system is symptomatic to another, it is important to isolate the root cause to avoid having to replace multiple components. For example, when the set-point voltage is mis-calibrated due to a software bug/error in the ECU, the hardware (alternator) may provide insufficient power to the loads. Here, the diagnosis should be directed to the software alone. These anomalous interactions between the hardware and software components make the problem of manually identifying the root cause cumbersome. In addition, the communication channel can introduce faults by losing packets due to heavy traffic; these manifest as signal delays. The symptoms of communication faults are similar to hardware and software faults. The check routines required to isolate these faults are not implemented in this paper and will be pursued in our future research.

We discuss the following steps of our diagnostic strategy in this paper (shown in Fig. 1). Firstly, we identify the potential failure modes associated with software, hardware and HW/SW interfaces in the EPGS-ECM-BCM system. Secondly, the monitoring mechanisms are designed to detect the anomalies. Then, the functional dependencies between the failure modes and tests are generated via CANoe/MATLAB simulations. Finally, the dynamic multiple fault diagnosis (DMFD) algorithm [19] is applied to the simulated data to isolate the faults based on the observed test outcomes over time; these are gleaned from the fault injection experiments. The paper is organized as follows. In section II, we present a brief description of our target cyber-physical system comprised of EPGS, ECM, and BCM. The experimental setup is provided in section III. The faults are injected via simulations and their relationship with the tests is defined via CANoe/SIMULINK environment. Results from computational experiments are presented in Section IV to evaluate the performance of the inference algorithm (primal-dual optimization framework for DMFD in [19]), and we conclude the paper in Section V with a summary. This paper primarily focuses on the idea of setting up the experimental platform to study the hardware-software interaction faults.

block diagrams for fault configurations)

Network → Automotive functions → components

List failure causes/modes (Failure rates, fault states and dynamics)

Example system Design tests (trouble codes, symptoms, features, onboard tests/messages, troubleshooting tests, ….)

Fault dictionary/D-matrix (Cause-effect model)

Inference algorithms (isolation based on observed test outcomes)

Fig. 1.

Flowchart of the design process Engine

The fault diagnostic system, usually defined by the faulttest relationships via a diagnostic matrix [17], needs to be established for the complex interactions between the EPGS and the ECUs. While tests for diagnosing hardware (electrical/mechanical) and software faults (design bugs/errors, invalid function calls, and program checks) independently are extensively discussed in the literature over the years, extra tests need to be incorporated for the diagnosis of interaction-related faults. These tests need to capture the failing symptoms at each-level of interaction by analyzing the corresponding input and output so that the root-cause alone is correctly isolated. For example, the symptoms for an erroneous set-point voltage are similar, even though the corruption originates in either BCM or ECM or EPGS. So, tests should be designed at all these interaction levels for proper isolation. Finally, the faulttest dependency models are used to infer incipient problems in these systems [17][18].

System modeling (Functional

Decompose vehicle failure sources

Belt Alternator

ECM BCM

Fig. 2.

Battery

Electric loads

Illustration of the automotive EPGS system

II. S YSTEM D ESCRIPTION The EPGS system illustrated in Fig. 2 includes the EPGS plant, the ECM, and the BCM. The EPGS plant involves a drive belt, alternator, and a battery to provide the necessary power to the electrical loads of the vehicle, such as lights, fans, etc. The drive belt connects the engine to the alternator; wherein the alternator converts the mechanical energy into electrical energy. The alternator assembly contains not only the brush synchronous machine, but also internal circuits such as the output rectifier and the voltage regulator, and the electric and control connection ports such as the L-terminal. The Lterminal transfers the control signal of the set-point voltage from the ECM to the alternator. At the same time, when the alternator self-diagnostic function detects a fault, the Lterminal will be pulled to a low voltage in order to indicate the faults in the ECM. The alternator supplies power to the electrical loads as well as to charge the battery. Battery acts as a secondary power source when the primary source, viz., the alternator, is unavailable or the load power exceeds the power generated by the alternator. In particular, during ignition off, the battery provides power to crank the engine, and sustain the quiescent loads, such as the antitheft system. During ignition on, the battery stabilizes the system voltage, and assists the alternator when excessive loads come up, such as electric power steering. The ECM and the BCM are the control modules that monitor the EPGS system’s working condition, and manage its operation. In particular, the BCM estimates the battery status, such as the state of charge based on battery temperature, voltage, and current. It, in turn, performs the Regulated Voltage Control (RVC) function to determine the alternator set-point voltage, as well as request engine idle speed boosting and

3

Set-point voltage

EPGS System (Plant Model)

Physical connection Sensor Tests

Battery voltage

Engine temperature

Electronic control module (ECM)

Body control module (BCM)

CAN bus

SOC test

Message 2 Signal 1: Set-point voltage Signal 2: Fuel mode Temperature range test Set-point test

Experimental set-up with CANoe/SIMULINK co-simulation

System Hardware (plant) faults

Controller faults

Communication faults

ECU faults

ECU 1

ECU k

Circuit faults

ECU m

Software logic faults Bugs

Errors

Component-level diagnosis

Exact RootCause

Exact RootCause Exact RootCause

Fig. 4.

Battery current

Message 1 Signal: Engine temperature

Temperature range test Set-point test

Fig. 3.

Broken cable test Belt test Regulator test Voltage test Current test: Low and high

Exact RootCause

The hierarchical diagnostic test scheme

electrical load shedding when necessary. The RVC control signals are transferred through the vehicle CANbus to the ECM, which controls the alternator through its L-terminal. III. E XPERIMENTAL M ETHODOLOGY The experimental platform is based on CANoe/SIMULINK co-simulation of the system and is shown in Fig. 3. The EPGS, ECM, and BCM are modeled independently in SIMULINK, and then the compiled models are uploaded into CANoe. The BCM node takes the sensor inputs, such as battery voltage and current, from the EPGS plant node, implements the RVC control logic, and sends control signals, set-point voltage and fuel mode to the ECM node via CANbus messages (Message 2 in Fig. 3) in CANoe. The messages are the packed signal information carriers. The ECM node collects and transmits sensor information, such as engine temperature and engine RPM, to the BCM node, receives the messages containing RVC control signals from the BCM, and sends control signals to the EPGS plant node. The tests should be implemented in the ECUs, but for convenience and clarity, the tests are shown at relevant locations as shown in Fig. 3. Conventionally, the diagnostic schemes for individual hardware or software faults are designed in isolation, i.e., they are locally optimized. For example, in [20], a model-based method

using machine learning approach is applied to detect hardware faults alone in a power inverter system. Also, another approach using bond graphs is implemented to diagnose hardware faults in an electro-mechanical test bench [21]. Similarly, Carrozza et al. proposed software fault diagnosis methods for system recovery [22]. Here, the software faults are caused by crash, hang, and/or workload failures. However, these unconnected strategies for hardware and software may not always be adequate when multiple systems (hardware and software) interact with each other. In [23], Yi et al. discussed hardware and software interaction faults in a temperature control system. Here, the hardware architecture operates incorrectly due to a problem in the software or vice versa. Because of the closed-loop structure, establishing the primary root-cause becomes difficult when the symptoms are physically observed elsewhere. Similarly, for the example in our paper, the ECU faults may lead to a shutdown of the EPGS system, while the symptoms may be very similar to the EPGS plant faults. In order to accurately isolate the root cause, we developed a strategy that monitors both global variables having system level impact, and local variables which are mainly used to isolate the root-causes within a sub-system. Here, the set-point voltage is one of the global variables that is transmitted from/to a sub-system (BCM−→ECM−→EPGS). The monitoring of this variable at different layers of its transmission enables the bifurcation at the sub-system level, i.e., the EPGS plant or the BCM or the ECM. Similarly, monitoring battery voltage (local variable) at the plant level facilitates diagnosis of the sensor or battery problem. This consideration led us to a 4- level diagnostic test scheme (Fig. 4) wherein the first level infers whether it is a plant fault or the ECU/communication fault. This is done by monitoring the global variable, i.e., set-point voltage at ECM and EPGS system. In addition, plant hardware tests determine if the fault has originated from ECM/BCM or EPGS. The second level tests monitor the variables to distinguish faults between the two ECUs and the communication channel. In order to distinguish communication faults from the ECU faults, the messages that are transmitted via CANbus are monitored. The input and output of the channel are tested for any lost or corrupted signals within the message packets. This is done by examining messages 1 and 2 as shown in Fig. 3 before transmitting and after receiving at the BCM and the ECM

4

TABLE I L IST OF FAULTS Fault Broken cable Belt slip Voltage regulator Temp. fault due to Temp. out of ECM/sensor range Temp. fault due to BCM software/communication Battery current High sensor Low Battery voltage sensor: high or low BCM software/error in the Set-point logic voltage out ECM software/communication of range fault Initial SOC estimation fault

TABLE II L IST OF TESTS Fault number s1 s2 s3 s4 s5 s6 s7 s8 s9 s10

Test Broken cable: battery current < -20A for 10 seconds Belt slip: estimated pulley ratio < 3 Voltage regulator: |Battery voltage-Set-point voltage| < 1V Temp. range test: Temp. ∈ / [-30 ECM sensor test 40 C] BCM sensor test Current range test: Current ∈ / [-60 Too low - 100 A] Too high Voltage range test: Voltage ∈ / [9 Too low 16 V] Too high Test in BCM Set-point voltage test1 Test in ECM SOC range test: Estimated SOC ∈ / [40 - 100%]

Test number T1 T2 T3 T4.1 T4.2 T5.1 T5.2 T6.1 T6.2 T7.1 T7.2 T82

s11

sub-systems. Next, to isolate the faults between the ECUs, we monitor set-point voltage again at BCM in addition to the test performed at ECM earlier. Similarly, tests monitoring engine temperature at BCM and ECM systems are used to identify faults originating from either of the two systems. Thus, second level diagnosis enables one to determine if the fault has originated from the EPGS, and/or the ECM, and/or the BCM. Further down, the third level tests deal with the individual component’s inputs and outputs within each system to diagnose faults at that level. This also includes isolating the circuit and software faults in the ECUs. Here, we can implement the individual hardware and software diagnostic strategies discussed in the literature. At the lowest level, we isolate the failure modes at the sub-component level. In this paper, we have implemented tests up to level 3, which enables the fault isolation between plant hardware and multiple ECUs, some plant hardware components, and some ECU software faults. Thus, this hierarchical test set-up determines the isolation level depending on the tests designed. Our process involves the following four steps. A. Enumerate Failure Modes In this step, the failure modes for HW, SW, and interface faults are identified based on the existing EPGS and ECU models, and the operational information. The list of faults is enumerated in Table I. The faults s1 -s3 , s6 -s8 are the component-level hardware plant (level-3) faults. The faults s9 and s10 are the global faults, i.e., they impact the entire system. Both of them are related to the set-point voltage. It is important to distinguish whether the fault occurs due to a miscalculation in the BCM, or mis-communication between the ECM and the BCM, or the ECM sensor fault. Further, if the outputs of the EPGS plant, viz., battery voltage and current sensors, are wrong, then the fault originating in the EPGS plant propagates to the ECUs. This set-point is very important to distinguish faults at level-1. Adequate tests have to be designed at each system level to diagnose the root-cause(s). The faults s4 and s5 are temperature-related faults. The engine in-take temperature is read by the ECM via its sensor and communicated to the BCM,

where it is subsequently used to estimate battery temperature and to calculate the set-point. The root causes of these two faults originate from different ECUs. Thus, if we can isolate these faults, we can make diagnosis at level-2 of our four-level diagnosis hierarchy. The fault s11 is a software logic fault, wherein the initial SOC estimate is incorrect. This fault, in turn, may lead to incorrect charging control. For example, if the initial SOC is estimated to be much higher than the true value, RVC will command the alternator not to charge the battery. Overtime, the actual battery SOC will run down to a level that it cannot start the vehicle. The challenge in detecting this fault is that its symptom exhibits a long delay, possibly over multiple ignition cycles. B. Test Design As discussed above, tests are required to detect faults at different levels of our diagnostic schema. The detection logic for various tests is listed in Table II. Note that the values used in the logic are for simulation purposes. They should be calibrated for the real systems. Low-severity degradation faults are not considered here. Here, tests T1 -T3 , T5 -T6 are the hardware component-level tests, whereas T8 is the test implemented to detect software faults in the ECUs (level-3). Test T7 detects faults at levels 1 and 2, whereas T4 distinguishes between the root-causes originating in the ECM and the BCM (level 3). The level 2 tests used to distinguish between the ECU and communication faults are not designed (restricted due to off-line performance of SIMULINK compiled models). Also, we limit ourselves to identifying faulty ECUs, rather than isolating the specific failure modes (circuit faults, software errors or bugs) within the ECUs. C. Generate Fault-test Dependency Models using Experimental Set-up To generate the fault-test dependency matrix (D-matrix) [17], we need to inject the faults via fault simulations and economy mode & Vset > 13 V) || (Normal mode & Vset 0, k ∈ (1, K), j ∈ Of (k)} is the set of Lagrange multipliers. In (8), Lagrange multipliers λj (k) are non-negative despite equality constraints in (4) because the variables yj (k) need to be nonnegative. The dual of the primal DMFD problem discussed above can be written as min Λ

Q(Λ)

subject to Λ = {λj (k) > 0, k ∈ (1, K), j ∈ Of (k)} where the dual function Q(Λ) is defined by Q(Λ) = max L(X, Y, Λ)

(10)

X K ,Y K

Simplifying further by rearranging and combining the terms, we obtain the dual function as Q(Λ) = max XK

(4)

K ∑ m ∑

{ξi (xi (k), xi (k − 1), λj (k))} + wk (Λ)

k=1 i=1

(11)

i=1

Thus, the primal problem in terms of the new equality constraint can be written as, max J(X, Y ) = max

X K ,Y K

K ∑

X K ,Y K

fk (x (k), x (k − 1), − y (k)) (5) − −

+

cij xi (k) +

+

wk (Λ) = γ(k) + g(k) +

ln(1 − yj (k)) +

 cij λj (k) xi (k) (12)

oj ∈Of (k)

m ∑

∑

[λj (k) ln λj (k)−

∀oj ∈Of (k)

µi (k)xi (k)

(13)

(1 + λj (k)) ln(1 + λj (k)) − λj (k)ηj ]

i=1

oj ∈Of (k) m ∑

oj ∈Op (k)

and

m ∑

oj ∈Op (k) i=1

∑

ξi (xi (k), xi (k − 1), λj (k)) =  ∑ ∑ = cij + µi (k) −

+σi (k)xi (k − 1) + hi (k)xi (k)xi (k − 1)

fk (x (k), x (k − 1), − y (k)) = − − m ∑

where

k=1

fk (x (k), x (k − 1), − y (k)) is defined as − −

∑

(9)

σi (k)xi (k − 1)

(6)

i=1

In all the equations, we can replace P dij and P fij with P dj and P fj where P dij = eij P dj + (1 − eij )P fj

hi (k)xi (k)xi (k − 1) + γ(k) + g(k)

P fij = eij P fj + (1 − eij )P dj

(14)

i=1

where,

(

cij = ln

ηj =

m ∑

1 − P dij 1 − P fij

) , γ(k) =

∑ oj ∈Op (k)

(

ln(1 − P fij ), µi (k) = ln

i=1

(

σi (k) = ln ( hi (k) = ln

P vi (k) 1 − P ai (k)

ηj ,

) , g(k) =

P ai (k) 1 − P ai (k)

m ∑ i=1

(1 − P ai (k))(1 − P vi (k)) P ai (k)P vi (k)

) ,

ln(1 − P ai (k)) )

(7)

The dual problem posed in equations (9)-(11) decomposes the original DMFD problem into m separable subproblems, one for each fault state sequence. We optimize ξi function in (12) to obtain optimal state sequence x∗i for each fault state, given a fixed set of Lagrange multipliers Λ = {λj (k), k ∈ (1, K), j ∈ Of (k)} using the two-level coordinated solution framework for the DMFD problem. The Viterbi algorithm [26] solves each of the subproblems with computational complexity O(K) and a subgradient method is used to update the Lagrange multipliers. A detailed description of dynamic multiple fault diagnosis algorithms may be found in our previous work (see formulation 1 in [19]).

7

Alternator voltage (a) Engine speed

Alternator current

Battery current

Battery voltage Electric load current

(b) Current of alternator (Pink), battery (yellow), and electric load (blue)

Alternator set-point voltage

Battery voltage Alternator voltage

(c) Voltage of alternator (Pink), battery (yellow), and alternator set-point (blue)

Fig. 5.

Sample data from EPGS system simulation

Fig. 7. Alternator and battery voltages when faults s9 and s10 are injected (Pink: alternator voltage, and Blue: battery voltage). X-axis and y-axis range from 0 to 1250 and 11 to 16 respectively. TABLE V P ERFORMANCE METRICS OF THE SIMULATED MODEL

Alternator maximum current

Load current

Alternator current Battery current

Fig. 6. Alternator and battery currents when faults s9 and s10 are injected (Pink: alternator maximum current, Red: alternator current, Green: load current, and Blue: battery current). X-axis and y-axis range from 0 to 1250 and -20 to 140 respectively.

IV. S IMULATIONS AND R ESULTS In our simulations, the fault scenarios are generated via injection method, considering only single fault occurrences. The nominal operating conditions are shown in Fig. 5; each fault is injected based on a driving cycle adapted from the US FTP-75 drive cycle. The experimental CANoe/MATLAB set-up is used to simulate the fault scenarios as discussed in the previous section. The following fault scenarios are used to generate the data for each fault: 1) Case 1: Permanent fault injected at 1s 2) Case 2: Permanent fault injected at 201s 3) Case 3: Permanent fault injected at 501s 4) Case 4: Permanent fault injected at 801s 5) Case 5: Intermittent fault injected at 201s and ends at 500s 6) Case 6: Intermittent fault injected at 901s and ends at 1200s Thus, each fault is simulated for 6 runs, each of 1315 seconds duration. This resulted in 61 cases (only 1 case was simulated for s10 ). After injecting the fault, the test outcomes are collected and saved as a Mat-file for subsequent inference 3 The

fault is detected only with a drive cycle delay

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11

Average correct isolation rate (%) 99.7 ± 0.3 97.4 ± 6.4 91.8 ± 6.4 93.3 ± 12.7 91.6 ± 13 100 100 100 100 99.97 ± 0.04 1003

Average false isolation rate (%) 0.02 ± 0.03 0.4 ± 1 0.005 ± 0.004 1.2 ± 1.4 1.4 ± 1.5 0.006 ± 0.003 0.005 ± 0.003 0.006 ± 0.003 1.1 ± 2.2 0 0

using the DMFD algorithm. As an example, case 5 is shown in Fig. 6 and Fig. 7 for faults s9 or s10 where the fault is injected between 201s and 500s. These faults originate in BCM and ECM respectively, affecting the set- point voltage. Therefore, the erroneous set-point voltage irrespective of its origination impacts the performance (current and voltage) of the alternator and battery similarly (assuming same severity for both the faults) as shown in Figs. 6 and 7. These figures are obtained in MATLAB. The currents and voltages of the alternator and battery changed from the nominal value when the fault is injected (steep u-shape with sudden dip at 201s and rise at 500s in Figs. 6 and 7). The load and the alternator maximum currents are given for reference in Fig. 6. The tests are implemented in CANoe and the results are plotted in Figs. 8 and 9 for both the faults (only the affected tests are shown, all the other tests have passed). We can see that the fault s10 can fire both the tests T7.1 (BCM test), and T7.2 (ECM test) in Fig. 8 because of the communication between ECM and BCM as seen in Fig. 3 whereas s9 results in failing of test T7.2 (ECM test) alone in Fig. 9 as there is no backward correspondence. Other tests such as T1 (Cable test), and T3 (Regulator test) also fail occasionally; this symptom is associated with their false-alarms and dealt with in the inference process. The following metrics were employed to evaluate the performance of the inference process over all the simulation cases for all the faults [19]:

8

monitored at each stage to infer the root cause of faults. The diagnostic scheme is hierarchical; the higher levels isolate faults to different subsystems and the lower levels provide finer diagnosis to failure modes. The proposed scheme has been validated on an automotive subsystem through CANoe/MATLAB co-simulation. Presently, diagnostics at the subsystem level is deemed adequate. In our future research, this scheme would be exercised with vehicle level systems, where multiple networks interact. Also, we will include extensive communication faults and component-level hardware/software faults in our analysis by conducting our fault injection experiments in a hardwarein-the-loop setting.

BCM test

1

ECM test

0 1

Cable test

0 1

0

Regulator test

1

0 200

Fig. 8.

300

400

500

800

1000

1200

Test results in CANoe when fault s9 is injected

R EFERENCES

0 1

[1] N. Boules, “Reinventing the automobile: The cyber-physical challenge,” Embedded Systems to Cyber-Physical Systems: A Review of the Stateof-the-Art and Research Needs Workshop, St. Louis, MO, April 2008. [2] J. Luo, and K. R. Pattipati, “An integrated diagnostic development process for automotive engine control systems,” IEEE Transactions on Systems, Man, and Cybernetics - Part C, Vol. 37, No. 6, pp. 1163-1173, November 2007. [3] J. Luo, M. Namburu, K. R. Pattipati, L. Qiao, and S. Chigusa, “Integrated model-based and data-driven diagnosis of automotive anti-lock braking systems,” IEEE Systems, Man, and Cybernetics – Part A, Vol. 40, No. 2, pp. 321-336, November 2009. [4] J. Luo, K. R. Pattipati, L. Qiao, and S. Chigusa, “Model-based prognostic techniques applied to a suspension system,” IEEE Systems, Man, and Cybernetics – Part C, Vol. 38, No. 5, pp. 1156-1168, September 2008. [5] B. Pattipati, K. R. Pattipati, J. P. Christopherson, M. Namburu, D. V. Prokhorov, and L. Qiao, “Automotive battery management systems,” IEEE Conference on AUTOTESTCON, Salt Lake City, Utah, September 2008. [6] V. N. Malepati, H. Li, K. R. Pattipati, S. Deb, and A. Patterson-Hine, “Verification and validation of high integrity software generated by automatic code generators,” IEEE International Conference on Systems, Man, and Cybernetics, Vol. 3, pp. 3004-3009, October 1998. [7] A. Sung, and B. Choi, “An interaction testing technique between hardware and software in embedded systems,” Proceedings of the 9th Asia-Pacific Software Engineering Conference, 2002. [8] E. Nelson, and H. Huang, “A software and system modeling facility for vehicle environment interactions,” Model-Driven Development of Reliable Automotive Services, Lecture notes in Computer Science, SpringerVerlag Berlin, Heidelberg, 2008. [9] W. Lee, D. Choi, and M. Sunwoo, “Modeling and simulation of vehicle electric power system,” Journal of Power Sources, pp. 58-66, 2002. [10] Y. Zhang, S. Rajagopalan, and M. Salman, “A practical approach for belt slip detection in an automotive electrical power generation and storage system,” Proceedings of IEEE Aerospace Conference, Big Sky, Montana, March 2010. [11] Development of distributed systems and ECU test - datasheet for CANoe, www.vector.com. [12] K. Chaaban, and P. Leserf, “Simulation of a steer-by-wire system using FlexRay-based ECU network,” Advances in Computational Tools for Engineering Applications, pp. 21-26, July 2009. [13] Guidelines for failure modes and effects analysis for automotive, aerospace and general manufacturing industries, Dyadem Press, Published by CRC Press, 2003. [14] W. T. Becker, and R. J. Shipley, “ASM Hanbook: Failure Analysis and Prevention,” Published by ASM International, 2002. [15] M. Trapp, B. Schurmann, and T. Tetteroo, “Failure behavior analysis for reliable distributed embedded systems,” Proceedings of the International Parallel and Distributed Processing Symposium, 2002. [16] R. K. Iyer, and P. Velardi, “Hardware-related software errors: measurement and analysis,” IEEE Transactions on Software Engineering, Vol. 11, No. 2, pp. 223-231, February 1985. [17] J. Luo, H. Tu, K. R. Pattipati, L. Qiao, and S. Chigusa, “Diagnosis knowledge representation and inference,” IEEE Instrumentation and Measurement Magazine, Vol. 9, No. 4, pp. 45-52, August 2006. [18] C. Zhou, R. Kumar, and S. Jiang, “Hierarchical fault detection in embedded control software,” IEEE International Computer Software and Applications Conference, August 2008.

Cable test

ECM test

1

0

Regulator test

1

0 200

Fig. 9.

300

400

500

800

1000

1200

Test results in CANoe when fault s10 is injected

Average correct isolation rate (CI): CI =

K 1 ∑ |ˆ x(k) ∩ r(k)| K |r(k)|

(15)

k=1

Average false isolation rate (F I): FI =

K x(k) ∩ r(k)| 1 ∑ |ˆ K |S| − |r(k)|

(16)

k=1

where, x ˆ is the fault state set at epoch k output by the algorithm, and r is the true fault state set at epoch k. S represents the set of all the faults. Using the D-matrix in Table III and the reliabilities in Table IV, the DMFD algorithms inferred the faults with an average isolation accuracy of 97.4%. Fault-specific accuracies are given in Table V along with the false isolation rate. The voltage regulator fault s3 is detected by only one test T3 (T8 fires with a very long delay). This test also has low detection probability; thus the corresponding accuracy is less. The origination of faulty global variable, setpoint voltage is isolated accurately with superfluous symptoms (accuracies associated with s9 and s10 ). The software logic fault, s11 can be detected with 100% accuracy only in the second drive cycle. V. S UMMARY AND F UTURE W ORK Due to coupling of the plants and the ECUs in the automotive control system, the effects of faults in one subsystem propagate to other subsystems. The diagnostics of such a system requires new domain knowledge against the already established diagnosis architecture that addresses hardware and software individually. In this research, we have developed a multi-level scheme to diagnose hardware, software, and hardware/software interaction faults, in which the system is

9

[19] S. Singh, A. Kodali, K. Choi, K. R. Pattipati, S. M. Namburu, S. Chigusa, D. V. Prokhorov, and L. Qiao, “Dynamic multiple fault diagnosis: mathematical formulations and solution techniques,” IEEE Transactions on Systems, Man, and Cybernetics – Part A, Vol. 39, No. 1, pp. 160-176, January 2009. [20] Y. L. Murphey, M. A. Masrur, Z. Chen, and B. Zhang, “Model-based fault diagnosis in electric drives using machine learning,” IEEE/ASME Transactions on Mechatronics, Vol. 11, No. 3, pp. 290-303, June 2006. [21] M. A. Djeziri, R. Merzouki, B. O. Bouamama, and G. Dauphin-Tanguy, “Robust fault diagnosis by using bond graph approach,” IEEE/ASME Transactions on Mechatronics, Vol. 12, No. 6, pp. 599-611, December 2007. [22] G. Carrozza, and R. Natella, “A recovery-oriented approach for software fault diagnosis in complex critical systems,” International Journal of Adaptive, Resilient and Autonomic Systems, Vol. 2, No. 1, pp. 77-104, January-March 2011. [23] Z. Yi, X. Mu, L. Zhang, and X. Zhang, “Interactive software and hardware faults diagnosis based on negative selection algorithm,” IEEE International Conference on Networking, Sensing, and Control, pp. 433437, April 2008. [24] Z. Ghahramani, and M. I. Jordan, Factorial hidden Markov models, Machine Learning, Kluwer Academic Publishers, Boston, 1997. [25] D. Bertsekas, Nonlinear programming, Athena Scientific, Belmont, MA, USA, 2nd Edition, 2003. [26] D. Forney Jr., “The Viterbi algorithm,” Proceedings of IEEE, Vol. 61, pp. 268-273, March 1973.

Anuradha Kodali is presently a PhD student in Electrical and Computer Engineering at University of Connecticut. She received her B.E. in Electronics and Communications Engineering from Andhra University, India in 2006. Her research interests include data mining, pattern recognition, fault detection and diagnosis, and optimization theory.

Yilu Zhang (M’02-SM’08) received his B.S., and M.S. degrees in electrical engineering from Zhejiang University, China, in 1994, and 1997, respectively; and his Ph.D. degree in computer science from Michigan State University, East Lansing, Michigan in 2002. He joined General Motors Global R&D center at Warren, Michigan in 2002, and currently holds a position of Staff Researcher. His research interests include statistical pattern recognition, machine learning, signal processing, and their applications, including integrated vehicle health management, and human machine interactions. Dr. Zhang served as an Associate Editor of International Journal of Humanoid Robotics from 2003 to 2007, the Publication Chair for IEEE 8th International Conference on Development and Learning 2009, the Chair of Battery Management System Workshop in conjunction with PHM Society Annual Conference 2011. He has published 40+ referred technical papers. He has seven US patents and 20+ pending patent applications, of which most are in the area of vehicle diagnosis and prognosis. Dr. Zhang is a two-time recipient (2008 and 2010) of the ”Boss” Kettering Award, the highest technology award in General Motors, for his contribution in vehicle diagnostics technologies.

Chaitanya Sankavaram received her B.Tech. degree in Electrical and Electronics Engineering from Sri Venkateswara University (SVU), Tirupathi, India, in 2005. She then joined Wipro Technologies as a Project Engineer and worked for two years in Bangalore, India. She is currently working towards a Ph.D. degree in Electrical and Computer Engineering at the University of Connecticut, Storrs. Her research interests include cyber-physical systems, fault diagnosis and prognosis, reliability analysis, data mining, pattern recognition, and optimization theory.

Krishna Pattipati (S’77-M’80-SM’91-F’95) is a Professor of Electrical and Computer Engineering at the University of Connecticut, Storrs, CT, USA. He received the B.Tech degree in Electrical Engineering with highest honors from the Indian Institute of Technology, Kharagpur, in 1975, and the MS and Ph.D. degrees in Systems Engineering from the University of Connecticut in 1977 and 1980, respectively. From 1980-86 he was employed by ALPHATECH, Inc., Burlington, MA. Since 1986, he has been with the University of Connecticut, where he is currently the UTC Professor of Systems Engineering in the Department of Electrical and Computer Engineering. He has served as a consultant to Alphatech, Inc., Aptima, Inc., and IBM Research and Development. He is a cofounder of Qualtech Systems, Inc., a small business specializing in intelligent diagnostic software tools. His current research interests are in the areas of agile planning in dynamic and uncertain environments, diagnosis and prognosis techniques for complex system monitoring, and predictive analytics for threat detection. Dr. Pattipati has published over 400 articles, primarily in the application of systems theory and optimization (continuous and discrete) techniques to large-scale systems. Dr. Pattipati was selected by the IEEE Systems, Man, and Cybernetics Society as the Outstanding Young Engineer of 1984, and received the Centennial Key to the Future award. He was elected a Fellow of the IEEE in 1995 for his contributions to discrete-optimization algorithms for largescale systems and team decision making. Dr. Pattipati has served as the Editor-in-Chief of the IEEE Transactions on SMC: Part B- Cybernetics during 1998-2001, Vice-President for Technical Activities of the IEEE SMC Society (1998-1999), and as Vice-President for Conferences and Meetings of the IEEE SMC Society (2000-2001). He was co-recipient of the Andrew P. Sage award for the Best SMC Transactions Paper for 1999, Barry Carlton award for the Best AES Transactions Paper for 2000, the 2002 and 2008 NASA Space Act Awards for A Comprehensive Toolset for Model-based Health Monitoring and Diagnosis, the 2003 AAUP Research Excellence Award and the 2005 School of Engineering Teaching Excellence Award at the University of Connecticut. He also won the best technical paper awards at the 1985, 1990, 1994, 2002, 2004, 2005 and 2011 IEEE AUTOTEST Conferences, and at the 1997 and 2004 Command and Control Conferences.

10

Mutasim Salman (SM’ 95) is a Lab. Group manager and a Technical Fellow in the Electrical, Controls and Integration Lab. of GM Research and Development Center. He has the responsibility of development and validation of algorithms for state of health monitoring, diagnosis, prognosis and fault tolerant control of vehicle critical systems. Mutasim pioneered the work on integrated chassis control in the late eighties that led to the production of GM Industry First Stabilitrak1 and then to Stabilitrak3. He had an extensive experience in hybrid vehicle, modeling, control and energy management strategies. He has several GM awards that includes 4 GM prestigious Boss Kettering, 3 McCuen, and 2 President and Chairman Awards. Mutasim received his bachelors degree in Electrical Engineering from University of Texas at Austin; M.S. and PhD in Electrical Engineering with specialization in Systems and control from University of Illinois at UrbanaChampaign. He also has an Executive MBA. He holds 34 patents, 7 of them have already been used in products, and has coauthored more than 55 refereed technical publications and a book. He joined the GM R&D staff in 1984.