Structural Health Monitoring - Semantic Scholar

Structural Health Monitoring http://shm.sagepub.com

An Overview of Intelligent Fault Detection in Systems and Structures K. Worden and J. M. Dulieu-Barton Structural Health Monitoring 2004; 3; 85 DOI: 10.1177/1475921704041866 The online version of this article can be found at: http://shm.sagepub.com/cgi/content/abstract/3/1/85

Published by: http://www.sagepublications.com

Additional services and information for Structural Health Monitoring can be found at: Email Alerts: http://shm.sagepub.com/cgi/alerts Subscriptions: http://shm.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://shm.sagepub.com at PENNSYLVANIA STATE UNIV on April 16, 2008 © 2004 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

An Overview of Intelligent Fault Detection in Systems and Structures K. Worden1,* and J. M. Dulieu-Barton2 1

Department of Mechanical Engineering, University of Sheffield, Mappin Street, Sheffield S1 3JD 2 Department of Ship Science, School of Engineering Sciences, University of Southampton, Highfield, Southampton SO17 1BJ This paper describes a coherent strategy for intelligent fault detection. All of the features of the strategy are discussed in detail. These encompass: (i) a taxonomy for the relevant concepts, i.e. a precise definition of what constitutes a fault etc., (ii) a specification for operational evaluation which makes use of a hierarchical damage identification scheme, (iii) an approach to sensor prescription and optimisation and (iv) a data processing methodology based on a data fusion model. Keywords

intelligent fault detection operational evaluation sensor optimisation sensor fusion

1 Introduction Engineering structures and systems are designed to operate within limits specified by the environment in which they will be used. At the design stage, the loading of the structure is defined and appropriate material choices are made based on their properties. In many cases prototypes are built and stringent experimental testing is carried out. Increasingly, engineers rely on modelling techniques to obtain knowledge of the structures’ behaviour in the operating environment. Both of these experimental and numerical approaches contain approximations and assumptions. A discussion of these assumptions could be the topic of an entire paper; here it is sufficient to say that a true picture of the behaviour of a structure is not available until it is in-service. In many cases the solution to the questionable loading conditions is to overdesign by implementing very conservative factors of safety. This approach

produces structures that are heavy and costly, and because of deterioration over time, the structure will still damage. In situations where weight is critical, marginal factors of safety are used and as a result of unpredicted loads, the structure may fail or more often damage will be initiated. For this type of structure, a routine inspection schedule is devised so that any damage can be repaired; however this introduces a further cost. Therefore a means of monitoring and assessing structures and systems must be devised in order to ensure the safe operation of the structure. If reliable intelligent monitoring systems could be devised then structures could be designed to operate at the margin of safety without extended periods of inspection and in the case of components that are overdesigned, material costs would be significantly reduced. The type of health monitoring described above is in a sense ‘safety critical’, i.e. life by monitoring with no margins. Such systems are

*Author to whom correspondence should be addressed.

Copyright ß 2004 Sage Publications, Vol 3(1): 0085–98 [1475-9217 (200403) 3:1;85–98 10.1177/1475921704041866]

85


86

Structural Health Monitoring 3(1)

currently prohibited from incorporating ‘intelligence’ by regulatory bodies (such as the UK Civil Aviation Authority (CAA) and the US Federal Aviation Authority (FAA)). It is not a simple matter to overcome such objections, but the potential benefits of doing so, justify the effort. Having said this there are substantial benefits to be obtained from monitoring which addresses non-critical failures, where monitoring is used to convert unplanned to planned maintenance. This falls outside the scope of the safety critical regulations, which could greatly accelerate the acceptance of intelligent systems. In devising an intelligent fault detection system the primary consideration is an unambiguous definition of damage as a prerequisite to providing a unified approach to damage evaluation across all the engineering disciplines. This requires a taxonomy for the relevant concepts, i.e. a precise definition of what constitutes a fault, damage and defect. A specification for operational evaluation is an essential feature; the proposed approach makes use of a hierarchical damage identification scheme. An approach to sensor prescription and optimisation must also be defined; in the present paper issues such as placement and validation are covered. Finally a data processing methodology is required; in the current approach this is based on a data fusion model. Damage evaluation is relevant to all the engineering disciplines and throughout the paper examples will be cited that illustrate this. However, there are four key multidisciplinary areas for which monitoring and assessing damage are principal concerns: . . . .

Structural Health Monitoring (SHM) Condition Monitoring (CM) Non-Destructive Evaluation (NDE) Statistical Process Control (SPC)

Structural Health Monitoring is relevant to structures such as aircraft and buildings and implies a sensor network that monitors the behaviour of the structure on-line. Typical sensors are optical fibres, electrical resistance strain gauges or acoustic devices. Condition Monitoring (CM) is relevant to rotating and reciprocating machinery, such as used in manufacturing.

CM also uses on-line techniques that are often vibration-based and use accelerometers as sensors. Non-Destructive Evaluation (NDE) is usually carried out off-line after the damage has been located using on-line sensors. (There are exceptions to this rule; NDE is used as a monitoring tool for, for example, pressure vessels and rails.) NDE is therefore primarily used for characterisation and as a severity check when there is a priori knowledge of the location of the damage. Typical techniques include ultrasound, thermography and shearography. Statistical Process Control (SPC) is process based rather than structure based and uses a variety of sensors to monitor changes in the process. At first glance it may seem that the four areas of SHM, CM, NDE and SPC are entirely different because of their application range. In fact there are many similarities that should be explored together if a unified approach to fault detection is to be created. All of the techniques used in SHM, CM, NDE and SPC use a sensor that provides a signal that is dependent on the behaviour of the structure or system. If damage evaluation is to evolve beyond the simple threshold approach (i.e. the signal has exceeded a certain limit and therefore the structure must be taken off-line, the damage located and repaired) then techniques for processing the signal to derive features that enable quantification must be established. In doing this, the degree to which a signal processing regime is intelligent must be defined. Rytter [21] has done this, to a certain extent. In Section 3 of the paper, Rytter’s approach is summarised and then modified to cover the four contexts of SHM, CM, NDE and SPC. Sensor issues are discussed in Section 4 and data processing procedures are covered in Section 5. The purpose of this paper is to set down all the issues involved in progressing to intelligent fault detection and to suggest an approach. The authors believe that a holistic approach is required for intelligent fault detection. The structure and the sensors must be treated as one, i.e. at the design stage provision should be made for fault detection. The holistic approach can be implemented on existing structures so the issues discussed in the paper should not be regarded as only relevant to new structures. The basis of the


Worden & Dulieu-Barton

Intelligent Fault Detection in Systems and Structures

holistic approach is that not only should the structure be monitored but so should the sensors and hence dealing with sensor failure is a primary concern.

contains a fault. The following establishes what the authors consider to be the coherent definitions of faults, damage and defects: .

2 Taxonomy All materials and hence all structures contain defects at the nano/microstructural level, e.g. in Fibre Reinforced Polymers (FRPs) defects occur at the macrostructural level because of voids produced in manufacturing. Broberg [4] provides an excellent account of the inception and growth of microcracks and voids in the process region of a material. As all materials contain defects, the difficulty is to decide when a structure is ‘damaged’. In any strength of materials text one can find properties of certain grades of materials, virtually always provided as a range of values. The engineer will design a structure using failure criteria based on material property values from the lowest end of the range. The reason that materials have a range of properties is because of slight compositional and process variability resulting in slightly different microstructures and possibly varying numbers or shapes of inclusions, voids and other defects. It is clearly seen that this type of ‘fault’ can be simply designed out and therefore in the context of this paper should not be considered as damage. For the structure in-service, the most common form of damage evolution will be under dynamic load. Here, damage evolves from microcracks and results in changes in the material properties. Other damage mechanisms include: deformation damage through creep particularly at elevated temperatures, corrosion damage and overloads. In the past, engineers designed ideal structures; i.e. making the assumption that at all points through the life cycle, the structure was damage free. With the pressure to introduce new lightweight structures this concept is no longer valid and damage tolerant [6,19] structures must be designed in order to provide adequate performance throughout the life of the product. The basis of this paper is to define when a structure is merely damaged and when a structure

87

.

.

A fault is when the structure can no longer operate satisfactorily. If one defines the quality of a structure or system as its fitness for purpose or its ability to meet customer or user requirements, it suffices to define a fault as a change in the system that produces an unacceptable reduction in quality. Damage is when the structure is no longer operating in its ideal condition but can still function satisfactorily, i.e. in a sub-optimal manner. A defect is inherent in the material and statistically all materials will contain some unknown amount of defects, this means that the structure can operate at its design condition even if the constituent materials contain defects.

The above notation allows a hierarchical relationship to be developed; i.e., defects lead to damage and damage leads to faults. Using this idea it is possible to design a damage tolerant structure. In the proposed approach it is necessary to introduce monitoring systems in order to obtain a damage tolerant structure, so that it can be decided when the structure is no longer operating in a satisfactory manner. This means that a fault has to have a strict definition, e.g. the stiffness of the structure has deteriorated beyond a certain level. In some cases a simple definition based on one parameter may not be sufficient. A good example of this is when a crack is propagating in a stable manner there will be an increase in strain in the component and hence a reduction in component stiffness. Once the strain has increased above a certain level, a decision may be made to take the component outof-service. On inspection it may be that the crack is growing in such a direction that the component will fail safe, so service could have been prolonged. Along with a strain monitor it would be useful to monitor the direction of crack growth, perhaps using a thermal technique [35]. With this in mind it is clear that the choice of the fault monitoring system needs to take into account the


88


material type and the operating environment. This approach ensures that the quality of the measurement will be optimised and hence an over arching issue will be the limitations of the sensor. From the above it can be seen that the question –what is a fault– is not only based on the structure’s operating environment but also the type of monitoring system that is used. Another consideration is the level of damage tolerance required. Some systems may not require any damage tolerance. An example of this is the crumple zones in a car; these are designed to fail on impact in order that the energy is absorbed by the structure and not the driver and passengers. There would be little point in monitoring these areas in a car and the crumple zones could be defined as areas with infinite fault tolerance. A good example of a damage tolerant system is that of a helicopter gearbox, where it has been standard practice to monitor gearboxes for faults. Threshold levels are set based on the vibration response of the gearbox and once these are exceeded, the gearbox is taken out-of-service. Here the fault monitoring system could be seen as an integral part of the gearbox, as without this system safe operation of the aircraft could not be achieved without many more costly maintenance intervals. Finally a means of identifying procedural faults must be included in any intelligent fault monitoring systems. If a structure is being incorrectly used this may cause a fault, however it is important to distinguish between this type of fault and the more general systemic fault. An example of this would be a mode of operation that increases the load beyond the design load for a limited period of time. The increase in load does not cause immediate failure but significantly reduces the product life. A type of alarm system could be used that triggers when the design loads are exceeded, this would alert the operator that there is a problem and hence prevent a repeat of the incident and hence increase the product life. An alternative monitoring philosophy is in current use in the aircraft industry. Health and Usage Monitoring Systems (HUMS) do not just consider overloads, but record or examine all loads and modify the remaining life in the light of usage.

3

Intelligent Fault Detection

Having established a precise terminology characterising the sub-optimal behaviour of structures and systems, i.e. what constitutes a fault, the discussion can proceed to matters of detection and how it can be achieved with intelligence. The first observation one might make is that fault detection is in a sense trivial, as a fault is defined above as a change in the condition of the structure that produces an unacceptable reduction in quality. By implication, such a change will be evident. Thus, intelligent fault detection actually entails detecting the damage that will, if not corrected, lead to a fault. Detection of damage is a facet of the broader problem of damage awareness or damage identification. The objective of a monitoring system must be to accumulate sufficient information about the damage for appropriate remedial action to be taken to restore the structure or system to highquality operation or at least to ensure safety. Also, efficiency demands that only the necessary information should be returned by the monitor. With this in mind, it is helpful to think of the identification problem as a hierarchical structure, in the same way as one can think of the evolution of the fault as a hierarchical structure. This train of thought began with Rytter in his PhD thesis [21]. The original specification cited four levels: 1. Detection: the method gives a qualitative indication that damage might be present in the structure. 2. Localisation: the method gives information about the probable position of the damage. 3. Assessment: the method gives an estimate of the extent of the damage. 4. Prediction: the method offers information about the safety of the structure for example, estimates a residual life. The vertical structure is clear; each level requires that all lower-level information is available. Note that the damage identification scheme should if at all possible, be implemented on-line i.e. during operation of the structure; in this case prediction must also be understood as an estimate of the residual safe-life of the structure obtained during




operation. For an aircraft in flight, for example, this is critical. If the diagnostic system signals serious damage but fails to indicate that there is time to land, the aircraft may be lost needlessly and at great expense when the crew bail out. Note that the primary concern is that the crew do bail out; issues of life safety far outweigh economic considerations. Few would argue that the structure above summarises the main issues in damage identification (at least for SHM) with one major exception, this is remedied by the introduction of a new level. At the risk of repetition, the new structure is: 1. Detection: the method gives a qualitative indication that damage might be present in the structure. 2. Localisation: the method gives information about the probable position of the damage. 3. Classification: the method gives information about the type of damage. 4. Assessment: the method gives an estimate of the extent of the damage. 5. Prediction: the method offers information about the safety of the structure for example, estimates a residual life. Classification is important, if not vital, for effective identification at level 5 and possibly at level 4. Level 5 is distinguished from the others in that it cannot be accomplished without an understanding of the physics of the damage, i.e. characterisation. Level 1 is also distinguished in the following sense – it can be accomplished with no prior knowledge of how the system will behave when damaged. In order to explain this, a slight digression on pattern recognition or machine learning is needed. Many modern approaches to damage identification are based on the idea of pattern recognition (PR). In the broadest sense, a PR algorithm is simply one that assigns to a sample of measured data a class label, usually from a finite set. In the case of damage identification, the measured data could be vibration modeshapes, full-field thermoelastic data, scattered wave profiles etc. The appropriate class labels would encode damage type, location etc. In order to carry out

89

the higher levels of identification using PR, it will almost certainly be necessary to construct examples of data corresponding to each class. That is, in order to establish that a given set of measurements from a composite panel shows the presence of a delamination, the algorithm must have prior knowledge of what data from a delaminated panel looks like as opposed to one with say, a resin-rich area. Each possible fault class should usually have a training set of measurement vectors that are associated uniquely with it. Many PR algorithms work by training a diagnostic, for example a neural network can learn by example; it is shown the measurement data and asked to produce the correct class label, if the result differs from the desired label, the network is corrected. Typically many presentations of data are required. This type of learning algorithm in which the diagnostic is trained by showing it the desired label for each data set is called supervised learning. If supervised learning is required, there will be serious demands associated with it; data from every conceivable damage situation should be available. The two possible sources of such data are computation or modelling and experiment. Modelling presents problems if the structure or system of interest is geometrically or materially complex, for example Finite Element analysis of structures requiring a fine mesh can be extremely time-consuming even if the material is well understood. Structures with composite or viscoelastic elements may not even have accurate constitutive models. The damage itself may be difficult to model, it may also make the structure dynamically nonlinear, i.e. an opening–closing fatigue crack, and this also presents a formidable problem. Unfortunately, the situation is no better for experiment. In order to accumulate enough training data, it would be necessary to make copies of the system of interest and damage it in all the ways that might occur naturally; for high value structures like aircraft, this is simply not possible. Fortunately there is an alternative to supervised learning – unsupervised learning. However, this mode of learning only applies to level 1 diagnostics, i.e. it can only be used for detection. The techniques are often referred to as novelty detection or anomaly detection methods [3,26,29].


90


The idea of novelty detection is that only training data from the normal operating condition of the structure or system is used to establish the diagnostics. A model of normal condition is created; later, during monitoring, newly acquired data are compared with the model. If there are any significant deviations, the algorithm indicates novelty. The implication is that the system has departed from normal condition, i.e. acquired damage. The advantage of such an approach is clear. If the training data is generated from a model, only the unfaulted condition is required and this will simplify matters considerably. From an experimental point of view, there is no need to damage the structure of interest. Although novelty detection is only a level 1 approach, there are many situations where this suffices, i.e. safetycritical systems where any fault on the system would require it to be taken out of service. It is noteworthy that novelty detection methods have a long history dating back to the wheeltapper’s hammer and beyond. It is an important qualifier that the novelty detectors should flag only significant deviations from normal operating condition. All real systems are subject to measurement noise and usually operate in a changing environment; the monitor must be able to distinguish between a statistical fluctuation in the data and a real deviation from normality. This means that of the various flavours of pattern recognition existing [22], the most appropriate one is Statistical Pattern Recognition (SPR). Another important observation is that there may be variations in the normal condition that are not statistical, i.e. the characteristics of the structure may vary with changing environmental conditions, and this must be addressed. In general, it is important that the algorithms used for damage identification should account properly for sources of uncertainty and variation in the data. The algorithms should also, as far as possible, return a confidence interval with their diagnosis. The term normal operating condition requires some discussion. As stated in the previous section, there will always be defects present in a structure to some extent. The normal operating condition therefore means a state of the system when there is some assurance, statistical or

otherwise that the system is fit-for-purpose. In some cases there may be macroscopic damage i.e. a fatigue crack; however, if it is known that the crack will not grow under the standard loadings etc. on the system, the state qualifies as in a normal operating condition. Novelty detection will then look for new cracks or unexpected growth of the old crack. The discussion above is intended to show that there is often a trade-off between the level of a diagnostic system and the expense of training it adequately. Given this fact, the main requirement of an intelligent fault detection system then is that it should return information at the appropriate level for the context. It should measure the appropriate data and process this with the appropriate algorithm. It should take proper account of uncertainty in the data and return a confidence level in its diagnosis. This bold statement raises numerous issues that the following sections will attempt to address.

4

Sensor Issues

4.1 Operational Evaluation The first demand of an intelligent fault detection system is that it should measure the appropriate data. This simple statement hides a multitude of problems. The holistic approach to fault identification requires that the diagnostic system should be carefully designed with the objectives in mind. This means at the very least, decisions must be made about the type of sensors to be used and the placement of those sensors. Before this, a pre-planning or evaluation stage is desirable, like the operational evaluation stage defined by Farrar and Doebling [7]. This requires the architect of the monitoring system to first: . .

.

provide economic and/or life safety justifications for performing the monitoring; define system-specific damage including types of damage and expected locations; e.g. Failure Modes Effects Analysis (FMEA) and Failure Modes Effects and Criticality Analysis (FMECA) [12]; define the operational and environmental conditions under which the system functions;



.


define the limitations on data acquisition in the operational environment.

Each of these requirements can substantially support the design process. The second of the four raises two important issues. First, as observed in the previous section, a supervised learning scheme for the diagnostics requires training data; in order to build a model or specify an experimental programme to generate training data, one must specify the expected damage classes (type, severity, location etc.). Secondly, given a priori information about likely damage locations, one can make an informed choice of whether to design for local or global monitoring or a hybrid of the two. For example, if certain ‘hot spots’ on a structure are identified as critical regions, they can be inspected using a high-resolution local method like active ultrasound, leaving the rest of the structure to be monitored using a less sensitive but global vibration-based method. The environmental conditions must be considered when designing the monitoring system. Consider the novelty detection methods described in the last section; if the data used to characterise the normal operating condition does not span the whole range of operational and environmental conditions observed in practice, it is likely to signal novelty when a previously unseen condition occurs. The system will erroneously diagnose a fault. Examples of systems where this will occur abound: an offshore structure changing its mass due to oil storage or marine fouling, an aircraft before and after dropping a store, a bridge in a desert area undergoing substantial temperature changes as day changes to night. There are likely to be obstructions to implementing the optimum monitoring system. This places constraints on the optimisation problem for sensor placement. In situations where the monitoring system is needed for a pre-existing structure, certain useful if not critical locations may be inaccessible. For a truly holistic approach to damage monitoring, the monitoring system should be designed as an integral part of the structure or system. In the case of control, a minimum requirement for the sensor system is to measure the current state of the plant during

91

normal condition. This may be insufficient for fault detection as the deviations from normal condition may be orthogonal to the measured dimensions. It is clearly necessary therefore to anticipate this during the design process; FMEA may give sufficient guidance for expected faults. If one is taking a unified approach to damage identification, where the problem may be to monitor a Civil Engineering structure, or a machine or a chemical process, one should add to the operational evaluation stage a further requirement: .

identify context-specific features and decide the appropriate level for monitoring.

This should be the first or second consideration. 4.2

Sensor Placement

The most critical issue in the specification of a monitoring system is the type and location of the sensors; there can be no monitoring without the appropriate sensors. Specification of the type of sensors is arguably a matter of operational evaluation; knowledge of the expected damage types is valuable if not crucial there. Also the question of local versus global monitoring must be settled at this stage. The optimal positioning of the sensors is critical for true intelligence in monitoring. In principle, this problem can be solved using an optimisation scheme; however, in practice the constrained optimisation problems that arise can be computationally intractable. Heuristics like Genetic Algorithms have recently begun to show promise (Worden and Burrows [31], reviews the situation), but only on vastly simplified problems. 4.3

Sensor Validation and Failure Safety

The holistic approach to health monitoring demands that the sensor network be an integral part of the structure or system. It immediately follows from this that damage to the system may manifest itself as damage to the sensor network. A fault in the sensor network – which must be regarded as a systemic fault – will have


92


undesirable consequences whether or not the integrity of the overall structure is compromised. A sensor failure that causes an unnecessary alarm may cause the system to be needlessly taken out of service. A sensor failure that causes a fault to go unnoticed may have severe cost or safety implications – the monitoring system might as well not be present. The implication of this argument is that the sensor network itself should be monitored. Redundancy or diversity should be built in if necessary to avoid the monitoring of a system itself making a significant contribution to fault reporting. There are essentially two approaches to monitoring the sensors. First, the individual sensors can be self-monitoring. This is the approach taken by Henry and Clarke in the SEVA programme [9]. This is currently only an option with larger sensors – the Coriolis mass flow meter described in the SEVA work can accommodate a communications bus. The sensor is capable of self-diagnostics and also returns the on-line uncertainty of the measurements. The alternative for sensors like accelerometers which cannot accommodate substantial electronics, is to allow the sensors to monitor each other as described, for example, in Kramers [10,11]. The crucial feature of the latter approach is that the sensor outputs are correlated i.e. there is redundancy in the sensor network. The Kramers approach not only allows the identification of errant sensors, but also generates an approximate version of what the deviant sensor should read. The development of smart sensors like piezoceramics and piezopolymers also opens up possibilities. The ability of piezoelectrics to act as both sensors and actuators allows the possibility of active validation, i.e. each sensor in turn can act as a signal generator, and the consequent readings on the remaining sensors can be compared with a template to give a health report. The question of redundancy has been raised. This should also be a requirement of an intelligent monitoring system. If a sensor is diagnosed as defective, the information that would have been delivered by the sensor should be available elsewhere. This issue should be considered during the design stage also and made part of the sensor optimisation. In the future, it may be possible to

perform on-line reallocation of the sensors; at present the only option is to include back-up sensors. The simplest approach is to collocate sensors at critical points, but to keep only one active at a given time; redundant sensors are switched in when damage occurs to an active sensor. This approach is clearly sub-optimal. Also certain causes of damage for example, impact, are likely to affect the collocated pairs. Another approach is to overdesign the sensor network in such a way that damage to a single sensor still leaves a network which is optimised for monitoring in some sense. This approach to fail-safe sensor optimisation is discussed in Side et al. [23] and Staszewski et al. [25]. Finally, it must be stated that sensors alone do not suffice for any real level of damage identification. The one possible exception is for level 1, in situations where novelty detection amounts to exceedance above a given threshold of a sensor reading. In all other cases some form of data processing is required; intelligent use of signal processing algorithms is essential and a principled strategy for this will be discussed later.

5

Data Processing for Damage Identification

Once the operational evaluation stage has passed and the sensor network has been designed, the health monitoring system can begin to deliver data. The choice and implementation of algorithms to process the data and carry out the identification is arguably the most crucial ingredient of an intelligent fault detection strategy. Before even choosing the algorithm, it is necessary to choose between two complementary approaches to the problem: . .

damage identification is an inverse problem; damage identification is a pattern recognition problem.

The first approach usually adopts a model of the structure and tries to relate changes in measured data from the structure to changes in the model, sometimes locally linearised models are used to




simplify the analysis. The algorithms used are mainly based on linear algebra or optimisation theory, and an excellent survey of the dominant methods can be found in Doebling et al. [5]. The second approach is based on the idea described in Section 3, whereby measured data from the system of interest are assigned a damage class by a pattern recognition algorithm. This is the approach that is chosen here for detailed discussion. There is no implied criticism of the inverse problem approach; the authors are simply concentrating on the framework with which they are most familiar. For a critical appraisal of inverse problem approaches to damage identification, the reader can consult Friswell and Penny [8]. The data processing element of a monitoring system comprises all actions on the data upstream from the point of acquisition by the sensors. The ultimate product of the analysis is a decision as to the health of the system. The analysis has been neatly summed up by Lowe [14] as the D2D (Data to Decision) process. The principled approach – the intelligent approach – to the D2D process is based on ideas of data fusion. Sensor and data fusion as a research discipline in its own right emerged largely as a result of various defence organisations attempting to formalise procedures for integrating information from disparate sources. There are many definitions of data fusion, but one of the first – which has endured – came from the North American Joint Directors of Laboratories, who were charged by the US military to establish a standard data fusion terminology [28]. Their definition of data fusion was: A multilevel, multifaceted process dealing with the automatic detection, association, correlation, estimation and combination of data from single and multiple sources. The object of the exercise was to determine battlefield situation and assess threat on the basis of data from various sources. The philosophy of data fusion was quickly recognised to have broader application; initially in the fields of meteorology and traffic management, but later in the medical field and in NDE. The first models of the D2D process were couched exclusively in military terms, but as time wore on, a more

93

Sensing

Signal Processing Feature Extraction Pattern Processing Situation Assessment Decision Making

Figure 1

The waterfall model.

general terminology was adopted. One of the first general models for data fusion was the waterfall model developed in DERA, the UK Defence Evaluation Research Agency [1]. It is no longer widely used; however, it will suffice to illustrate the main stages of the D2D process. The model is illustrated in Figure 1. Apart from the situation assessment stage, which is an artefact of the old military terminology, all of the important processing stages are in the model. Beyond the sensor level which generates the raw data, the first stage is signal processing. This should more properly be called pre-processing. The purpose is to prepare the data for feature extraction, but more of that later. The preprocessing stage can encompass two tasks. The first of these is data cleansing. Examples of cleansing processes are: filtering to remove noise, spike removal by median filtering, removal of outliers (care is needed here as the presence of outliers is one indication that the data is not from normal condition), and treatment of missing data values. The second (optional) pre-processing stage is a preliminary attempt to reduce the dimensions of the data vectors and further de-noise the signal. For example, given a random time-series with many points, it is often useful to convert the data to a spectrum by Fourier transformation. If the signal is divided into


94


contiguous blocks before transformation and the resulting spectra are averaged, the number of points in the spectrum can be much lower than in the original time-history and noise is averaged away. Another advantage of treating the time signal this way is that the data vector obtained should be independent of time. If the original time-series is random, it makes little sense to compare measurements at different starting times. The pre-processing is usually carried out on the basis of Engineering judgement and experience. At this stage, the aim would be to reduce the dimension of the data set from possibly many thousands to perhaps a hundred. The second stage is feature extraction. The term feature comes from the pattern recognition literature and is short for distinguishing feature. Recall that the fundamental problem of pattern recognition is to assign a class label to a vector of measurements. This task is made simple if the data contains dominant features that distinguish it from data from other classes. In general, the components of the signal that distinguish the various damage classes will be hidden by features that characterise the normal operating condition of the structure, particularly when the damage is not yet severe. The task of feature extraction is to magnify the characteristics of the various damage classes and suppress the normal background. Suppose the raw data from the sensors is a time-series of accelerations from the outside of a gearbox casing. Further suppose that the time data has been pre-processed and converted into an averaged spectrum. Feature extraction in this situation could be extracting only the spectral lines at the meshing frequency and its harmonics as these lines are known to be sensitive to damage. So feature extraction can be carried out on the basis of Engineering judgement also. Alternatively, statistical or information theoretic algorithms can be used to reduce the dimension like Principal Component Analysis. The resulting low-dimensional data set is the feature vector or pattern vector, the pattern recognition algorithm will be used to assign a class. The aim of this stage would be to generate a feature vector of dimension less than ten. A low-dimensional feature vector is a critical element in any pattern recognition problem as the number of data

examples needed for training grows explosively with the dimension of the problem. Care must be taken at this stage that the information discarded in the dimension reduction is not relevant for diagnosing the damage. Feature extraction should only discard components of the data that do not distinguish the different system states and thus concentrate the information about damage. Some practitioners use the term symptoms for what are defined here as features. This is a little imprecise, in common English usage; a symptom is evidence of the presence of a problem. Most of the time, the monitored quantities will show no evidence of damage because damage is not present. Only when damage occurs do the features display symptoms. The next stage is pattern processing. This is the application of an algorithm which can decide the damage state on the basis of the given feature vector. An example would be that a neural network has been trained to return the damage type and severity when presented with say, condensed spectral information from a gearbox. Three types of algorithm can be distinguished depending on the desired diagnosis. 1. Novelty detection. In this case, the algorithm is required to simply indicate if the data comes from normal operating condition or not. This is a two-class problem which has the advantage that unsupervised learning can be used. Methods for novelty detection include: outlier analysis [30], kernel density methods [26], autoassociative neural networks [18], Kohonen networks [27], growing radial basis function networks [20] and methods based on SPC control charts [24]. 2. Classification. In this case, the output of the algorithm is a discrete class label. In order to apply such an algorithm, the damage states must be quantised, i.e. for location, the structure should be divided into labelled substructures. In this case, the algorithm could only locate to within a sub-structure, so resolution of what is essentially a continuous parameter may not be good unless many labels are used. However, this type of algorithm is useful in the sense that the algorithms can be trained to give the probability of class membership; this gives




an inbuilt confidence factor in the diagnosis. In the case where the desired diagnosis is from a discrete set, e.g. for diagnosing damage type, this class of algorithms are singled out. Examples of algorithms include: neural network classifiers trained with the 1 of M rule, linear and quadratic discriminant analysis, kernel discriminant analysis and nearest neighbour classifiers. A comparison of some of these approaches on a damage classification problem is given in Worden and Manson [30]. 3. Regression. In this case the output of the algorithm is one or more continuous variables. For location purposes, the diagnosis might be the Cartesian coordinates of the fault, for severity assessment it could be the length of a fatigue crack. The regression problem is often nonlinear and is particularly suited to neural networks. As in the classification case, it is often possible to recover a confidence interval for a neural network prediction [13]. In all cases, the pattern processing is subject to an important limitation. There is a trade-off between the resolution of the diagnosis and the noise rejection capabilities of the algorithm. Put simply, if the data is always noise-free, there will be very little fluctuation in the measurement from normal operating condition; in this case, small damages will cause detectable deviations. If there is much noise on the training data, it will be difficult to distinguish fluctuations due to noise and deviations due to damage unless the damage is severe. One of the tasks of feature extraction is to eliminate as far as possible, fluctuations on the normal condition data. This optimisation for performance is a requisite feature of intelligent fault detection. The concepts of pattern recognition have been applied in the framework of fault detection on many occasions. The reader can consult Doebling et al. [5] for the earlier references. A recent sequence of studies on laboratory structures and a full-scale aircraft [15–17,34], illustrate the particular approach of the authors on damage identification up to and including level 3 in Rytter’s hierarchy. The final stage in the D2D chain for the waterfall model is the decision. This is a matter of

95

considering the outputs of the pattern recognition algorithm and deciding whether action needs to be taken, and what that action should be. The waterfall model is a fusion model in the sense that it allows for a sensor network with different types of sensors all of which generate relevant information that can be fused together at the pre-processing or feature extraction stages. In general, fusion models also allow information to be combined at the pattern level or the decision level. The objective at all times is to reach a decision with higher confidence than can be reached using any of the information sources alone. A better fusion paradigm than the waterfall model is the Omnibus model. This is illustrated in Figure 2. This has several advantages over the waterfall model. First of all it includes the possibility of action. In the context of SHM, this could be repair; in the context of SPC this could be control. In the event of a sensor failure, this could entail the reallocation of sensors or the switching in of redundant sensors. The model has a loop structure that makes clear the fact that action need not interrupt the monitoring process, but may enhance it.

6

Summary The definitions proposed here are simple:

1. A fault is when the structure can no longer operate satisfactorily. If one defines the quality of a structure or system as its fitness for purpose or its ability to meet customer or user requirements, it suffices to define a fault as a change in the system that produces an unacceptable reduction in quality. 2. Damage is when the structure is no longer operating in its ideal condition but can still function satisfactorily, i.e. in a sub-optimal manner. 3. A defect is inherent in the material and statistically all materials will contain a known amount of defects, this means that the structure will operate at its optimum if the constituent materials contain defects.


96


Decision Making

Soft Decision Fusion

Hard Decision Fusion

Context Processing Decide Pattern Processing Feature Extraction

Control Orientate

Act

Resource Tasking

Observe Signal Processing Sensor Data Fusion

Figure 2

Sensing

The Omnibus model for data fusion.

Thus there is a natural progression within a structure from a defect to damage to a fault. As a fault is defined as an unacceptable reduction in quality, an intelligent strategy demands that a diagnostic system actually detects damage, i.e., should raise the alarm before progression to a fault. In summary then, the requirements for an intelligent fault detection system are:

.

. .

.

Sensor Management

It should be designed from a holistic viewpoint, ideally at the design stage of the structure of interest. At the very least, an operational evaluation stage is necessary to establish the baseline requirements for and restrictions on the monitoring. The sensor distribution should be optimal. This means the smallest number giving the required resolution should be active at any given time. Note that the optimisation may well be constrained. It should take account of inaccessible regions, and any upper limits on weight or power consumption etc. If failure-safety is desired – and it should always be recommended – the distribution should include inactive units that can be switched on in the event of sensor failure. This implies that the sensors themselves should be monitored. This can be accomplished by using self-validating sensors or sensor

.

.

.

networks. Active and inactive units should be at the optimal locations. Data processing should be dictated by an optimal fusion model designed to give the most robust and confident decision process given the available sensor resolutions and confidences. Where possible, algorithms that are provably optimal should be employed. If control is part of the system requirements, the actuator distribution should be optimal. The smallest number of actuators giving the required control actions should be employed and these should be located optimally. Feedback should be part of the process. If the decision process identifies an information gap and the system is capable of sensor/actuator redeployment. This should be implemented by whatever means available. This may be as simple as switching in inactive units, or could in the future entail physical movement of units. There should be a strategy for planning and implementing repair or replacement of the system or of subsystems. This may or may not go as far as specifying a backup system, depending on cost constraints. Monitoring should be as continuous as possible, with all levels of the D2D process updated as frequently as cost/technology constraints allow.




Some of these requirements are not currently within the reach of available technology. However, the purpose of this paper has been, amongst other things, to specify a paradigm for intelligent fault detection for discussion. It does not matter that some of the ingredients are not currently available; if the paradigm is considered desirable the missing ingredients will become the focus of research and will soon be here.

Acknowledgements The authors would like to acknowledge numerous useful discussions with their valued colleagues: Drs Graeme Manson and Wieslaw Staszewski of the University of Sheffield, UK, Drs Chuck Farrar and Hoon Sohn of the Los Alamos National Laboratory, USA and Professor Peter Cowley of Rolls-Royce plc. The authors would also like to thank the EPSRC for granting funds to support the Engineering Network on Structural Integrity and Damage Assessment. This has proved to be an invaluable forum for discussion.

References 1. Bedworth, M. (1994). Probability moderation for multilevel information processing. DRA Technical Report, DRA/CIS(SE1)/651/8/M94.AS03BP032/1. 2. Bedworth, M. and O’Brien, J. (1999). The Omnibus model: a new model of data fusion. DERA Malvern preprint. 3. Bishop, C.M. (1994). Novelty detection and neural network validation. IEEE Proceedings – Vision and Image Signal Processing, 141, 217–222. 4. Broberg, K.B. (1999). Cracks and fracture. Cambridge University Press. 5. Doebling, S.W., Farrar, C.R., Prime, M.B. and Shevitz, D. (1996). Damage identification and health monitoring of structural and mechanical systems from changes in their vibration characteristics. Los Alamos National Laboratory Report, LA-13070. 6. Dorey, G. (1989). Damage tolerance and damage assessment in advanced composites. In: Advanced composites (pp. 369–398). Elsevier. 7. Farrar, C.R. and Doebling, S.W. (2000). The state of the art in vibration-based structural damage identification. CD-ROM, Los Alamos Dynamics. 8. Friswell, M.I. and Penny, J.E.T. (1997). Is damage location using vibration measurements practical? Proceedings of 2nd International Conference on

97

Structural Damage Assessment using Advanced Signal Processing Procedures (DAMAS 97), (pp. 351–362), Sheffield. 9. Henry, M.P. and Clarke, D.W. (1993). The selfvalidating sensor: rationale, definitions and examples. Control Engineering Practice, 1, 585–610. 10. Kramers, M.A. (1991). Nonlinear principal component analysis using autoassociative neural networks. American Institute of Chemical Engineers Journal, 37, 233–243. 11. Kramers, M.A. (1992). Autoassociative neural networks. Computers in Chemical Engineering, 16, 313–328. 12. Leitch, R.D. (1995). Reliability analysis for engineers – an introduction. Oxford Science Publications. 13. Lowe, D. and Zapart, C. (1999). Point-wise confidence interval estimation by neural networks: a comparative study based on automotive engine calibration. Neural Computing and Applications, 8, 77–85. 14. Lowe, D. (2000). Feature extraction, data visualisation, classification and fusion for damage assessment. Oral Presentation at EPSRC SIDAnet Meeting, Derby, UK. 15. Manson, G., Worden, K. and Allman, D.J. (2002). Experimental validation of a damage severity method. Proceedings of 1st European Workshop on Structural Health Monitoring, (pp. 845–852), Paris. 16. Manson, G., Worden, K. and Allman, D.J. (2003a). Experimental validation of structural health monitoring methodology II: novelty detection on an aircraft wing. Journal of Sound and Vibration, 259, 345–363. 17. Manson, G., Worden, K. and Allman, D.J. (2003b). Experimental validation of structural health monitoring methodology III: Damage location on an aircraft wing. Journal of Sound and Vibration, 259, 365–385. 18. Pomerleau, D.A. (1993). Input reconstruction reliability information. In: Hanson, S.J., Cowan, J.D. and Giles, C.L. (eds.), Advances in neural information processing systems 5, Morgan Kaufman Publishers. 19. Reifsnider, K.L. and Case, S.W. (2002). Damage tolerance and durability of material systems, Wiley Inter-Science. 20. Roberts, S. and Tarassenko, L. (1994). A probabilistic resource allocating network for novelty detection. Neural Computation, 6, 270–284. 21. Rytter, A. (1993). Vibration based inspection of Civil Engineering structures. PhD Thesis, Department of Building Technology and Structural Engineering, University of Aalborg, Denmark. 22. Schalkoff, R. (1992). Pattern recognition: statistical, structural and neural approaches. John Wiley and Sons, Singapore. 23. Side, S., Staszewski, W.J., Wardle, R. and Worden, K. (2000). Fail-safe sensor distributions for damage


98

24.

25.

26.

27.

28.

Structural Health Monitoring 3(1) detection. Proceedings of 2nd International Workshop on Damage Assessment using Advanced Signal Processing Procedures – DAMAS ’97, (pp. 41–52), Sheffield, UK. Sohn, H. and Farrar, C.R. (2000). Statistical process control and projection techniques for damage detection. Proceedings of European COST F3 Conference on System Identification and Structural Health Monitoring (pp. 105–114), Madrid, Spain. Staszewski, W.J., Worden, K., Wardle, R. and Tomlinson, G. (2000). Fail-safe sensor distributions for impact detection in composite materials. Smart Materials and Structures, 9, 298–303. Tarassenko, L., Hayton, P., Cerneaz, Z. and Brady, M. (1995). Novelty detection for the identification of masses in mammograms. Proceedings of 4th International Conference on Neural Networks, Cambridge, (pp. 442–447), UK. IEE Publication 409. Taylor, O., MacIntyre, J., Isbell, C., Kirkham, C. and Long, A. (1999). Adaptive fusion devices for condition monitoring: local fusion systems of the NEURALMAINE project. Proceedings of 1st International Conference on Damage Assessment of Structures – DAMAS ’99, (pp. 205–216), Dublin, Eire. White, F.E. Jr. (1990). Joint Directors of Laboratories Data Fusion Subpanel Report: SIGINT Session. Technical Proceedings of the Joint Service Data Fusion Symposium, DFS-90, 469–484.

29. Worden, K. (1997). Structural fault detection using a novelty measure. Journal of Sound and Vibration, 201, 85–101. 30. Worden, K., Manson, G. and Fieller, N.R.J. (2000). Damage detection using outlier analysis. Journal of Sound and Vibration, 229, 647–667. 31. Worden, K. and Burrows, A.P. (2000). A combined neural and genetic approach to sensor placement. Engineering Structures, 23, 885–901. 32. Worden, K. and Manson, G. (2000). Damage identification using multivariate statistics: kernel discriminant analysis. Inverse Problems in Engineering, 8, 25–46. 33. Worden, K. (2003). Cost action F3 on structural dynamics: benchmarks for working group 2 – structural health monitoring. Mechanical Systems and Signal Processing, 17, 73–75. 34. Worden, K., Manson, G. and Allman, D.J. (2003). Experimental validation of structural health monitoring methodology I: novelty detection on a laboratory structure. Journal of Sound and Vibration, 259, 323–343. 35. Zehnder, A.T. and Rosakis, A.J. (1993). Temperature rise at the tip of dynamically propagating cracks: measurements using high speed IR detectors. In: Experimental techniques in fracture, (pp. 125–169), V.C.H. Publishers Inc., New York.