Evaluation of Shock Detection Algorithm for Road Vehicle ... - MDPI

0 downloads 0 Views 8MB Size Report
Oct 11, 2018 - This paper assesses different machine learning classifiers to index the ... classifiers have been selected for RVV shock detection purposes, ...... Available online: https://www.ista.org/pages/procedures/ista-procedures.php ( ...
vibration Article

Evaluation of Shock Detection Algorithm for Road Vehicle Vibration Analysis Julien Lepine 1, *,† 1 2

* †

and Vincent Rouillard 2

Department of Engineering, University of Cambridge, Trumptington St, Cambridge CB2 1PZ, UK College of Engineering & Science, Victoria University, P.O. Box 14428, Melbourne, VIC 8001, Australia; [email protected] Correspondence: [email protected]; Tel.: +44-7780-948185 Work done at Victoria University.

Received: 4 September 2018; Accepted: 6 October 2018; Published: 11 October 2018

 

Abstract: The ability to characterize shocks which occur during road transport is a vital prerequisite for the design of optimized protective packaging, which can assist in reducing cost and waste related to products and good transport. Many methods have been developed to detect shocks buried in road vehicle vibration signals, but none has yet considered the nonstationary nature of vehicle vibration and how, individually, they fail to accurately detect shocks. Using machine learning, several shock detection methods can be combined, and the reliability and accuracy of shock detection can also be improved. This paper presents how these methods can be integrated into four different machine learning algorithms (Decision Tree, k-Nearest Neighbors, Bagged Ensemble, and Support Vector Machine). The Pseudo-Energy Ratio/Fall-Out (PERFO) curve, a novel classification assessment tool, is also introduced to calibrate the algorithms and compare their detection performance. In the context of shock detection, the PERFO curve has an advantage over classical assessment tools, such as the Receiver Operating Characteristic (ROC) curve, as it gives more importance to high-amplitude shocks. Keywords: shock-on-random; environmental shock; vehicle vibration; shocks detection; Receiver Operating Characteristic curve; optimal operation point; classifier calibration

1. Introduction In most countries, product distribution predominantly relies on road transportation. This type of transportation can be harmful to freight as the interaction between the road vehicle and pavements creates shocks and vibration. To mitigate this problem, a common practice is to use protective packaging, which significantly increases shipping costs. Inefficient packaging constitutes a pressing global problem that costs hundreds of billions of dollars, and also impacts negatively upon the environment. An insufficient level of packaging increases the occurrence of product damage during transport, whereas excessive packaging increases the packages’ weight and volume, which creates more costs throughout the supply chain. In order to reduce these costs, packaging protection is currently being optimized by simulating the vibration produced by road vehicles. Different physical simulation methods are prescribed by international and national standards [1–4]. The model used by the standards assumes that Road Vehicle Vibration (RVV) is a Gaussian and stationary random signal. It is broadly acknowledged that this assumption oversimplifies the RVV, which imposes a significant limit onto packaging optimization [5–9]. Realistic and accurate RVV simulations must consider the different excitation modes contained in RVV—that is, the nonstationary random vibration induced by the pavement’s profile and variations in vehicle speed, as well as the shocks caused by the pavement’s aberrations. More complex RVV models have been recently developed to enhance RVV simulations by essentially using two different approaches—analyzing RVV in the time domain [6,8,10–19], and in Vibration 2018, 1, 220–238; doi:10.3390/vibration1020016

www.mdpi.com/journal/vibration

Vibration 2018, 1

221

the time-frequency domain [20–27]. However, a fundamental problem still remains; these alternatives are effective only in specific scenarios, such as: (1) when the underlying random vibrations are stationary (which is rarely the case for RVVs), or; (2) when the random vibrations are non-stationary and no shocks are present. No single approach available today can accurately detect shocks that are superimposed on non-stationary, random background vibrations. Both excitation modes should be included in the RVV simulation to ensure that simulations are realistic and accurate and that the packaging will protect against vibrations during transport without being too excessive. Each mode is represented by a different mathematical model and cannot be analyzed with the same statistical tools. For instance, the nonstationary mode can be modelled as a series of stationary Gaussian random segments of different RMS (root mean square) values [6], and the shocks as a distribution of impulses of different amplitudes. To create these models, the nonstationary random vibration and the shocks have to be characterized separately. This task is challenging because all the excitation modes co-exist in the vibration acceleration signal recorded on a vehicle. This paper assesses different machine learning classifiers to index the shocks buried in synthetic, nonstationary, random vibration signals representing real RVV. With machine learning, it is possible to integrate many RVV analysis methods, such as moving statistics, the Discrete Wavelet Transform, and the Hilbert-Huang Transform, which greatly enhance shock detection. The processing required to integrate these methods of analysis is presented in the first part of this paper. The second part deals with an evaluation of four classification algorithms (Decision Tree, k-Nearest Neighbors, Bagged Ensemble, and Support Vector Machine) which was performed using synthetically generated RVV signals. The evaluation used a novel detection assessment tool, namely the PERFO (Pseudo Energy Ration/Fall-Out) curve, which was specifically developed for shock detection purposes. The application of this machine learning algorithm on real-life vehicle measurements was described in detail in a previous publication by the authors [28]. 2. RVV Analysis Methods RVV signals can be studied by using time and time-frequency analysis tools. Attempts at detecting shocks have been made using moving statistics, such as moving RMS [11,12], moving kurtosis [12], and the moving crest factor [8,10–15], calculated on RVV acceleration signals in the time domain. Using different moving window length, the moving RMS has also been used to describe the RVV nonstationarities as a series of stationary Gaussian random segments [6,8,15–18]. The sensitivity of these moving statistics to shocks and nonstationarities highly depends of the window length used in the calculation [29,30]. For instance, short windows have good sensitivity to shocks, but cannot distinguish shocks for a short, stationary Gaussian random segment. As the nonstationarities and shocks may cause variations in the RVV spectrum [7,31–33], different time-frequency analysis methods have been used to make this distinction. One of these methods is the Discrete Wavelet Transform (DWT), which is often used to identify shocks [20–23] and RVV nonstationary components [22–24]. The DWT has good sensitivity to signal state changes and shocks but also to nonstationary segment changes, which may create similar output in a DWT scalogram [22]. The Hilbert-Huang transform, which is an adaptive time-frequency analysis method [34], has also been used to analyze both modes of RVV signals [23,25–27]. This method has better sensitivity to nonstationary segments than the DWT [23,25], but cannot be effectively used to detect shocks [25,27]. All these methods have their own benefits and limitations when applied to RVV signals, but none can effectively detect shocks buried in nonstationary, random vibration signals. A useful approach which can get the most out of the time and time-frequency tools is to integrate them with machine learning by using a classification algorithm [30]. As shown in Figure 1, the process starts with a learning dataset, which is a set of data where the classes are known. For RVV, this is a vertical acceleration signal where the locations of the shocks are known. Dataset segments containing shocks are attributed to the class “shocks”, and the remaining to the class “no shock”. The dataset is processed to reveal different data behaviors and characteristics in a format compatible with machine learning algorithms.

Vibration 2018, 2, x FOR PEER REVIEW Vibration 2018, 1

3 of 19 222

learning algorithms. This is where machine learning is different to other classification approaches, because can base its prediction ondifferent several different signal processing methods. This means it can This isitwhere machine learning is to other classification approaches, because it can base combine the more effective shockssignal and nonstationary analysis methods tocan distinguish the more shocks its prediction on several different processing methods. This means it combine the from signal intensity variations (nonstationarity). The outcomes of each method are called predictors. effective shocks and nonstationary analysis methods to distinguish the shocks from signal intensity Once the processing is completed, the data of is randomly partitioned into two sets: thethe training dataset, variations (nonstationarity). The outcomes each method are called predictors. Once processing is and the validation set,is where both sets haveinto the two same proportion of dataset, classes. and Thethe training dataset completed, the data randomly partitioned sets: the training validation set, is where boththe setsalgorithm have the same proportion classes. The dataset is used is tothen trainvalidated the algorithm used to train and develop theofclassifier, andtraining the trained classifier using developdataset. the classifier, and the trained classifier isphase then validated the validation dataset. theand validation The outcome of the validation is appliedusing to optimize the performance outcome of the validation phase is applied to optimize the performance of various of The various algorithms, and the optimized algorithm is then employed to generalize the algorithms, classification and the algorithm is then employed to generalize the classification model to a new dataset. model to aoptimized new dataset.

Figure algorithm. Figure1.1.Workflow Workflow of of the the classification classification algorithm.

2.1.2.1. Machine Learning Machine LearningAlgorithms Algorithms There exists algorithms,which whichhave havea amyriad myriadofof potential There existsa alarge largevariety varietyof ofmachine machine learning learning algorithms, potential applications. Based on preliminary work undertaken by Lepine [35], four different machine learning applications. Based on preliminary work undertaken by Lepine [35], four different machine learning classifiers have been purposes,such suchas asthe theDecision DecisionTree, Tree, k-Nearest classifiers have beenselected selectedfor forRVV RVVshock shock detection detection purposes, k-Nearest Neighbors (kNN), Bagged Ensemble, and Support Vector Machine (SVM). These algorithms were Neighbors (kNN), Bagged Ensemble, and Support Vector Machine (SVM). These algorithms were ® ® trained using their information,refer refertotoHastie Hastieetet [36], Rogers trained using theirimplementation implementationin in Matlab Matlab (for (for more information, al.al. [36], Rogers and Girolami [37], Cherkassky and Mulier [38], and Shalev-Shwartz and Ben-David [39]). and Girolami [37], Cherkassky and Mulier and Shalev-Shwartz and Ben-David [39]). 2.1.1. Decision Tree 2.1.1. Decision Tree The DecisionTree Treealgorithms algorithmsbase base their their prediction prediction on thethe The Decision on aa cascade cascadeofofstatistical statisticaltests, tests,where where outcome thefirst firsttest testleads leads to to another another test thethe class of the data is predicted (Figure 2). outcome ofofthe testand andsosoononuntil until class of the data is predicted (Figure The predictors are the statistics used for the tests. For shock detection application, a Decision Tree 2). The predictors are the statistics used for the tests. For shock detection application, a Decision Tree comprised of 20 tests was used, as this was found to be optimal by Lepine [35]. comprised of 20 tests was used, as this was found to be optimal by Lepine [35].

Vibration 2018, 1

223

Vibration 2018, 2, x FOR PEER REVIEW Vibration 2018, 2, x FOR PEER REVIEW

4 of 19 4 of 19

Figure 2. Example of Decision Tree made up of five binary tests (splits) and six final outcomes (a–f). Figure 2. Example of Decision Tree made up of five binary tests (splits) and six final outcomes (a–f ). 2. Example of Decision Tree made up of five binary tests (splits) and six final outcomes (a–f). 2.1.2. Figure k-Nearest Neighbors

2.1.2. k-Nearest Neighbors

The k-Nearest Neighbors algorithms (kNN) are able to group training datasets by class in as

2.1.2. k-Nearest Neighbors The Neighbors algorithms (kNN) are able to group training by class manyk-Nearest spaces as there are predictors. A kNN algorithm classifies new data pointsdatasets by grouping them in as Themost k-Nearest Neighbors algorithms are able to group training datasetsby bygrouping inby asthem manyinto spaces as there are predictors. A kNN(kNN) classifies new data points the common class of their nearest kalgorithm neighbors; in other words, the classification isclass made many spaces are of predictors. A kNN algorithm classifies new data points byitgrouping them into the most common class nearest k neighbors; in other words, the classification made by association. If as thethere majority oftheir a data point’s nearest neighbors belong to one class, is moreislikely intothe theIf most common class of their nearest k neighbors; in other words, classification is made by that that point belongs to class too. Fornearest instance, in Figure 3, 15 data points are squares, five are association. the majority ofthis a data point’s neighbors belong to the one class, it is more likely association. If the majority of a data point’s nearest neighbors belong to one class, it is more likely circles, and the class is For unknown. As four of the3,five neighbors of the lozenge the point belongs tolozenge this class too. instance, in Figure 15 nearest data points are squares, five areare circles, that thea point belongs to this (5NN) class too. For instance, in Figure 3, data 15 data points are squares, five are circles, 5-Nearest Neighbors classifier would classify this point as a circle. For the shock and the lozenge class is unknown. As four of the five nearest neighbors of the lozenge are circles, a circles, and the lozenge class isalgorithm unknown. Asused, four of the five nearestby neighbors of the are detection application, a 100NN was as recommended Lepine [35]. Thelozenge Euclidean 5-Nearest Neighbors (5NN) classifier would classify this data point as a circle. For the shock detection circles, a 5-Nearest Neighbors (5NN) classifier would classify this data point as a circle. For the shock distance was used to define the nearest neighbors. application, a 100NN algorithm used, as recommended by Lepine [35]. The Euclidean distance detection application, a 100NNwas algorithm was used, as recommended by Lepine [35]. The Euclidean was used to define the to nearest distance was used define neighbors. the nearest neighbors.

Figure 3. The kNN process, where the lozenge is a new data point and the two classes of the training points are the square and circle, numbered from the nearest neighbors to the farthest. 3. kNN The kNN process, where lozengeisisaanew new data data point two classes of the training FigureFigure 3. The process, where thethe lozenge pointand andthe the two classes of the training areEnsemble the square and circle, numbered from the nearest neighbors to the farthest. 2.1.3. points Bagged

points are the square and circle, numbered from the nearest neighbors to the farthest. Bagged ensemble algorithms use a combination of Decision Trees on which they base their 2.1.3. Bagged Ensemble Bagged algorithm stands for “bootstrap aggregation”, and generates Decision Trees on 2.1.3.prediction. Bagged Ensemble Bagged ensemble algorithms a combination of Decision Trees on which they of base their different random data samples. Theuse classification is made by using the average response each of Bagged ensemble algorithms use for a of combination oftrees Decision Trees they base prediction. Bagged algorithm stands “bootstrap aggregation”, generates Decision ontheir these Decision Trees, and an ensemble 150 decision was and found toon be which suitable forTrees shockprediction. stands for “bootstrap aggregation”, generates differentBagged randomalgorithm data samples. The classification is made by using theand average responseDecision of each ofTrees detection purposes. these Decision Trees, an ensemble 150 decisionistrees wasby found to the be suitable shock- of on different random dataand samples. The of classification made using averageforresponse detection purposes. Support VectorTrees, Machine each 2.1.4. of these Decision and an ensemble of 150 decision trees was found to be suitable for n shock-detection purposes.

2.1.4.DSupport Machine  x   Vector i yi  x  x '   b Support Vector Machine (SVM) algorithms are powerful classifiers that i 1

n 2.1.4.only Support Vector Machine work when the data have only two classes. As such, they are well-suited for shock detection.

D  x    i yi  x  x '  b Support Vector Machine (SVM) algorithms are powerful classifiers that n _ i 1

_

0 ) + b Support Vector Machine (SVM) algorithms are powerful classifiers α i yi (the x · xdata Donly = ∑ when (x) work have only two classes. As such, they are well-suited for shock detection. i =1

that only work when the data have only two classes. As such, they are well-suited for shock detection. Using a training dataset, the SVM found a hyperplane that maximized the distance between the two classes’ data points represented in all their dimensions (predictors). This hyperplane is defined as:

Vibration 2018, 1

224

n

Vibration 2018, 2, x FOR PEER REVIEW

D (x) =

_

∑ α i yi

 _ x · x0 + b

5 of 19

(1)

i =1

Using a training dataset, the SVM found a hyperplane that maximized the distance between the two in all their dimensions (predictors). Thisand hyperplane is defined yi ), fordata where (xi ,classes’ i =points 1, . . .represented , n are, respectively, the predictor matrix class vector of as: the n training 𝑛

_

_

⌢ data points; (x · x0 ) is the predictor-matrix inner product; and α⌢ and b are parameters defined during 𝐷(𝐱) = ∑ 𝛼 𝑖 𝑦𝑖 (𝐱 ⋅ 𝐱′) + 𝑏 i (1) the learning process [36–39]. 𝑖=1 Therewhere are many it is impossible to solvematrix (1) in direct 1,..., n are, for i where respectively, the predictor andits class vectorform of the because n training no single  xi , yi  ,cases hyperplane canpoints; separate classes. For example, the dataset in Figure 4 cannot data the predictor-matrix inner product; and  i shown and b are parameters definedbe divided ' isdata  x  xthe by a singleduring hyperplane because the cluster of circles is surrounded by squares. the learning process [36–39]. This typeThere of problem bewhere solved nonlinear transformation by means are manycan cases it isusing impossible to solve (1) in its direct form becauseofnoKernel single functions hyperplane can separate the data classes. For example, the dataset shown in Figure 4 cannot which transform the data into a space where the classes are more distinct. This process is done be by replacing divided by a single hyperplane because the cluster of circles is surrounded by squares. the inner product in the linear hyperplane definition by a nonlinear Kernel function, G(x, x0 ), such as:

This type of problem can be solved using nonlinear transformation by means of Kernel functions which transform the data into a space where distinct. This process is done by n the classes aremore _ _ 0 replacing the inner product in D the linear hyperplane definition by α i yi G x · x + b a nonlinear Kernel function, (x) = G  x, x ' , such as: i =1



𝑛

(2)



⌢ The Kernel function that was found𝐷(𝐱) to be= well-suited for detection purposes is (2) the Gaussian ∑ 𝛼 𝑖 𝑦𝑖 𝐺(𝐱 ⋅ 𝐱′) + shock 𝑏 𝑖=1 functions (also known as the radial basic functions): The Kernel function that was found to be ( well-suited for ) shock detection purposes is the 0 |2 Gaussian functions (also known as the radial basic functions):  x − x | (3) G x, x 0 = exp − 2 |𝑥 σ −2𝑥′|

𝐺(𝑥, 𝑥′) = exp {−

𝜎2

}

(3)

2 where σ2 defines function’s width. where  the defines the function’s width.

Figure 4. Dataset that cannot be divided by a single hyperplane, nor be classified with the linear

Figure 4. support Datasetvector thatmachine cannot(SVM). be divided by a single hyperplane, nor be classified with the linear support vector machine (SVM). 2.2. Synthetic RVV Signals

2.2. Synthetic RVV Signals was trained with a synthetic signal (learning dataset) composed of nonstationary The algorithm random vibrations and shocks, which represented a realistic RVV signal. The advantages of using a

The algorithm was trained with a synthetic signal (learning dataset) composed of nonstationary synthetic RVV over vibration signals recorded on vehicles are twofold: (1) the signal’s characteristics (such as theand location of the which shocks) can be preciselyaknown, which is essential verify the learning of using a random vibrations shocks, represented realistic RVV signal. toThe advantages process; andvibration (2) the signal can virtually last foron anyvehicles duration of time, which can the detection synthetic RVV over signals recorded are twofold: (1)increase the signal’s characteristics accuracy. To represent the shocks and nonstationarities, the synthesis was made from the sum of two (such as the location of the shocks) can be precisely known, which is essential to verify the learning separate signals (Figure 5). process; and (2) the signal can virtually last for any duration of time, which can increase the detection accuracy. To represent the shocks and nonstationarities, the synthesis was made from the sum of two separate signals (Figure 5). The vehicle vibration spectrum was calculated from vertical acceleration measured on a vehicle. A Gaussian signal was created from an inverse Fast Fourier Transform of the spectrum, modified with a random phase. The nonstationary signal was created from a modulated Gaussian random signal with the spectrum of a typical road vehicle. The modulation function represents typical RMS variations observed in a RVV signal [8,18]. The transient signal is a series of impulse responses of a typical vehicle transfer function. The transfer function corresponds to a two-degree-of-freedom mass-spring-damper model, which

Vibration 2018, 1

225

represents a quarter-car suspension model of a road transport vehicle [40]. The distribution of the impulse location, and amplitude represented realistic road surface aberrations. 6 of 19 Vibration 2018, 2, xduration, FOR PEER REVIEW

Figure5. 5. Signal Signal synthesis. Figure synthesis.

The vehicle vibration spectrum was calculated from vertical acceleration measured on a vehicle.

Both the nonstationary and transient added together toof create a synthetic RVV signal. A Gaussian signal was created from ansignals inversewere Fast Fourier Transform the spectrum, modified Complete onphase. the RVV synthesis are presented Lepine [35]. with adetails random Thesignal nonstationary signal was createdbyfrom a modulated Gaussian random signal with the spectrum a typical roads vehicle. modulation functionof represents typical RMS Classifiers were trainedofwith a 3000 dataset The (sampling frequency 1024 Hz), incorporating variations observed in a RVV signal [8,18]. 300 randomly distributed shocks. The spectrum of RVV signals dropped by several orders above The transient signal is frequency a series of impulse responses a typical vehicle function.ofThe 100 Hz, meaning the sampling was more than 10 of times greater than transfer the bandwidth interest. transfer function corresponds to a two-degree-of-freedom mass-spring-damper model, which The classifiers were then validated with a 3500 s dataset (at the same sampling frequency) that contained represents a quarter-car suspension model of a road transport vehicle [40]. The distribution of the 350 randomly distributed shocks. Preliminary work from Lepine [35] shows that this combination of impulse location, duration, and amplitude represented realistic road surface aberrations. both learning and validation datasets is necessary to obtain accurate classification models. Both the nonstationary and transient signals were added together to create a synthetic RVV signal. Complete details on the RVV signal synthesis are presented by Lepine [35]. Classifiers were trained with a 3000 s dataset (sampling frequency of 1024 Hz), incorporating 300 randomly distributed shocks.performance The spectrum depends of RVV signals dropped by several orders above 100 Machine learning prediction on the data processing undertaken before Hz, meaning the sampling frequency was more thanto 10detect times greater than buried the bandwidth of interest. the training phase—that is, the predictors. In order the shocks into RVV signals, the The classifiers were then validated with aanalysis 3500 s dataset (at the same sampling frequency) predictors were obtained from the relevant methods, which are: the moving RMS,that moving contained 350 randomly distributed shocks. Preliminary work from Lepine [35] shows that this crest factor, moving kurtosis, Hilbert-Huang Transform (HHT), and Discrete Wavelet Transform combination of both learning and validation datasets is necessary to obtain accurate classification (DWT). Sections 2.3.1–2.3.5 summarizes the work published by Lepine et al. [30] on the selection and models.

2.3. Predictors

optimization of predictors for RVV shock detection. 2.3. Predictors

2.3.1. Moving RMS

Machine learning prediction performance depends on the data processing undertaken before

Measuring the fluctuations in the RVV signals’ RMS valuesburied givesinto anRVV overview the training phase—that is, the predictors. In order to detect the shocks signals, of the their nonstationarities and shocks.from Thethe moving RMS with methods, a window duration, T,moving is defined as:moving predictors were obtained relevant analysis which are: the RMS, crest factor, moving kurtosis, Hilbert-Huang v Transform (HHT), and Discrete Wavelet Transform (DWT). Sections 2.3.1 to 2.3.5 summarizes theu work u ZTpublished by Lepine et al. [30] on the selection 1 u and optimization of predictors formRVV (shock detection. t) = t x (t + τ )2 dτ RMS

2.3.1. Moving RMS

T

(4)

0

where τ isMeasuring a dummythe variable representing timesignals’ shift. RMS values gives an overview of their fluctuations in the aRVV nonstationarities and shocks. The moving RMS with a window T, isthe defined as: causal reaction Note how the moving RMS is calculated “forward” to consider vehicle (t + τ )duration, from the road excitation. The major shortcoming of 𝑇this predictor indicates its dependency on its 1 window duration. A shorter window is better for detecting short transient events but is ineffective for (4) 𝑚𝑅𝑀𝑆 (𝑡) = √ ∫ 𝑥(𝑡 + 𝜏)2 ⅆ𝜏 𝑇 longer sustained changes in RMS level, and vice versa0 [16,30], meaning there is no ideal window size. Fortunately, machine learning classification has the capability to use multiple predictors. The moving RMS whereare τ isnot a dummy representing a time shift.different window lengths (0.5 s and 4 s) are used. predictors limitedvariable to one window length, so two Note how the moving RMS is calculated “forward”

t   

to consider the vehicle causal

2.3.2.reaction Movingfrom Crest theFactor road excitation. The major shortcoming of this predictor indicates its dependency on its window duration. A shorter window is better for detecting short transient events but is

The moving crest factor is the ratio between the absolute value of the signal over the moving RMS ineffective for longer sustained changes in RMS level, and vice versa [16,30], meaning there is no ideal valuewindow of a window duration T: machine learning classification has the capability to use multiple size. Fortunately, | x (t)| mCF (t) = predictors. The moving RMS predictors areq not Rlimited to one window length, so two different (5) 2 1 T window lengths (0.5 s and 4 s) are used. T 0 x ( t + τ ) dτ

Vibration 2018, 1

226

Note how the moving crest factor is calculated “forward” (t + τ ) to consider the vehicle causal reaction from the road excitation. In general, the moving crest factor of a signal increases with the presence of shocks. Therefore, shocks can be detected when the crest factor is above a certain threshold. As opposed to the moving RMS predictor, the moving crest factor predictor is more accurate when using a longer moving window [30]. This is because a longer window averages out the effect of the shock at the crest factor’s denominator without affecting its numerator, which results in a greater sensitivity to shocks. However, windows of too long a duration have been found to misclassify short RMS variations as shocks [30]. Two crest-factor predictors with window lengths of 8 s and 64 s were used to develop the machine learning classifiers. 2.3.3. Moving Kurtosis The moving kurtosis, κ, is the fourth moment of a signal. It gives a measure of the “peakedness” of the Probability Density Function (PDF) of a signal segment (duration, T) which is defined as: 1 T

RT

T

0

κ (t) =  R 0 1 T

[ x (t + τ )]4 dτ 2 [ x (t + τ )]2 dτ

(6)

Two signals with the same RMS value can have different kurtosis if one has a few very high peak values. The kurtosis could also be interpreted as how close a signal’s distribution is to a Gaussian distribution which has a kurtosis of 3. Shocks and nonstationarity components in RVV create a leptokurtic distribution—that is, a kurtosis above 3 [6,19]. Depending on the window length, the moving kurtosis does not have the same sensitivity to signal RMS variations and shocks. Preliminary experiments have shown that by combining two window lengths (4 s and 8 s), the moving kurtosis functions can identify both the shocks and the nonstationarities of RVV [35]. 2.3.4. Discrete Wavelet Transform The DWT is a time-frequency (or more specifically, time-scale) analysis method that provides predictors which are more sensitive to signal changes. Different publications suggest that the Daubechies wavelets are the best suited for RVV analysis [20,21]; thus, the Daubechies 10 wavelet was used to analyse the learning data and coefficients of the first 12 scales, which were directly used as predictors. As the signal sampling rate is halved for every scale, the largest scale (12) has a frequency range up to 0.25 Hz (for a sampling rate of 1024 Hz), which can be considered to be refined enough for RVV analysis purposes. This resampling also causes the number of coefficients to decrease at every scale. To create predictors with sampling rates that match the original signal and the other predictors, the coefficients are replicated to match the sampling rate of the signal. Figure 6 presents an example of the DWT predictors of four different scales of the signal. Vibration 2018, 2, x FOR PEER REVIEW 8 of 19

Figure 6. Example of Discrete Wavelet Transform (DWT) analysis. Shocks in the synthetic signal are in

Figure 6. Example of Discrete Wavelet Transform (DWT) analysis. Shocks in the synthetic signal are in yellow, and only coefficients from DWT numbers 1, 5, 9, and 12 are presented for illustration purposes. yellow, and only coefficients from DWT numbers 1, 5, 9, and 12 are presented for illustration purposes.

2.3.5. Hilbert-Huang Transform Predictors The Hilbert-Huang Transform (HHT) is an adaptive time-frequency analysis method providing different types of predictors from RVV signals. The HHT uses a sifting process to decompose a signal into different narrow-banded components called Intrinsic Mode Functions (IMFs), which have the following characteristics, as defined by Huang et al. [34]:

Vibration 2018, 1

227

A full description of the specific application of the DWT, including its mathematical derivation, can be found in previous publications [22,30,35]. 2.3.5. Hilbert-Huang Transform Predictors The Hilbert-Huang Transform (HHT) is an adaptive time-frequency analysis method providing different types of predictors from RVV signals. The HHT uses a sifting process to decompose a signal into different narrow-banded components called Intrinsic Mode Functions (IMFs), which have the following characteristics, as defined by Huang et al. [34]: 1. 2.

In the whole dataset, the number of extrema and number of zero-crossings must either be equal to each other, or differ by one at most; At any point, the mean value of the envelopes defined by the local maxima and local minima is zero.

The sifting process is made by fitting a spline function over signal local maxima and a second over local minima. The mean of these functions is then subtracted from the signal, which has the effect of removing the low-frequency components of the signal. This process is repeated until the number of extrema and zero-crossings remain the same and either equal to each other, or differ, at most, by one, for eight consecutive iterations [41]; the remaining signal becomes the first IMF. The next IMFs are then calculated using what remains from the signal, using the same process. This iterative process stops when no more IMF can be fitted, and the residual is called the signal trend. The predictors used in the machine learning process are the instantaneous amplitude and frequency functions of each IMF, which can be calculated using the Hilbert transform. These functions reveal trend changes in the signal, which make them effective predictors. For example, Figure 7 shows IMFs 1, 5, and 9 of a signal, along2018, with2,IMF 5’sPEER instantaneous amplitude envelope and instantaneous frequency functions. Vibration x FOR REVIEW 9 of 19

Figure 7.7.Example (HHT)analysis. analysis.Shocks Shocksininthe thesynthetic synthetic signal Figure ExampleofofHilbert-Huang Hilbert-Huang Transform Transform (HHT) signal areare in in yellow, presentedin inIMF IMFnumbers numbers1,1,5,5,and and and instantaneous yellow,and andfor forillustration illustrationpurposes, purposes, is only presented 9, 9, and instantaneous amplitudeand andfrequency frequencyfunctions functions of of IMF number amplitude number5.5.

2.3.6. Detection Enhancement Algorithm The detections made by the selected classifiers are discrete and independent of the predictors’ order. In other words, a classification at one time is not affected by and does not affect previous and future classifications. Therefore, the causality of the shock responses is only considered in the predictor computations and not in the classification algorithms. This means that the shock detections can be scattered, rather than continuous. For instance, the shock buried in the signal in Figure 8 between 48 s and 48.5 s can be detected in two distinct segments. These incomplete detections

Figure 7. Example of Hilbert-Huang Transform (HHT) analysis. Shocks in the synthetic signal are in yellow, Vibration 2018, 1and for illustration purposes, is only presented in IMF numbers 1, 5, and 9, and instantaneous 228 amplitude and frequency functions of IMF number 5.

2.3.6. 2.3.6. Detection Enhancement Enhancement Algorithm Algorithm The detections made by the selected classifiers are discrete and independent of the predictors’ predictors’ order. order. In In other other words, words, aa classification classification at at one one time time is is not not affected affected by by and and does does not not affect affect previous previous and and future thethe causality of the responses is only is considered in the predictor future classifications. classifications.Therefore, Therefore, causality ofshock the shock responses only considered in the computations and not inand thenot classification algorithms. This means the that shock can be predictor computations in the classification algorithms. Thisthat means thedetections shock detections scattered, rather than continuous. For instance, the shock the signal in signal Figurein 8 between can be scattered, rather than continuous. For instance, theburied shockinburied in the Figure 8 48 s and 48.5 can be detected in two distinct segments. incomplete decrease the between 48 ss and 48.5 s can be detected in two distinctThese segments. Thesedetections incomplete detections classifier’s accuracy. decrease the classifier’s accuracy. An developed to enhance the continuity of the detections for RVVfor analysis An algorithm algorithmwas wasspecially specially developed to enhance the continuity of the detections RVV and shock detection. The detection enhancement algorithmalgorithm extends the detection sequencessequences to ensure analysis and shock detection. The detection enhancement extends the detection they have at least the at same as duration the longest response function (in this specific to ensure they have leastduration the same as impulse the longest impulse response function (incase, this 1.4 s) which the s) period natural frequency of the frequency vehicle shock response. is more than specific case,is 1.4 whichofisthe thefirst period of the first natural of the vehicleThis shock response. the of than the natural frequency the two-degree-of-freedom vehicle model usedvehicle to synthesize Thisperiod is more the period of the of natural frequency of the two-degree-of-freedom model the signal. used to synthesize the signal.

Figure 8. 8. Typical discontinuous shock shock detection detection of of sample sample signal, signal, grouped grouped into into three three continuous continuous Figure Typical discontinuous sequences (A–C). (A–C). sequences

The algorithm is an iterative process, described in Figure 9. It starts at the detection point with The algorithm is an iterative process, described in Figure 9. It starts at the detection point with the maximum absolute value of the signal, and then creates a 1.4 s window which corresponds to the maximum absolute value of the signal, and then creates a 1.4 s window which corresponds to approximately 2.5 times the period of the first natural frequency of the vehicle transfer function used approximately 2.5 times the period of the first natural frequency of the vehicle transfer function used in the model synthesis. The starting point of the window is positioned 0.28 s before the absolute in the model synthesis. The starting point of the window is positioned 0.28 s before the absolute maximum. This places the absolute maximum at 20% of the window length, which places the window at approximately the beginning of the vehicle shock response. The algorithm interprets all the data points in this window as a shock if at least 10% of these data points were classified as shock by the classifier, as shown in Figure 9a,b. In this case, the window points and the continuous detections adjacent to it are considered as a shock segment. This 10% overlap criterion is based on an arbitrary value which only affects the classifier’s operating point threshold (see next section). If the 10% detection overlap criterion is negative, the window is classified as no-shock, and the detection points made by the classifier are voided, as shown at Figure 9b,c; this concludes the first iteration. The algorithm continues this iterative process by finding the next maximum absolute detection acceleration, excluding the points from the segment created in the previous iterations. The windowing process is then performed, and the 10% detection overlap criterion is applied using all the data points, including those from the previous iterations.

detection overlap criterion is negative, the window is classified as no-shock, and the detection points made by the classifier are voided, as shown at Figure 9b,c; this concludes the first iteration. The algorithm continues this iterative process by finding the next maximum absolute detection acceleration, excluding the points from the segment created in the previous iterations. The windowing all Vibration 2018, 1process is then performed, and the 10% detection overlap criterion is applied using 229 the data points, including those from the previous iterations.

Figure 9. 9. Detection Detection enhancement enhancement of of aa sample sample signal: signal: first iteration (a) and second second iteration iteration (b) (b) of of the the Figure first iteration (a) and extension algorithm, algorithm, and and (c) (c) final final result. result. extension

3. Classifier Classifier Evaluation Evaluation 3. Classifier validation validationisisan anessential essentialstep step machine learning; it assesses detection accuracy Classifier inin machine learning; it assesses the the detection accuracy and and calibrates the detection sensitivity of the algorithm. Depending on the application, calibrates the detection sensitivity of the algorithm. Depending on the application, differentdifferent criteria criteria can be used to evaluate performances. The following sections show that classical can be used to evaluate detectiondetection performances. The following sections show that classical evaluation evaluation methods are not ideally suited to validation and calibration of the shock detection methods are not ideally suited to validation and calibration of the shock detection algorithm, creating algorithm, need to develop moremethods. specific evaluation methods. the need tocreating developthe more specific evaluation 3.1. Classical Classical Detection Detection Assessment Assessment 3.1. Classifiers base statistic which is calculated fromfrom predictors. The Classifiers base their theirprediction predictiononona sufficient a sufficient statistic which is calculated predictors. sufficient statistic is compared to a threshold value to attribute a class to a data point. This can The sufficient statistic is compared to a threshold value to attribute a class to a data point. This can be be seen as as the the probability probability that that aa data data point point belongs belongs to to one one class class over over another. another. For For various various reasons, reasons, the the seen classification is is not not necessarily necessarilymade madewhen whenthere thereisismore morethan thanaa50% 50%chance chancethat thata adata datapoint pointisisfrom from classification a a specific class—other probabilities or threshold values are often used. Changing this threshold varies specific class—other probabilities or threshold values are often used. Changing this threshold varies the amount amount of of true true detections, detections, misdetections, misdetections, and and false false detections, detections, which which can can be be assessed assessed in in terms terms of of the sensitivity and and specificity. specificity. sensitivity In this this context, context, the the usual usual definition definition of of sensitivity sensitivity is is the the proportion proportion of of shock shock data data points points in in the the In signal that thatare arecorrectly correctly classified as such. However, the purposes of RVV signalsensitivity analysis, signal classified as such. However, for thefor purposes of RVV signal analysis, sensitivity can be assessed using a modified definition. The shocks can be considered as a sequence can be assessed using a modified definition. The shocks can be considered as a sequence of data points of data points insteaddata of individual data points.this, Considering this,of thesensitivity definitionbecomes of sensitivity becomes instead of individual points. Considering the definition the number of the number of sequences true detection over theoftotal number of shocks. true detection oversequences the total number shocks. Sensitivity =

Number of True Detections Number of Shocks

(7)

where a detection has to overlap at least 75% of the shock duration to be considered true. This 75% overlap is an arbitrary value, ensuring that an significant proportion of the shock is detected. Converse to sensitivity, the specificity is the proportion of the signal without shock that is correctly classified as such. This can be considered as the accuracy of the prediction of the duration of the signal without shock. The definition of the specificity is not modified, and is considered to be the number of true nondetection data points over the number of data points without transients. Speci f icity =

Number of Nondetection Data Points Number of Data Points without Shocks

(8)

Vibration 2018, 1

230

The sensitivity and specificity are specific to the operating point—that is, the threshold value used for the detection. Using a low detection threshold increases the sensitivity to the detriment of the specificity, and vice versa. This relationship is represented by the ROC (Receiver Operating Characteristic) curve, which displays the sensitivity as a function of the fall-out (i.e., probability of a false detection, which is equal to 1—specificity). A simple method which can be used to assess and compare classifiers’ performance is to calculate the area under the ROC curve (AUC) [42,43]. The ideal classifier has an AUC of 1, and a classifier based on chance has an area of 0.5. This means that a classifier with an AUC above 0.5 has better prediction performance than chance. 3.1.1. Optimal Operation Point The AUC value is a convenient accuracy measurand because it integrates both the classifier sensitivity and specificity dependence in a single value. However, this simplicity is also a significant shortcoming, because it can lead to false comparisons [44]. The AUC gives the ROC’s global performance but does not account for its shape or any local features. For instance, two classifiers can have the same AUC but significantly different ROC shapes, as shown in Figure 10. Classifier A has better sensitivity at a low fall-out value, but reaches a 100% detection rate almost at 100% fall-out. On the other hand, classifier B has poor sensitivity at low fall-out, but surpasses classifier A’s sensitivity for any fall-out value over 0.23. Thus, classifier A is better for an application where a low false-detection rate is required, and classifier B is better for an application where a higher detection rate is more Vibration 2018, 2, x FOR PEER REVIEW 12 of 19 important, regardless of false detections.

Figure 10. Receiver Operating Characteristic (ROC) curve of two classifiers with equal area-underFigure 10. Receiver Operating Characteristic (ROC) curve of two classifiers with equal area-underthe-ROC-curve (AUC) values (0.80). the-ROC-curve (AUC) values (0.80).

The Neyman-Pearson criterion finds the OOP without knowing the a priori probability and This leads to anany important detection theory: the shock optimal operation (OOP). without attributing decision concept cost [45], in which is the case for RVV detection. This point criterion is Classifiers give the probability that a class exists. However, even if the probability is more than 50%, fairly simple; it determines the OOP by fixing the fall-out to a certain level, and by using the point it does not the necessarily mean thanthis thislevel. classLimiting should be detected. One could assume where ROC curve crosses theconsidered maximum level of false detection (fall-out)that can1% fall-out is very good, but not if it is used to detect very unlikely events, as 1% of samples will be be seen as fixing the significance level of the detection. For example, for a 10% significance level false or positive. is therefore toROC find the optimal between theasfall-out andat the sensitivity fall-out,It the operationimperative point on the is where the trade-off fall-out equals to 0.1, presented Figure 11. for a specific classifier and application. This trade-off is known as the OOP, and can be found on the ROC curve using the Neyman-Pearson criterion. The Neyman-Pearson criterion finds the OOP without knowing the a priori probability and without attributing any decision cost [45], which is the case for RVV shock detection. This criterion is fairly simple; it determines the OOP by fixing the fall-out to a certain level, and by using the point where the ROC curve crosses this level. Limiting the maximum level of false detection (fall-out) can be seen as fixing the significance level of the detection. For example, for a 10% significance level or fall-out, the operation point on the ROC is where the fall-out equals to 0.1, as presented at Figure 11.

The Neyman-Pearson criterion finds the OOP without knowing the a priori probability and without attributing any decision cost [45], which is the case for RVV shock detection. This criterion is fairly simple; it determines the OOP by fixing the fall-out to a certain level, and by using the point where the ROC curve crosses this level. Limiting the maximum level of false detection (fall-out) can be seen as1 fixing the significance level of the detection. For example, for a 10% significance level or231 Vibration 2018, fall-out, the operation point on the ROC is where the fall-out equals to 0.1, as presented at Figure 11.

Figure 11. ROC curves and optimal operation point (OOP) of the selected classifiers. Figure 11. ROC curves and optimal operation point (OOP) of the selected classifiers.

3.1.2. Neyman-Pearson Maximum Amplitude Distribution 3.1.2. Neyman-Pearson Maximum Amplitude Distribution The ROC curve and OOP coordinates give a good indication of the true and false detection rates The ROC curve and OOP coordinates give a good indication of the true and false detection rates of the of the classifiers; however, they do not provide any insight into the quality of their detections. RVV classifiers; however, they do not on provide any insight into the quality of their As detections. RVV simulation simulation accuracy depends the characteristics of each of its modes. previously explained, accuracy depends on the characteristics of each of its modes. As previously explained, once the once the shocks are detected in a RVV signal, they can be extracted independently. The othershocks modesare detected in be a RVV signal,characterized they can be extracted independently. The other can then be affected separately can then separately in the remaining signal without theirmodes parameters being characterized in the remaining signal without their parameters being affected by the shocks. by the shocks. The main was the themaximum maximumabsolute absoluteamplitude amplitude which The mainparameter parameterused usedto tocharacterize characterize shocks shocks was which defines thethe severity detectionquality qualitycan canbebeassessed assessed comparing defines severityofofthe theshocks. shocks.The The classifiers’ classifiers’ detection byby comparing thethe maximum acceleration for the the real realshocks shocksand andthe thedetections detections (true and maximum acceleration distributions for (true and false Vibration 2018,absolute 2,absolute x FOR PEER REVIEW distributions 13false of 19 detections) using the OOP defined by the Neyman-Pearson criterion (Figure 12). These distributions detections) using the OOP defined by the Neyman-Pearson criterion (Figure 12). These distributions distribution) and each shock detection (detection distribution). Comparing the real and the detected represent maximum absolute acceleration values of superimposed onon the signal (real represent thethe maximum absolute acceleration values of each eachshock shock superimposed the signal (real shock-amplitude distributions gives more insight about the classifiers’ quality than the distribution) and each shock detection (detection distribution). Comparingdetection the real and the detected sensitivity and fall-out values, because it indicates which shockdetection intensities are than morethe likely to be shock-amplitude distributions gives more insight about the classifiers’ quality sensitivity misclassified. The RMS valueitofindicates the signalwhich is shown in Figure 12 toare givemore an indication shock and fall-out values, because shock intensities likely to of be relative misclassified. intensities. The RMS value of the signal is shown in Figure 12 to give an indication of relative shock intensities.

Figure 12. acceleration distributions for for all the classifiers usingusing the OOP, Figure 12. Maximum Maximumabsolute absolute acceleration distributions all selected the selected classifiers the defined by the criteria; σ50%σ, and σ95% are,are, respectively, the median and 95th OOP, defined by Neyman-Pearson the Neyman-Pearson criteria; , and σ respectively, the median and 95th 50% 95% percentile root root mean mean square square (RMS) (RMS) value value of of the the stationary stationary Gaussian Gaussian random random segments segments comprised comprised in in percentile the signal. the signal.

The significance of the shock amplitude is relative to the intensity of the signal. As presented in Figure 13, shocks with a maximum absolute amplitude which are inferior to four times the RMS value (standard deviation, σ, for zero mean signals) of the underlying random Gaussian signal are barely noticeable and have little effect on RVV. As RVV are nonstationary signals, the intensity cannot be

Figure 12. Maximum absolute acceleration distributions for all the selected classifiers using the OOP, defined by the Neyman-Pearson criteria; σ50%, and σ95% are, respectively, the median and 95th percentile Vibration 2018, 1 root mean square (RMS) value of the stationary Gaussian random segments comprised in the signal.

232

The significance of the shock amplitude is relative to the intensity of the signal. As presented in maximum absolute amplitude which are inferior to four times the RMS RMS value value Figure 13, shocks with a maximum deviation, σ, σ, for for zero mean signals) signals) of the underlying underlying random random Gaussian Gaussian signal signal are are barely barely (standard deviation, noticeable and and have have little little effect effect on on RVV. RVV. As RVV are nonstationary noticeable nonstationary signals, the intensity cannot be assessed with RMS value, but but can be with the distribution RMS value of the stationary withaasingle single RMS value, canassessed be assessed with the distribution RMS value of the Gaussian random segments compose ofcompose the signal. to visualize the relative the importance stationary Gaussian randomwhich segments which of In theorder signal. In order to visualize relative of the shocksofburied in the RVV signal, theRVV median, σ50% , and the 95th percentile, valuesσand importance the shocks buried in the signal, the median, σ50% , and the σ95th percentile, 95%, 95% , RMS four times their respective values, presented the, are shock amplitude RMS values and four times their4σrespective 50% and on 4σ95% presented ondistribution the shock 50% and 4σvalues, 95% , are 4σ and misdetection/over-detection plots. amplitude distribution and misdetection/over-detection plots.

Figure underlying random Figure 13. 13. Maximum Maximum absolute absolute significance significance for for shock, shock, relative relative to to the the underlying random Gaussian Gaussian signal RMS value (σ). signal RMS value (σ).

3.2. 3.2. Pseudo-Energy Pseudo-Energy Ratio/Fall-Out Ratio/Fall-Out (PERFO) (PERFO) Curve Curve A A comparison comparison of of the the distribution distribution of of the the maximum maximum absolute absolute accelerations accelerations of of the the real real shocks shocks and and detections reveals much detail on the classifiers’ accuracy and detection quality. However, this is detections reveals much detail on the classifiers’ accuracy and detection quality. However, thisnot is not a convenient approach, as the distribution comparison is qualitative, rather than quantitative. To improve this, the detections’ maximum absolute acceleration correctness can be reduced to a single number. Since the importance of detecting shocks increases with their amplitude, the classifiers’ accuracy and quality can be assessed as being the ratio between the detection and the shocks’ maximum absolute acceleration pseudo-energy, calculated as: R=

∑[Detections’ Max Absolute Acceleration] 2 ∑[Shocks’ Max Absolute Acceleration]

2

(9)

As for sensitivity, the pseudo-energy ratio depends on the classifier’s detection threshold and directly affects the fall-out level. The relationship between the pseudo-energy ratio and the fall-out is presented in Figure 14. Contrary to the ROC curve, the Pseudo-Energy Ratio/Fall-Out (PERFO) curve does not end at the coordinate (1, 1) and is not monotonic. This is because reducing the detection threshold increases the length of signal segments considered as shocks. As the segments become longer, they may include more than one shock. However, only the absolute acceleration of each segment is considered in the PERFO calculation, such that only the shock with the highest amplitude is assessed. Therefore, the larger the detection segments become, the more shocks are potentially left out of this analysis, which explains why the pseudo-energy ratio drops after a certain point. It can be seen in Figure 14 that this drop appears for fall-out from between 0.3 and 0.5, depending on the classifier. As the classifiers operate under this fall-out level, it does not affect the current analysis.

longer, they may include more than one shock. However, only the absolute acceleration of each segment is considered in the PERFO calculation, such that only the shock with the highest amplitude is assessed. Therefore, the larger the detection segments become, the more shocks are potentially left out of this analysis, which explains why the pseudo-energy ratio drops after a certain point. It can be seen in Figure 14 that this drop appears for fall-out from between 0.3 and 0.5, depending on the Vibration 2018, 1 233 classifier. As the classifiers operate under this fall-out level, it does not affect the current analysis.

Figure 14. Pseudo-Energy Ratio/Fall-Out (PERFO) curve for the selected classifiers. Figure 14. Pseudo-Energy Ratio/Fall-Out (PERFO) curve for the selected classifiers.

The detection quality can be assessed by observing where the PERFO curve has a pseudo-energy detection can the be assessed bypoint observing the PERFO curve hasdetection a pseudo-energy ratio The of one, whichquality represents operation wherewhere the pseudo-energy of the and the ratio of one, which represents the operation point where the pseudo-energy of the detection and actual shocks in the signal are equal. Since the pseudo-energy is calculated from the maximum the actual shocks amplitude, in the signalthe arelower equal.the Since the pseudo-energy calculated fromhigh-amplitude the maximum squared absolute fall-out value is at this is point, the fewer squared absolute amplitude, the lower the fall-out value is at this point, the fewer high-amplitude false detections are present in the classification. Hence, the PERFO curve measures how accurately falsecorrect detections are present in the classification. Hence, the the amount of pseudo-shock energy is detected in aPERFO signal.curve measures how accurately the correct amount of pseudo-shock energy is detected in a signal. PERFO Maximum Amplitude Distribution PERFO Maximum Amplitude Distribution The PERFO criterion is designed to give a more accurate maximum amplitude detection The PERFO criterion is designed to give a more accurate maximum amplitude detection distribution than the “classical” OOP criteria. This improvement can be observed in Figure 15, where distribution than the “classical” OOP criteria. This improvement can be observed in Figure 15, the detection distribution of each classifier follows the real shock distribution more closely when where the detection distribution of each classifier follows the real shock distribution more closely when compared to the results shown in Figure 12. Vibration 2018, x FOR PEER shown REVIEWin Figure 12. 15 of 19 compared to2,the results

15.Maximum Maximumabsolute absolute acceleration distributions forthe allselected the selected classifiers, using the Figure 15. acceleration distributions for all classifiers, using the PERFO PERFO criterion; , and are, respectively, the and median 95th percentile value of the criterion; σ50% , andσ50% σ95% are,σ95% respectively, the median 95th and percentile RMS valueRMS of the stationary stationaryrandom Gaussian randomcomprised segments comprised in the signal. Gaussian segments in the signal.

3.3. Comparison Comparison of of Evaluation Evaluation Measurands Measurands 3.3. Three evaluation evaluation measurands measurands were were presented presented to to validate Three validate and and calibrate calibrate the the RVV RVV shock shock classifiers. classifiers. The AUC AUC gives gives aa global global appreciation appreciation of of the the classifiers’ classifiers’ performances; performances; the the greater greater the the AUC, AUC, the the more more The predictive power the classifier has, and a value of 0.5 is equivalent to guessing the signal class. The predictive power the classifier has, and a value of 0.5 is equivalent to guessing the signal class. limitation of this measurand is its independence to the detection threshold. To overcome this, the position of the OOP on the ROC curve can be used as an evaluation measurand. The classifiers’ sensitivity at 10% fall-out (Neyman-Pearson criterion) gives a more specific evaluation for the application. The quality of the detection can also be used to refine the position of the OOP, using the PERFO curve. The advantage of the PERFO criterion is that it ensures the detection has the same

Vibration 2018, 1

234

The limitation of this measurand is its independence to the detection threshold. To overcome this, the position of the OOP on the ROC curve can be used as an evaluation measurand. The classifiers’ sensitivity at 10% fall-out (Neyman-Pearson criterion) gives a more specific evaluation for the application. The quality of the detection can also be used to refine the position of the OOP, using the PERFO curve. The advantage of the PERFO criterion is that it ensures the detection has the same peak-value pseudo-energy than the shocks present in the signal. A low fall-out value at a pseudo-energy ratio of one on the PERFO curve indicates that there was only a small number of false detections. The results of each evaluation measurand is given in Table 1. Table 1. Comparison of classifier evaluation measurands, such as: the AUC; the resulting Neyman-Pearson OOP sensitivity for a 10% fall-out; and the resulting PERFO criterion OOP fall-out. Classifier

AUC

Neyman-Pearson

PERFO

Decision Tree 100NN Bagged Ensemble SVM

0.81 0.85 0.86 0.82

0.45 0.61 0.65 0.54

0.08 0.03 0.04 0.07

According to the AUC and the Neyman-Pearson criterion, the Bagged Ensemble is a more accurate classifier, followed by the 100NN. The PERFO criterion, however, ranked the 100NN first, and the Bagged Ensemble second. The ranking of the SVM and Decision Tree remained third and fourth, respectively, with all the measurands. For shock-detection applications, the PERFO ranking is more accurate as it defines a more relevant OOP. This can be observed by comparing the maximum amplitude distributions for detection made at both OOPs (Figure 12 for Neyman-Pearson, and Figure 15 for PERFO). The shape of the distributions makes the analysis of the high-acceleration region difficult, such as where it matters the most. This is because the number of high-amplitude shocks is too small in comparison to the total amount of shocks, and the discrepancies within the classifiers’ distribution cannot be easily seen in that region. To facilitate the comparison, the error between the detection and the real distributions were presented on the misdetection/over-detection graphs (Figure 16). These graphs show the average difference between the number of detections and the real shocks distributed into five bins. When a bin has more detections than real shocks, the over-detection rate is calculated by dividing the difference by the number of detections. On the other hand, when a bin has more shocks than detections, the misdetection rate is calculated by dividing the difference by the number of real shocks. Lower values for both the over-detection and misdetection rate indicate better classification performance. As seen in Figure 16, the Neyman-Pearson OOP over-detected below 4σ95% (≈19 m/ss ) for all the classifiers. Above 4σ95% , this OOP made for perfect detection (0 over-detection and misdetection rates) for the 100NN and the Bagged Ensemble algorithms. The Decision Tree and SVM algorithms misdetected 40% and 30% of the shocks, respectively. For most of the classifiers, the PERFO criterion defined a better OOP. For low-amplitude shocks (below ≈ 2σ95% or 10 m/ss ), the classifiers misdetected shocks rather than over-detecting them, as with the Neyman-Pearson. Above this amplitude, most of the PERFO distribution errors improved, and the Decision Tree, 100NN, and Bagged Ensemble had lower misdetection and over-detection rates. Only the SVM detected less shocks (for both low- and high-amplitude) with the PERFO criterion than with the Neyman-Pearson criterion.

misdetected 40% and 30% of the shocks, respectively. For most of the classifiers, the PERFO criterion defined a better OOP. For low-amplitude shocks (below ≈ 2σ95% or 10 m/ss), the classifiers misdetected shocks rather than over-detecting them, as with the Neyman-Pearson. Above this amplitude, most of the PERFO distribution errors improved, and the Decision Tree, 100NN, and Bagged Ensemble had lower and over-detection rates. Only the SVM detected less shocks (for both low- and Vibrationmisdetection 2018, 1 235 high-amplitude) with the PERFO criterion than with the Neyman-Pearson criterion.

Figure 16. graph based on on thethe classifiers’ distribution for both OOPs. σ50% 16. Misdetection/over-detection Misdetection/over-detection graph based classifiers’ distribution for both OOPs. and 95% are, the median andand 95th percentile RMS value σ50%σand σ95%respectively, are, respectively, the median 95th percentile RMS valueofofthe thestationary stationary Gaussian Gaussian random segments composing the signal.

4. 4. Summary Summary of of Discussion Discussion The The Neyman-Pearson Neyman-Pearson criterion criterion effectively effectively defines defines classifiers’ classifiers’ OOP OOP when when all all the the detections detections have have the the same same importance. importance. However, However, for for applications applications such such as as the the detection detection of of shocks shocks buried buried in in RVV RVV signals, especially for protective packaging optimization, not all shocks have the same importance, signals, especially for protective packaging optimization, not all shocks have the same importance, as as high-amplitude high-amplitude shocks shocks create create more more damage damage to to freight. freight. For For this this application, application, the the PERFO PERFO criterion criterion gave ensuring that gave aa more more appropriate appropriate OOP, OOP, ensuring that the the detection detection amplitude amplitude distribution distribution followed followed the the shock shock amplitude distribution more closely. For the four selected classifiers, the PERFO criterion reduced amplitude distribution more closely. For the four selected classifiers, the PERFO criterion reduced the the over-detection especially for low-amplitude compromising the highover-detection rate, rate, especially for low-amplitude shocks, shocks, without without compromising the high-amplitude amplitude detection (except for the SVM showing a high misdetection rate with the PERFO criterion). detection (except for the SVM showing a high misdetection rate with the PERFO criterion). For protective packaging packagingoptimization optimizationpurposes, purposes,this this means that a simulation performed with For protective means that a simulation performed with the the PERFO detection would a more accurate relative proportion lowand high-amplitude PERFO detection would havehave a more accurate relative proportion of low-ofand high-amplitude shocks. shocks. Considering that significant shocks have maximum above 95%, the Bagged Considering that significant shocks have maximum amplitudeamplitude above 4σ95% , the 4σ Bagged Ensemble Ensemble classification algorithm seems best-suited for the application as it has the best performance classification algorithm seems best-suited for the application as it has the best performance above this above this value 16). This conclusion should,be however, bewith validated with real vehicle data. value (Figure 16).(Figure This conclusion should, however, validated real vehicle data. The classification algorithms are based on many parameters, such as the number of tests for the The classification algorithms are based on many parameters, such as the number of tests for Decision TreeTree or the type of of distance and number ofofnearest the Decision or the type distance and number nearestneighbors neighborsfor forthe thekNN. kNN. Classifiers’ Classifiers’ parameters parameters can can be be optimized optimized using using the the PERFO PERFO OOP OOP fall-out. fall-out. An An optimization optimization based based on on this this single single value ensures that classifiers provide their best shock-detection performance. value ensures that classifiers provide their best shock-detection performance. The adapted to other classification classification applications The PERFO PERFO curve curve and and criterion criterion can can also also be be adapted to other applications where where not all detections/misdetections have the same importance, e.g., structural damage detection not all detections/misdetections have the same importance, e.g., structural damage detection on on aa structure, or economic fraud detection. Depending on the application, the square exponent in the pseudo-energy calculation can be replaced by another exponent, which will modulate the importance of high-value detection. 5. Conclusions This paper has shown that machine learning classification algorithms can be used to detect shocks present in RVV signals. The performance of different algorithms (Decision Tree, 100NN, Bagged Ensemble, and SVM) were assessed using various methods. Considering that high-amplitude shocks are more important, the PERFO curve and the resulting OOP were shown to be better performance indicators than the classic assessment tools, such as the area under the ROC curve (AUC) and the Neyman-Pearson OOP sensitivity, because they give an appreciation of the detection quality.

Vibration 2018, 1

236

Author Contributions: Conceptualization, J.L. and V.R.; methodology, J.L.; software, J.L.; validation, J.L.; formal analysis, J.L. and V.R.; writing—original draft preparation, J.L.; writing—review and editing, V.R.; visualization, J.L.; supervision, V.R.; project administration, V.R. Funding: This research did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sectors. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

13.

14.

15. 16. 17. 18. 19. 20.

ASTM International. Test Method for Random Vibration Testing of Shipping Containers; ASTM-D4728; ASTM International: West Conshohocken, PA, USA, 2012. Department of Defense Test Method Standard. Environmental Test Methods and Engineering Guides; MIL-STD-810F; Department of Defense Test Method Standard: Columbus, OH, USA, 2000. BSI. Packaging. Complete, Filled Transport Packages and Unit Loads. Vertical Random Vibration Test; ISO-13355; BSI: London, UK, 2002; p. 16. ISTA. Test Series. Available online: https://www.ista.org/pages/procedures/ista-procedures.php (accessed on 5 October 2016). Charles, D. Derivation of Environment Descriptions and Test Severities from Measured Road Transportation Data. J. IES 1993, 36, 37–42. Rouillard, V. On the Synthesis of Non-Gaussian Road Vehicle Vibrations. Ph.D. Thesis, Monash University, Melbourn, Australia, 2007. Kipp, W.I. Random Vibration Testing of Packaged-Products: Considerations for Methodology Improvement. In Proceedings of the 16th IAPRI World Conference on Packaging, Bangkok, Thailand, 8–12 June 2008. Rouillard, V.; Sek, M. Creating Transport Vibration Simulation Profiles from Vehicle and Road Characteristics. Packag. Technol. Sci. 2013, 26, 82–95. [CrossRef] Rouillard, V. Generating Road Vibration Test Schedules from Pavement Profiles for Packaging Optimization. Packag. Technol. Sci. 2008, 21, 501–514. [CrossRef] Rouillard, V.; Bruscella, B.; Sek, M. Classification of road surface profiles. J. Transp. Eng. 2000, 126, 41–45. [CrossRef] Bruscella, B.; Rouillard, V.; Sek, M. Analysis of road surface profiles. J. Transp. Eng. 1999, 125, 55–59. [CrossRef] Bruscella, B. The Analysis and Simulation of the Spectral and Statistical Properties of Road Roughness for Package Performance Testing; Department of Mechanical Engineering, Victoria University of Technology: Melbourne, Australia, 1997. Garcia-Romeu-Martinez, M.-A.; Rouillard, V.; Cloquell-Ballester, V.-A. A Model for the Statistical Distribution of Road Vehicle Vibrations. In World Congress on Engineering 2007: WCE 2007; Imperial College London: London, UK, 2007; pp. 1225–1230. Garcia-Romeu-Martinez, M.-A.; Singh, S.P.; Cloquell-Ballester, V.-A. Measurement and analysis of vibration levels for truck transport in Spain as a function of payload, suspension and speed. Packag. Technol. Sci. 2008, 21, 439–451. [CrossRef] Rouillard, V.; Sek, M.A. A statistical model for longitudinal road topography. Road Transp. Res. 2002, 11, 17–23. Rouillard, V. Quantifying the Non-stationarity of Vehicle Vibrations with the Run Test. Packag. Technol. Sci. 2014, 27, 203–219. [CrossRef] Rouillard, V.; Sek, M.A. Monitoring and simulating non-stationary vibrations for package optimization. Packag. Technol. Sci. 2000, 13, 149–156. [CrossRef] Rouillard, V.; Sek, M.A. Synthesizing Nonstationary, Non-Gaussian Random Vibrations. Packag. Technol. Sci. 2010, 23, 423–439. [CrossRef] Steinwolf, A.; Connon, W.H., III. Limitations of the Fourier transform for describing test course profiles. Sound Vib. 2005, 39, 12–17. Wei, L.; Fwa, T.; Zhe, Z. Wavelet analysis and interpretation of road roughness. J. Transp. Eng. Asce 2005, 131, 120–130. [CrossRef]

Vibration 2018, 1

21. 22.

23. 24. 25.

26. 27.

28. 29. 30. 31.

32. 33. 34.

35. 36. 37. 38. 39. 40. 41.

42. 43.

237

Nei, D.; Nakamura, N.; Roy, P.; Orikasa, T.; Ishikawa, Y.; Kitazawa, H.; Shiina, T. Wavelet analysis of shock and vibration on the truck bed. Packag. Technol. Sci. 2008, 21, 491–499. [CrossRef] Lepine, J.; Rouillard, V.; Sek, M. Wavelet Transform to Index Road Vehicle Vibration Mixed Mode Signals. In Proceedings of the ASME 2015 Noise Control and Acoustics Division Conference at InterNoise 2015, San Francisco, CA, USA, 9–12 August 2015. Ayenu-Prah, A.; Attoh-Okine, N. Comparative study of Hilbert–Huang transform, Fourier transform and wavelet transform in pavement profile analysis. Veh. Syst. Dyn. 2009, 47, 437–456. [CrossRef] Griffiths, K.R. An Improved Method for Simulation of Vehicle Vibration Using a Journey Database and Wavelet Analysis for the Pre-Distribution Testing of Packaging. Ph.D. Thesis, University of Bath, Bath, UK, 2012. Lepine, J.; Rouillard, V.; Sek, M. Using the Hilbert-Huang transform to identify harmonics and transients within random signals. In Proceedings of the 8th Australasian Congress on Applied Mechanics: ACAM 8, Melbourne, Australia, 25–26 November 2014. Mao, C.; Jiang, Y.; Wang, D.; Chen, X.; Tao, J. Modeling and simulation of non-stationary vehicle vibration signals based on Hilbert spectrum. Mech. Syst. Signal Process. 2015, 50, 56–69. [CrossRef] Lepine, J.; Sek, M.; Rouillard, V. Mixed-Mode Signal Detection of Road Vehicle Vibration Using Hilbert-Huang Transform. In Proceedings of the ASME 2015 Noise Control and Acoustics Division Conference at InterNoise 2015, San Francisco, CA, USA, 9–12 August 2015. Lepine, J.; Rouillard, V.; Sek, M. Evaluation of machine learning algorithms for detection of road induced shocks buried in vehicle vibration signals. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2018. [CrossRef] Rouillard, V.; Lamb, M. On the Effects of Sampling Parameters When Surveying Distribution Vibrations. Packag. Technol. Sci. 2008, 21, 467–477. [CrossRef] Lepine, J.; Rouillard, V.; Sek, M. On the Use of Machine Learning to Detect Shocks in Road Vehicle Vibration Signals. Packag. Technol. Sci. 2016, 30, 387–398. [CrossRef] Chonhenchob, V.; Singh, S.P.; Singh, J.J.; Stallings, J.; Grewal, G. Measurement and Analysis of Vehicle Vibration for Delivering Packages in Small-Sized and Medium-Sized Trucks and Automobiles. Packag. Technol. Sci. 2012, 25, 31–38. [CrossRef] Wallin, B. Developing a Random Vibration Profile Standard; International Safe Transit Association: East Lansing, MI, USA, 2007. Rouillard, V.; Richmond, R. A novel approach to analysing and simulating railcar shock and vibrations. Packag. Technol. Sci. 2007, 20, 17–26. [CrossRef] Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [CrossRef] Lepine, J. An Optimised Machine Learning Algorithm for Detecting Shocks in Road Vehicle Vibration. Ph.D. Thesis, College of Engineering and Science, Victoria University, Melbourne, Australia, 2016. Hastie, T.; Tibshirani, R.; Friedman, J.; Franklin, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction; Springer: New York, NY, USA, 2005; Volume 27. Rogers, S.; Girolami, M. A First Course in Machine Learning; CRC Press: Boca Raton, FL, USA, 2011. Cherkassky, V.; Mulier, F.M. Learning from Data: Concepts, Theory, and Methods; John Wiley & Sons: Hoboken, NJ, USA, 2007. Shalev-Shwartz, S.; Ben-David, S. Understanding Machine Learning: From Theory to Algorithms; Cambridge University Press: Cambridge, UK, 2014. Cebon, D. Handbook of Vehicle-Road Interaction/David Cebon: Lisse; Swets & Zeitlinger: Abingdon, UK, 1999. Huang, N.E.; Wu, M.-L.C.; Long, S.R.; Shen, S.S.; Qu, W.; Gloersen, P.; Fan, K.L. A confidence limit for the empirical mode decomposition and Hilbert spectral analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 2003, 459, 2317–2345. [CrossRef] Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 1997, 30, 1145–1159. [CrossRef] Ling, C.X.; Huang, J.; Zhang, H. AUC: A better measure than accuracy in comparing learning algorithms. In Advances in Artificial Intelligence; Springer: Berlin, Germany, 2003; pp. 329–341.

Vibration 2018, 1

44. 45.

238

Powers, D.M.W. The problem of Area Under the Curve. In Proceedings of the 2012 IEEE International Conference on Information Science and Technology, Wuhan, China, 23–25 March 2012; pp. 567–573. Lehmann, E. Introduction to Neyman and Pearson (1933) on the problem of the most efficient tests of statistical hypotheses. In Breakthroughs in Statistics; Springer: Berlin, Germany, 1992; pp. 67–72. © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Suggest Documents