between time histories in fields such as statistics, computational mechanics, signal pro- cessing, and ... such as the new car assessment program NCAP and the Insur- ance Institute of ...... mation in Engineering Conference, Las Vegas, NV.
H. Sarin M. Kokkolaras G. Hulbert P. Papalambros Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109-1316
S. Barbat R.-J. Yang Passive Safety, Research and Advanced Engineering, Ford Motor Company, Highland Park, MI 48203-3177
1
Comparing Time Histories for Validation of Simulation Models: Error Measures and Metrics Computer modeling and simulation are the cornerstones of product design and development in the automotive industry. Computer-aided engineering tools have improved to the extent that virtual testing may lead to significant reduction in prototype building and testing of vehicle designs. In order to make this a reality, we need to assess our confidence in the predictive capabilities of simulation models. As a first step in this direction, this paper deals with developing measures and a metric to compare time histories obtained from simulation model outputs and experimental tests. The focus of the work is on vehicle safety applications. We restrict attention to quantifying discrepancy between time histories as the latter constitute the predominant form of responses of interest in vehicle safety considerations. First, we evaluate popular measures used to quantify discrepancy between time histories in fields such as statistics, computational mechanics, signal processing, and data mining. Three independent error measures are proposed for vehicle safety applications, associated with three physically meaningful characteristics (phase, magnitude, and slope), which utilize norms, cross-correlation measures, and algorithms such as dynamic time warping to quantify discrepancies. A combined use of these three measures can serve as a metric that encapsulates the important aspects of time history comparison. It is also shown how these measures can be used in conjunction with ratings from subject matter experts to build regression-based validation metrics. 关DOI: 10.1115/1.4002478兴
Introduction
Automotive manufacturers have to meet several vehicle safety regulations and mandatory Federal Motor Vehicle Safety Standards 共FMVSS兲. Additionally, consumer information programs such as the new car assessment program 共NCAP兲 and the Insurance Institute of Highway Safety 共IIHS兲 impose further requirements on vehicle safety. Currently, assessment of whether these requirements are satisfied is conducted through numerous, costly and time-consuming physical experiments. Computer modeling and simulation-based methods for virtual vehicle safety analysis and design verification could make this process more time and cost efficient. Moreover, virtual testing 共VT兲 can improve real-world vehicle safety beyond regulatory requirements since computer predictions can be used to extend the range of protection to real-world crash conditions at speeds and configurations not addressed by current regulations. To achieve the promises of VT, computer predictions need verification and validation 共V&V兲, so that the designs obtained using simulation models can be cleared for production with minimal or reduced physical prototype testing. The American Institute of Aeronautics and Astronautics guide for verification and validation of computational fluid dynamics simulations defines verification and validation as follows 关1兴. “Verification is the process of determining that a model implementation accurately represents the developer’s conceptual description of the model and the solution to the model.” “Validation is the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model.” Contributed by the Dynamic Systems Division of ASME for publication in the JOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT, AND CONTROL. Manuscript received September 15, 2008; final manuscript received May 12, 2010; published online October 28, 2010. Assoc. Editor: Jeffrey L. Stein.
The American Society of Mechanical Engineers Standards Committee on verification and validation in computational solid mechanics describes model validation as a two-step process 关2兴: 1. quantitatively comparing the computational and experimental results for the response of interest 2. determining whether there is an acceptable agreement between the model and the experiment for the intended use of the model Oberkampf and Barone proposed in Ref. 关3兴 six properties that a validation metric should satisfy. These six properties form a generic guideline and act as a set of requirements for the development of a new validation metric. Their third property dictates that an effective metric for measuring the discrepancy between simulation model responses represented by time histories is necessary to accomplish the first step of the validation process. In this paper, we review existing error measures and metrics and discuss their advantages and limitations. We then propose a combination of measures associated with three physically meaningful error characteristics: phase, magnitude, and slope. The proposed approach utilizes measures such as cross-correlation and L1 norm and algorithms such as dynamic time warping 共DTW兲 to quantify the discrepancy between time histories. We then show how these measures can be used to build regression-based validation metrics in cases where subject matter expert data are available. It is important to note that four of the remaining five properties advocated by Oberkampf and Barone 关3兴 for useful validation metrics involve the uncertainties related to numerical error, experimental error, experiment postprocessing, and the number of experiments conducted. While these are critical issues to be addressed, they are not considered in this paper as the goal of this present work is to establish an appropriate set of error measures for vehicle safety applications and to assess combinations of these measures into an error metric. With an established set of error measures, the next step toward a fully developed validation metric is to use the error measures to provide the quantitative values for
Journal of Dynamic Systems, Measurement, and Control Copyright © 2010 by ASME
NOVEMBER 2010, Vol. 132 / 061401-1
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm
Table 1 Results for the L1, L2, and Lⴥ norms Norm L1 L2 L⬁
Test 1 and test 2
Test 1 and test 3
0.3 0.6 0.82
0.45 0.58 0.85
closer to time history test 1. These time histories are used to demonstrate that there is a need for objective metric共s兲 and that existing measures must be used appropriately. 2.1 Vector Norms. When time histories are discretized 共i.e., finite-dimensional兲, the most popular measure for quantifying their difference is to use vector norms. Assuming two time history vectors A and B of equal size N, the L p norm of the difference of the two is
冉兺 冊 N
储A − B储 p =
1/p
兩ai − bi兩 p
共1兲
i=1
The three most popular norms are L1, L2 共Euclidean兲, and L⬁. The results obtained when using these three norms for measuring the discrepancy between test 1 and test 2 and test 1 and test 3 are presented in Table 1 and confirm the known fact that norm choice may lead to different conclusions: one would conclude that test 2 is “closer” to test 1 when using the L1 and and L⬁ norms, while the use of the L2 norm would lead to the conclusion that test 3 is, in fact, closer to test 1. The major limitation of using norms 共and the reason of the illustrated differences兲 is that they are not capable of distinguishing error due to phase from error due to magnitude. Even with this limitation, norms form the foundation for quantifying discrepancy between time histories. 2.2 Average Residual and Its Standard Deviation. The average residual measures the mean difference between two time histories: N
Fig. 1 Time history examples
assessment under uncertainty. For example, the error measures proposed in this paper could be used in the Bayesian framework proposed by Rebba and Mahadevan 关4兴.
2
061401-2 / Vol. 132, NOVEMBER 2010
i
i=1
共2兲
N
A distinct disadvantage is that positive and negative differences at various points may cancel each other out. The standard deviation of residuals is defined as the square root of the sample variance of the residuals:
Review of Error Measures, Metrics, and Algorithms
In this section, we review popular measures, metrics, and algorithms used currently to quantify discrepancies between time histories in various fields such as voice, signature or pattern recognition, computational mechanics, data mining, and operations research. Of particular emphasis are their advantages and disadvantages in order to identify a set of measures, metrics, and algorithms that are best suited for vehicle safety applications. We provide references only for the less commonly used metrics. In this paper, a distinction is made between error measures and error metrics. An error measure provides a quantitative value associated with differences in a particular feature of time series. An error metric provides an overall quantitative value of the discrepancy between time series; it can be a single error measure or a combination of error measures. Typically, an error measure does not provide a complete perspective of time series differences to be used reliably as an error metric. We consider a simple example comprising time histories of the same physical measure obtained from three different tests. Time histories, “test 2” and “test 3,” are compared with time history “test 1” to determine which one has the smallest discrepancy and is thus the “better prediction” of test 1 共Fig. 1兲. The reader should not be biased as to which of the time histories 共test 2 or test 3兲 is
兺 共a − b 兲 i
¯R =
SN−1 =
冑
N
兺 共R − ¯R兲 i
2
i=1
共3兲
N−1
where Ri = 共ai − bi兲. The results for the time history examples shown in Fig. 1 are presented in Table 2. The results cannot lead to conclusive statements regarding which test 共2 or 3兲 is closer to test 1, as the measures of average residual and its standard deviation are conflicting. 2.3 Coefficient of Correlation and Cross-Correlation. The coefficient of correlation is a measure that indicates the extent of Table 2 Results for average residual and its standard deviation Measure ¯R SN−1
Test 1 and test 2
Test 1 and test 3
0.8 7.7
3.8 6.4
Transactions of the ASME
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm
Table 3 Results for coefficient of correlation and R-square
linear relationship between two time histories, i.e., to what extent can A be represented as mB + c. The coefficient of correlation can range from −1 to +1. The value of +1 represents a perfect positive linear relationship between the time histories, which implies that they are both identical in shape. A value of −1 would indicate a perfect negative linear relation, which would indicate that the two time histories are mirror images of each other. The coefficient of correlation is computed as
兺
冑兺
共ai − ¯a兲共bi − ¯b兲
i=1
N
共4兲
N
共ai − ¯a兲2
i=1
兺 共b − ¯b兲 i
Test 1 and test 2
Test 1 and test 3
R-square
0.5 0.25
0.6 0.36
better correlated with test 1 than is test 2. However, the R-square values for tests 2 and 3 are very low and hence neither seems to be close to test 1. This is mainly because these measures are sensitive to phase difference and cannot distinguish between error due to phase and error due to magnitude. A modification to the concept of coefficient of correlation used in signal processing is called cross-correlation. It is sometimes called the sliding dot product and has applications in the fields of pattern recognition and cryptanalysis. It can be used to measure the phase lag between two time histories. Cross-correlation is a series defined as
N
=
Measure
2
i=1
The square of the coefficient of correlation is called the coefficient of determination and is commonly known as R-square. The results of applying this measure to the previous time history examples are presented in Table 3 and indicate that test 3 is
N−n
共N − n兲
共n兲 =
冑
兺
冉兺 冊 冑 i=1
N−n
共N − n兲
兺a i=1
−
ai
2.4 Sprague and Geers (S&G) Metric. Geers 关9兴 proposed an error measure for comparing time histories that combined the errors due to magnitude and phase differences. Recently, Sprague and Geers updated the phase error portion of the metric 关10,11兴. The error in magnitude and phase are computed for the time histories by using Eqs. 共6兲 and 共7兲, respectively. The combined error CS&G is then used to provide an overall error measure between the two time histories.
PS&G =
冑
1 cos−1
AA −1 BB
冉冑
AB
AABB
共6兲
冊
共7兲 共8兲
where
AA =
兺
N
ai2
i=1
N
,
BB =
兺
i+n
i=1
N−n
兺b
冉兺 冊 N−n
2 i+n
−
i=1
共5兲
2
bi+n
i=1
used as a reference in Eq. 共6兲. The separation of the error into magnitude and phase components is an advantage when more detailed investigation of the error sources is necessary. But the metric lumps the entire information of the time histories into AA, BB, and AB. Consequently, this metric cannot consider the shape of the time histories. This limitation is illustrated by the example in Fig. 2: The two simple time histories have the same value for AA and BB but differ from each other in magnitude, phase, and shape. Even though there exists an error in magnitude, the S&G metric quantifies it as zero. 2.5 Russell’s Error Measure. Russell 关12,13兴 proposed a set of magnitude, phase, and comprehensive error measures to provide a robust means for quantifying the difference between time histories. The metric is similar to the S&G metric with a modification in the magnitude error factor. The magnitude error factor is defined such that it has approximately the same scale as the phase error when there exists an order-of-magnitude difference in amplitude of the responses. These are then combined to form the comprehensive error facto, similar to the S&G metric. The magnitude error factor is given by
冉 冏冑
M R = sign共AA − BB兲log10 1 +
2 2 + PS&G CS&G = 冑M S&G
N
i=1
共N − n兲
i=1
where n = 0 , 1 , . . . , N − 1. To compute the phase difference between two time histories, we determine the maximum value 共nⴱ兲; nⴱ is then a measure for phase lag. This concept has been used by Liu et al. 关5兴 and Gu and Yang 关6兴 and is also included as a metric in ADVISER, a commercial software package that contains a simulation model quality rating module 关7,8兴 for vehicle safety applications.
M S&G =
兺 兺b ai
2
N−n
2 i
N−n
N−n
aibi+n −
AA − BB AABB
冏冊
共9兲
The results of applying the Russell metric to the time history examples are presented in Table 5. Even though Russell’s error
N
bi2
i=1
N
兺ab
i i
,
AB =
i=1
Table 4 Results for S&G metric
N
The results of applying the S&G metric to the time history examples are presented in Table 4. The S&G metric quantifies a lower magnitude error for test 2 and a lower phase error for test 3. The combined error is lower for test 2, indicating that test 2 is closer to test 1 than test 3. The limitation of the S&G metric is that it is not symmetric. The results depend on the time history that is Journal of Dynamic Systems, Measurement, and Control
M S&G PS&G CS&G
Test 1 and test 2
Test 2 and test 1
Test 1 and test 3
Test 3 and test 1
0.08 0.20 0.22
−0.08 0.20 0.22
0.67 0.17 0.70
−0.40 0.17 0.44
NOVEMBER 2010, Vol. 132 / 061401-3
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm
Table 6 Results for NISE metric
PNISE M NISE SNISE CNISE
Fig. 2 Failure of S&G metric to quantify error due to magnitude
measure overcomes the limitation of asymmetry as observed in the S&G metric, it still fails in identifying and quantifying the magnitude error of the example shown in Fig. 2 共i.e., the magnitude error between these two time histories is still computed as equal to zero兲. 2.6 Normalized Integral Square Error (NISE). The NISE is used to quantify the difference between time histories from repeated tests, e.g., see Ref. 关14兴. It measures the difference between two time histories and is related in principle to the concept of cross-correlation. It considers three aspects: phase shift, amplitude 共magnitude兲 difference, and shape difference. It uses the cross-correlation principle from Sec. 2.3 to compute nⴱ. It then shifts one of the time histories 共A or B兲 relative to the other by nⴱ “steps” to compensate for the error in phase. The quantity AB共nⴱ兲 is computed after this adjustment. The equations for the phase, magnitude, and shape error are given in Eqs. 共10兲–共12兲, respectively. PNISE =
2AB共nⴱ兲 − 2AB AA + BB
M NISE = 共nⴱ兲 −
共10兲
2AB共nⴱ兲 AA + BB
共12兲 2AB AA + BB
共13兲
The results of applying the NISE metric to the time history examples are presented in Table 6. Even though NISE attempts to consider shape error, the overall measure 共CNISE兲 is independent of 共nⴱ兲 as this term is cancelled out; hence, it does not account for shape error. An interesting observation is that the magnitude error contribution to the NISE error can be negative, i.e., the magnitude error can decrease the overall combined error.
Table 5 Results for Russell metric Test 1 and test 2
Test 1 and test 3
0.064 0.20 0.21
0.32 0.17 0.36
061401-4 / Vol. 132, NOVEMBER 2010
0.18 −0.045 0.06 0.20
0.09 0.014 0.15 0.25
2.7 Dynamic Time Warping (DTW). DTW is an algorithm for measuring discrepancy between time histories and was first used in context with speech recognition in the 1960s 关15兴. Since then, it has been used in a variety of applications: computer vision 共e.g., Ref. 关16兴兲, data mining 共e.g., Ref. 关17兴兲, signature matching 共e.g., Ref. 关18兴兲, and polygonal shape matching 共e.g., Ref. 关19兴兲. The ability of DTW to identify that two time histories with time shifts are a “match” makes it an important similarity identification technique in speech recognition 关20兴, since human speech consists of varying durations and paces. The time warping technique aligns peaks and valleys as much as possible by expanding and compressing the time axis according to a given cost 共distance兲 function 关21兴. As an example, consider the cost function, d关i , j兴 = 共ai − b j兲2, in which ai is the ith element of time history A 共ai = A共ti兲兲, b j is the jth element of time history B 共b j = B共t j兲兲, and i , j = 1 , 2 , . . . , N, where N is the total number of time samples 共the lengths of A and B are assumed to be the same in this case兲. Let wk = 共ik , jk兲 denote the indices of an ordered pair of time samples from A and B. The DTW algorithm then finds a monotonically increasing sequence of ordered adjacent pairs such that the cumulative cost function 共sum of the cost functions over k = 1 , 2 , . . . , N兲 is minimized. That is, a sequence 具w1 , w2 , . . . , wN典 is found that minimizes the cost function subject to the constraints that 共i兲 the sequence must progress one step at a time 共0 ⱕ ik − ik−1 ⱕ 1 and 0 ⱕ jk − jk−1 ⱕ 1, k = 2 , 3 , . . . , N + 1兲 and 共ii兲 the sequence is monotonically increasing 共wk−1 · wk ⬎ 0, k = 2 , 3 , . . . , N + 1兲. The results of the DTW algorithm for the time history example are shown in Figs. 3 and 4. The DTW distance 共square root of the cumulative cost function兲 for test 2 is 768 while the DTW distance for test 3 is 5636. Consequently, test 2 is to be a closer representation of test 1 than is test 3 with respect to the DTW distance.
3
The overall NISE for two time histories is given by
MR PR CR
Test 1 and test 3
共11兲
SNISE = 1 − 共nⴱ兲
CNISE = PNISE + M NISE + SNISE = 1 −
Test 1 and test 2
Proposed Error Measures
Several measures used to quantify discrepancy 共or error兲 between time histories have been discussed in the previous section. Each has its own advantages and limitations. The concepts of magnitude and phase measures were introduced and different approaches to measuring and combining these measures into a single metric were articulated. In signal processing literature, a third signal measure is given by frequency. That is, for a simple harmonic signal, the time history can be described by y共ti兲 = Y cos共ti + 兲
共14兲
in which Y is the amplitude, is the frequency, is the phase, and ti is the value of time at time index i. The difficulty in quantifying the error associated with the features of phase, magnitude, and frequency separately is that they can be coupled strongly. For example, to quantify the error associated with magnitude, the presence of a phase difference between the time histories may result in a misleading measurement. Thus, it is important to minimize the influence of the other two features when quantifying the error due to the third one. While this can be accomplished using standard signal processing techniques such as fast Fourier transforms 共FFTs兲, transformation to the frequency domain is less useful for more content-rich signals that pure harmonics, such as vehicle safety-related time histories. Transactions of the ASME
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm
Fig. 3 DTW results for time histories 1 and 2
Fig. 4 DTW results for time histories 1 and 3
In this section, we propose using measures to quantify magnitude and phase error. To minimize the influence of phase on the magnitude error, we employ the DTW algorithm and a suitable error function. To capture the complex behavior of frequency content, we introduce a slope measure, which captures the local frequency discrepancies. 3.1 Phase Error. To quantify the error due to phase, we considered the phase measure used by Sprague and Geers and by Russell in their metric 共Eq. 共7兲兲 and the cross-correlation technique presented in Sec. 2.3. The cross-correlation based method for quantifying error in phase was used in Ref. 关22兴, shifting one of the time histories to maximize the correlation coefficient. This shift is considered to be the measure for error in phase. We compared the performances of the cross-correlation method and the S&G phase error and concluded that the cross-correlation method has greater sensitivity to phase differences. An example to illustrate this is presented in Fig. 5. It is evident that there exists a much larger phase difference between the computer-aided engineering “共CAE兲-1” and “test” time histories than between the “CAE-2” and test time histories 共note that the time history examples in this section are not related to the ones in the previous section兲. The S&G phase error quantification was identical for both cases while the cross-correlation quantification provided different values. Thus, we use the cross-correlation technique to quantify phase error in our metric. Using the number of shifted time steps, nⴱ is a linear type of measure for phase error. In practical applications, small time step differences should be viewed as local, rather than global phase error. Consequently, small time step differences should not be Journal of Dynamic Systems, Measurement, and Control
Fig. 5 Example cross-correlation
to
compare
S&G
phase
measure
to
weighted as heavily as large time step differences in the total phase error measure. We chose a penalty function that can be tuned to suit a particular application: Errorphase = e共共nⴱ−c兲/r兲
共15兲
where c and r are parameters that define the rise start point and rate of increase for the function. That is, c provides a measure of the time shift value below, which the phase error can be considered negligible, while r affects the rate of phase error increase above the critical value given by c. For our safety applications, c = 15 and r = 20, based on the subject matter experts assessment that phase shifts less than 1.2 ms are negligible. 3.2 Magnitude Error. To quantify the error only associated with magnitude, we need to first minimize the discrepancy between the time histories caused by error in phase and frequency. We can compensate for global phase error by shifting the time history by the number of steps 共nⴱ兲 computed for the phase error. However, time-shifted history may still exhibit local phase errors. To address the local effects, we apply DTW to the time-shifted time history and to the reference time history. The cost function selected for DTW considers not only the distance but also the slope between two points: NOVEMBER 2010, Vol. 132 / 061401-5
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm
error, the slope is calculated for the time-shifted histories. Then, taking the derivative at each point, we obtain “derivative timeshifted histories” represented by A关ts+d兴 and B关ts+d兴. The effect of localized time shifts still exists, so the DTW algorithm is applied. The L1 norm of the DTW time-shifted histories is then used to quantify the isolated contribution of slope error: Errorslope =
储A关ts+d+w兴 − B关ts+d+w兴储1 储B关ts+d+w兴储1
共18兲
in which the superscript 关ts + d + w兴 denotes that the time histories were processed by the sequence of time shifting for global phase effect, derivative computation, and finally, DTW.
4
Fig. 6 Illustration of DTW effect on time histories: „top… time histories before DTW and „bottom… time histories after DTW
d关i, j兴 = 共共aits − btsj 兲2 + 共ti − t j兲2兲
冏冉 冊 冉 冊 冏 dAts dt
−
t=ti
dBts dt
共16兲
t=t j
in which the superscript ts is used to denote time-shifted time histories 共although only one time history is shifted in practice兲. This ensures the mapping of a point to the closest point having similar slope on the other time history and thus minimizes both local phase and local frequency differences between the two time histories. Figure 6 depicts two time history examples before and after DTW using Eq. 共16兲. It is apparent that DTW minimizes the local phase and frequency effects. We then use the L1 norm on the warped time-shifted histories to isolate the relative magnitude error between the two time histories: Errormagnitude =
储A关ts+w兴 − B关ts+w兴储1 储B关ts+w兴储1
共17兲
in which the superscript 关ts + w兴 denotes the phase-shifted, DTW modified time histories. 3.3 Slope Error. As frequency is a global measure, we employ the slope of the time history at each time point, following the rationale that the time derivative of a harmonic time history, Eq. 共14兲 provides a direct value for frequency . Therefore, the slope error is computed from the derivative of the time histories. Considering the derivative information ensures that the effect of magnitude is compensated for, as the derivative depends on the slope and not on the amplitude. To minimize the effect of global phase 061401-6 / Vol. 132, NOVEMBER 2010
Example
In this section, we present results from the application of the proposed error measures using data from a case study provided by an International Standards Organization 共ISO兲 working group on Virtual Testing 共ISO technical committee 共TC兲 22, subcommittees 共SC兲 10 and 12, and working group 共WG兲 4兲. An experimental test setup used available crash pulses to record acceleration time histories at different locations of a dummy during impact: head, thorax, and tibia. For the head impact case, three experiments were conducted. Eleven time history responses were recorded. Three CAE simulations were conducted, employing a different computer simulation code for each model. We present here the error measures for three responses of the head impact case: head impactor displacement, head acceleration in the x-direction, and neck force in the x-direction. Figure 7 provides plots of the time histories for these three physical responses from the experiments and the simulations. The complete set of results can be found in Ref. 关23兴. We quantify error between the different tests and the computational models for each response individually. For each response, we compare tests among themselves to obtain error measures between test repetitions. We then compare the computational model predictions to each of these tests to obtain a measure for the discrepancy between test and computational data. If the error between tests is greater than or equal to the error between the computational model and the tests, we may infer that the computational model is adequate. To illustrate this idea, we consider the error measures relative to one test, test 1. 共In practice, the error measure quantification is performed using each available test data set as the baseline case.兲 We compare the remaining two tests and the three computational models to test 1. We then have the following three cases: 1. Looking at one response at a time, if the values associated all three error measures for a computational model are less then or equal to the respective error values for the tests, we may conclude that the computational model is a good representation of reality. 2. Looking at one response at a time, if all three error measure values for one computational model are less than all three error measure values for another computational model, we may conclude that the first model is better than the second model. 3. Looking across all responses, if we find that one computational model performs well for all of the responses, we can conclude that it is better than the other models collectively. Figure 8 depicts the results for the three considered responses. From the head impactor response, all three error measure values for all three computational models are less than or equal to the error measure values for test 2 and test 3. Thus, we may conclude that all of the computational models are adequate for the head impactor response. As there are negligible differences in the error measures for the three computational models, they can be considered to have equally good representation of the head impactor response. Transactions of the ASME
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm
Fig. 8 Sample of results for head impact case
the worst. In this case, we can rank the models. The values of the error measures shown in Fig. 8 are consistent with the qualitative visual differences that can be observed in Fig. 7 for phase, magnitude, and slope among the tests and computational time histories.
5 Building Regression-Based Validation Metrics Using Ratings of Subject Matter Experts
Fig. 7 Computational results and test data for head impactor displacement „top…, head acceleration in the x-direction „middle…, and neck force in the x-direction „bottom…
For the head acceleration in the x-direction, the computational models have acceptable error only in the phase component; the models have larger magnitude and slope errors compared with the tests. In addition, the three computational models do not exhibit consistently better or worse errors, so no conclusive ranking of the models can be made. For the neck force response in the x-direction, only the phase error is acceptable for all three models. However, the computational models exhibit consistent magnitude and slope errors, with model 1 being the best and model 3 being Journal of Dynamic Systems, Measurement, and Control
In Sec. 4, we presented the results individually from the three proposed error measures. It is apparent that no single error measure can provide a quantitative metric regarding the match between time history responses. Instead, as was done to develop the S&G error metric, a combination of error measures is needed. Consequently, a rational procedure is required to develop such a combination of error measures. In this work, we rely on the opinions of subject matter experts 共SMEs兲 to build and train a regression model for model validation. Subject matter experts are individuals with long experience in a particular discipline. They are thus trusted to evaluate and rank the predictive capability of computational models by 共mostly visual兲 inspection of comparison plots. We use SME ratings of computational models and the three proposed error measures to build a regression-based validation metric that can validate and/or rank other computational models. Comparisons with other metrics in use commercially are made to assess the robustness of the developed regression model. We consider a case 共previously reported in Ref. 关24兴兲, where a deceleration time history from a crash is known by means of physical experiment. Fifteen computational models had been deNOVEMBER 2010, Vol. 132 / 061401-7
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm
Table 8 Error measure values for the CAE models CAE ID
Phase
Magnitude
Slope
1188 1189 1130 1047 1020 1041 1028 1005 1083 1052 1042 1100 1009 1016 1022
0.52 51.94 1.73 0.67 0.61 0.50 0.52 0.50 0.52 0.52 0.64 1.16 0.70 0.61 0.52
0.42 0.20 0.31 0.22 0.19 0.22 0.19 0.16 0.12 0.09 0.33 0.28 0.16 0.17 0.08
0.43 0.33 0.51 0.41 0.48 0.31 0.27 0.23 0.33 0.25 0.45 0.69 0.28 0.44 0.22
Fig. 9 A typical plot presented to the SMEs
measures were combined using a regression model to predict the average SME ratings. We built a linear regression model using the following first-degree polynomial to fit the error measure values to the SME average ratings:
veloped to predict the deceleration time history for this crash 共these models are not necessarily different computational models but can include different substantiations of the same computational model due to different parameter values chosen for the models兲. Six SMEs were presented with fifteen comparison plots 共one for each model兲 and average SME ranking of the models was recorded. Ratings range from 1 共worst match兲 to 10 共excellent match兲. Figure 9 depicts a typical comparison plot that was shown to the SMEs. We used ten of the available fifteen data sets and SME ratings to build a regression-based validation metric. We then used the remaining five data sets to test our model. Many combinations are possible for choosing which ten data sets to use to build the regression model and a full discussion of the combinations are found in Ref. 关23兴. Table 7 presents the individual and average SME ratings for the time histories associated with the training and test data sets for one particular training set selection. Each computational model 共CAE兲 is identified with an ID number and the data sets have been sorted in ascending order of average SME rating. The error measures computed using the three proposed error measures are given in Table 8 for the 15 data sets. It is worth noting that the relatively large phase error for CAE 1189 is reflected by the low SME rating for this model. The three error
R p = 10 − 共c1Errorphase + c2Errormagnitude + c3Errorslope兲 共19兲 where R p denotes predicted rating and recalling that a rating of 10 is an excellent match, implying no error. Figure 10 depicts the regression model rating predictions on the ten time histories used to build the model and on the five remaining time histories relative to the average SME ratings 共the bars for the SME ratings represent the range of the SME ratings兲. It can be seen that the validation metric assessments agree well with the SME ratings and the predictions always fall within the range of the individual SME ratings. While the results presented are for only one training set, the same performance was observed for all regression models we built using different combinations of training and test time histories 关23兴. It is instructive to compare the rating predictions of our regression-based validation metric to the rating predictions of four existing metrics used currently for this particular application: wavelet decomposition method, step function, ADVISER model evaluation criteria, and corridor violation plus area. A complete description of these metrics may be found in Ref. 关24兴. It should be noted that a linear regression approach was used to combine
Table 7 SME ratings for the fifteen CAE models „first ten models used to build the regressionbased validation metric; last five used to test it… SME ranking CAEID
1
2
3
4
5
6
Average ranking
1188 1189 1130 1047 1020 1041 1028 1005 1083 1052
5 3 4 5 6 6 7 8 7 8
3 4 4 4 4 5 6 7 7 7
1 4 4 5 5 5 5 7 7 8
2 3 3 4 6 6 5 6 9 10
3 3 4 5 7 6 7 8 9 10
4 3 5 5 4 5 6 6 7 8
3.00 3.33 4.00 4.67 5.33 5.50 6.00 7.00 7.67 8.50
1042 1100 1009 1016 1022
4 5 7 7 8
3 4 6 7 9
4 3 6 7 8
3 3 7 8 10
4 4 7 9 10
4 6 6 5 8
3.67 4.17 6.50 7.17 8.83
061401-8 / Vol. 132, NOVEMBER 2010
Transactions of the ASME
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm
Fig. 10 Regression-based validation metric: data fit and test
the individual error measures in the existing metrics used for comparison. Figure 11 presents a comparison of the existing metrics and the metric proposed in this work, which is labeled as the error assessment of response time histories 共EARTH兲 metric. The EARTH and wavelet decomposition metrics predict the SME ratings quite well. While different training sets yield slightly different absolute results, in aggregate, the EARTH metric provides a more robust measure of SME average rating across all training sets than the four commonly used metrics studied. In all the regression models we built, EARTH consistently predicted SME ratings well. This indicates that EARTH is capable of recognizing the key features associated with the time histories for this application and provide an overall error measure by combining them.
6
Conclusions
The objective of the research presented in this paper was to evaluate existing measures for assessing the error between time histories and to propose a set of measures that can quantify error of complex time histories associated with vehicle safety applications. We adopted the idea of classifying error into phase and magnitude based on existing metrics. We enhanced this concept by using DTW to separate the effects of phase and magnitude. In addition, to provide a measure of error due to differences in shape, we introduced the concept of slope error, using the slope time history, to account for shape discrepancy. The DTW algorithm also was employed when assessing slope error. The applicability of the proposed error measures was demonstrated through two case studies pertaining to vehicle safety. The first case study illustrates how the proposed measures can be used Journal of Dynamic Systems, Measurement, and Control
Fig. 11 Comparison of EARTH to other metrics
NOVEMBER 2010, Vol. 132 / 061401-9
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm
to assess predictive capability of computational models. The second case study showed how the measures can be used in conjunction with SME data to develop regression-based models to validate simulation models. A comparison with four existing metrics for model validation in vehicle safety applications demonstrated that the proposed metric agrees well with SME ratings. The methods presented are the first step toward developing a fully realized validation metric. Following the guidelines of Oberkampf and Barone 关3兴, with effective error measures in place, the need exists to incorporate uncertainty related to experimental and numerical error and to incorporate information regarding the number of experiments available. This work is being conducted, following some of the methodologies proposed by Rebba and Mahadevan 关4兴.
Acknowledgment The authors would like to thank Dr. Guosong Li of Ford Motor Co. and Dr. Matt Reed of the University of Michigan Transportation Research Institute 共UMTRI兲 for providing data and helpful feedback and suggestions. This work has been supported partially by Ford Motor Co. 共University Research Project No. 20069038兲 and by the Automotive Research Center 共ARC兲, a U.S. Army Center of Excellence in Modeling and Simulation of Ground Vehicles led by the University of Michigan. Such support does not constitute an endorsement by the sponsors of the opinions expressed in this paper.
References 关1兴 American Institute of Aeronautics and Astronautics, 1998, Guide for the Verification and Validation of Computational Fluid Dynamics Simulations. 关2兴 American Society of Mechanical Engineers, 2003, Council on Codes and Standards, Board of Performance Test Codes: Committee on Verification and Validation in Computational Solid Mechanics. 关3兴 Oberkampf, W. L., and Barone, M. F., 2006, “Measures of Agreement Between Computation and Experiment: Validation Metrics,” J. Comput. Phys., 217共1兲, pp. 5–36. 关4兴 Rebba, R., and Mahadevan, S., 2006, “Model Predictive Capability Assessment Under Uncertainty,” AIAA J., 44, pp. 2376–2384. 关5兴 Liu, X., Yan, F., Chen, W., and Paas, M., 2005, “Automated Occupant Model Evaluation and Correlation,” Proceedings of the 2005 ASME International Mechanical Engineering Congress and Exposition, Orlando, FL. 关6兴 Gu, L., and Yang, R. J., 2004, “CAE Model Validation in Vehicle Safety Design,” SAE Technical Paper Series, Paper No. 20054-01-0455.
061401-10 / Vol. 132, NOVEMBER 2010
关7兴 2007, ADVISER Reference Guide, 2.5 ed. 关8兴 Jacob, C., Charras, F., Trosseille, X., Hamon, J., Pajon, M., and Lecoz, J. Y., 2000, “Mathematical Models Integral Rating,” Int. J. Crashworthiness, 5共4兲, pp. 417–432. 关9兴 Geers, T. L., 1984, “Objective Error Measure for the Comparison of Calculated and Measured Transient Response Histories,” Shock and Vibration Bulletin, 54, pp. 99–107. 关10兴 Sprague, M. A., and Geers, T. L., 2004, “A Spectral-Element Method for Modelling Cavitation in Transient Fluid-Structure Interaction,” Int. J. Numer. Methods Eng., 60共15兲, pp. 2467–2499. 关11兴 Schwer, L. E., 2005, “Validation Metrics for Response Histories: A Review With Case Studies,” Schwer Engineering & Consulting Service Technical Report. 关12兴 Russell, D. M., 1997, “Error Measures for Comparing Transient Data: Part I, Development of a Comprehensive Error Measure,” Proceedings of the 68th Shock and Vibration Symposium, Hunt Valley, MD. 关13兴 Russell, D. M., 1997, “Error Measures for Comparing Transient Data: Part II, Error Measures Case Study,” Proceedings of the 68th Shock and Vibration Symposium, Hunt Valley, MD. 关14兴 Donnelly, B. R., Morgan, R. M., and Eppinger, R. H., 1983, “Durability, Repeatability, and Reproducibility of the NHTSA Side Impact Dummy,” 27th Stapp Car Crash Conference. 关15兴 Rabiner, L. R., and Huang, B. H., 1993, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ. 关16兴 Munich, M., and Perona, P., 2003, “Visual Identification by Signature Tracking,” IEEE Trans. Pattern Anal. Mach. Intell., 25共2兲, pp. 200–217. 关17兴 Oates, T., Firoiu, L., and Cohen, P., 2000, Using Dynamic Time Warping to Bootstrap HMM-Based Clustering of Time Series, Springer-Verlag, Berlin, pp. 35–52. 关18兴 Faundez-Zanuy, M., 2007, “On-Line Signature Recognition Based on VQDTW,” Pattern Recogn., 40, pp. 981–992. 关19兴 Arkin, E. M., Chew, L. P., Huttenlocher, D. P., Kedem, K., and Mitchell, J. S. B., 1991, “An Efficiently Computable Metric for Comparing Polygonal Shapes,” IEEE Trans. Pattern Anal. Mach. Intell., 13共3兲, pp. 209–216. 关20兴 Efrat, A., Fan, Q., and Venkatasubramanian, S., 2007, “Curve Matching, Time Warping, and Light Fields: New Algorithms for Computing Similarity Between Curves,” J. Math. Imaging Vision, 27共3兲, pp. 203–216. 关21兴 Chan, F., Fu, A., and Yu, C., 2003, “Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping,” IEEE Trans. Knowl. Data Eng., 15共3兲 pp. 686–705, 关22兴 Chang, Y., and Seong, P., 2002, “A Signal Pattern Matching and Verification Method Using Interval Means Cross Correlation and Eigenvalues in the Nuclear Power Plant Monitoring Systems,” Ann. Nucl. Energy, 29, pp. 1795– 1807. 关23兴 Sarin, H., 2008, “Error Assessment of Response Time Histories 共EARTH兲: A Metric to Validate Simulation Models,” MS thesis, University of Michigan, Ann Arbor, MI. 关24兴 Yang, R. J., Li, G., and Fu, Y., 2007, “Development of Validation Metrics for Vehicle Frontal Impact Simulation,” Proceedings of the ASME 2007 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Las Vegas, NV.
Transactions of the ASME
Downloaded 28 Oct 2010 to 141.212.126.69. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm