Predicting materials properties and behavior using ...

Materials Science and Engineering A 433 (2006) 261–268

Predicting materials properties and behavior using classification and regression trees Yong Li ∗ Romer Inc., Hexagon Metrology, Carlsbad, CA 92008, USA Received 6 December 2005; received in revised form 22 June 2006; accepted 22 June 2006

Abstract An investigation was conducted to evaluate the effectiveness of a non-parametric statistical methodology of classification and regression tree (CART) [L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees, Wadsworth Inc., California, 1984] as an alternative to the traditional parametric-based regression techniques in predicting materials properties and behavior. It has been demonstrated, with its application to a database on the creep rupture data of austenitic stainless steels, that the CART technique consistently outperforms the conventional curvilinear regression method in terms of the accuracy of prediction. Moreover, the results of the CART analysis provide an insight into the relationships and interactions between the materials variables and the insight will be beneficial to understanding materials behavior and useful in materials design. © 2006 Elsevier B.V. All rights reserved. Keywords: Prediction of materials properties and behavior; Classification and regression trees (CART); Data mining; Materials informatics

1. Introduction Accurate prediction of materials properties and behavior based on the existing data has been the persistent effort of many materials researchers. More often than not, the prediction quest was pursued with resort to one of the regression techniques, and among which the linear or curvilinear regression has been the de facto choice due to its simplicity in implementation and interpretation. Nevertheless, the inherent complexity of materials behavior inevitably confines its scope of usability in many circumstances. Consequently, the non-linear regression techniques such as neural networks and its variations have received considerable attention in materials community [2–8]. While the parametric-based (both linear and non-linear) regression methods have achieved remarkable success, their limitation in handling materials data is becoming evident in two ways. First, it has been noted [9,10], that materials data exist in a variety of formats including the categorical whose datum space is characterized by a set of limited, discrete values. For ∗

Present address: Formerly Research Assistant Professor in the Department of Materials Science and Engineering, University of Southern California (USC), currently Senior Software Engineer at Romer Inc., Carlsbad, CA 92008. Tel.: +1 949 551 2345. E-mail address: [email protected]. 0921-5093/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.msea.2006.06.100

instance, we may need a variable to represent the classification of engineering materials. Such a variable may take on one of the following values: “metals”, “polymers”, “elastomers”, “ceramics”, “glasses” and “hybrids” [11]. While it is acceptable to assign a numerical number to a categorical variable, such as “metals” = 0, “polymers” = 1, etc., to facilitate mathematical handling, a prediction of 1.25 by a parametric model for the materials classification would be difficult to comprehend. Second, as the size of materials databases increases rapidly, missing data are frequently observed. As the parameterized models require all the data be available for an entry to be used, there are basically two ways to handle the missing data problem in parametric modeling: (a) simply drop entries where one or more data are missing and (b) make a guess for the missing data. For example, in the case that the composition data were missing, if the element is known to be a deliberate addition to the material, it is set to be zero; if it is regarded as impurity, it is set to be the average of the available data [6]. It is clear that the first approach leads to an inefficient and possibly biased use of the original data. The second may also be problematic. It has been well documented in literature on database management [12], that the missing data indicate that the values of the data are unknown and an “unknown” is not equivalent to a “zero” or an average of other values. In order to address the issue of categorical variables and missing data in materials research, a non-parametric predic-

262

Y. Li / Materials Science and Engineering A 433 (2006) 261–268

tion technique of classification and regression tree (or decision tree) is useful. The classification and regression tree (CART) [1] is a nonparametric classification and prediction model that has been extensively applied in many areas from medical science to enology [13–16]. However, its applications to materials science are very limited. Sturrock and Bogaerts [17], using five techniques including the decision tree, studied a two-class classification problem on stress corrosion cracking of austenitic stainless steels. Their results indicate that the decision tree method provides the best performance in terms of both classification accuracy and intelligibility of the output. In this paper, we first briefly describe the fundamentals of the CART methodology, and then present its application to a database with more than 1600 entries on creep rupture data of austenitic stainless steels, in attempt to demonstrate that CART is capable of predicting reasonably well the creep rupture life and rupture stress based on testing condition, processing, heat treatment history and chemical composition of the materials. 2. Method 2.1. A brief review of CART The CART modeling is a non-parametric statistical methodology that can incorporate both numerical and categorical variables into analysis. Moreover, it uses an effective algorithm to cope with the missing data situation [1]. It has been demonstrated that CART can still perform reasonably well when the missing data do not exceed 5% [1]. In a classification or regression problem, we are given a data set of training records (or training data). Each record (or entry) has a number of variables. There is one distinguished variable called the dependent or response variable and the remaining variables are referred to as the predictor variables. The overall goal of the CART analysis is to devise models that will use the predictor variables to predict the values of the response variable in a non-parametric way. In essence, the CART analysis is a process of binary recursive partitioning of data set based on a specific splitting criterion [18]. The CART procedure utilizes a computationally intensive algorithm that searches for the best split among all possible split points for each predictor variable in order to conduct classification or regression. To pinpoint the best split, CART applies its goodness-of-split criteria to evaluate the reduction in “impurity” achieved by the potential partitioning [1] and all the possible splits are ranked based on their impurity reduction. The partitioning is carried out using the best split by default. However, in the case where the field for the best split is missing (i.e., the missing data situation), the next best split, which is termed as the surrogate split, will be used. The graphical output of the CART analysis resembles an inverted tree with internal and terminal nodes. An important consideration in devising the CART models pertains to the construction of the “right-sized” tree [1]. In the extreme, one may grow a tree so large that each terminal node contains only one entry. Such a tree may perfectly classify the training data, but will most likely incur significant errors in the

testing data and the real-world predictions. This phenomenon is referred to as “overfitting”. To avoid overfitting, CART conducts a tree-pruning procedure by means of either cross-validation or a testing data set after a large tree is grown [1]. With the pruning, the analysis leads to an optimal tree that is used to serve the purpose of prediction. In the optimal tree, each terminal node is associated with a set of “rules” that record a sequence of splitting criteria that lead to the formation of that specific node. The rules are important in two ways. First, they are used to predict the values of the response variable. Second, they contain a wealth of information about the relationship between the response and the predictor variables and the interactions among the predictors. 2.2. The creep rupture database of austenitic stainless steels The creep rupture data were extracted from the database maintained by the National Institute of Materials Science (NIMS) in Japan [19]. The selection of the NIMS database for this investigation was based on the following considerations. First, it is probably the most comprehensive materials database recording the creep and creep rupture data for austenitic stainless steels and it is accessible via the Internet. Second, the creep rupture data cover a wide range of testing conditions for 48 heats of products. Each heat represents a specific austenitic stainless steel with different chemical composition and well-documented processing and heat treatment information. Third, the data volume of more than 1600 entries in total is large enough to permit a reliable statistical analysis like CART. Fourth, the original data were obtained through the tests conducted by following the identical creep testing standards [19], and therefore the problem of data scattering, which is often severe in creep rupture testing, can be greatly alleviated. Finally, the NIMS database contains detailed information about the curvilinear regression equation for the archived creep rupture data and it provides an excellent opportunity to compare the performance of the CART method with a well-established regression technique. The austenitic stainless steels under investigation include: AISI 304 (basic 18Cr–12Ni), AISI 316 (304 + Mo), AISI 321 (304 + Ti) and AISI 347 (304 + Nb). The variables used in the present CART analysis are listed in Table 1. It should be noted that the creep rupture life is presented in the logarithmic value. It can be seen from the table that there are missing data for several elements. The total number of the missing data is counted less than 5% of the total in the database. In this investigation, we used “Rupture life” and “Stress” as the dependent variables for the CART analysis. Among those variables listed in Table 1, the “Processing history” is the only categorical variable and it has 16 values as recorded in Table 2. It contains the information about the thermo-mechanical processing, solution treatment temperature, how long the material is heat treated, and the quench method. In Table 2, the contents of processing and heat treatment history are quoted verbatim from the original database [19] and WQ stands for “water quench”. There are two reasons that the processing and heat treatment was not further divided into “pro-

Y. Li / Materials Science and Engineering A 433 (2006) 261–268 Table 1 Variables used in the CART analysis Variables

Minimum Maximum Mean

S.D.

Missing data

Stress (MPa) Temperature (◦ C) Rupture life (log h) Austenite grain size number Processing history

18 450 1.14 2.3

471 850 5.38 7.0

126 670 3.58 5.1

78 68 0.95 1.0

No No No No

–

–

–

Composition (wt.%) C Si Mn P S Ni Cr Mo Cu Ti Al B N Nb + Ta Co V O

0.011 0.40 0.81 0.019 0.027 9.00 16.39 0.02 0.02 0.0006 0.002 0.0001 0.0074 0.001 0.06 0.031 0.0032

0.090 0.82 1.82 0.038 0.003 13.65 18.95 2.56 0.35 0.55 0.161 0.003 0.081 0.88 0.37 0.057 0.0054

0.061 0.60 1.53 0.025 0.010 11.81 17.66 1.02 0.14 0.12 0.031 0.001 0.025 0.21 0.27 0.039 0.0044

–

No

0.015 0.092 0.25 0.004 0.006 1.33 0.72 1.07 0.09 0.18 0.038 0.0007 0.014 0.34 0.08 0.009 0.001

No No No No No No No No No No Yes Yes No Yes Yes Yes Yes

cessing method”, “heat treatment temperature” and “heat treatment time”. First, the information on temperature and time of heat treatment are not always available in the original database. Second, CART has the ability to process the categorical data. An examination of the assigned values for the second column in Table 2 indicates that each value represents a unique combination of thermo-mechanical processing and heat treatment. The values of Type ID in Table 2 are used for the Processing history variable in the CART analysis. Prior to the CART analysis, the extracted creep rupture data were randomized to avoid any possible bias and then divided into two groups: training data and testing data. The training data were Table 2 The values of processing history Type ID

Processing history

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Hot extruded and cold drawn 1180 ◦ C WQ Rotary pierced and cold drawn 1120 ◦ C/10 min WQ Hot extruded and cold drawn solution treated Hot extruded and cold drawn 1130 ◦ C WQ Rotary pierced and cold drawn 1100 ◦ C/10 min WQ Hot rolled 1050 ◦ C/40 min WQ Hot rolled 1050 ◦ C/80 min WQ Hot extruded and cold drawn 1200 ◦ C/20 min WQ Hot rolled 1050 ◦ C/30 min WQ Hot rolled 1040 ◦ C/5 min WQ Hot rolled 1050 ◦ C/60 min WQ Rotary pierced and cold drawn 1070 ◦ C/10 min WQ Hot rolled 1080 ◦ C/120 min WQ Hot rolled 1100 ◦ C/70 min WQ Hot rolled 1060 ◦ C/110 min WQ Hot rolled 1100 ◦ C/30 min WQ

263

used to build the CART models and the testing data for examining the performance. The software used in this study is CART® 5.0 program which is a commercial product by Salford-Systems, Inc. in San Diego, California, and CART® is a registered trademark of the California Statistical Software, Inc. 3. Results After applying the CART program to the training data set by choosing the Rupture life as the response variable and the rest as the predictor variables, an optimal regression tree was constructed. Fig. 1 shows a snapshot of a portion of the regression tree. It is noted in Fig. 1 that there are two types of nodes in the tree: the internal nodes and the terminal nodes as discussed in Section 2; the internal nodes have two intermediate child nodes and the terminal nodes have none. In each internal node box, it displays four pieces of information: (1) the node number; (2) the split criterion for the node; (3) the average of the rupture life over all the cases in the node and (4) the number of cases in the node. For instance, at Internal node 209 as shown in Fig. 1, the split criterion is “Is stress less than or equal to 176.5 MPa?” That means that the entries with a “Yes” answer are assigned to the left child node (Terminal node 206) and the entries with a “No” go to the right child node (Terminal node 207). There are 12 cases stored in this internal node and the mean of rupture life of these 12 cases is 1.686 (log h). In the terminal node box, it displays the terminal node number and the number of cases stored in the node. Each terminal node has a set of rules that contain the information about how the splitting decision is made during the construction of the tree. An example of the rules for Terminal node 206 is shown in Table 3. The rules are formatted as a C language-compatible code. The symbol “&&” means the logic “AND”, and “ 687.5 && Testing Temperature < = 725 && Ni > 10.205 && Cr < = 17.66 && Stress > 152 && Stress < = 176.5 ) then The mean of Rupture life = 1.7239

264


Fig. 1. A portion of the regression tree for the creep rupture data of austenitic steels.

Fig. 2 shows a ranking of relative importance of the predictor variables in identifying the creep rupture life calculated based on the CART analysis. As shown in Fig. 2, the stress, testing temperature and processing history are three most important factors in determining the rupture life of the austenitic stainless steels. After the regression tree is built using the training data, it is imperative to evaluate the performance of the tree by the testing data. The process of applying the new data (i.e., the data not used during the tree construction) to the CART model is referred to as “scoring”. The CART® 5.0 program provides a built-in scoring facility to support such an operation [20]. Briefly, the process of scoring is to “drop” the values of the predictor variables in the testing data set onto the tree; the tree uses the rules to iden-

Fig. 2. Relative importance of the predictor variables in determining the rupture life.

tify the values of the dependent variable. Since the measured values of the dependent variable are known in the testing data, CART also outputs the residuals that are the difference between the measured and predicted values. Through the residuals, the accuracy of the prediction can be evaluated. To illustrate the scoring process manually, we choose a record in the testing data set: {stress = 157 MPa; testing temperature = 700 ◦ C; rupture life = 1.69 (log h); Ni = 11.9%; Cr = 16.6%; Nb + Ta = 0.03%, and etc.} The “dropping” procedure is essentially a matching process of the values of the predictor variables with the rules for the terminal nodes. An exhaustive comparison with all the rules for each terminal node reveals that this record satisfies the conditions defined by the rules for Terminal node 206, which is listed in Table 3. Therefore, a predicted value for this record is obtained as 1.72 (log h). As the measured value for this record is 1.69 (log h), the residual will be calculated as [measured value−predicted value] = −0.03 (log h). In this particular case, the prediction error is about 1.8%. The predicted values of the rupture life for all the entries in the testing data set are plotted against the measured data as shown in Fig. 3. In addition to the construction of a single regression tree to conduct prediction, the CART® 5.0 program also allows to build a series of trees through slightly “perturbing” and re-sampling the original training data. And the average of the predictions from these trees is reported as the final result. This approach is referred to as the “committee of experts” [20]. The purpose of such an operation is two-fold: (a) to examine the stability of the original training data; and (b) to improve the prediction accuracy. If the original training data are not stable, the committee approach can enhance the performance up to 40% [20].


265

Fig. 5. Predicted rupture life vs. measured rupture life from the ARCing. Fig. 3. Predicted rupture life vs. measured rupture life by the optimal tree.

On the other hand, if there is no significant improvement in performance with the approach, it indicates the training data are stable. Therefore, the committee approach will not deteriorate the accuracy of prediction and it may provide some insight into the quality of the original dataset. However, it should be pointed out that, since it is a synergetic effect of a series of trees, a graphical presentation of a tree structure as illustrated in Fig. 1 and a set of clearly defined rules are not available in the committee approach. There are two commonly adopted techniques to serve the committee approach: the bootstrap aggregation (Bagging) and the adaptive re-sampling & combining (ARCing) [22–24]. The major difference between Bagging and ARCing is the way to re-sample the original training data. In Bagging, a new training data set is created by excluding some entries and meanwhile including some entries duplicated or multiplied to make it a full set of training data [20]. In ARCing, the probability with which an entry is selected for the next training set is not constant; instead, the probability of selection increases with the frequency with which an entry has been misclassified in previous trees [20]. Figs. 4 and 5 show the plots of the predicted vs. measured creep rupture data for Bagging and ARCing, respectively. We can also build a regression tree using the rupture stress as the response variable and the rupture life as a predictor variable, along with other predictors. A plot of predicted rupture stress is displayed in Fig. 6.

Fig. 4. Predicted rupture life vs. measured rupture life from the Bagging.

Fig. 6. Predicted rupture stress vs. measured rupture stress for austenitic steels.

4. Discussion 4.1. The relative importance of the predictor variables It is of interest to note as shown in Fig. 2, that a ranking of relative importance is naturally set up for the predictor variables from the stress, testing temperature and chemical composition in identifying the values of the rupture life through the CART analysis. The ranking for each variable is identified by computing the contributions stemming from both the variable’s role as a primary splitter and their role as a surrogate to any of the primary splitter [20]. Sourmail et al. [6] reported a similar plot obtained through the neural network models and it appeared that the reported rankings varied remarkably with each neural network model built during the analysis. On the other hand, in CART, we can always build an optimal tree; therefore the ranking of predictors associated with the optimal tree is more stable. At this moment, the significance of this ranking plot is not fully understood. However, it may be an interesting topic that deserves further consideration. It can be speculated that a property or behavior of a material must be controlled by a number of materials variables such as chemical composition, processing history and testing condition. One may further speculate that the degree of the influence of those variables on a specific property may not be the same, and as a result, it implies there must be a ranking among them. This invites two questions. First, how to define the degree of the influence (or the importance) mathematically? Second, how to measure it experimentally by the methods of materials sci-

266


Table 4 Values of the COD for different regression methods Regression method

COD

Optimal regression tree Bagging ARCing Stress as the dependent variable Curvilinear regression

0.91 0.92 0.91 0.96 0.89

ence? In this study, it may be speculated that, if the ranking truly reflects the role of each variable in determining the creep rupture property, a plot like Fig. 2 would be of significance in materials design. 4.2. The performance of the CART analysis In order to evaluate quantitatively the accuracy of the CART regression methods in the present investigation, the coefficient of determination (COD) [21], which is expressed in Eq. (1), can be computed for each regression method: (Yi − Y i )2 COD = 1 − i (1) ¯ 2 i (Yi − Yi ) where Yi is the measured value and i is the index number ranging from 1 to n and n is the total number of entries, Y i is the predicted value and Y¯ is the average of Yi . The COD takes on a value in the range of [0,1] in which a value of 1 indicates a perfect prediction. The data of the COD for the predictions obtained through the optimal tree, Bagging, ARCing and the stress as the dependent variable are recorded in Table 4. In order to compare the performance of the CART analysis with the curvilinear regression method, the prediction values of the rupture life were computed for the same testing data using the curvilinear regression equation documented for the austenitic stainless steels [19] and a plot of the predicted values versus the measured data is shown in Fig. 7. In Fig. 7, the predicted data were obtained using the curvilinear regression equation of logarithmic stress by the Manson–Haferd method [19]. The equation is k log th = log ta + [T + 273.15 − Ta ] (2) bi (log S)i i=0

where th is the creep rupture life in hour, ta and Ta are the optimized constants, T the testing temperature in ◦ C, S the stress in MPa, k the degree of the regression equation, and bi ’s are regression coefficients estimated by the method of least squares [19]. The values of log ta , Ta and bi for a variety of austenitic

Fig. 7. Predicted rupture life vs. measured rupture life by the curvilinear regression.

steels were extracted from NIMS website [19] and are listed in Table 5. The value 0.89 of COD for the curvilinear regression in Fig. 7 is recorded in the last row in Table 4. An inspection of the data in Table 4 reveals three observations. First, the committee approach has a modest improvement in prediction accuracy over the single optimal tree which indicates that the original training data are fairly stable. This result is in agreement with the original speculations discussed in Section 2 on the selection of the NIMS database [19]. Second, when the stress is used as the response variable, the accuracy increases significantly. While the exact reason for the improvement is not clear, it may be related to the fact that the stress is a simple variable that can be precisely controlled and measured while the rupture life is affected by many factors and is more likely to incur data scattering during the creep rupture tests. Finally, the CART models consistently outperform the curvilinear regression technique in the present investigation. It should be pointed out that the testing data set includes the data from four different types of austenitic steels as indicated in the first column of Table 5. The predictions by the CART methods as shown in Figs. 4–6 are conducted using single prediction engines, i.e., one prediction model is applied to all the testing data for four types of the steels. Nevertheless, the prediction by curvilinear regression shown in Fig. 7 is achieved with four regression equations with each applied to a portion of the testing data for one type of the steels. It would imply that, if a single regression equation had been used for all the testing data, the performance of the curvilinear regression would have been inferior to its current situation. 4.3. The rules from the CART analysis When an optimal tree is built, each terminal node is associated with a set of rules, which record the split criteria that lead

Table 5 Constants in the curvilinear regression equations for austenitic steels [19] Steel name

Ta

log ta

b0

b1

b2

b3

b4

b5

SUS 304H TB SUS 321H TB SUS 316-HP SUS 316-B

650 360 720 720

10.37846 14.12465 9.222631 8.446851

0.05075 0.706352 1.572912 −0.30077

0.051484 −2.0332 −4.76518 0.703243

0.01884 2.261936 5.686968 −0.62687

– 1.24022 3.36375 0.244349

– 0.335164 0.984457 −0.03592

– 0.03582 0.11426 –


267

Table 6 The examples of the rule set pairs for AGSN comparison Node A

Node B

Observations

/*Terminal Node 27*/ if ( ( Processing History = = 3|| ... ) && Testing Temperature > 612.5 && Testing Temperature < = 662.5 && Nb + Ta < = 0.38 && Stress > 103 && Stress < = 132 && Cr ≤ 18.245 && Austenite Grain Size Number < = 4.45 ) then The mean of Rupture life = 4.38995

/*Terminal Node 28*/ if ( ( Processing History = = 3|| ... ) && Testing Temperature > 612.5 && Testing Temperature < = 662.5 && Nb + Ta < = 0.38 && Stress > 103 && Stress < = 132 && Cr < = 18.245 && Austenite Grain Size Number > 4.45 ) then The mean of Rupture life = 4.11862

When AGSN increases, CRL decreases

/*Terminal Node 37*/ if ( ( Processing History = = 1|| ... ) && Testing Temperature > 662.5 && Testing Temperature < = 712.5 && Nb + Ta < = 0.455 && P > 0.0245 && Stress < = 50 && Austenite Grain Size Number < = 5.35 ) then The mean of Rupture life = 4.57123

/*Terminal Node 39*/ if ( ( Processing History = = 1|| ... ) && Testing Temperature > 662.5 && Testing Temperature < = 712.5 && Nb + Ta < = 0.455 && P > 0.0245 && Stress < = 50 && Austenite Grain Size Number > 5.35 ) then The mean of Rupture life = 4.41096

When AGSN increases, CRL decreases

to the formation of that specific node. An investigation of the rules will reveal useful insight about the relationship between the dependent variable and a specific predictor variable and the interactions between the predictors. In this study, we are able to extract useful information such as how an element in the austenitic steel affects its creep rupture life, how two elements interact with each other during the creep, and the relationship between creep rupture and grain size. For illustration, we present an example for the relationship of the creep rupture life (CRL) and the austenite grain size number (AGSN). A more detailed analysis will be reported in a separate paper. In order to examine the relationship between CRL and AGSN, the rule set pairs are sought for in such a way that in two rule sets, the only difference lies in the rule item on the AGSN. Of all the terminal nodes in the optimal tree, there are seven pairs of rule sets that satisfy the requirement and the examples of two pairs are recorded in Table 6. It should be mentioned that the grain size number increases with decrease in grain size [25]. An inspection of seven pairs of the rule sets reveals three observations. First, six pairs exhibit the same relationship: when AGSN increases (the grain size decreases), CRL decreases. This result is in agreement with the results obtained in the AISI 316 steels using traditional regression methods [26]. Second, such an increase–decrease

relationship may break down when a comparison is conducted among different pairs. For instance, a comparison of Terminal nodes 27 in the second row of Table 6 and the terminal node 39 in the third row shows the opposite trend. Finally, one pair of rule sets shows the opposite trend: when AGSN increases, CRL increases, even though the rule items on other variables are the same. These results may imply that the relationship between CRL and AGSN in the austenitic stainless steels may be local, i.e., only valid within a short range under certain conditions, and may be strongly influenced by the interactions of other materials variables. 5. Conclusions The present analysis has demonstrated that CART as a nonparametric classification and regression technique is an effective alternative to the traditional parametric regression methods in predicting materials properties and behavior. Due to its nonparametric nature, CART has the advantages over parametricbased techniques in processing categorical data and addressing the missing data issue. The application of CART to the creep rupture data of austenitic stainless steels has demonstrated that the CART analysis is able to perform a better prediction than the conventional curvilinear regression method in terms of predic-

268


tion accuracy. Moreover, the output of rule sets from the CART analysis can provide useful insight into the relationships between the response and the predictor variables, the relative importance of predictor variables, and the interactions between the predictors. It will be very useful in understanding materials behavior and of interest to materials design. Acknowledgements The author would like to thank Professor Y.T. Chou at the University of California, Irvine and Dr. Xianping Ge at Google for useful discussions. References [1] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees, Wadsworth, Inc., Monterey, California, 1984. [2] B.G. Sumpter, A.A. Gakh, D.W. Noid, Proceedings of the Intelligent Engineering Systems Through Artificial Neural Networks, St. Louis, MO, USA, 1994, pp. 863–868. [3] R.B. Yao, C.X. Tang, G.X. Sun, AFS Trans. 104 (1996) 635. [4] H.K.D.H. Bhadeshia, ISIJ Int. 10 (1999) 966. [5] M.-Y. Chen, D.A. S Linkens, Proceedings of the Second International Conference on Intelligent Processing and Manufacturing of Materials, Part, vol.1, IEEE, Piscataway, NJ, 1999, pp. 395–400. [6] T. Sourmail, H.K.D.H. Bhadeshia, D.J.C. MacKay, Mater. Sci. Technol. 18 (2002) 655.

[7] Q. Hancheng, X. Bocai, L. Shangzheng, W. Fagen, J. Mater. Process. Technol. 122 (2002) 196. [8] Y. Huang, P.L. Blackwell, Mater. Sci. Technol. 19 (2003) 461. [9] D. Cebon, M.F. Ashby, Eng. Des. 26 (2000) 8. [10] Y. Li, Adv. Eng. Mater. 6 (2004) 92. [11] M.F. Ashby, Materials Selection in Mechanical Design, 3rd ed., Butterworth Heinemann, Oxford, 2005, Chapter 4. [12] J.D. Ullman, J. Widom, A first course in database systems, 2nd ed., PrenticeHall, Upper Saddle River, NJ, 2002. [13] S.C. Lemon, J. Roy, M.A. Clark, P.D. Friedmann, W. Rakowski, Ann. Behav. Med. 26 (2003) 172. [14] I. Zlobec, R. Steele, N. Nigam, C.C. Compton, Clin. Cancer Res. 11 (2005) 5440. [15] T.B. Spruill, W.J. Showers, S.S. Howe, J. Environ. Qual. 31 (2002) 1538. [16] V. Subramanian, K.K.S. Buck, D.E. Block, Am. J. Enol. Vitic. 52 (2001) 175. [17] C.P. Sturrock, W.F. Bogaerts, Corrosion 53 (1997) 333. [18] D. Steinberg, P.L. Colla, CART: Tree-structured nonparametric data analysis, Salford Systems, San Diego, CA, 1995. [19] http://mits.nims.go.jp/db top eng.htm. [20] http://www.salford-systems.com. [21] T.P. Ryan, Modern Regression Methods, John Wiley & Sons, Inc., New York, 1997. [22] Y. Freund, R.E. Schapire, Proceedings of the Thirteenth International Conference of Machine Learning, Morgan Kaufmann Publishers, San Francisco, CA, 1996, pp. 148–156. [23] L. Breiman, Mach. Learn. 24 (1996) 123. [24] L. Breiman, Neural Comput. 11 (1999) 1493. [25] http://www.metallography.com/grain.htm. [26] P. Marshall, Austenitic stainless steels, in: Microstructure and Mechanical Properties, Elsevier, London, 1984, Chapter 5.

Predicting materials properties and behavior using ...

Predicting materials properties and behavior using ...

Suggest Documents