Available online at www.sciencedirect.com
ScienceDirect Procedia Manufacturing 10 (2017) 1009 – 1019
45th SME North American Manufacturing Research Conference, NAMRC 45, LA, USA
Data-driven Weld Nugget Width Prediction with Decision Tree Algorithm Fahim Ahmeda, Kyoung-Yun Kimb * a
[email protected], b
[email protected] ab Department of Industrial and Systems Engineering, Wayne State University, Detroit, MI, 48202, USA
Abstract This paper presents the capability of a decision tree algorithm to realize a data-driven resistance spot welding (RSW) weldability prediction. Although RSW provides commendable advantages, such as low cost and high speed/high volume operations, the RSW processes are often inconsistent and these significant inconsistencies are a well-known reliability issue. RSW process and data challenges including inconsistency often hinder the utilization of the data-driven weldability prediction. In this paper, we apply a decision tree algorithm on the RSW dataset collected from an automotive OEM to plot regression trees and to extract decision rules for the weld nugget width prediction. With three RSW test datasets, we conclude that the decision trees help in predicting the nugget width and in determining the impact of design and process parameters to the nugget width response variable. © 2017 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license © 2017 The Authors. Published by Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-reviewunder underresponsibility responsibility of the Scientific Committee of 45th NAMRI/SME. Peer-review of the organizing committee of the SME North American Manufacturing Research Conference Keywords: Data-driven weldability prediction; resistance spot welding; decision tree algorithm; weld nugget
* Corresponding author. E-mail address:
[email protected]
2351-9789 © 2017 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of the 45th SME North American Manufacturing Research Conference doi:10.1016/j.promfg.2017.07.092
1010
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019
1. Introduction Resistance spot welding (RSW) is a process, in which contacting metal surfaces are joined by the heat obtained from the resistance to the electric current. As RSW is inexpensive to implement and has a good capability for automation, it is considered nowadays as the primary sheet metal welding method of the automotive industry [1]. In the RSW process, important quality parameters are weld nugget geometry (e.g., nugget size) and the mechanical behavior of the weld, which provide an idea about the quality performance and process robustness. To create a better-quality welding, the proper understanding of the input parameters and of the correlation among the input parameters with the response parameters (i.e., nugget width) is essential. System integrators/OEMs working with products with metallic structures often rely on the material suppliers and testing service companies to conduct the actual physical testing of new (or new combinations of) materials and new weldment designs. However, if the suppliers or testing service providers do not provide the test results in the right time, the selection of new materials and the new weldment designs are often delayed or obstructed. Alternatively, industry often utilizes numerical simulations (e.g., finite element analysis). However, the multi-physical nature of RSW often obstructs the realistic numerical simulation. Utilizing the accumulated or legacy RSW test data for new process and weldment design is a promising solution. However, the RSW processes are often inconsistent and its significant data inconsistency is a well-known reliability issue. Data inconsistency results in significant noise in data and must be resolved before the data is used to extract reliable information by conducting analysis or building models [27, 28]. The data inconsistency includes the noise that exists in the dataset. Specifically, the same input parameters (e.g., weld schedule) of different test cases often provide different weld responses (e.g., nugget width). The weld schedule includes weld current, time, force, etc. The noise resulting from this inconsistency creates issues for engineers who are using the data for making welding design decisions. While various prediction algorithms [2, 18, 9] have been developed, the inconsistency issue creates a significant variability in the weld response. This prevents from building more accurate prediction models. The data inconsistency in the RSW quality dataset is reported by authors in Kim et al. [6]. To tackle the RSW data issues and to realize robust weldability prediction, we aim to provide a data-driven weldability prediction framework that can extract regression rules from the inconsistent RSW datasets. In the overall framework, the extracted rules are intended to be converted to formal RSW rules for better knowledge sharing. This paper focuses on only the regression rule mining aspect of the envisioned framework. In this paper, a decision tree algorithm (i.e., CART) is selected for rule extraction. This CART algorithm is tested with the real RSW quality dataset with high data inconsistency. The test results are reported as the validation of the algorithm.
2. Literature review 2.1. Data-driven approach in design and manufacturing The decision-making process in engineering design and manufacturing domain is of complex nature. Developing a system that is going to support decision making is an arduous task, prone to errors and with a high probability of failure [19]. In every industry, vast amounts of data are now available and they utilize for gaining competitive advantage in design, manufacturing, analysis or decision-making. This phenomenon has given rise to the application of data science. Data science is the joining link between data-processing technologies (including ‘‘big data’’) and data-driven decision making [17]. Data mining discovers the underlying patterns from large datasets and applies algorithms for the extraction of rules, trends, or to make sense of the dataset. The growing interest in data mining has led to the development of many algorithms that extract knowledge (e.g., rules) and features from large datasets [8]. Data mining is being applied in various industries nowadays, including semiconductor manufacturing [25], electronic assembly [7], and health care industry [22]. According to Yang & Trewn [24], data mining and knowledge discovery in databases have been successfully used for solving quality inspection and control problems in various stages of product life cycle. Data-driven weldability prediction can be considered as a solution to this predicament; however, it is being underutilized for various reasons. First, legacy data obtained from physical experiments often does not conform to
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019
design requirements (especially if the material combination is new). Second, often the data is documented in a format that cannot be easily utilized and widely shared. Third, due to the properties of welding materials and the complexity of welding processes, welding test results tend to be inconsistent and form highly-variable datasets [12,15]. To overcome these issues, a project titled VRWP (Virtually Guided RSW Weldability Prediction) has been formed. This project aims to advance a virtually guided RSW weldability qualification environment and allows an OEM and suppliers to rapidly converge on the feasibility of weldment designs during the early development stages. Specific tasks include constructing a shareable weldability knowledge database, developing a web-based weldability prediction platform, and using that platform to effectively predict the response parameters of RSW process for future cases (for existing or new environment) reducing the effects of data inconsistency. VRWP is a part of a DMDII (Digital Manufacturing and Design Innovation Institute) funded project in coalition with the automotive industry. For this project, researchers apply different machine learning algorithms to learn about the data and the parameters involved and the correlation between the RSW parameters. Also, a physical testing and numerical simulation are also performed to increase the data quality for advanced high strength steel materials. These results will be reported as a separate article. In this paper, we report the impact of each RSW parameter on the response parameters (i.e., nugget width) in terms of regression rule generation and an initial analysis of the capability of the decision tree algorithm to predict the nugget width.
2.2. Machine learning approaches for welding applications Ouafi et al. [10] develop an on-line welding quality assessment system for resistant spot welding (RSW), which can predict the significant parameters for determining welding quality like nugget width and penetration. The system is based on multi-layer perceptron (MLP). MLP has been used by other researchers as well. Kim et al. [5] apply MLP to arc welding parameter selection problems to avoid improper welding designs. Sohmshetty et al. [20] use an MLP based approach for weldability prediction. Pal et al. [11] develop a neural network-based system employing arc signals to model the weld joint strength in pulsed metal inert gas welding. Sumesh et al. [21] categorize different types of welding quality using decision tree and random forest algorithms. Martín et al. [9] also apply decision tree and random forest algorithms to monitor quality control of resistance spot welding (RSW). One of the major concerns while dealing with welding data is the inconsistencies relating to the datasets. Park et al. [13] and Park & Kim [14] address this issue in their respective research works. They present, in their research papers, at first, the inconsistent welding quality data. Then they propose a prediction framework improving the prediction accuracy and computational efficiency of the inconsistent datasets. The proposed framework employs bootstrap aggregating models with support vector regression (SVR). Now, it is evident that significant strides have been taken in predictive modeling of welding cases even if there are inconsistencies in the dataset. Kim et al. [6] further develop this by proposing a new prediction performance measure called mean acceptable error (MACE). MACE tries to measure the performance of prediction models constructed with the presence of proper-inconsistency, which can reduce the effects of inconsistent data on other normal data in the dataset. Park & Kim [16] propose a new approach called ‘instance variant nearest neighbor’. This approach approximates a regression surface of a function using the concept of k nearest neighbor.
3. Weldability dataset This paper utilizes three Resistance Spot Welding (RSW) quality datasets. The datasets are collected from a welding data repository, which had been accumulated over many years by performing physical testing conducted by welding engineers. The numbers of the test cases are shown in Table 1. In total, three welding datasets include 3,573 welding cases. A response parameter included in this dataset is the nugget width. Each of these datasets includes two feature types (i.e., design and process) of RSW parameters. The design parameters represent the weldment design conditions. The process parameters represent the weld schedule used for these two material sets. The dataset includes 11 input parameters (for both materials) and one response
1011
1012
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019
parameter as a welding quality measure (i.e., nugget width). The nugget width is expressed in millimeters. Table 2 shows the RSW design and process parameters along with the response parameter. The data includes various advanced high strength steel material including DP600 and DP780, high-strength, low-alloy steel (HSLA), and dual phase (DP) steel, etc. The specific material names are not exposed due to the confidentiality. Table 1. RSW Quality Dataset. Dataset
Test cases
1 2 3
1370 1077 1126
Table 2. Parameters in the Welding Quality Dataset. Feature types
Feature names with units
Design Parameters (DP)
Material Thickness (mm) Coating-EG (Electrogalvanized) Coating – HDG Coating weight (gm/m2) Surface class
Process Parameters (PP)
Weld force (Lbs) Minimum button diameter of stack-up (mm) Weld current (kA) Weld time (ms) Weld time cycle
Response Parameters (RP)
Nugget width (mm)
4. Decision tree algorithm and experiments
In this paper, the authors investigate the applicability of the decision tree algorithm for a decision-tree based prediction model for RSW with real manufacturing datasets collected from the industry. For the research, we employ a Decision Tree (DT) algorithm for classifying the data and to extract rules from the welding dataset. A decision tree resembles the tree structure. A tree is a hierarchical organization of the collection of nodes and links, in which every node, except the root node, has one incoming link. Each node is a predictive feature and a link represents a value of each conditional variable (i.e., feature). The leaf or terminal nodes are the predicted values of the predictor variable based on the training set patterns. The variable at each node of the branch is the most important or informative towards the leaf node. When a new sample or instance of the test set is obtained, a decision or prediction about the state of the predictor variable can be generated by following the path in the tree from the root to a leaf, using the tree structure. The internal nodes contain splits, which test the value of an expression of the attributes. Paths starting from an internal node to its children culminates into distinct outcomes of the test. Many decision tree learning algorithms (e.g., C4.5, ID3, SLIQ or SPRINT) are used and they have their own distinct features [2, 18]. In our case, we use Classification and Regression Trees (CART) algorithm. CART is a nonparametric decision tree algorithm that generates either classification or regression tree, based on whether the response variable is categorical or continuous [3]. An advantage of using CART algorithm is that the algorithm
1013
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019
performs automatic variable selection. In other methods like SVM and ANN, feature selection is required. Also, CART handles missing values of the dataset using surrogate splits. The logic behind the conditional structure of DT is easy to understand and the patterns and decision rules can be easily extracted using this algorithm. The rules are the useful information extracted from the structure of DT, which is in the form of IF-THEN statements. Each rule starts at the root node and each variable that intervenes in the tree division makes an IF of the rule. The rule ends in a leaf node with a value of THEN. The resulting leaf node is the class occurring the most times or the predicted value of the variable of interest. This method includes classical algorithms like CART [2], C4.5 [18], etc. Classification and regression trees are machine-learning methods for constructing prediction models from the dataset. To obtain the prediction model, the system recursively partitions the data space and fits a simple prediction model within each partition. Thus, the partitioning can be represented graphically as a decision tree. Mathematical representation of DT model is given below ܻ = ݎഥ(ܺ௧ ) = [ܺ(ݎ௧ , ߣ, ])ܦ
(1)
where ܻ = final predicted value; ݎҧ = average regression function; X t = observation from test set; Ȝ = random parameter of partition; D = total data A CART tree follows a top-down approach, built by connecting the nodes with the help of variable-value pair reaching a leaf node. In this paper, we build a model for accurate prediction of the nugget width for RSW. X t is the observation from the test set taken from dataset D. Here Ȝ is a random parameter of partition, which is taken as any m values from the n variables from the test set, where m < n. It is the number of predictors sampled for splitting at each node with the default value of m as n/3 for regression. Here, m can be chosen by cross-validation. Bootstrap aggregating is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms, which is usually applied to DT methods. When building these decision trees, each time a split in a tree is considered, a random sample of m predictors is chosen as split candidates from the full set of n predictors. The split can use only one of those m predictors. For m, it is recommended that the default value for m is n/3 for regression by [31, 32]. The ensemble output of these expected values is given as the final predicted value. Y in our case is the predicted value, which is a leaf node of each branch of the tree. 4.1. Decision tree formalism and decision tree generation Decision tree algorithm can be mathematically formulated to better understand, to automate the process, and to interoperate with other systems. Our data consists of p inputs and a response, for each of N observations: that is, (x i , y i ) for i = 1, 2, …, N, with x i = (x i1 , x i2 , … , x ip ). The algorithm needs to decide on the splitting variables and split points, and the topology (shape) of the tree [26]. Suppose that we have a partition into M regions P 1 , P 2 , … , P M , and we model the response as a constant cm in each region ெ
f (x) = ୀଵ ܿ ܲ א ݔ(ܫ ) If we adopt as our criterion, minimization of the sum of squares (y i – f (x i ))2, it is easy to see that the best ܿƸ just the average of y i in region R m : ܿƸ
R
m
= ave (y i |x i אR m )
(2) R
m
is
(3)
CART and decision trees like algorithms work through recursive partitioning of the training set to obtain subsets that are as pure as possible to a given target class. Each node of the tree is associated with a set of records B that is divided by a splitting criterion. For example, in our case, the nugget width is the response parameter (RP) and it is a
1014
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019
continuous attribute. The predictor parameters range from the process parameters (PP) (e.g., weld current, weld force, etc.) to design parameters (DP) (e.g., coating, material, etc.). A split on a continuous attribute A can be induced by the test A İ x. The set of records B is then partitioned into two subsets that lead to the left branch of the tree and to the right one. Starting with all the data, we consider a splitting variable j and split point x, and define the pair of half-planes [26]. B l(j,x) = {b אB : b(A) j x} and B r(j,x) = {b אB : b(A) j > x},
where PP אB, DP אB
(4)
Then, we seek the splitting variable j and split point x that solve ݉݅݊,௫ [݉݅݊ଵ σ א(ೕ,ೣ)
(ݕ െ ܿଵ )ଶ + ݉݅݊ଶ σ אೝ(ೕ,ೣ)
(ݕ െ ܿଶ )ଶ ]
(5)
For any choice j and s, the inner minimization is solved by ܿƸ
R
1
= ave (y i |b i ܤ א(,௫) ) and ܿƸ
R
2
= ave (y i |b i ܤ א(,௫) )
(6)
The dividing step of the recursive algorithm is to induce a decision tree considering all possible splits for each feature and tries to find the best one based on a chosen quality measure: the splitting criterion. For each splitting variable, the determination of the split point x can be done very quickly and hence by scanning through all inputs, determination of the best pair (j, x) is feasible. Having found the best split, we partition the data into the two resulting regions and repeat the splitting process on each of the two regions. Then this process is repeated on all resulting regions. The selection of the best split is usually carried out by impurity measures. For the regression tree, the sum of squared deviation is the impurity measure [29]. The impurity of the parent node should be decreased by the split. 4.2. Experiments To extract rules from the welding dataset, we use the R software, which is mainly used for statistical computing and graphics by statisticians and data miners for data analysis. We set up the environment in RStudio. At first, we divide the welding dataset into two parts. 50% of the whole dataset is considered as a training dataset and the rest as a test dataset. We also take 70-30 and 80-20 to be the other percentage splits. The splits are randomly chosen. We use the “tree” package to build the decision trees. There is a built-in “tree” function that we have used for the forecasting of the predictor variable, nugget width. For percentage split we have used the “sample” function. For drawing the decision tree, we use the “plot” & “text” functions. The program built a regression model on the training dataset. The output of the regression model is plotted as a decision tree. The regression trees for 50 – 50 percentage split for each dataset are shown in Fig. 1, 2, and 3. To develop the decision tree, the program first selects several random variables out of all parameters of a dataset. It is to find out the significance of those variables on the nugget width. The significance is an importance measure of each predictor parameter on the response parameter. The variable, which has the most significant relation with the nugget width is considered as a root node. In our case, for the dataset 1, the weld current is considered as root node as it has the strongest relation with nugget width. Then the program selects the same number of random variables again from the dataset to find out their relation with the nugget width and to find out the intermediary nodes. The program is going to repeat the same procedure until it reaches the leaf node (i.e., predicted nugget width values). The total number of random variable chosen at each step is based on the equation ξܰ, where N is the total number of variables of a certain dataset [30]. Suppose for dataset 1, N = 17. Therefore, the number of a random variable selected at each step will be ξ17 = 4.124 ൎ 4. To reach the leaf node, the program maintains the simple rule that the mean of the nugget width can vary but the standard deviation should be as low as possible. Using the tree package of R, we build a regression tree. The tree is grown to the full length without any stopping criteria. Here generally the stopping criterion is referred to as - when to stop trying to split at a node. The aim of the regression model is to minimize Mean Squared Error (MSE) occurring at the output. The model sets criteria for one
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019
of the nodes containing values to be less than input variable or whether the highest reduction in MSE is less than some predetermined threshold value or not.
Fig. 1. Decision Tree for Dataset 1 (Training = 50%).
1015
1016
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019 Fig. 2. Decision Tree for Dataset 2 (Training = 50%).
Fig. 3. Decision Tree for Dataset 3 (Training = 50%).
2500
Deviance
2000 1500 1000 500 0 1
3
4
5
6
Size
7
9
11
12
1017
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019
Fig. 4. Cross Validation for Tree Size.
If it conforms, then the model stops splitting the nodes and consider the current node as a leaf node. However, setting a criterion may result in under-fitting or early stoppage of splitting. Hence, allowing a tree to grow to full length and performing cross-validation may result in a more informative tree. The size of the tree is generally the mean number of leaf nodes present in the structure. For example, as shown in Fig. 1, we build a tree from the dataset 1, perform the cross-validation, and find that the minimum deviance occurs at tree size 12 with MSE of prediction to be 1.12. Fig. 4 illustrates the same. The cross validation has been conducted for multiple iterations as well. For example, for the dataset 3 for 10 iterations of cross validation, the mean error value was 0.9875. To evaluate the performance of generated trees, we divide the dataset into two parts using percentage split – training and testing as mentioned earlier. Then we plot the regression tree based on the training dataset. We use the testing set to fit the generated model. The MSE value for training and testing sets should be very similar, if the plotted model has good predictive nature. If MSE for the test set is much higher than that of the training set, there is a high probability that we over fit the data. That means, the model tests well in training, but has little predictive value when tested with unprecedented records. We calculate the residual mean deviance of the training dataset (after percentage split) using an R function. Then we calculate the MSE values of the testing dataset using the formula in R. The formula of MSE is shown below – = ܧܵܯ
ଵ
σୀଵ(ݕ – ݂(ݕ ))ଶ
(7)
Here, ݕ = actual value; f (ݕ ) = predicted value; and n = number of observations. The MSE values are shown in Table 3. It is evident that the MSE values are similar. Table 3. Performance measure of the datasets. Test Sets
Residual mean deviance
MSE
Training = 80%
Training = 70%
Training = 50%
Testing = 20%
Testing = 30%
Testing =50%
1
1.188
1.230
1.333
0.876
0.905
1.118
2
1.275
1.263
1.134
1.307
1.326
1.419
3
2.445
2.599
2.475
2.631
2.626
2.658
5. Conclusion We applied the decision tree algorithm on the RSW dataset that we collected from the automotive OEM to plot regression trees and to extract decision rules. The decision trees help us in determining the impact of each parameter on the response parameter, so that we can better understand the priority of importance of each of the design and process parameters. When new (or new combinations of) materials are considered for an assembly, industry often requires new physical tests or numerical simulations (e.g., finite element analysis). Data-driven predictive modeling tool can be considered to effectively analyze and predict the response parameter (i.e., nugget width). This presented approach generates a prediction model that can recommend potentially desired weldment designs by predicting their welding quality. One of the most significant advantages of this approach is that engineers can avoid conducting physical welding tests on many candidate designs while focusing on recommendations provided by the prediction model. That is why; in RSW, if we can create a prediction model that can recommend the response or output parameter such as nugget width given a weldment design accurately, then it will support the designers in making better weldment design decisions. The authors are keen to test other machine learning techniques on the same dataset and to develop an RSW prediction platform based on different data mining techniques. We will also consider other response parameters
1018
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019
(e.g., expulsion and mechanical properties of the weld). Decision tree algorithm is described in this paper, whereas, application of other machine learning algorithms will be described in subsequent papers. To develop a machine interoperable weldability knowledge sharing, the authors are also keen to develop a semantic framework, which will work with welding ontology. The decision rules generated from this research will be converted into semantic rules for helping the collaborative designers to make a better decision regarding RSW weldability.
Acknowledgement This research is funded by DMDII Project: VRWP: Virtually Guided RSW Weldability Prediction (DMDII-1507-04).
References [1]
O. Andersson, A. Melander, Prediction and verification of resistance spot welding results of ultra-high strength steels through FE simulations. Modeling and Numerical Simulation of Material Science, 5 (1), (2015) 26-37. [2] L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and Regression Trees, Wadsworth International Group, 1984. [3] F. Chen, P. Deng, J. Wan, D. Zhang, A. V. Vasilakos, et al., Data mining for the internet of things: literature review and challenges. International Journal of Distributed Sensor Networks, 11 (8), 2015. [4] S. Ghosal, S. Chaki, Estimation and optimization of depth of penetration in hybrid CO2 LASER-MIG welding using ANN-optimization hybrid model. Int. J. Adv. Manuf. Technol. (Springer), 47 (9–12), (2010), 1149–1157. [5] I. S. Kim, Y. Jeong, C. Lee, P. Yarlagadda, Prediction of welding parameters for pipeline welding using an intelligent system. Int. J. Adv. Manuf. Technol. (Springer), 22 (9-10), (2003), 713–719. [6] K.Y. Kim, J. Park, R. Sohmshetty, Prediction measurement with mean acceptable error for proper inconsistency in noisy weldability prediction data. Robotics and Computer-Integrated Manufacturing, 43 (2017) 18–29. [7] A. Kusiak, C. Kurasek, Data mining analysis of printed-circuit board defects. IEEE Trans. Robot. Autom., 17 (2), (2001) 191–196. [8] A. Kusiak, Data mining: manufacturing and service applications. International Journal of Production Research. 44 (18-19), (2006) 4175– 4191. [9] Ó. Martín, , M. Pereda, J.I. Santos, J. M. Galán, Assessment of resistance spot welding quality based on ultrasonic testing and tree-based techniques. J. Mater. Process. Technol., 214 (11), (2014) 2478–2487. [10] El. A., Ouafi, R. Bélanger, and J. Méthot, Artificial neural network-based resistance spot welding quality assessment system. Revue Métall. Cambridge University Press., 108 (6), (2011) 343–355. [11] S. Pal, S.K. Pal, A.K. Samantaray, Artificial neural network modeling of weld joint strength prediction of a pulsed metal inert gas welding process using arc signals. Journal of Materials Processing Technology, 202 (1-3), (2008) 464–474. [12] J. Park, K.Y. Kim, Fitness and constraint function approximation using meta-modeling for design optimization. In 2015 Industrial and Systems Engineering Research Conference. Nashville, TN, ISERC 2015. [13] J. Park, K.Y. Kim, R. Sohmshetty, A prediction modeling framework: toward integration of noisy manufacturing data and product design. In ASME 2015 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Boston, Massachusetts, USA, 2015. [14] J. Park, K.Y. Kim, Prediction modeling framework with bootstrap aggregating for noisy resistance spot welding data. ASME Journal of Manufacturing Science and Engineering, 2015 (in preparation). [15] J. Park, K.Y. Kim, R. Sohmshetty, Towards proper-inconsistency in weldability prediction using k-nearest neighbor regression and generalized regression neural network with mean acceptable error. 24th International Conference on Flexible Automation and Intelligent Manufacturing, FAIM2014. San Antonio, TX, 2014. [16] J. Park, K.Y. Kim, Instance variant nearest neighbor using particle swarm optimization for function approximation. Applied Soft Computing, 40 (2016) 331-341. [17] F. Provost, T. Fawcett, Data science and its relationship to big data and data-driven decision making. Big Data, 1 (1), (2013) 51-59. [18] J.R. Quinlan, Programs for Machine Learning. Morgan Kaufmann Publishers, 1994. [19]Y. Reich, A. Kapeliuk, A framework for organizing the space of decision problems with application to solving subjective, context-dependent problems. Decision Support Systems, 41 (1), (2005) 1–19. [20] R. Sohmshetty, R. Ramachandra, T. Coon, K. Choi, K. Y. Kim, Weldability prediction of AHSS stackups using artificial neural network models. SAE Technical Paper, SAE World Congress, 14 , 2012. [21] A. Sumesh, K. Rameshkumar, K. Mohandas, R. Shyam Babu, Use of machine learning algorithms for weld quality monitoring using acoustic signature. Proc. Comput. Sci., 50 (2015) 316–322.
Fahim Ahmed and Kyoung-Yun Kim / Procedia Manufacturing 10 (2017) 1009 – 1019 [22] J. Sun, C.K. Reddy, Big data analytics for healthcare. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (2013) 1525-1525. [23] G. Xu, J. Wen, C. Wang, X. Zhang, Quality monitoring for resistance spot welding using dynamic signals. In Mechatronics and Automation IEEE International Conference, (2009) 2495– 2499. [24] K. Yang, J. Trewn, Multivariate Statistical Methods in Quality Management. New York, McGraw-Hill, 2004. [25] Y. Zhu, J. He, Co-Clustering structural temporal data with applications to semiconductor manufacturing. ACM Trans. Knowl. Discov. Data, 10 (4), 2016. [26] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media, 2009. [27] V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Computing Surveys (CSUR), 41(1), (2009) 15. [28] J. A. Olvera-López, J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, J. Kittler, A review of instance selection methods, Artificial Intelligence Review, 34(2), (2010) 133–143. [29] W. Y. Loh, Classification and regression trees, WIREs Data Mining Knowl Discov, 1 (2011) 14-23. [30] H. Liu, H. Motoda (Eds.), Computational Methods of Feature Selection. Boca Raton: Chapman and Hall/CRC, 2007. [31] G. Biau, E. Scornet, A Random Forest Guided Tour, TEST, 25 (2016) 197. [32] A. Liaw, M. Wiener, Classification and regression by random Forest, R news, 2 (2002) 18–22.
1019