Hindawi Publishing Corporation Advances in Mechanical Engineering Volume 2014, Article ID 791242, 16 pages http://dx.doi.org/10.1155/2014/791242
Research Article Modelling an EDM Process Using Multilayer Perceptron Network, RSM, and High-Order Polynomial Khalid A. Al-Ghamdi1 and Elaine Aspinwall2 1 2
Department of Industrial Engineering, King Abdulaziz University, P.O. Box 80204, Jeddah 21589, Saudi Arabia School of Mechanical Engineering, University of Birmingham, Birmingham B15 2TT, UK
Correspondence should be addressed to Khalid A. Al-Ghamdi;
[email protected] Received 11 May 2014; Revised 27 July 2014; Accepted 25 August 2014; Published 27 October 2014 Academic Editor: Min Zhang Copyright Β© 2014 K. A. Al-Ghamdi and E. Aspinwall. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Owing to the inadequacy of modelling electrical discharge machining (EDM) processes based on physical laws, three empirical modelling methods have been adopted in this study, namely, multilayer perceptron (MLP), response surface models (RSM), and high-order polynomials (HOP). To date, no publications regarding the use of the latter approach were found in the literature for modelling EDM processes; there were however some related to the approximation quality of RSM versus that of MLP networks but no investigations assessing the performance of the latter method against HOP. This study attempts to fill this gap by comparing the performance of the three methods mentioned above when modelling an EDM process with a WC-Co workpiece material. Three models were developed to correlate the material removal rate (MRR) with current, on-time, off-time, and capacitance. The half-normal plot and analysis of variance were used to test the significance of the investigated parameters. Due to the complex interdependence pattern that current and pulse on-time exhibited, the approximation of RSM was poor while that of the HOP and MLP models was adequate. A confirmation run based on new factor levels was performed to test the modelsβ generalization. The performance of the HOP model was marginally inferior to that of the MLP, but based on the paired π‘-test, both models performed equally well.
1. Introduction EDM is one of the most widely known and used nonconventional machining processes [1]. Its theory was proposed by two Soviet scientists in the middle of the 1940s [2] and is very effective in machining materials of any hardness provided they are electric conductors [3, 4]. A major advantage of the process is its electrothermal nature which eliminates the need for any direct contact between the electrode and workpiece during machining, thereby eradicating mechanical stress, chatter, tool deformation, and vibration problems [5]. It involves a series of rapid, repetitive, and randomly distributed discrete electric sparks initiated using a pulse generator and applied within a constant spark gap between the tool and workpiece. The spark causes ionisation of the dielectric fluid at a certain voltage and creates an ionised plasma channel which acts as the heat source. Associated with this channel
is a large thermal flux which causes the surface temperature to rise above the liquidus temperature. Consequently, metal is removed by melting and vaporisation. The spark is employed for a certain duration followed by a similar period during which deionisation of the dielectric occurs and the gap is flushed of debris [6]. The stability of the process is maintained by a servo-controlled mechanism whereby a constant gap between the tool and the workpiece is sustained whilst impressing a preselected shape on the latter [7, 8]. The EDM process can be very effective for machining cemented carbide composites like cobalt-bonded tungsten carbide (WC-Co) which possesses excellent mechanical properties such as high strength, toughness, and hardness [9]. While conventionally used as a cutting tool material, the scope of WC-Co applications has been extended to incorporate molding material in metal forming, forging, squeeze casting, high-pressure die casting, and other applications
2 where hardness and wear resistance are critical requirements. The difficulties and high cost associated with machining WCCo using conventional processes are major barriers for its wider use [10], but being conductive, it can be machined using EDM [11]. The material removal mechanism associated with this process is however quite complex. Certain studies [12β14] have reported a remarkable stochastic behaviour of the MRR associated with EDM. This can be attributed to the random physical conditions observed at the machining zone and the irregularity of the electrical parameters of erosive impulses [14]. Consequently, the ability to establish a scientifically admissible theory correlating the process variables with the MRR is impaired and the task is predominantly accomplished by means of empirical or semiempirical modelling methods. Broadly, two approaches are adopted, namely, the use of conventional statistical modelling approaches such as the RSM and the application of soft computing algorithms such as artificial neural networks (ANN) and adaptive neurofuzzy inference systems (ANFIS). RSM, for example, was employed to relate the EDM parameters mathematically with the process key performance measures when machining Ti-6Al4V [15], πΎ-TiAl [16], siliconised silicon carbide [17], Al4Cu-6Si alloy-10 wt.% SiCP composites [18], DIN 1.2714 steel [19], WC-Co [20β22], and carbon steel (EN8) [23]. ANN techniques, on the other hand, have been used to model and predict the main EDM performance characteristics in the case of machining aluminium and iron [24], WC-Co [25], and various steel grades [26β31], while ANFIS has been used to model the wire electrodischarge machining (WEDM) of both AISI D5 [32] and AISI D2 steel [31]. The study in this paper centred on modelling the MRR associated with the EDM of WC-Co as a function of pulse current, pulse on-time, pulse off-time, and capacitance. Three approaches were adopted, namely, feed-forward MLP neural network, RSM, and HOP. To date, the latter has not been used to model the EDM process even though together with ANN, this class of models is very powerful for approximating nonlinear functional relationships. Paliwal and Kumar [33] conducted a comprehensive review of articles that involved a comparative study of ANN and statistical modelling techniques used for prediction and classification problems in a range of applications. While some of the cited studies compared the performance of ANN with that of RSM and classical regression models, none involved ANN against HOP. To address this aspect in this study, the predictive capabilities of the MLP networks, RSM, and HOP were compared using the mean square error (MSE) and the residualsβ ranges when modelling the EDM process described earlier. The methodology adopted in this study is firstly discussed. This is followed by a description of the experimental procedures. The results are then presented together with their analysis and discussion. The paper culminates with the main conclusions.
2. Methodology Design of experiments (DoE) is a powerful approach for exploring, understanding, and empirically modelling the
Advances in Mechanical Engineering X1
X2 .. . Xs
w1j w2j .. . wsj
s
s
Oj = β wij xi + πj β
i=1
f(β wij xi + πj ) f(Oj )
i=1
yj
πi
Figure 1: Schematic representation of a neuron.
cause-and-effect relationship between process parameters and the main performance characteristics [34]. Generally a DoE study is performed in three stages: planning, conducting and analysis, and interpretation [35]. The planning stage involves (i) recognizing the problem or the improvement opportunity, (ii) stating the objectives, (iii) selecting the performance measure(s) and the measurement system(s), (iv) determining the factors that might influence the chosen performance measure(s), (v) choosing the levels for the factors, (vi) finding an appropriate design, and (vii) assigning the factors to the columns of the selected design. Having accomplished these tasks, the next stage is to carry out the experiment as planned. Finally the results obtained are analysed and interpreted. In this study, the empirical modelling task of the analysis and interpretation stage was performed using MLP networks, RSM, and HOP. A brief description of each of these methods follows. 2.1. Multilayer Perceptron Networks. A neural network is a structure composed of a large number of simple processors (neurons) that are extensively interconnected, operate in parallel, and have the ability to learn complex nonlinear and multivariable relationships between processesβ inputs and outputs data. MLP (one of the most useful neural networks in function approximation) consists of neurons in an input layer, one or more hidden layers, and an output layer [36]. The number of neurons in the input and output layers are determined from the number of studied input parameters and output response variables, respectively. However, the decision regarding the number of hidden layers and the neurons in each layer should be made on the basis of the complexity of the studied system. A schematic representation of a neuron is given in Figure 1 which shows how the input parameters (π₯π ) are multiplied by the synaptic (link) weights π€ππ and summed together with the constant bias term ππ to yield the variable ππ which is then input to the activation function π(). The output variable π¦π is the outcome of substituting the variable ππ in the activation function π(). It can be fed into other neurons, in which case it is regarded as an input variable and treated in the same way as π₯π was dealt with. It can also resemble the predicted value of the response under study if the modelled neuron is one of those of the output layer [37]. The activation function is used to calculate the output of a neuron based on the sum of the weighted input signals. A common linear activation function is the identity while
Advances in Mechanical Engineering
3
the sigmoidal is the most widely used nonlinear function in the MLP networks [38]. To quantify the difference between the desired and the MLP network outputs, it is common to compute the error (πΈ) function as follows: π
π
2
πΈ = β β (π¦πππ β π¦πππ ) ,
(1)
π=1 π=1
where π¦πππ is the πth predicted response value of the πth output variable obtained using the constructed MLP network and π¦πππ is the πth actual experimental value of the πth output response. The training of a network can be thought of as an optimisation problem in which the optimal values of the weights (π€ππ ) and bias (ππ ) in Figure 1 that minimise the error function πΈ are sought. A frequently used training algorithm is the gradient descent back-propagation, in which the weights and bias values are updated using a first-order optimisation procedure that employs a steepest gradient descent technique [38]. Despite its popularity, it suffers from several limitations. In fact, the absence of reliable rules to determine the rate at which the step size (the amount by which variable values are changed) should be reduced during network training can lead to searching in a weight space region located far from the optimal one, rendering a very poor solution. Moreover, being reliant on first-order optimisation, the gradient descent training process is quite slow and can be trapped in local minima despite the use of a learning rate. To deal with these limitations, second-order nonlinear optimisation techniques should be used to improve the training speed and reliability [39, 40]. In this regard, LevenbergMarquardt is a powerful alternative back-propagation training algorithm that employs an approximate Hessian matrix and updates the weights utilising a second-order derivative of error vector as follows: ΞW = (H + π β I)
β1
T
β J β e,
(2)
where H = JT β J is the Hessian matrix of the error vector, J = βE is the Jacobian matrix of derivatives of each error to each weight, JT is its transpose, I is the identity matrix, and e is an error vector. The term π is called the damping factor, which is used to control the learning process. Its value is adjusted at each training iteration based on the error value (see (1)). In fact, when the reduction in error is rapid, a small π value is used, leading the Levenberg-Marquardt algorithm to be an approximation of the Gauss-Newton method, which is fast and reliable in finding optimal solutions. However, if an iteration renders an insufficient reduction in the error function, the value of π increases, bringing the algorithm closer to the gradient descent algorithm. Considering the latterβs inferior performance compared with that of the GaussNewton method, it is better to decrease the scalar π after each search algorithm step. Indeed, increasing π should be avoided unless an increase in the error associated with the reached step is noticed [37]. In summary, for a given number of input parameters and output variables, the use of the MLP network entails making
a decision about the number of hidden layers and that of neurons in each layer. It also involves selecting an appropriate activation function and a training algorithm. 2.2. Response Surface Models. RSM is a body of mathematical and statistical techniques useful for the modelling and analysis of problems in which one or more performance measures (response variables) are suspected of being affected by several process variables [41]. The form of the relationship between the response and the process variables is unknown but can be thought of as a functional mathematical relationship: ππ = π (π1 , π2 , . . . , ππ ) + π,
(3)
where ππ is the response and π1 , π2 , . . . , ππ are the process variables. The term βπβ is a surrogate for all those variables that are omitted from the model but collectively affect ππ . Therefore, the primary task in response modelling is to find a reasonable approximation of the βtrueβ functional relationship that underpins the process under study. Polynomial models are very effective for approximating continuous curves and surfaces. They are developed using a Taylor series expansion of terms involving the studied factors and their interactions. Generally, a polynomial model is said to be of order π if this is the power of one or more of its factor terms or if π is the sum of the powers of the factors involved in one of its interaction terms. Typically, a RSM study involves fitting an initial first-order polynomial model. If this is found to be insufficient for modelling the response over the investigated region, a second-order polynomial model is fitted. In fact, polynomials of second order are the most widely used functions in RSM [42]. Indeed, most of the designs such as the central composite design (CCD) and Box-Behnken are suitable for fitting second or lower order models. For βπβ factors, a second-order model comprises (π + 1)(π + 2)/2 terms and takes the following form: π
π
π
π=1
π=1
π