Int. J. Appl. Ceram. Technol., 10 [S1] E240–E246 (2013) DOI:10.1111/j.1744-7402.2012.02810.x
Support Vector Machine and Relevance Vector Machine for Prediction of Alumina and Pore Volume Fraction in Bioceramics Kangeyanallore Govindaswamy Shanmugam Gopinath Mechanical Engineering Department, Jaya Suriya Engineering College, Anna University, Chennai 600055, India
Soumen Pal,* Pijush Samui, and Bimal Kumar Sarkar Manufacturing Division, School of Mechanical and Building Sciences, VIT University, Vellore, 632014 Tamil Nadu, India Centre for Disaster Mitigation and Management, VIT University,Vellore, 632014 Tamil Nadu, India Nuclear and Medical Physics Division, School of Advanced Sciences, VIT University, Vellore, 632014 Tamil Nadu, India
The determination of wt% alumina (wa) and pore volume fraction (pv) in alumina-based bioceramics is important in ceramic engineering. This article adopts support vector machine (SVM) and relevance vector machine (RVM) for prediction of wa and pv based on SiC. SVM is firmly based on theory of statistical learning. RVM is based on a Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation. The developed SVM and RVM give equations for prediction of wa and pv. This article gives robust models based on SVM and RVM for prediction of wa and pv.
*
[email protected] © 2012 The American Ceramic Society
www.ceramics.org/ACT
SVM and RVM for Property Prediction in Bioceramics
Introduction Biocompatible ceramics have found extensive use in the human body in the past few decades. These are primarily used to replace or reconstruct damaged bones/tissues of the human body because of their biocompatibility and bioactivity characteristics with respect to the host tissue.1 In recent years, research on biocompatible ceramics has focussed attention on the fabrication of bioceramics with a porous configuration because the porous network allows the tissue to infiltrate, which further enhances the implant tissue attachment.2 Hydroxyapatite ceramics, in a porous form, can be infiltrated by bone tissue with the same characteristics as peri-implanted tissues.3 An important aspect for pore colonization is that they must be larger than 50–100 lm4 or even 250– 300 lm according to Klawiter et al.5 A number of techniques have been used with which pores can be generated in a ceramic body. Main principle in these methods is that the foreign combustible organic material present burns away during firing resulting in free spaces and voids in the final body. However, these methods are not very satisfactory because it is difficult to obtain homogeneous distribution of pores in the ceramic body, in particular when the organic powders reside in discrete pockets after mixing with the ceramic powders.6 Alumina-based biocompatible ceramic bodies has been significantly used in medical fraternity. Key parameters in these biocompatible bodies are the wt% alumina present (wa) and the pore volume fraction (pv). These parameters play a vital role in the performance in the human body. The determination of these parameters is an important task in ceramic engineering. Usually experimental methods have been used to determine alumina wt% and pore volume fraction in the alumina-based biocompatible bodies. However, these methods often become time consuming and expensive. Altinkok and Koker7 have successfully applied artificial neural network (ANN) for prediction of wa and pv based on experimental data. However, ANN has a few limitations such as black-box approach, arriving at local minima, overfitting problem, low generalization capability, etc. This article adopts Support Vector Machine (SVM) and Relevance Vector Machine (RVM) for prediction of wa and pv based on wt of SiC (ws) in the composition. SVM is constructed based on statistical
E241
learning theory.8,9 It has been used to solve different problems in engineering.10–12 RVM is a probabilistic version of SVM.13 Researchers have successfully adopted RVM for solving different problems in engineering.14–16 This study uses the database collected by Altinkok and Koker.7 This study has the following aims: To examine the capability of SVM and RVM for prediction of wa and pv. To develop equations based on the SVM and RVM. To carry out a comparative study between the developed SVM and RVM.
• • •
Explanation of Models Used Details of SVM This section will describe the methodology of SVM. Consider the following dataset. D ¼ fðx1 ; y1 Þ; ðx2 ;y2 Þ;...; ðxn ;yn Þg;x 2 R N ; y 2 R
ð1Þ
where x = input, y = output, RN = N-dimensional vector space and R = one-dimensional vector space. In this study, the input variable is ws and the output variables are wa and pv. So, x ¼ ½ws and y ¼ ½wa; pv : The e-insensitive loss function can be described in the following way: Le ð y Þ ¼ 0 for jf ðx Þ y j\e otherwise Le ð y Þ ¼ jf ðx Þ y j e
ð2Þ
where e = error insensitive zone. Equation (2) defines a e tube so that if the predicted value is within the tube, the loss is zero. Further, if the predicted value is outside the tube, the loss is equal to the absolute value of the deviation minus e. The main aim in SVM is to find a function f ðx Þ that gives the deviation of e from the actual output and at the same time is as flat as possible. Let us assume a linear function f ðx Þ ¼ ðw:x Þ þ b; w 2 R N ; b 2 r
ð3Þ
where w = an adjustable weight vector and b = the scalar threshold. Flatness in the case of Eq. (3) means that one seeks a small w. One way of obtaining this is by minimizing the Euclidean norm kw k2 . This is equivalent to the following convex optimization problem
International Journal of Applied Ceramic Technology—Gopinath, et al.
E242
1 kw k2 2 Subject to: yi ðhw:xi i þ b Þ e; i ¼ 1; 2; . . .; l ðhw:xi i þ b Þ yi e; i ¼ 1; 2; . . .; l Minimize
ð4Þ
The above convex optimization problem seems feasible. However, this may not be the case. We also may want to allow some errors. Analogously to the “soft margin” loss function17 which was used in SVM by Cortes and Vapnik,18 the parameters ξi, ni are slack variables that determine the degree to which samples with error more than e may be penalized. In other words, any error smaller than e does not require ξi, ni and hence does not enter the objective function because these data points have a value of zero for the loss function. The slack variables (ξi, ni ) have been introduced to avoid infeasible constraints of the optimization problem [Eq. (4)] as Minimize:
1 X 1 jjwjj2 þ C ni þ ni 2 i¼1
Subject to: yi ðhw:xi i þ b Þ e þ ni ; i ¼ 1; 2; . . .; l ðhw:xi i þ b Þ yi e þ ni ; i ¼ 1; 2; . . .; l ni 0 and ni 0; i ¼ 1; 2; . . .; l
ð5Þ
The constant C, 0 < C < ∞, determines the tradeoff between the flatness of f(x) and the amount up to which deviations larger than e are tolerated.19 This optimization problem [Eq. (5)] was solved by Lagrangian Multipliers,8 and its solution was given by f ðx Þ ¼
nsv X i¼1
ai ai ðxi : x Þ þ b
ð6Þ
where b ¼ 12 w:½xr þ xs , ai, ai are the Lagrangian multipliers and nsv is the number of support vectors. An important aspect is that some Lagrangian multipliers (ai, ai ) will be zero, implying that these training objects are considered to be irrelevant for the final solution (sparseness). The training objects with nonzero Lagrangian multipliers are called support vectors. When linear regression is not appropriate, then input data has to be mapped into a high dimensional feature space through some nonlinear mapping.20 The two steps that are involved are first to make a fixed nonlinear mapping of the data onto the feature space and then to carry out a linear regression in the high dimensional space. The input data is mapped onto the feature space by a map Ф. The dot product given by
Vol. 10, No. S1, 2013
Uðxi Þ : U xj is computed as a linear combination of the training points. The concept of kernel function, has been introduced to K xi ; xj ¼ Uðxi Þ : U xj reduce the computational demand.18,21 So, Eq. (6) can be written as f ðx Þ ¼
nsv X
ai ai K xi : xj þ b
ð7Þ
i¼1
Among the common kernels (such as homogeneous polynomial, nonhomogeneous polynomial, radial basis function, Gaussian function, sigmoid, etc.), radial basis function is used for nonlinear cases. In this work, we use the above SVM model for prediction of wa and pv. The data has been divided into two sub-sets: a training dataset, to construct the model, and a testing dataset to estimate the model performance. So, for this study a set of 12 data are considered as the training dataset and remaining set of six data are considered as the testing dataset. The data is scaled between 0 and 1 by normalization. This study uses radial basis function n o Þðxi x ÞT K ðxi ; x Þ ¼ exp ðxi x 2r (where r is the width of 2 the radial basis function) as a kernel function. The design values of C, e and r have been determined during the training of SVM. The model of SVM has been done using MATLAB. Details of RVM The RVM, introduced by Tipping,13 is a sparse linear model. Let D ¼ fðxi ; ti Þ; i ¼ 1; . . .; N g be a dataset of observed values. Where xi = input, ti = output, xi ∊ RN and ti ∊ R. In this study, the input parameter is ws. So, x ¼ ½ws . The output of the RVM model is wa and pv. So, t ¼ ½wa; pv . One can express the output as the sum of an approximation vectory ¼ ðy ðx1 Þ; . . .; y ðxN ÞÞT , and zero mean random error vector (noise) e ¼ ðe1 ; . . .; eN ÞT where en N ð0; r2 Þ and N ð0; r2 Þ is the normal distribution with zero mean and variance r2. So, the output can be written as tn ¼ y ðxn ; xÞ þ en where x is the parameter vector. Let us assume p ðtn jx Þ N y ðxn Þ; r2
ð8Þ
ð9Þ
where, N ðy ðxn Þ; r2 Þ is the normal distribution with mean y(xn) and variance r2. This y(x) can be expressed
SVM and RVM for Property Prediction in Bioceramics
www.ceramics.org/ACT
E243
as a linearly weighted sum of M nonlinear fixed basis functions,
where the mean (l) and the covariance (M) are respectively given by
M X Uj ðx Þjj ¼ 1; . . .; M : y ðx; xÞ ¼ xi Ui ðx Þ ¼ Ux
l ¼ r2 M UT t
ð16Þ
1 M ¼ r2 UT U þ A
ð17Þ
i¼1
ð10Þ The likelihood of the complete data set can be written as N =2 1 exp 2 kt Uxk2 p t jx; r2 ¼ 2pr2 2r ð11Þ T
where t ¼ ðt1 . . .; tN Þ , x ¼ ðx0 ; . . .; xN Þ and 2 3 1 K ðx1 ; x1 Þ K ðx1 ; x2 Þ K ðx1 ; xn Þ 6 1 K ðx1 ; x2 Þ K ðx2 ; x2 Þ K ðx2 ; xn Þ 7 6 7 UT ¼ 6 . 7 .. .. .. .. 4 .. 5 . . . . 1 K ðxn ; x1 Þ K ðxn ; x2 Þ K ðxn ; xn Þ where k ðxi ; xn Þ is a kernel function. To prevent overfitting, automatic relevance detection (ARD) prior is set over the weights x as p ðxjaÞ ¼
N Y
N xi j0; a1 i
ð12Þ
i¼0
where a is a hyperparameter vector that controls how far from zero each weight is allowed to deviate.13 Consequently, using Baye’s rule, the posterior over all unknowns could be computed using the defined noninformative prior distribution: p x; a; r2 =t ¼ R
p ðy =x; a; r2 Þ : p ðx; a; rÞ p ðt =x; a; r2 Þ : p ðx; a; r2 Þ dxdadr2 ð13Þ
Full analytical solution of this integral (13) is obdurate. Thus decomposition of the posterior according to p ðx; a; r2 =t Þ ¼ p ðx=t ; a; r2 Þp ða; r2 =t Þ is used to facilitate the solution.13 The posterior distribution over the weights is thus given by p ðt =x; r Þ : p ðx=aÞ p x=t ; a; r2 ¼ p ðt =a; r2 Þ 2
ð14Þ
The resulting posterior distribution over the weights is the multi-variate Gaussian distribution p x=t ; a; r2 ¼ N ðl; M Þ ð15Þ
and
with diagonal A = diag(a0, …, aN). For uniform hyperpriors over a and r2, one needs only to maximize the term p ðt =a; r2 Þ such that R 2 p ðt =a; r Þ ¼ p ðt =x; r2 Þ : p ðx=aÞdx
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N r2 þ UA 1 UT 2 ¼ ð2pÞ 1 1 exp y T r2 þ UA 1 UT y ð18Þ 2 Maximization of this quantity is known as the type II maximum likelihood method22,23 or the “evidence for hyper parameter”.24 Hyper parameter estimation is carried out in iterative formulae, for example, gradient descent on the objective function.13 The outcome of this optimization is that many elements of a go to infinity such that w will have a few nonzero weights that will be considered as relevant vectors. This article adopts the above RVM model for prediction of wa and pv. The same training dataset, testing dataset, normalization technique and kernel function have been used as earlier used by SVM model. MATLAB has been used for developing the RVM model. Results and Discussion Coefficient of correlation (CC) has been used to assess the performance of SVM and RVM models. The value of CC has been determined by using the following equation: n P
ÞðPi P Þ ðAi A CC ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffisffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n n P P Þ ðA i A ðPi P Þ i¼1
i¼1
ð19Þ
i¼1
where Ai and Pi are the actual and predicted values and P are mean of actual and predicted respectively, A values corresponding to n patterns. For robust model, the value of CC should be close to one.
E244
International Journal of Applied Ceramic Technology—Gopinath, et al.
SVM In SVM, the design values of C, e and r have been determined by trial and error approach. For prediction of wa, the design values of C, e and r are 200, 0.009, 0.25 respectively. The number of support vectors is 9. Figure 1 shows the performance of training and testing dataset for prediction of wa. It is observed from Fig. 1 that the value of CC is close to one. The developed SVM gives the following equation for prediction of wa ( ) 12 X ðxi x ÞT ðxi x Þ wa ¼ ai ai exp ð20Þ 0:125 i¼1 where the values of ai ai are given in Fig. 2. For prediction of pv, the design values of C, e and r are 200, 0.009, 0.7 respectively. The number of support vectors is 12. Figure 3 illustrates the performance of training and testing dataset for prediction of pv. It can be seen from Fig. 3 that the value of CC is close to one. The following equation for the prediction of pv has been presented based on the developed SVM
Fig. 1. Performance of SVM model for prediction of wa.
Fig. 2. Values of (ai ai*).
Vol. 10, No. S1, 2013
) ðxi x ÞT ðxi x Þ pv ¼ ai ð21Þ exp 0:98 i¼1 where the values of ai ai are given in Fig. 2. 12 X
ai
(
RVM For prediction of wa, the design value of r is 0.1 in case of RVM model. The number of relevance vector is 3. The performance of training and testing dataset for prediction of wa has been shown in Fig. 4. Figure 4 also confirms that the value of CC is close to one. To predict wa, the following equation has been developed based on the RVM— ( ) 12 X ðxi x Þðxi x ÞT wa ¼ xi exp ð22Þ 0:02 i¼1 where Fig. 5 shows the values of xi. The design value of r is 0.2 for prediction of pv. Figure 6 illustrates the performance of training and testing dataset for prediction of pv. It can be seen from
Fig. 3. Performance of SVM model for prediction of pv.
Fig. 4. Performance of RVM model for prediction of wa.
www.ceramics.org/ACT
SVM and RVM for Property Prediction in Bioceramics
Fig. 5. Vaules of wi.
E245
Fig. 8. Variance of testing dataset for prediction of wa and pv.
Further, the developed RVM also gives the variance of the predicted output. Figures 7 and 8 show the variance of the predicted outputs. The obtained variance can be used to determine risk. For testing dataset, the value of CC is higher in RVM model than the SVM model. Therefore, the performance of RVM is better than the SVM. Further, SVM uses three tuning parameters whereas RVM uses only one tuning parameter. Hence, it may be stated that RVM is a better model compared to SVM. Fig. 6. Performance of RVM model for prediction of pv.
Conclusions This study describes the SVM and RVM for prediction of wa and pv. The developed SVM and RVM give encouraging results. User can use the developed equations for determination of wa and pv in related ceramic systems containing SiC. The developed RVM gives probabilistic output. In summary, it can be concluded that SVM and RVM can be used as robust tools for solving different problems in Ceramic engineering. However, it is seen that RVM has better performance over SVM.
Fig. 7. Variance of training dataset for prediction of wa and pv.
Fig. 6 that the value of CC is close to one. The developed RVM gives the following equation for prediction of pv. ( ) 12 X ðxi x Þðxi x ÞT xi exp pv ¼ ð23Þ 0:08 i¼1 where Fig. 5 depicts the values of xi.
References 1. K. De Groot, “Bioceramics Consisting of Calcium Phosphate Salts,” Biomaterials, 1 47–50 (1980). 2. EW White and EC Shors, “Biomaterial Aspects of Interpore 200 Porous Hydroxyapatite,” Dental Clin. North Am., 30 49 (1986). 3. N Passuti, G Daculsi, JM Rogez, S Martin, and JV Bainvel, “Macroporous Calcium Phosphate Ceramics Performance in Human Spine Fusion,” Clin. Orthoped., 148 169–176 (1989). 4. JC Le Huec, T. Schaeverbeke, D. Clement, J. Faber, and A Le Rebeller, “Influence of Porosity on the Mechanical Resistance of Hydroxyapatite Ceramics Under Compressive Stress,” Biomaterials, 16 113–118 (1995).
E246
International Journal of Applied Ceramic Technology—Gopinath, et al.
5. JJ Klawiter, JG Bagwell, AM Weinstein, BW Sauer, and JR Pruitt, “An Evaluation of Bone Growth into Porous High Density Polyethylene,” J. Biomed. Mater. Res., 10, 311–321 (1976). 6. N. Ozgu¨r, A. Engin, and T. Cu¨neyt, “Manufacture of Macroporous Calcium Hydroxyapatite Bioceramics,” J. Eur. Ceram. Soc., 19, 2569–2572 (1999). 7. N. Altinkok and R. Koker, “Mixture and Pore Volume Fraction Estimation in Al2O3/SiC Ceramic Cake Using Artificial Neural Network,” Mater. Design, 26, 305–311 (2005). 8. V. N. Vapnik, Statistical Learning Theory, New York, Wiley, (1998). 9. V. N. Vapnik, “An Overview of Statistical Learning Theory,” IEEE Trans. Neural Networks, 10 [5] 988–999 (1999). 10. G. Shi, Y. Zou, W. J. Li, Y. Jin, and P. Guan, “Towards Multi-classification of Human Motions Using Micro IMU and SVM Training Process,” Adv. Mater. Res., 60-61 189–193 (2009). 11. S. Sonavane and P. Chakrabarti, “Prediction of Active Site Cleft Using Support Vector Machines,” J. Chem. Inform. Model., 50 [12] 2266–2273 (2010). 12. G. Jun, F.-L. Chung, and S. Wang, “Matrix Pattern Based Minimum Within-Class Scatter Support Vector Machines,” Appl. Soft Comput. J., 11 [8] 5602–5610 (2011). 13. M. E. Tipping, “The Relevance Vector Machine,” Adv. Neural Inform. Proc. Syst., 12 625–658 (2000). 14. J. Yuan, C.-L. Liu, and X.F. Zha, “Relevance Vector Machines Based Modelling and Optimisation for Collaborative Control Parameter Design: A Case Study,” Int. J. Comput. Appl. Technol., 36 [3-4] 191–199 (2009).
Vol. 10, No. S1, 2013
15. W. Liying and W. Zhao, “Forecasting Groundwater Level Based on Relevance Vector Machine,” Adv. Mater. Res., 121-122 43–47 (2010). 16. C. A. M. Lima and A. L. V. Coelho, “Kernel Machines for Epilepsy Diagnosis via EEG Signal Classification: A Comparative Study,” Artif. Intell. Med., 53 [2] 83–95 (2011). 17. K. P. Bennett and O. L. Mangasarian, “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets,” Optim. Methods Softw., 1 23–34 (1992). 18. C. Cortes and V. N. Vpanik, “Support-Vector Networks,” Mach. Learn., 20 [3] 273–297 (1995). 19. A. J. Smola and B. Scholkopf, “A Tutorial on Support Vector Regression,” Stat. Comput., 14 199–222 (2004). 20. B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A Training Algorithm for Optimal Margin Classifier,” Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA, 27–29 July 27– 29, 1992. 21. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machine, Cambridge University Press, Cambridge, U.K., 2000. 22. J. O. Berger, Statistical Decision Theory and Bayesian Analysis, 2nd ed., Springer, New York, 1985. 23. G. Wahba, “A Comparison of GCV and GML for Choosing the Smoothing Parameters in the Generalized Spline-Smoothing Problem,” Ann. Stat., 4 1378–1402 (1985). 24. D. J. MacKay, “Bayesian Methods for Adaptive Models,” Ph. D. Thesis, Department of Computer and Neural System, California Institute of Technology, Pasadena, CA, 1992.