MNLR and ANN structural group contribution methods for predicting ...

Process Safety and Environmental Protection 9 3 ( 2 0 1 5 ) 182–191

Contents lists available at ScienceDirect

Process Safety and Environmental Protection journal homepage: www.elsevier.com/locate/psep

MNLR and ANN structural group contribution methods for predicting the flash point temperature of pure compounds in the transportation fuels range Tareq A. Albahri ∗ Chemical Engineering Department, Kuwait University, P.O. Box 5969, Safat 13060, Kuwait

a b s t r a c t A QSPR method is presented for predicting the flash point temperature (FPT) of pure compounds in the transportation fuels range. A structural group contribution method is used to determine the flash point temperature using two techniques: multivariable nonlinear regression and artificial neural networks. The method was used to probe the structural groups that have significant contribution to the overall FPT of pure compounds and arrive at the set of 37 atom-type structural groups that can best represent the flash point for about 375 substances. The input parameters to the model are the number of occurrence of each of the 37 structural groups in each molecule. The neural network method was the better of the two techniques and can predict the flash point of pure compounds merely from the knowledge of the molecular structure with an overall correlation coefficient of 0.996 and overall average and maximum errors of 1.12% and 6.62%, respectively. The results are compared to the more traditional approach of the SGC method along with other methods in the literature. © 2014 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.

Keywords: Flash point; Group contribution; Molecular modeling; Neural networks; Quantitative structure property relation; QSPR

1.

Introduction

In recent years, the term environmental impact has extended its traditional meaning to include other extensive concepts in view of the possibility of industrial accidents which, because of their magnitude, are capable of causing significant damage to people and the environment. This concern, which in the past was principally associated with the nuclear industry, now includes the chemical industry and their safety. Among these concerns are incidents of fire disasters in chemical and petrochemical plants caused by the leak of materials at or above their auto ignition temperature or flash point or within their flammability limits. The flammability characteristics of chemical substances are very important for safety considerations in storage,

∗

processing, and handling. These characteristics which include the flash point, the auto ignition temperature, and the upper and lower flammability limits are some of the most important safety specifications that must be considered in assessing the overall flammability hazard potential of a chemical substance, defined as the degree of susceptibility to ignition or release of energy under varying environmental conditions. Experimental values of these properties are always desirable, however, they are scarce and expensive to obtain. When experimental values are not available and determining them by experimental means is not practical, a prediction method, which is desirably convenient and fast, must be used to estimate them. The flash point temperature (FPT) is one of the most important safety specifications used to characterize the hazard potential of a chemical substance. The flash point

Tel.: +965 2481 7662; fax: +965 2483 9498. E-mail address: [email protected] Received 1 December 2013; Received in revised form 27 January 2014; Accepted 15 March 2014 Available online 24 March 2014 http://dx.doi.org/10.1016/j.psep.2014.03.005 0957-5820/© 2014 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.


temperature of a combustible substance is the minimum temperature at which sufficient vapor is produced to form an ignitable mixture with air near the surface of the liquid or within the vessel used which is within the flammability limits. This is not to be confused with the fire point where the vapor will continue to burn even after the removal of the ignition source; at the flash point, once the ignition source is removed, the fire will stop. FPT can indicate the possible presence of highly volatile or flammable materials in a relatively nonvolatile (nonflammable) environment and is usually used in shipping and safety regulations to define and classify flammable and combustible materials. The U.S. Department of Transportation (DOT) and the U.S. Department of Labor (OSHA) have established that liquids with a flash point under 37.8 ◦ C (100 ◦ F) are considered flammable. There exist several recognized tests for evaluating the PFT of a substance or fuel, which differ according to the characteristics of the liquid under study. Standard ASTM (2002) closed-cup test methods include Tag (D56-01), small scale (D3828-98), Setaflash (D3828), Pensky-Martens (D93-00), and the equilibrium method (D3941-90). Standard ASTM (2002) open-cup test methods include Cleveland (D92-01) and Tag (D1310). Generally, methods using closed vessels are used for low FPT substances and give less value than those obtained with open vessels, although the differences are small. In all cases the procedure involves a slow heating of the liquid in contact with the air, applying a source of ignition at predetermined intervals and recording the temperature at which burning occurs.

1.1.

Background

When the flammability characteristics cannot be determined experimentally, empirical equations for their determination are available. Detailed review of the FPT predictions methods has been discussed extensively in the literature. Gharagheizi et al. (2011) for example, developed a simple empirical correlation to predict FPT using the normal boiling point and the number of carbon atoms. Although the above correlation was able to predict FPT for 1471 pure organic compounds with an absolute average deviation of 2.4% and a correlation coefficient of 0.979, it requires the normal boiling point to predict the flash point which is not always available or convenient. Albahri (2003a) introduced the concept of using structural group contributions (SGC) to predict FPT, from the molecular structure of the compound alone, using multivariable nonlinear regression (MNLR). The input parameters to the final polynomial equation are the number of occurrence of each of the structural groups in the molecule in addition to the group contribution values. Although the model was able to accurately predicted FPT for 287 compounds with a correlation coefficient of 0.98 and average error of 1.2%, it is limited to hydrocarbons only. Keshavarz and Ghanbarzadeh (2011) developed a simple method for predicting the flash point of unsaturated hydrocarbons including alkenes, alkynes and aromatics with an absolute average deviation of 11 K. The number of carbon and hydrogen atoms is used as a core function that can be revised for some compounds using a structural parameter correcting function. The method was developed using a set of 173 compounds and tested using another set of 76 compounds and is restricted to unsaturated hydrocarbons alone which limits its applicability.

183

Rowley et al. (2010) developed a correlative method for estimating the flash point of 1062 organic compounds based entirely on structural contributions. The proposed correlation based on Clausius–Clapeyron equation and Leslie–Geniessee relation results in an average absolute error and deviation of 2.84% and 9.8 K, respectively. The sum of the 62 atomic structural group contributions is used to predict the boiling point input parameter of the developed correlation. Although the method is accurate enough, the structural group definitions and restrictions make the method hard to practice. Rowley et al. (2011) developed yet another correlation for predicting the flash point of the same set of pure organic compounds based on the normal boiling point and enthalpy of vaporization at the normal boiling point with an absolute average error and deviation of 1.32% and 4.65 K, respectively. The method requires knowledge of other thermo-physical properties which are inconvenient. Furthermore, significant errors are reported for carboxylic acids. Li and Liu (2010) used Le Chatelier’s rule and Antoine equation to correlate the vapor pressure and flash point, and further provided a comprehensive review of the flash point prediction methods based on vapor pressure, molecular structure, composition range, and boiling point. They ultimately recommended using QSPR with artificial neural networks as a correlation technique because of its nonlinear property, high accuracy, and potential for wide application. In terms of using artificial intelligence, Saldana et al. (2011) developed a method to predicting the flash point of 437 hydrocarbons, alcohols and esters using QSPR methods. Various approaches were investigated from linear modeling such as genetic function approximation (GA) and partial least squares (PLS) to nonlinear methods such as feed-forward artificial neural networks (FF-ANN), general regression neural networks (GRNN), support vector mechanics (SVM) and graph machines (GM). None of the models was significantly more accurate than the others; thus, consensus modeling was used to improve generalization and predictive power compared to individual predictive models. The correlation coefficient was 0.922 and the absolute average error and deviation were 3.2% and 10.4 K, respectively. The consensus method is the average of all the above predictive methods which make the method cumbersome for practice. Khajeh and Modarress (2011) developed a QSPR model to predict the flash point of 151 alcohols using genetic function approximation (GFA) and using four molecular descriptors as input to adaptive neuro-fuzzy inference system (ANFIS) model with a correlation coefficient of 0.931 and 0.957, respectively. However, the method is limited to only alcohols. Gharagheizi and Alamdari (2008) used a QSPR model and Genetic Algorithm-based Multivariate Linear Regression (GA-MLR) technique to select four chemical structure-based molecular descriptors to predict FPT of 1030 organic compounds with a correlation coefficient of 0.9669 and average deviation of 12.691 K. Gharagheizi et al. (2008) further used 79 structural groups in a 79-9-1 feed forward neural network to correlate FPT of 1378 organic compounds with a correlation coefficient of 0.9757, and an average absolute deviation and maximum error of 8.1 K and 26%, respectively. The four molecular descriptors and the 79 structural groups are both intricate and hard to determine for each molecule which poses difficulties on the methods applicability in practice. Pan et al. (2007) developed a back-propagation 9-5-1 neural network model using group-bond contribution method to model FPT for 92 alkanes. Although the model was accurate

184


enough with a correlation coefficient of 0.99 and an absolute average deviation of 4.8 K, it was developed based on a small number of molecules which limits its applicability to only alkanes. Mathieu (2010) obtained better results for the same data set of 92 alkanes using simple algebraic expression based on the contribution from four structural groups, namely CH3 , CH2 , >CH , and >CCH >C< CH2 CH C< C CH C >CH2 (ring) >CH (ring) >C< (ring) CH (ring) C< (ring) Cl OH O (non-ring) >C O (non-ring) Ketone HC O (aldehyde) COOH (acid) O NH2 >NH (non-ring) >N (non-ring) O (ring) >C O (ring) >NH (ring) N (ring) >N (ring) H >S >SO SH S N S

(FPT)i

0.2823 0.6199 0.8512 1.0179 0.1431 0.5856 0.9265 0.6446 0.3692 0.7478 0.6125 0.6517 0.4666 0.5859 1.0747 0.9295 3.2230 0.1692 2.5228 2.0626 5.1405 1.8826 1.7216 4.7711 0.4800 0.9161 4.0227 2.2231 3.4118 1.8971 0.0000 1.8503 7.1659 1.2870 2.0413 2.8685 0.1067

error. The actual number of hidden-layer neurons is arrived at by stepping down one number at a time until the best results, as indicated by the correlation coefficient and average error, are obtained for both the training and testing data-sets. After arriving at the best network architecture, we demonstrated the predictive ability of the ANN by training it with only 335 of the experimental data points. The trained networks were then applied to predict the physical properties of the 40 remaining substances, which were not included in the learning database. The compounds in the testing set were chosen based on the abundance of their counterparts (the class of compounds they represent) in the training data set on which the neural networks were trained. This is necessary since ANN cannot predict the FPT property of a class of compounds on which it was not trained. The accuracy of these predictions was compared with the available experimental data.

3.

Results

3.1.

Method 1: SGC-based MNLR model

Using the experimental data on FPT of 375 pure compounds from the API-TDB (API, 1987), the constants for Eq. (1) and the values of the various structural group contributions shown

700

Calculated Flash Point Temperature (K)

Table 1 – Atom-type structural groups corresponding to the input nodes of the neural network in Fig. 1 and the structural group contribution values used with Eq. (1) for estimating the flash point temperature of pure compounds.

R = 0.9

600 500 400 300 200 100 100

200

300

400

500

600

700

Experimental Flash Point Temperature (K) Fig. 2 – Parity plot showing the accuracy of the models correlation for the flash point temperature of a set of 375 pure compounds using the traditional SGC-MNLR model. in Table 1 were calculated. An optimization algorithm based on the least square method was used for that purpose. The algorithm minimizes the sum of the difference between the calculated and experimental FPTs using the general reduced gradient (GRG) nonlinear optimization method in the solver function of Microsoft Excel. Conversion of the above regression algorithm was achieved in less than 1 min on a desktop personal computer. The final flash point temperature equation is, FPT = 180.594 + 23.3514

ni (FPT)i

(3)

where FPT is the flash point temperature in kelvin, (FPT)i is the atom-type structural group contribution, ni is the num ni (FPT)i is the ber of structural groups in the molecule, and sum of the atom-type structural group contributions to the total FPT. The optimal values of the constants c, d, and e in Eq. (1) were all determined to be zero and the final FPT equation becomes linear as shown in Eq. (3). The values of the molecular contributions (FPT)i for each atom-type structural group are shown in Table 1. The calculation procedure for FPT using Eq. (3) and the SGC values in Table 1 is illustrated in Appendix A for p-diethyl benzene and methyl diethanolamine. We have had big success in the past using the least square method and MS Excel to estimate the parameters of the traditional SGC approach for predicting such properties as auto ignition temperature, flash point, and upper and lower flammability limits (Albahri, 2003a), octane number (Albahri, 2003b), and aniline point (Albahri, 2012) with correlation coefficients as high as 0.99. In some of these cases we have had more components and more structural groups. Therefore, using MS Excel seems to be a judicial choice for its simplicity, surprisingly faster convergence, and equal effectiveness to other optimization tools for the task in hand. The results for the traditional SGC model predictions for FPT using Eq. (3) and the structural group contributions in Table 1 are summarized in row 4 of Table 2. The model predictions did not correlate very well with the experimental data as shown in Fig. 2 with 4.3 average absolute percentage error and a 0.90 correlation coefficient. From the success we have had in the past in predicting other properties (Albahri, 2003a, 2003b, 2012) it is our conclusion that the impediment is related not to the optimization tool used but in fact to the FPT property which is too complex to model for various families of

187


Table 2 – Comparison of the present models with others from the literature for estimating the flash point temperature of pure compounds. Method ANN ANN ANN SGC Correlation QSPR ANN ANN Correlation QSPR ANN Correlation ANN QSPR Correlation Various methods SGC + MNLR Correlation Correlation SGC ANFIS GFA a b c d e f g h

Source

Data set

This work (overall) This work (training) This work (testing) This work (overall) Gharagheizi et al. (2011) Gharagheizi and Alamdari (2008) Gharagheizi et al. (2008) Pan et al. (2007) Mathieu (2010) Patel et al. (2009) Tetteh et al. (1999) Valenzuela et al. (2011) Lazzûs (2010) Albahri (2003a) Keshavarz and Ghanbarzadeh (2011) Saldana et al. (2011) Jia et al. (2012) Rowley et al. (2010) Rowley et al. (2011) Mathieu (2012) Khajeh and Modarress (2011) Khajeh and Modarress (2011)

375c 335c 340c 375c 1471c 1030c 1378c 92b 92b 236f 400c 611c 505c 287a 249d 437e 287c 1062c 1062c 92g 151h 151h

AAD (K) 3.55 3.59 3.25 14 – 10.2 8.1 4.8 4.12 20.44 11 4.2 6.2 5.39 11 10.4 3.77 9.8 4.65 12 – –

Ave %error 1.1 1.12 0.97 4.3 2.4 – – – – 6.16 – 1.5 1.8 1.2 – 3.2 1.16 2.84 1.32 – – –

Max %error 6.62 6.62 5.83 58.37 – – 26 – – – – – – 13.4 – 3.2 – – – – – –

R 0.9961 0.9960 0.998 0.9 0.979 0.9669 0.9757 0.99 0.99 0.948 0.9326 – 0.9881 0.98 – 0.922 0.9931 – – 0.89 0.957 0.931

Data set is pure hydrocarbon compounds. Data set is pure alkanes. Data set is pure organic compounds. Data set is unsaturated hydrocarbons. Data set is pure hydrocarbon compounds plus alcohols and esters. Data set is organic solvents. Data set is organosilicon compounds. Data set is pure alcohols.

chemical compounds using the traditional SGC approach based on MNLR and the least square technique which suffers from several shortcomings. These limitations are mainly associated with using a simple correlation (Eq. (1)) which is unable to capture the complex nature of the FPT property. In addition, as in many other iterative techniques, the method success is dependent on effectively providing appropriate initial values of the group contributions. It is our experience that minor improvement if any can be obtained using other optimization tools such as MATLAB or GAMS. Our previous efforts to use the traditional SGC approach based on MNLR to model the properties of classes of compounds from various chemical families have also been unsuccessful (Albahri and George, 2003) and likewise was the attempts made by many others (Pan et al., 2007; Patel et al., 2009). ANN has consistently provided better alternative with a high accuracy in all cases (Pan et al., 2007; Patel et al. 2009; Albahri and George, 2003). ANN method negates these inconveniences and inaccuracies or the limitations and offers a promising alternative to modeling for a number of reasons. ANNs are able to capture the non-linearity in the system behavior very effectively. Once properly trained, ANNs offer predictions quickly and accurately on a personal computer. Furthermore, the connection weights and network architecture make predictions possible using a spreadsheet.

3.2.

Method 2: SGC-based ANN model

Using the probing set of data on the flash point temperature of 335 pure compounds (API, 1987), which included various chemical families both organic and inorganic including

halogenated compounds, acids, ethers, ketones, aldehydes, alcohols, phenols, esters, amines, anhydrides, and sulfur compounds, several atom-type structural groups were tested and modified. During this probing stage, the correlation coefficient was used as an indication to discriminate between the SGC methods and the structural groups that have significant contribution to the FPT property. It was finally arrived at the set of groups that can best represent the experimental data with a correlation coefficient of 0.995 consisting of the 37 structural groups shown in Table 1. In addition to the above proposed structural groups, several others have also been investigated. Although better results were obtained with a larger number of structural groups, the improvement was not significant. During model execution it was found necessary to modify the structural groups to account only for the ones that have a significant influence on the overall FPT property. For example, no significant distinction in the FPT existed for the cis- and trans- structural orientations in olefins or cyclic compounds. Hence, such distinction was avoided in the choice of the structural groups in Table 1. It was also unnecessary to account for the location of the alkyl substitutions on the benzene ring in the ortho-, meta-, and para- positions in aromatics, the location of the alkyl branches along the chain for iso-paraffins and iso-olefins, the location of the double bond along the chain in olefins, and the alkyl substitutions and ring size for naphthenes. Our attempts to enhance the model predictions by using two sets of structural groups, one for the benzene ring in aromatics and another for the cyclic ring in naphthenes, did not result in a significant improvement in the model predictions and correlation of the experimental data. Therefore such distinctions were avoided.

188


20 10

R (train) = 0.996

% Error

500

Average error = 1.12 %

15

R (ovarall) = 0.996 R (test) = 0.996

400

5 0 -5

-10 -15 -20

300

150

250

350

450

550

Experimental Flash Point Temperature (K)

200

100 100

Training Data Set Testing Data Set

200

300

400

500

600

Experimental Flash Point Temperature (K) Fig. 3 – Parity plot showing the accuracy of the models correlations for the flash point temperature of a training set of 335 pure compounds and predictions for a testing set of 40 compounds using the SGC-ANN model. To assess the accuracy of the models prediction, the data was then separated into training and testing data sets consisting of 335 and 40 pure compounds. The % error between the predicted FPT and the experimental data used in training the network was calculated. The results from the trained network are summarized in Table 2 indicating that the average error for the FPT calculations for this mode was about 1.12% with a correlation coefficient of 0.996. As can be seen, the correlation of the neural network model for the training data set is good. The predictions of the trained neural networks have been cross validated against a testing set of data of 40 compounds that have not been used in the training process. The percentage error between the predicted FPT and the experimental data used in testing the network was calculated. The networks predictions compared well against this new set of data with average and maximum percentage errors of 0.97 and 5.83%, respectively, and a 0.998 correlation coefficient. The detailed results are listed in Table 3 for the testing data set and Table 4, which is available as supplementary material, for the training data set showing the deviations in the predicted FPTs for all types of pure compounds ranging in FPT from 135 to 375 K. The obtained computed results are in good agreement with respect to experimental data and average absolute deviations are within the experimental uncertainties. A parity plot showing the accuracy of models correlation for both training and testing is presented in Fig. 3. Comparing with the experimental values, as shown in Table 2, we found the predictions to be comparable to the trained networks in terms of AAD and correlation coefficients. The percentage errors for the overall training and testing data sets are shown in Fig. 4 where 60% of data is below the 1% error range, 98% of data is below 5% error range, and only 2% of data is between 5 and 10% errors range. The error distribution is shown in Fig. 5. As can be seen, the predictions of the neural network model are excellent and the maximum percent errors are also satisfactory. The testing data set showed slightly less maximum error (5.83%) than that of the training data set (6.62%). This is justified by the fact that the compounds in the testing set were preselected as representative examples of the different classes of compounds in the training data set on which the neural networks were trained. This is not to say that ANN can predict FPT of pure compounds with an average error of 0.97% and a maximum error of 5.83%, but in fact an average error

Fig. 4 – Percentage error range for the flash point temperature for the whole data set of 375 pure compounds using the SGC-ANN model. of 1.1% and a maximum error of 6.62% are expected as can be seen in rows 1 and 2 of Table 2. This is still good compared to other method s in literature as shown in Table 2. These results show that while the SGC-MNLR model gave less accurate results with a correlation coefficient of 0.90, the SGCANN model was able to predict FPT with accuracy higher than any other method in literature with a correlation coefficient of 0.996 merely using the molecular structure of the molecule. Although it may not be appropriate to compare the various models that predict FPT because of the different data sets used in developing them, however, our model shows some advantages over others in the literature. Gharagheizi et al. (2011), Rowley et al. (2011) and Valenzuela et al. (2011) empirical correlations require experimental values for the boiling point, the enthalpy change of vaporization, or the enthalpy change of combustion, the availability or lack thereof poses limitations on their applicability which make these methods inconvenient in practice. Gharagheizi and Alamdari (2008) model uses 4 intricate chemical structure-based molecular descriptors and Tetteh et al. (1999) model requires the first-order molecular connectivity index (1 ␹) that are not easy and inconvenient to determine. Gharagheizi et al. (2008) and Rowley et al. (2010) models use a large number of structural groups that are intricate and more difficult to determine in practice. Pan et al. (2007) and Mathieu (2010) methods were developed based on a small number of molecules which limits their applicability to only alkanes, or to organosilicon compounds like in Mathieu (2012). Albahri’s (2003a) method although merely requires knowledge of the molecular structure of the compound it is limited to hydrocarbons alone. Similarly Keshavarz and Ghanbarzadeh (2011) model is limited to unsaturated hydrocarbons while Khajeh and Modarress (2011) model is limited to alcohols. Saldana et al. (2011) consensus model is not only limited to hydrocarbons, alcohols and esters but is also cumbersome to use in practice. Patel et al. (2009) model 7 6

% error

Calculated Flash Point Temperature (K)

600

5 4 3 2 1 0

50

100

150

200

250

No of compounds in each range Fig. 5 – The % error ranges of the results obtained using the SGC-ANN model versus the corresponding number of pure compounds in each range.

189


Table 3 – A testing set of compounds not used during SGC-ANN model development for predicting the flash point temperature along with the deviations and percentage errors (comparison between experimental data and calculated results). Serial No.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 a

Compounds

n-Hexane n-Heptane 2,4-Dimethylpentane 2,2,3-Trimethylpentane Ethylcyclopentane trans-1,2-Dimethylcyclohexane n-Butylcyclohexane Cyclooctane trans-Decahydronaphthalene (trans-decalin) trans-2-Hexene 2,3-Dimethyl-2-butene 1-Eicosene trans-1,3-Pentadiene 2,3-Dimethyl-1,3-butadiene Cyclopentadiene 3-Methyl-1-butyne p-Xylene n-Nonylbenzene cis-1-Propenyl benzene 1,1-Diphenylethane 1,3-Diphenylbenzene (m-terphenyl) Phenanthrene Propionic acid 2-Methylbutyric acid 2-Methyl-2-butanol m-Cresol Acetaldehyde Methacrolein Isobutylamine Acridine Ethyl acetate Diethyl ether Tetrahydrofuran 1,2-Dichloroethane Methyl-n-propyl ketone Ethyl mercaptan Diethylene glycol Methyl diethanolamine N-Methyl-2-pyrrolidone Dimethyl sulfoxide

Exp.

Calc.

251.7 269.2 261.2 270.2 269.2 290.2 321.2 303.3 331.1 246.2 250.2 440.2 230.2 251.3 225.2 221.2 298.3 372.2 311.2 402.2 464.2 444.2 328.2 348.2 313.9 368.2 235.4 275.3 255.5 442.2 269.2 228.3 259.2 286.2 280.6 225.2 397.2 400.0 368.3 361.1

242.3 264.3 261.3 267.0 267.5 288.2 328.2 303.2 335.5 248.3 250.1 436.7 231.4 249.1 224.6 221.2 295.7 385.2 318.2 411.3 460.6 418.3 315.6 348.6 313.9 363.7 235.4 274.3 252.0 442.2 268.7 228.9 259.2 286.3 283.4 225.2 397.1 400.0 368.3 361.1

% Error a

Dev. −9.42 −4.87 0.07 −3.23 −1.71 −1.95 7.03 −0.09 4.32 2.16 −0.12 −3.48 1.23 −2.26 −0.58 0.05 −2.66 13.00 6.99 9.12 −3.62 −25.90 −12.59 0.47 −0.01 −4.44 0.03 −0.99 −3.50 −0.01 −0.50 0.56 0.00 0.14 2.82 0.01 −0.13 0.00 0.01 −0.01

−3.74 −1.81 0.03 −1.19 −0.63 −0.67 2.19 −0.03 1.30 0.88 −0.05 −0.79 0.53 −0.90 −0.26 0.02 −0.89 3.49 2.25 2.27 −0.78 −5.83 −3.84 0.13 0.00 −1.21 0.01 −0.36 −1.37 0.00 −0.19 0.25 0.00 0.05 1.01 0.01 −0.03 0.00 0.00 0.00

Average error = 0.97%, Maximum error = 6.62%, AAD = 3.25 K, Correlation coefficient = 0.9963.

requires using specialty software (CAMD) which is not convenient. Jia et al. (2012) model uses 90 functional groups as input parameters while Lazzûs (2010) model uses 44 groups, which is more input variables and less accurate than our SGC-ANN model. Of all the models Mathieu (2012) was the least accurate while Lazzûs (2010) and Jia et al. (2012), respectively, came in second and third place as most accurate after our model. Our proposed SGC-ANN model is therefore better than the other models in terms of combined accuracy and simplicity since it had the lowest average absolute error (1.1%) and only requires a simple break down of the compounds molecular structure, which is always known without the need for intricate structural descriptors or complex structural groups that are hard to determine.

4.

Flash point temperature (K)

Conclusion

The neural network based structural group contribution model presented here proves to be a powerful tool for

predicting the flash point temperature of pure compounds merely from their molecular structure. Having obtained only limited success with the traditional SGC-MNLR approach using the least square method (R = 0.90), a SGC-ANN model was developed for the same purpose. Neural network offered a significant improvement (R = 0.9961) and an advantage over the traditional SGC-MNLR method. Another major advantage is the ability of the ANNs to probe the structural groups that have significant contribution to the overall FPT property of pure compounds which is very difficult and time consuming to perform with the traditional SGC approach using non-linear regression techniques. The model offers more advantages to other methods in the literature in terms of simplicity and accuracy. Our definition of atomic-type structural groups is simpler, fewer in number, and provided better correlation of experimental data than other methods. Finally, this work demonstrates that the complex FPT property can be modeled by back-propagation neural network models. Considering the difficulty and complexity of developing a first principles mechanistic model of FPT involving the

190


kinetics and dynamics of combustion, neural networks can be an effective alternative. FPT has a number of intrinsic physical parameters associated to the molecular structure such as group interactions, structural orientations, skew, hindrance, steric, resonance, inductive, and chiral effects that are usually unknown and have to be determined through a parameter estimation methodology. Neural networks can learn about these inherent relationships among various structural groups of the molecule and their contribution to the overall FPT property. The Authors are currently developing a model for estimating the FPT of transportation fuels from those of the pure components, utilizing the current procedure for the automatic generation and reliable estimation of FPT of pure components for which no data exist. Fig. 7 – Molecular structure of methyl diethanolamine.

5.

Supplementary Material Calculating the overall structural group contributions:

The training data sets of compounds used during model development for FPT along with percentage errors are shown in Table 4, which is available as supplementary material. The parameters (weights and biases) of the hidden layer for the ANN architecture described in Fig. 1 are shown in Table 5, which is available as supplementary material.

Appendix A. A.1. Example 1: Prediction of the flash point temperature for p-diethyl benzene The molecular structure for p-diethyl benzene, shown in Fig. 6, consists of the following structural groups obtained from Table 1 for estimating FPT; two ( CH3 ), two (>CH2 ), four ( CH (ring)), and two (>C (ring)). Calculating the overall structural group contributions:

ni (FPT)i

= 2(−CH3 ) + 2(> CH2 ) + 4(= CH − (ring)) + 2(> C = (ring)) = 2(0.2823) + 2(0.6199) + 4(0.5859) + 2(1.0747) = 0.5647 + 1.2397 + 2.3436 + 2.1494 = 6.2974

Substituting into Eq. (3): FPT = 180.594 + 23.3514 ( ni (FPT)i ) = 180.594 + 23.3514 (6.2974) = 327.7 K This is only – 0.49% of the experimental value of 329.3 K.

A.2. Example 2: Prediction of the flash point temperature for Methyl Diethanolamine The molecular structure for Methyl Diethanolamine, shown in Fig. 7, consists of the following structural groups obtained from Table 1 for estimating FPT; one ( CH3 ), four (>CH2 ), two ( OH), and one (>N (non-ring))

Fig. 6 – Molecular structure of p-diethyl benzene.

ni (FPT)i

= 1(−CH3 ) + 4(> CH2 ) + 2(−OH) + 1(> N − (non-ring)) = 1(0.2823) + 4(0.6199) + 2(3.2230) + 1(0.4800) = 0.2823 + 2.4796 + 6.446 + 0.48 = 9.6879

Substituting into Eq. (3): FPT = 180.594 + 23.3514 ( ni (FPT)i ) = 180.594 + 23.3514 (9.6879) = 406.8 K This is only 1.705% of the experimental value of 400 K.

Appendix B. Supplementary data Supplementary material related to this article can be found, in the online version, at http://dx.doi.org/10.1016/j.psep. 2014.03.005.

References AIChE, 1998. Molecular Knowledge Systems, Inc., Cranium V1.0.3, Property Estimation Software. http://www.molknow.com/ Albahri, T.A., 2003a. Flammability characteristics of pure hydrocarbons. Chem. Eng. Sci. 58, 3629–3641. Albahri, T.A., 2003b. Structural group contribution method for predicting the octane number of pure hydrocarbon liquids. Ind. Eng. Chem. Res. 42, 657–662. Albahri, T.A., 2012. Prediction of the aniline point temperature of pure hydrocarbon liquids and their mixtures from molecular structure. J Mol. Liq. 174, 80–85. Albahri, T.A., George, R.S., 2003. Artificial neural network investigation of the structural group contribution method for predicting pure components auto ignition temperature. Ind. Eng. Chem. Res. 42, 5708–5714. API, 1987. Technical Data book on Petroleum Refining, 6th ed. The American Petroleum Institute, Washington, DC. ASTM, CHETAH V7.3 2001. American Society for Testing and Materials, Chemical Thermodynamics and Energy Release. ASTM, 2002. Annual Book of Standards, American Society for Testing and Materials. ASTM, Philadelphia, PA. Bunz, A.P., Braun, B., Janowsky, R., 1999. Quantitative structure-property relationships and neural networks: correlation and prediction of physical properties of pure components and mixtures from molecular structure. Fluid Phase Equilibr. 158, 367–374. Elkamel, A., Al-Ajmi, A., Fahim, M., 1999. Modeling the hydrocracking process using artificial neural networks. Petr. Sci. Technol. 17 (1999), 931–954. Gharagheizi, F., Alamdari, R., 2008. Prediction of flash point temperature of pure components using a quantitative structure–property relationship model. QSAR Comb. Sci. 27, 679–683.


Gharagheizi, F., Alamdari, R.F., Angaji, M.T., 2008. A new neural network-group contribution method for estimation of flash point temperature of pure components. Energy Fuels 22, 1628–1635. Gharagheizi, F., Eslamimanesh, A., Mohammadi, A.H., Richon, D., 2011. Empirical method for representing the flash-point temperature of pure compounds. Ind. Eng. Chem. Res. 50, 5877–5880. Ismail, A., Soliman, M.S., Fahim, M.A., 1996. Prediction of the viscosity of heavy petroleum fractions and crude oils by neural networks. Sekiyu Gakkashi 39, 383–388. Jia, Q., Wang, Q., Ma, P., Xia, S., Yan, F., Tang, H., 2012. Prediction of the flash point temperature of organic compounds with the positional distributive contribution method. J Chem. Eng. Data 57, 3357–3367. Keshavarz, M.H., Ghanbarzadeh, M., 2011. Simple method for reliable predicting flash points of unsaturated hydrocarbons. J. Haz. Mater. 193, 335–341. Khajeh, A., Modarress, H., 2011. Quantitative structure–property relationship for flash points of alcohols. Ind. Eng. Chem. Res. 50, 11337–11342. Lazzûs, J.A., 2010. Prediction of flash point temperature of organic compounds using a hybrid method of group contribution + neural network + particle swarm optimization. Chinese J. Chem. Eng. 18, 817–823. Lee, M.J., Chen, J.T., 1993. Fluid property prediction with the aid of neural networks. Ind. Eng. Chem. Res. 32, 995–997. Li, X., Liu, Z., 2010. Research progress on flash point prediction. J. Chem. Eng. Data. 55, 2943–2950. Lipmann, R.P., 1987. An introduction to computing with neural nets. IEEE ASSP Mag. 3, 4–22. Mathieu, D., 2010. Inductive modeling of physico-chemical properties: flash point of alkanes. J. Haz. Mater. 179, 1161–1164. Mathieu, D., 2012. Flash point of organosilicon compounds: how data for alkanes combines with custom additive fragments

191

can expedite the development of predictive models. Ind. Eng. Chem. Res. 51, 14309–14315. MATLAB, 2001. The MathWorks, Inc., Matlab V6.1, Neural Network toolbox. http://www.mathworks.com/ Pan, Y., Jiang, J., Wang, Z., 2007. Quantitative structure–property relationship studies for predicting flash points of alkanes using group bond contribution method with back-propagation neural networks. J. Haz. Mater. 147, 424–430. Patel, S.J., Ng, D., Mannan, M.S., 2009. QSPR flash point prediction of solvents using topological indices for application in computer aided molecular design. Ind. Eng. Chem. Res. 48, 7378–7387. Reid, R.C., Prausnitz, J.M., Polling, B.E., 1987. The Properties of Gases and Liquids, 4th ed. Hill, New York. Rowley, J.R., Rowley, R.L., Wilding, W.V., 2010. Estimation of the flash point of pure organic chemicals form structural contributions. Proc. Saf. Prog. 29, 353–358. Rowley, J.R., Rowley, R.L., Wilding, W.V., 2011. Prediction of pure-component flash points for organic compounds. Fire Mater. 35, 343–351. Saldana, D.A., Starck, L., Mougin, P., Rousseau, B., Pidol, L., Jeuland, N., Creton, B., 2011. Flash point and cetane number prediction for fuel compounds using Quantitative Structure Property Relationship (QSPR) Methods. Energy Fuels 25, 3900–3908. Tetteh, J., Suzuki, T., Metcalfe, E., Howells, S., 1999. Quantitative structure–property relationships for the estimation of boiling point and flash point using a radial basis function neural network. J. Chem. Inf. Comp. Sci. 39, 491–507. Valenzuela, E.M., Vázquez-Román, R., Patel, S., Mannan, M.S., 2011. Prediction models for the flash point of pure components. J. Loss Prev. Proc. Ind. 24, 753–757. Widrow, B., Lehr, M.A., 1990. 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proc. IEEE 78, 1415–1442.

MNLR and ANN structural group contribution methods for predicting ...

MNLR and ANN structural group contribution methods for predicting ...

Suggest Documents

MNLR and ANN structural group contribution methods for predicting ...

Potential of group contribution methods for the

Status and results of group contribution methods

Procedures for predicting habitat and structural ...

Assessing computational methods for predicting

Experimental and analytical methods for predicting mechanical ...

Structural Analysis Methods for Structural Health ... - CiteSeerX

Instrumental methods and techniques for structural ...

Computational Methods for Structural RNAs

Innovative Computational Methods for Structural

Analytical methods for structural dominance

Novel optimum contribution selection methods accounting for ...

Comparing classification methods for predicting distance students ...

Intraoperative Biometry versus Conventional Methods for Predicting ...

Diagnostic Methods for Predicting Performance ...

the contribution of structural geology and regional

Methods for Predicting Mechanical Deformations ... - ScholarlyCommons

Ultrasonography methods for predicting ... - Semantic Scholar

Machine Learning ANN Models for Predicting Sensory Quality ... - Usal

Generating Programs for Predicting the Activity ... - Mary Ann Liebert, Inc

Contribution of gravity data and Sentinel-1 image for structural ...

Contribution of gravity data and Sentinel-1 image for structural

group and phase speed analysis for predicting and mitigating the ...

group and phase speed analysis for predicting and mitigating the ...