The calculation of thermodynamic properties of molecules

3 downloads 0 Views 2MB Size Report
In other cases, e.g. cokes formation, the complexity is so large that ... all molecules that do not burn in a way yielding a limited .... every compound to be calculated. .... alkanes, alkenes, alkynes and also radicals, the popular ..... the vibrational modes that are treated harmonically and large- ..... Molecular structure 2-D drawing.
CRITICAL REVIEW

www.rsc.org/csr | Chemical Society Reviews

The calculation of thermodynamic properties of molecules Veronique van Speybroeck,a Rafiqul Ganib and Robert Johan Meier*cd Received 16th October 2009 First published as an Advance Article on the web 11th February 2010 DOI: 10.1039/b809850f Thermodynamic data are key in the understanding and design of chemical processes. Next to the experimental evaluation of such data, computational methods are valuable and sometimes indispensable tools in obtaining heats of formation and Gibbs free energies. The major toolboxes to obtain such quantities by computation are quantum mechanical methods and group contribution methods. Although a lot of progress was made over the last decade, for the majority of chemical species we are still quite a bit away from what is often referred to as chemical accuracy, i.e. ‘1 kcal mol1’. Currently, for larger molecules the combination of group contribution methods with group additive values that are determined with the best available computational ab initio methods seems to be a viable alternative to obtain thermodynamic properties near chemical accuracy. New developments and full use of existing tools may lead to further improvements (critical review, 83 references).

1. Introduction Knowledge on thermodynamic properties is a key element in understanding the rate and selectivity of chemical processes, a

Center for Molecular Modelling, Ghent University, Technologiepark 903, 9052 Zwijnaarde, Belgium. E-mail: [email protected]; Fax: +32 9 2646697; Tel: +31 9 2646558; Web: http://molmod.Ugent.be b Director-CAPEC, Department of Chemical & Biochemical Engineering, Soltofts Plads, Building 229, Technical University of Denmark, DK-2800 Lyngby, Denmark. E-mail: [email protected]; Fax: +45 45932906; Tel: +45 45252882; Web: http://www.capec.kt.dtu.dk c FRSC Institute of Chemistry and Dynamics of the Geosphere IV: Agrosphere, Forschungszentrum Ju¨lich GmbH, D-52425 Ju¨lich, Germany. E-mail: [email protected]; Fax: +31 84 7458853; Tel: +31 6 51224998 d Chemistry Department, and the University of York, York, UK Y010 5DD. E-mail: [email protected]

Veronique van Speybroeck has a permanent position at the Ghent University as Research Professor. She leads the ‘‘Computational Molecular Modeling’’ division of the Center for Molecular Modeling (CMM— http://molmod.ugent.be).She is author of more than 100 publications in peer-reviewed journals and her current research interests primarily comprise study of the kinetics of chemical reactions with state of the art molecular modeling techniques. Veronique van Speybroeck The applications of interest are situated in petrochemistry, catalysis and polymers. Very recently she received an ERC starting grant on a subject on first principle chemical kinetics in nanoporous materials.

1764 | Chem. Soc. Rev., 2010, 39, 1764–1779

including the design of viable industrial chemical processes. In this review we primarily focus, but also limit ourselves, to the evaluation of enthalpy and entropy of, isolated, molecules. The experimental determination of such properties is, in many cases, not a priori trivial and often demanding, in some cases not possible. This applies to, e.g., unstable species such as radicals. In other cases, e.g. cokes formation, the complexity is so large that experimental verification is virtually impossible. Theoretical evaluation may help to investigate distinct, individual, reaction pathways, and determine their likelihood in the process under investigation. Again another example comprises all molecules that do not burn in a way yielding a limited number of simple products, e.g. aromatic nitrogen compounds, making it virtually impossible to study their heat of formation with respect to the elements with great accuracy. As an alternative there are theoretical techniques to evaluate enthalpies and entropies, and therefore heats of formation and

Rafiqul Gani is professor of systems design at the Department of Chemical & Biochemical Engineering, The Technical University of Denmark and the director of CAPEC. His current research interests include modelling, property estimation, processproduct synthesis & design, and computer-aided processtools integration. He has more than 150 reviewed publications. Professor Gani is editorin-chief of Computers and Rafiqul Gani Chemical Engineering, the series editor for the Elsevier CACE book series and serves in the editorial advisory board of several other journals. Professor Gani is a member of the executive board of the EFCE, and is Fellow-AIChE and also Fellow-IChemE. This journal is

 c

The Royal Society of Chemistry 2010

also providing important ingredients to compute reaction kinetics of elementary chemical reactions. These techniques are constantly improved with respect to accuracy and applicability to new classes of chemical species. In this review we discuss the two important classes of approaching the evaluation of thermodynamic values by modeling: ab initio quantum mechanics (qm) modeling and the group contribution (GC) method. The first one is based on solving the quantum mechanical Schro¨dinger equation in a rigorous way. The second one is based on the empirical observation that in many cases a molecule may be split up into various distinct chemical groups, each group being associated with a uniquely set value for its energy. The sum of the energies of these groups equals the correct energy of the total molecule. We know this very well from, e.g., the heat of formation of a homologous series such as the n-alkanes. One generic issue is of relevance to mention at forehand. One should distinguish between relative and absolute values. For some applications it is very satisfactory to have good relative values. As long as there is a systematic deviation between theory and experiment, we can at least mutually compare theoretical values for different molecules. When suitable experimental reference values are available, we may even get a good estimate for the absolute value. When we are interested in relative reaction pathways, however, and there are several competing reaction paths, difference of several kJ mol1 in free energies may have a large impact on the conclusions. In addition, when no experimental values are available which may be used as suitable reference, the demand on the predictive power of the theoretical method may be very high. Importantly, before starting the discussion on current methodologies and achievements, is obviously the accuracy required. In other words, when are we satisfied with the quality of results? Talking to process technologists, for instance those that use ASPEN-like modeling tools to model chemical processes and chemical plants, they suggest some 5% accuracy is required for individual thermodynamic values such as an

Robert Meier is a Corporate Scientist in the DSM R&D organization, a Visiting Scientist of the Research Center Ju¨lich, GmbH, and an honorary Visiting Professor in the Chemistry Department of the University of York, England. He studied Theoretical Chemistry at the University of Amsterdam, and graduated from the same University on a subject in experimental condensed matter physics. He has some 130 reviewed publicaRobert Meier tions and recently edited and co-authored a book on Polymer Characterisation. His interests include chemical analysis (spectroscopy, molecular identification by mass spectrometry, electron spin resonance), computational chemistry, polymer science and catalysis. This journal is

 c

The Royal Society of Chemistry 2010

enthalpy of formation or a free energy of reaction. This 5% does not look a very strict requirement. However, keeping in mind that most energy differences are in the kJ up to dozens of kJ range, this implies that a few kJ accuracy will not be a total exception. The traditional ‘chemical accuracy’ has always been taken as ‘1 kcal mol1’. So in summary, whether it is for the chemist or for the technologist, a better than 4 kJ mol1 accuracy is required when absolute numbers are required.

2. Calculation of enthalpy by quantum chemical (q.m.) techniques 2.1 Basics of evaluating enthalpy and heat of formation using quantum mechanical calculations Let us first briefly explain how a quantum mechanical calculation may be employed to obtain values for enthalpy, entropy, and heat of formation. Normally speaking, we do not have a very accurate molecular geometry at hand, and therefore the geometry needs to optimized first. This process is known as geometry optimization but in fact this is the process of energy minimization the geometry and electronic structure of the molecular species under consideration. This is accomplished by adopting an initial geometry. The latter can be obtained from an ‘intelligent’ molecular builder program that ’knows’ standard characteristic bond length, bond angles, or, whenever available, a known experimental geometry (e.g. crystalline X-ray structure). The electronic wavefunction is optimized (the so-called SelfConsistentField calculation), after which the forces on the atoms are calculated (partial derivatives of the energy versus displacement of the atoms). Next, the atoms are displaced in proportion to the calculated individual forces on the individual atoms. Then a new SCF calculation is started, forces are recalculated, atoms are again moved, etcetera. So there are two loops, the inner SCF loop, and the outer geometry optimising loop, as illustrated in Scheme 1:

Scheme 1

Chem. Soc. Rev., 2010, 39, 1764–1779 | 1765

The calculated components just described lead to the optimized geometry of the molecule. For larger macromolecules for which various minima exist on the potential energy surface (PES), more advanced procedures have been developed which enable to locate the global minimum of a molecule in a computationally efficient way. The total energy of the molecule is the key output of the computational process, next to the molecular geometry. The enthalpy of formation can be calculated using a variety of schemes. In the atomization energy method the following formula is used (here illustrated for hydrocarbons):

been developed that involve even more balanced reactions, and therefore lead to increased accuracy of the enthalpy of formation. These reactions include hyperhomodesmotic, isoplesiotic, etc.; an overview is presented by Wheeler et al.2 However, whichever one of these schemes is invoked, if one needs to construct the enthalpies of formation for a large set of molecules, as is the case in large reaction networks, it is not practically possible to construct such a reaction network for every compound to be calculated. Therefore, the aforementioned method using atomization enthalpies is a more universal method and is preferred, although accuracy will in general be less.

DfH(CmHn; 298 K) = mDfHexp(C; 298 K) + nDfHexp(H; 298 K)  [mHcalcd(C; 298 K) + nHcalcd(H; 298 K)  Hcalcd(CmHn; 298 K)]

(1)

CmHn - mC + nH Eqn (1) is visualized in Fig. 1. The last three terms of eqn (1) yield the calculated atomization enthalpy for CmHn. It can be seen that DfH0(CmHn; 298 K) is obtained by subtracting the calculated atomization energy for CmHn from the experimental DfH0(298 K) for the constituting atoms. As the calculation of the heat of formation is based on atomization enthalpies, the procedure which was described in the above generally leads to certain non-negligible errors, another procedure may be invoked in which at least a partial cancellation of errors occur. This method involves the use of so-called isodesmic reaction schemes, first introduced by People et al.,1 in which hypothetical reactions are constructed that conserve the number of bonds of each order between each pair of atom types, involving species with a well-known enthalpy of formation. An example is

This way it can partially correct for the systematic errors in the ab initio calculations. Alternatively, other schemes have

Fig. 1 Illustration of the calculation of standard enthalpies of formation from ab initio standard enthalpies of atomization.

1766 | Chem. Soc. Rev., 2010, 39, 1764–1779

2.2 State of the art of quantum calculation (q.m.) of the heat of formation There is quite a large number of papers dedicated to the calculation of thermodynamic properties by quantum calculations, in particular heats of formation of molecules. The methods employed in these papers are almost exclusively postHartree Fock or Density Functional Theory methods (see next section for a possible future alternative). Before getting more explicit, in general one could say that the post-Hartree Fock methods should be considered the better ones, but their computational expense is so high that only comparatively small systems (typically less than 50 atoms) can be studied. DFT methods on the other hand are computationally cheap, but on the other hand a problem is that there is now a plethora of density functionals, each having its own merits, and often developed starting from a certain, limited, set of comparatively small molecules. In addition, sometimes corrections are introduced to further close the gap with experiment which makes the method even more empirical and less generally applicable. Going back in history, before ab initio quantum mechanics became a computationally feasible approach, semi-empirical q.m. methods were being used and tables with calculated and experimental heats of formation became available.3 The accuracy, however, although good for its time, was very often appreciably beyond the 4 kJ mol1 level, with exceptions up to 40–80 kJ mol1 for organic molecules. This level of accuracy, albeit a landmark in those days, is clearly insufficient. Moreover, the semi-empirical methods were parameterised using a simplified Hamiltonian, and therefore they were unable to describe organometallic species properly (although attempts have been made) or organic compounds that are either unstable or have an uncommon internal bonding structure. Continuing efforts in developing more advanced semi-empirical methods has led to more advanced methods (such as AM1, PM3, SAMI and PM5) and they have found widespread use in a variety of applications but they still show limited accuracy in reproducing experimental data for a diverse set of quantum mechanical observables. Recent efforts has led to new semi-empirical methods denoted as the PDDG/PM3 and PDDG/MNDO which seem very promising in terms of accuracy. They employ a so-called Pairwise Distance Directed Gaussian modification (PDDG) which introduces functional group information via pairwise atomic interactions using only atom-based parameters.4 A recent benchmark study on a set of 622 organic molecules containing C, N, O and H showed that they have a This journal is

 c

The Royal Society of Chemistry 2010

mean absolute deviation (MAD) of about 8–13 kJ mol1 compared with 28, 18 and 24 kJ mol1 for AM1, PM3 and SCC-DFTB for the same test set. The Hartree–Fock method that was very popular in the early days of computational chemistry gave, on the same test set, MADs in the order of 13–17 kJ mol1 depending on the basis set used. Better agreement (4.5 kJ mol1) could also be obtained for hydrocarbons using empirical fitting procedures but the general applicability of such procedures is of course prohibited.5 To obtain a better agreement with experiment in a more systematic way, the most common approach is to use post-Hartree Fock methods modified to deal with electron correlation. This, however, often leads to highly demanding computational methods (as for instance the so-called CCSD(T) level scales as N7 with N the number of basis functions used to describe the electron cloud) that will not be effective when attempting to tackle the more real-size molecules which are handled by the experimental chemist. Moreover, not having the possibility to perform calculations with full recovery of electron correlation and with an (almost) infinite basis set, it is far from a priori that a higher level method will bring the better results. A recent in depth study on organo-sulfur compounds showed that computationally very expensive methods such as CCSD(T) in combination with a large basis set still overestimate the enthalpy of formation by more than 30 kJ mol1 for some compounds.6 This is illustrated in Fig. 2 for the enthalpies and Fig. 3 for the entropies for 5 S-containing compounds: H2S, SMe2, HSSH, Me–S–S–Me and SQCH2.

In more recent times, Density Functional Theory (DFT) based methods have taken over as they are more efficient than high level post-Hartree Fock calculations. Most of the remainder of this section deals with the performance of these methods to evaluate heats of formation. Independent of the computational method applied, the alkanes are the simplest category of molecules to obtain good results. Other organic molecules are generally more difficult to achieve accurate thermodynamic parameters, in particular the heat of formation. Particular organic classes pose difficult cases, such as heterocycles including nitrogen. Generally the most difficult class is formed by metal containing species, e.g. organometallics. Hydrocarbons are undoubtedly the class of chemical species for which most benchmark computational studies have been performed. For a test set of 64 hydrocarbon molecules including alkanes, alkenes, alkynes and also radicals, the popular B3LYP functional gave a serious overestimation of the enthalpies of formation with respect to the experimental values: Mean Absolute Deviations (MADs) of 31 kJ mol1 and 50 kJ mol1 were found for the B3LYP/6-31G(d) and B3LYP/ 6-311G(d,p) level of theory, respectively.7 It is noticed that the smaller basis set performs better than their larger counterpart. In general B3LYP DFT underestimates the bond energies, yielding atomization energies which are too low and enthalpies of formation which are too high. Using a smaller basis set increases the basis set superposition error (BSSE).8 Another problematic artefact of the B3LYP method and all other ab initio and DFT based methods is the systematic increase of the deviation in terms of the

Fig. 2 Influence of the level of theory and basis set on the mean absolute deviation (MAD) between the ab initio and experimental standard enthalpies of formation at 298 K for 5 S-containing compounds. (With kind permission from Springer Science + Business Media:Theor. Chem. Accounts, 2009, 123, 391–412, A. G. Vandeputte, M.-F.Reyniers and G. B. Marin, Fig. 3).

This journal is

 c

The Royal Society of Chemistry 2010

Chem. Soc. Rev., 2010, 39, 1764–1779 | 1767

Fig. 3 Influence of the level of theory and basis set on the mean absolute deviation (MAD) between the ab initio and experimental standard molar entropies at 298 K for 5 S-containing compounds. (With kind permission from Springer Science + Business Media:, Theor. Chem. Accounts, 2009, 123, 391–412, A. G. Vandeputte, M.-F. Reyniers and G. B. Marin, Fig. 4).

of following equation against the experimental enthalpies of formation: DfH(CmHn; 298 K) = mX + nY  DatomH(CmHn; 298 K) (2)

Fig. 4 Errors in computed heats of formation for linear alkanes as a function of the number of carbon atoms, n (Reprinted with permission from ref. 15. Copyright 2008 American Chemical Society).

number of atoms, so for hydrocarbons the number of carbon atoms and hydrogen atoms, in the molecule.9,10 This is illustrated in Fig. 4. To remove these systematic deviations Saeys et al.7 proposed to use the Atom Additive Correction scheme (AAC).10 Instead of using the experimental standard enthalpies of formation of the carbon and hydrogen atom in eqn (1), they are replaced by new values X and Y which are determined by linear regression 1768 | Chem. Soc. Rev., 2010, 39, 1764–1779

The value of X was determined as 710.05 kJ mol1 and 707.35 kJ mol1 at the B3LYP/6-31G(d) and B3LYP/ 6-311G(d,p) level, respectively, which should be compared to the experimental standard enthalpy of formation of the carbon atom of 716.7 kJ mol1. This deviation of X with respect to the experimental value indicates that an important systematic error in B3LYP calculations takes place. A similar effect is observed for Y, i.e. the enthalpies of formation of the hydrogen atom. The calculated enthalpies of formation at the B3LYP level are significantly improved by using this AAC method, giving MADs of 9.2 kJ mol1 and 13 kJ mol1 at the B3LYP/ 6-31G(d) and B3LYP/6-311G(d,p) levels of calculation, respectively, which should be compared with the original MADs on the same test set of 31 and 50 kJ mol1. However, even within this adapted scheme the MADs are still not within chemical accuracy, moreover maximum deviation for some molecules were noticed in the order of 30–40 kJ mol1. These conclusions were later also generalized for a larger test set of 622 neutral, closed-shell organic compounds containing the elements C, H, N and O: B3LYP gives on average MADs of the order of 22–29 kJ mol1. These errors can be significantly reduced down to about 13 kJ mol1 by using optimized atomic energies similar as the AAC scheme sketched above. In addition, the steady increase in deviation compared to experimental values as the systems become larger is no longer striking with this procedure. This journal is

 c

The Royal Society of Chemistry 2010

Allinger and co-workers showed already a decade ago that accuracies better than ‘1kcal mol1’ (4kJ mol1) could be achieved for hydrocarbons, although there are still exceptions (29 kJ mol1 for tri-t-butylmethane).11 Their procedure is somewhat different than the atomization energy method sketched above. It involves the calculation of total energies by DFT methods, a Boltzmann correction for low energy conformations that are populated, and an adjustable parameter to allow for the extra enthalpy in a molecule as a result of low torsional barriers (in essence the enthalpy equivalent of what we discuss regarding the entropy in the section 3.1 ‘Anharmonicity/bond–bond coupling’). Instead of using the atomization energies of the constituting atoms, they used a set of adjustable bond/group parameters which are estimated for small trivial molecules such as methane.12 In essence the bond/ group parameters are fitted for a small set of test molecules and similarly remove systematic errors present in the ab initio methods and may be compared to the AAC correction scheme sketched above. An emerging field within the DFT community is the development of functionals that include dispersion interactions. Noteworthy is the development of the B2PLYP and mPW2PLYP double hybrid functionals which include a second-order Møller–Plesset type correction. Their training set, expanded from the G3/05 set, included 271 heats of formation, showed MADs of 22 kJ mol1 for the B3LYP/TZV2P, 11 kJ mol1 for the B2PLYP/CQZV3P and 8 kJ mol1 for the mPW2PLYP/ CQZV3P level of theory.13 To account for the dispersion interactions, the DFT-D approach, where an empirical –C6R6 is added to the standard Density Functionals, is very promising.14 Especially from a practical point of view, where the focus is on robustness and computational speed this approach provides high accuracy in many different situations. This approach is most beneficial for complexes where an average improvement on the heats of formation of about 4 kJ mol1 can be expected. For the a general test set of 622 molecules containing C, O, H and N the MAD was reduced by about 1.3 kJ mol1.15 The greatest benefit of the empirical correction term is in reducing the underestimate of the stability of branched over linear alkyl groups, however some systems remain problematic such as bridged polycyclic molecules or species with multiple N–N bonds (errors of 21–63 kJ mol1 were found for polycyclic systems with multiple N–N bonds). Thus also with inclusion of the empirical correction term it appears to be very difficult to go beyond the limit of 8.4 kJ mol1. If more accurate heats of formation are required (MADs less than 8 kJ mol1) , one needs to resort to composite methods (e.g. G2, G3, G3B3, CBS-QB3, W1, W2, or other variants) which are especially designed to reach chemical accuracy by combining the results of several calculations. They combine methods with a high level of theory and a small basis set with methods that employ lower levels of theory with larger basis sets. Afterwards the individual calculation results are combined to reach accuracy limit in which no basis set errors are made and in which electron correlation is fully taken into account. The major drawback of these methods is their applicability to only small molecules (o30 atoms depending on the chosen method). A useful roadmap to use these This journal is

 c

The Royal Society of Chemistry 2010

‘‘accurate’’ methods also for large molecules is the combination of with group additive schemes that are outlined later in this paper. In that case the composite methods are used only for small molecules in order to calculate group additive values which afterwards can be used in a group additive scheme to obtain the thermodynamic data for larger molecules.16 In almost all of the reported MADs of this section only the lowest energy conformer is considered for the calculation of the heat of formation, except in the approach followed by Allinger discussed above. At 298 K the most stable conformer is the dominant one in the equilibrium mixture. Only if the energy difference between the various conformers is of the order of RT (2.5 kJ mol1 at 298 K) the equilibrium mixture will contain a non-negligible contribution of higher energy conformers. This approximation yields errors below 4 kJ mol1 on the enthalpies of formation for flexible molecules with less than 30 atoms.17–19 An overview of the different MADs is presented in Table 1. All of the reported MADs above rely on the atomization enthalpy method. More accurate heats of formation can be obtained by using isodesmic or homodesmotic reactions. As an example, MADs were reported on a test set of 40 molecules composed of H, C, O and N. MADs of 6.3 kJ mol1 and largest deviation of 23 kJ mol1 could be achieved using a set of isodesmic reactions, whereas using the atomization energy method numbers of 11 kJ mol1 and 39 kJ mol1 were reported (using the B3LYP functional).20 In order to obtain similar accuracies with both methods the atomization enthalpy method needs to be combined with the bond additive corrections as explained above.21–23 Whereas most published work deals with organic molecules, organometallics obviously form a very important class of molecules in chemistry. In a study on Zn-complexes, Weaver et al.24 reported that for the CCSD(T) level an average error of over 42 kJ mol1, with the largest deviation being over 145 kJ mol1. The high-level G2 method was shown to lead to errors up to 30 kJ mol1 for polar molecules such as alkali and alkaline oxides and hydroxides.25 In a later study26 much better results were reported, though deviations were still up to 22 kJ mol1. Yang et al.27 reported that, while studying a series of firstrow transition metal complexes using a large variety of density functionals, even the best functional they found exhibited an average unsigned error in the heats of formation of over 40 kJ mol1, see Fig. 5. What seems a larger problem is that a large variety of computational methods are in essence semi-empirical procedures as they rely on parameters which are fitted to obtain the best agreement for a selected number of compounds. Thus, no generally applicable method yielding heats of formation with an acceptable error seems available at present. 2.3 Quantum Monte Carlo (QMC) method An interesting more recent development is the Quantum Monte Carlo method.28,29 Because not that much has been published as yet in the field of chemistry, we will only briefly touch this development. It is worthwhile mentioning that this approach, fully ab initio, is extremely accurate and in that sense goes beyond most traditional ab initio as well as DFT methods. By solving the Schro¨dinger equation through a Chem. Soc. Rev., 2010, 39, 1764–1779 | 1769

Table 1 Overview table of mean absolute deviations (MADs) for various sets of molecules and various calculation methods. All values in kJ mol1 Trainingset Semi-empirical methods 622 organic molecules containing C, N, O and H Hartree–Fock—Post Hartree–Fock and Composite methods 622 organic molecules containing C, N, O and H 64 hydrocarbons 64 hydrocarbons G2/97 data set (148 compounds with 1-6 non-hydrogen atoms) G3/99 data set G2/97 data set DFT Methods 64 hydrocarbons

622 neutral organic compounds containing only C, H, N and O G3/05 data set expanded (271 molecules) G2/97 test set G2/97 data set

G2 data set G3 data set 786 organic compounds formed by the atoms C, H, O, N, S and Cl

Computational Method

MAD

Reference

AM1 PM3 SCC-DFTB PDDG/PM3

28 18 24 13

Sattelmeyer et al., J. Phys. Chem. A, 2006, 110, 13551–13559

HF

13–17

CBS-QB3 CBS-QB3+AAC scheme G2

9 2 7

Tirado-Rives et al., J. Chem. Theory. Comput., 2008, 4, 297–306 Saeys et al., J. Phys. Chem. A, 2003, 107, 9147–9159

G3 SCS/MP2/QZV3P MP2/QZV3P

4 5 7

Curtiss et al., J. Chem. Phys. 2000, 112, 7374–7383 Grimme, S. J. Phys. Chem. A, 2005, 109, 3067

B3LYP/6-31G(d) B3LYP/6-311G(d,p) B3LYP/6-31G(d)+AAC scheme B3LYP/6-311G(d,p)+AAC scheme B3LYP/6-31+G(d,p) B3LYP/6-31+G(d,p)+OC6 B3LYP/TZV2P B2PLYP/CQZV3P mPW2PLYP/CQZV3P B3LYP/6-311+g(df,2p) X3LYP/6-311+g(df,2p) BP86/QZV3P+AAC scheme PBE/QZV3P+AAC TPSS/QZV3P+AAC TPSSH/QZV3P+AAC B3LYP/QZV3P+AAC PBE0/QZV3P+AAC M05-2X/UGBS2P M05-2X/UGBS2P Marrero-Gani method (GC method)

31 50 9 13 11 10 22 11 8 13 12 12 12 13 11 9 10 13 18 7

Saeys et al., J. Phys. Chem. A, 2003, 107, 9147–9159

Fig. 5 Average unsigned heat of formation errors (kcal mol1) for a set of transition-metal systems complexes. The set comprises very small organometallic species containing either of the elements Ti, V, Cr, Mn or Fe. Left-hand bars: 6-31+G**/LanL2dz, middle bars: 6-31G**, right-hand bars: TZVP basis set. (Reprinted with permission from ref. 27. Copyright 2009 American Chemical Society).

stochastic projection, the greatest and best possible accuracy and reliability of all present quantum methods can be 1770 | Chem. Soc. Rev., 2010, 39, 1764–1779

Curtiss et al., J. Chem. Phys. 1997, 106, 1063–1079

Tirado-Rives et al., J. Chem. Theory Comput., 2008, 4, 297–306 Schwabe and Grimme, Phys. Chem. Chem. Phys. 2006, 8, 4398–4401 Xu et al., J. Chem. Phys. 2005, 122, 014105 Grimme, J. Phys. Chem. A, 2005, 109, 3067–3077

Vydrov et al., J. Chem. Phys., 2006, 125, 234109-1–234109-9 J. Marrero and R. Gani, Fluid Phase Equilibria, 2001, 184, 183–208

achieved. A main source of error in all of these calculations is that the electron–electron interaction energies, also known as the correlation energy, is not accounted for to a sufficient extent. In this respect QMC does better, and this is the most likely reason why it yields more accurate heats of formation. To give an indication: Hartree–Fock captures 0% of the correlation energy, CCSD(T) roughly 75% (this is a computationally expensive post-HF method), whereas Variational QMC captures about 85% and Diffusion QMC some 95% of the correlation energy. Capturing that much of the correlation energy is the reason why the method provides such accurate total energies, much better than any of the other quantum methods available today. The result is that cohesive energies or binding energies can be obtained with an accuracy of 1–4%. Although the QMC method is computationally not cheap in first instance, it scales much more favourably (second to third power in system size) compared to traditional post-HarteeFock methods. An explicit sample set is shown in Table 2. The G2MP2 method is supposedly (e.g., according to literature) one of the better post-HF methods, and this seems confirmed by the result for HCN. However, for aniline the G2MP2 result is 21 kJ mol1 off(!). Apart from the fact that an error of 13–21 kJ mol1 is This journal is

 c

The Royal Society of Chemistry 2010

Table 2 Ab initio calculated heats of formation, using two DFT methods (ACM and B3LYP), the post-Hartree–Fock CCSD(T) level, and the G2MP2 level of calculation. Data compared to QMC and experimental results. The B3LYP results were obtained using a 6-311+G(3df,2p) basis set for the energy applied to a geometry obtained at the 6-3111G(d,p) level. All values in kJ mol1 Method species

ACM (TZP)

B3LYP (see caption)

CCSD(T)

G2MP2

QMC

Experiment.

HCN Hydroxyl Vinyl alcohol Aniline Melamine

150.1 — — 86.25 91.03

127.6 42.3 43.5 107.5 12.9

— 54.8 44.4 — —

131.3 — — 107.1 66.82

134 63 41.5 87.9 56.5

131.9 60.7 40.6 86.9 Uncertain

hardly acceptable for application in real design routes, equally important is the observation that the method does not have a typical offset: some species are calculated on the spot, other are may kJ mol1 off the experimental result. This implies that for a species like melamine, for which reliable experimental data are not available and unlikely to become available, it is unclear how to arrive at a reliable value for the heat of formation. Thus, though few data are available, very good agreement in absolute values between QMC results and experimental data is observed, without the empirical character of the G2MP2 method. For these QMC results, the values for aniline and melamine can be further improved by more extended QMC calculations (values in Table 2 are first attempts). The kind of accuracy shown here for small molecules is essential because normally the error blows up proportionally when considering larger molecules. There are many areas in the field of chemistry where accurate energies are currently difficult to obtain. Examples include relative conformational stabilities (see, e.g., ref. 30), heats of formation (see the above), and surface adsorbed species including notorious cases such as CO on Pt, and bulk structures, e.g. oxides.31 It is likely that the potential of QMC regarding achieving chemical accuracy for computationally ‘‘hard’’ problems will become ever more relevant. Obviously, the QMC method will apply irrespective of whether the species of interest in a stable organic or inorganic species, a radical, or an ion, or an adsorbed species.

3. Calculation of the entropy by quantum mechanical methods To obtain the entropy, one needs to evaluate the vibrational frequencies as these appear dominantly in the key expressions (see further below). Vibrational frequencies are proportional to the second derivatives of the energy versus the atomic displacement involved, i.e. the forces on the atoms (see ‘forces’ in the scheme above). These can therefore be directly obtained from a subsequent calculation step starting from the optimized molecular geometry and the previously computed accompanying electronic wave function (see the Scheme in the above). Special attention will be given to methods that go beyond the harmonic oscillator approximation as they have been found to give substantial corrections for molecules with a large degree of flexibility. This will include the simplest models that are used nowadays but also outline more advanced procedures such as coupled internal rotations. A special subsection will be devoted to the symmetry number. We believe all This journal is

 c

The Royal Society of Chemistry 2010

these effects have been studied recently in more detail, but a comprehensive overview for the community is still lacking. To obtain the molecular entropy, the following well-known equations need to be evaluated (i) Vibration X ðhn i =kTÞ expðhn i =kTÞ Svibration ¼ R 1  expðhn i =kTÞ i ð3Þ   ln½1  expðhn i =kTÞ (ii) Rotation for a linear molecule  2  8p IkT Srotation ¼ R ln þR sh2

ð4Þ

whereas for a non-linear molecule ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi "pffiffiffis   ffi# R 2IA kT 2IB kT 2IC kT 3 p Srotation ¼ ln þ R 2 2 s h2  h2  h2  ð5Þ (iii) Translation Stranslation

   5 3 2pk ¼ R þ ln 2 2 h2

ð6Þ

with IA, IB and Ic the moments of inertia of the global molecule, R the gas constant, T absolute temperature, h Planck’s constant, and ni the vibrational frequency associated with the ith normal mode and s the symmetry number.32,33 In practice, comparatively simple methods, including now hardly used methods such as the semi-empirical AM1 and PM3 methods, produce molecular entropies close to experimental values.34 In particular for very small molecules and rigid molecules the agreement is very good: the deviation at T = 298 K is less than 4 J mol1 K. At 298 K this corresponds to 1.2 kJ mol1, which is great because we are looking for chemical accuracy in total enthalpy or total free energy of 4 kJ mol1 (‘1 kcal mol1’). Deviations are observed when, for instance, the number of C–C bonds increases, or the number of rotatable groups increases. The vibrational contribution (eqn (3)) to the entropy is the most significant part as the temperature dependence of the entropy is largely determined by the vibrational contribution. It seems therefore that the low frequency vibrations which dominate the vibrational contribution to the entropy are the most likely candidates to cause disagreement between theoretical and experimental entropies. Mayer et al.,35 however, Chem. Soc. Rev., 2010, 39, 1764–1779 | 1771

long ago put forward that an error of even 10% in the frequencies only leads to a 1% error in the total entropy for one vibrational degree of freedom for a mode at 200 cm1 and T = 300 K. So in essence there must be a different reason for the remaining differences between calculated and experimental molecular entropies. Meanwhile research has learned that the causes for these discrepancies can be attributed to (one or more of) the following factors: (i) depending on the rotational barrier, not all bond rotations should be handled as harmonic oscillators, and also rotational coupling between neighbouring bonds may be important, (ii) the symmetry number is in fact a good number only for bonds with a high barrier to rotation. These two issues are discussed in the following two sub-sections. 3.1

Anharmonicity and bond–bond coupling

All thermodynamic molecular properties depend on the molecular partition function, with a vibrational, rotational, translational and electronic part, resulting in the expressions for example of the entropy given in the previous section. These expressions and more in particular the vibrational part is only valid if the molecule undergoes only small deviations from the equilibrium geometry. This is true at low temperatures or for internal motions characterized by steep potentials. The low vibrational spectrum in most molecules is, however, characterized by some large-amplitude vibrations that give rise to large deviations from the equilibrium configuration. Among them internal rotations (IR) about single bonds are certainly the most important and have also received the most interest in literature. As the shape of the rotational potential gives rise to various stable conformers, the consideration of a quadratic form of the potential (as harmonic oscillator) is no longer meaningful and can lead to partition functions which largely deviate from the real ones. The concept is illustrated in Fig. 6 for n-butane. To cure this problem for the relevant modes, various models have been suggested. The simplest approximation was suggested by Pitzer36 and has become a method of general use for onedimensional treatment of internal rotors. However, in that approach only the coupling with the overall rotation is taken into account, but neither relaxation effects nor coupling with other internal modes are accounted for. The calculation of the moments of inertia is done for the most stable geometry and is

assumed fixed along the rotational path. This method has been used frequently latest years and is generally noted as the 1D-HR approximation.37–39 Various special software tools have been developed to calculate the corrections for the large amplitude motions which exhibit anharmonic features. Despite its crude approximation the method was shown to yield quantitatively good predictions for a series of n-alkanes and other molecules such as alcohols. Some literature studies showed that the ‘‘good performance’’ of these one-dimensional models, in which no coupling with other large amplitude motions and no coupling with other vibrational modes is taken into account, is due to a fortuitous cancellation of errors.40 But the 1D-HR model showed that the entropies of a series of n-alkanes up to n-decane could be predicted with a standard deviation of less than 1%—i.e. approximately 5 J/(mol K) for the decane-whereas the standard Harmonic Oscillator (HO) model underestimates the experimental values of the entropy by about 10%.41 In more recent work this conclusion was generalized for molecules containing heteroelements such as alcohols/thiols and ethers/sulfides ethers.42 An exact theoretical framework was derived later on, in which the vibrational modes that are treated harmonically and largeamplitude motions are separated correctly. It was shown that the 1D-HR model nearly coincides with the exact model unless at higher temperatures or for small potential energy variations along the large-amplitude path.43 In summary, to obtain a good reproduction of thermodynamic properties it is essential to treat the anharmonic motions appropriately. On the other hand, though bond–bond coupling or coupling with other vibrational modes is present, its effect is hardly seen in the derived thermodynamic properties. 3.2 The symmetry number The symmetry number, present in the rotational part of the partition function, involves the number of rotations that superimpose the molecule onto itself. There is an external symmetry number involving rotations of the entire molecule only, and there is an internal symmetry number which involves rotation of a subgroup of the molecule, e.g. a methyl group. The appearance of the symmetry number is a consequence of the fact that double counting of (quantized) rotational levels should be prevented.32 Whether the effective symmetry number equals

Fig. 6 The potential of the ethyl rotation in butane relative to the energetically most favoured conformation.

1772 | Chem. Soc. Rev., 2010, 39, 1764–1779

This journal is

 c

The Royal Society of Chemistry 2010

the external symmetry number only, or that it is to be taken the sum of the internal and the external symmetry numbers depends on whether the internal subgroup is not free (external symmetry number only) or free (add external and internal symmetry number) to rotate. In practice every internal rotation will involve a finite energy barrier, and therefore neither of the extremes formally applies. A more exact numerical solution for intermediate cases may be pursued44 but is rarely applied. In practice it is commonly assumed that the rotational barriers are relatively high and only the external symmetry number is taken account of.45,46 Most importantly, regarding the application of computed entropies, this procedure has led to good agreement with experimental data (for force field calculated entropies see ref. 46). We wish to emphasize, however, that formally one might argue about the appropriateness of this route which uses external symmetry numbers only. 3.2.1 Continuous symmetry numbers. Although not really practised regularly, arguments and procedures have been put forward in the past for the introduction of continuous symmetry numbers. To cite Pitzer: ‘‘Ordinarily one would assign a symmetry number of three to pyramidal NH3 or six to a hypothetical planar NH3. If the pyramid is rapidly reversing itself the effective symmetry should also be six’’.47 This touches the point that the symmetry number depends on the observation time of the spectator or the spectrometer. To circumvent such problems, the concept of continuous symmetry numbers was introduced in this field by Avnir and co-workers.48 Despite the absence of many practical examples in literature, let us briefly recall some presented by Estrada and Avnir. Take the isotopically labelled 34S12C32S, which should formally have s = 1, whereas the experimental entropy agrees very well with s = 2. The application of the continuous symmetry concept leads led s = 1.9988, thereby resolving the problem observed between theory and experiment.49 Similar apparent discrepancies occur in the case of small distortions, e.g. Jahn–Teller distortions, that would have serious effects when the traditional integer symmetry numbers would be used only. In conclusion, although the effects are not huge, a requirement of an accuracy better than 4 kJ mol1 for the Gibbs free energy does impose a very strict requirement on entropy calculations as well.

4. Group contribution methods: scope and limitations Group contribution (GC) methods (see ref. 50–53 to name a few) belong to the class of empirical property models, also known as additive methods, where the molecular structure of a compound is broken down into building blocks and the property of a compound is estimated by the summation of the contributions of each building block (as schematically illustrated in Fig. 7). In GC-methods, the building blocks are functional groups and these methods are based on the assumption that a property value of any group has the same contribution in all the compounds where it appears and that the property value of the compound is a function of the contributions of all the groups needed for a unique representation of the molecular structure of the compound. In this way, the GC-methods are This journal is

 c

The Royal Society of Chemistry 2010

Fig. 7 Schematic representation of the Group Contribution concept where the molecule is broken into building blocks.

similar to methods based on topological indices,54 where the building blocks are topological indices,55,56 bond contributions,57,58 conjugates59,60 or a combination of them. The GC-methods could also be considered a special class of QSAR methods61,62 where the molecular descriptors are the building blocks, that is, the functional groups. The advantage of the GC-methods is that they are simple and easy to use (see, later, eqn (7)) because a property, such as the standard heat of vaporization of a compound, is usually calculated as a simple linear summation of the contributions of the functional groups representing the molecular structure of the compound. These GC-methods are predictive in nature because the same groups may be used to represent the molecular structure of more than one compound, just like the same atoms can be found in many molecules. They have therefore, a potentially wide application range and appear to be best suited for many process engineering design/simulation/analysis related calculations because they are purely predictive and computationally simple and inexpensive to apply. The connection between the GC-methods and the quantum mechanical calculations is that the latter may provide the building blocks for the former in the form of molecular descriptors, e.g., Gani et al. have used the MOPAC suite of programs45 to generate and evaluate molecular descriptors. Generally, GC-methods are useful when qualitatively correct estimates of the properties are needed and/or the demand for quantitative accuracy is not very high. This also implies an important limitation of the GC-methods, that is, it may not always be possible to obtain reliable (especially, quantitative) predictions of the needed properties for some compounds. Another limitation is that the set of functional groups may not be able to represent all compounds, such as isomers. The application of the GC-methods are limited to the compounds that can be represented by the existing set of functional groups and to the properties for which the group contributions (regressed parameters) are available. 4.1 Order of GC-methods GC-models are usually classified in terms of the order of the functional groups they employ. Based on this classification, Chem. Soc. Rev., 2010, 39, 1764–1779 | 1773

the method of Joback and Reid51 is classified as first-order because they employ first-order functional groups while the method of Constantinou-Gani52 is classified as second-order because of the use of first- and second-order functional groups, and the methods of Benson52 and Marrero-Gani53 are classified as higher-order because of the use of first-, second- and third-order functional groups. In the case of the methods of Constantinou-Gani and Marrero-Gani, the second- and third-order group contributions are added as correction terms to the first-order group contributions. The general form of these methods52,53 is given by eqn (7). f ðYÞ ¼

X i

!a Mi Ci

þ

X j

!b Nj Dj

þ

X

!c Ok Ek

ð7Þ

k

In the above equation, f(Y) is a function of property Y; Ci, Dj and Ek are contributions of functional groups of order 1, 2 and 3, respectively; Mi, Nj and Ok are number of times groups i, j and/or k appear in the molecular representation of the compound, and, a, b, c define the type of model equation. When a = b = c = 1, the model has only linear contributions; when any of these parameters is zero, the corresponding contribution is not used in the estimation of the property; while when they are not zero or 1, the contributions are non-linear. The contributions Ci, Dj and Ek are usually determined by regression using a collected set of experimental data for various properties: the standard enthalpy of fusion, the standard enthalpy of vaporization, critical temperature, normal boiling point and many more. For example, Marrero and Gani used measured standard heats of formation data for 803 compounds to estimate the contributions Ci, Dj and Ek. See also section 4.3 for prediction of the group parameter values (that is, predicting the group contributions as opposed to regressing them by fitting experimental data). Note that GC-methods where the right hand side of eqn (7) employ non-linear terms of the first-order group contributions and/or functions of first-order group contributions and other properties can also be found,50 as highlighted

by eqn (8) for estimation of critical temperature (Tc) of organic molecules, which needs values of normal boiling points (Tb) as well as first-order group contributions (Ci). Tc = Tb/[0.584 + 0.965 * SMiCi  (SMiCi)2]

(8)

4.1.1 Functional groups of GC-methods. The basic level has a large set of simple groups that allows describing the molecular structures of a wide variety of organic compounds. However, these groups capture only partially the proximity effects and are unable to distinguish among isomers. For this reason, the first level of estimation is intended to deal with simple and mono-functional compounds. To accommodate this deficiency, Marrero and Gani53 and before them, Constantinou and Gani52 and Benson51 (who also used non-linear group contributions) introduced higher-order groups, whose purpose was to add molecular structural information (that the first-level groups could not include) of the compound as a higher-level contribution or correction to the value from the first level. The second level involves groups that permit a better description of proximity effects and differentiation among isomers. The second level of estimation is consequently intended to deal with poly-functional, polar or non-polar compounds of medium size, i.e. number of carbon atoms = 3–6, and aromatic or cyclo-aliphatic compounds with only one ring and several substituents. The third level has groups that provide more structural information about molecular fragments of compounds whose description is insufficient through the first and second level groups. The third level of estimation allows estimation of complex heterocyclic and large (number of carbon atoms = 7–60) poly-functional acyclic compounds. The ultimate objective of the proposed multilevel scheme is to enhance the accuracy, reliability and the range of application for a number of important pure component properties. This multi-level molecular representation and property calculation scheme has been shown to enhance the accuracy,

Table 3 Three examples of representation of molecular structures through functional groups. In column 4, [2] indicates group type of the specific group Functional groups Molecule

First-order

Second-order

Third-order

7 aCH, 2 aC, 1 aN

None

1 arom.fused2 1 pyridene.fused2

1 CH3, 2 CH2, 1 CH3CO

1 CH3COCH2

None

9 aCH, 1 aC, 1 aC-NH2, 1 aC-NH

1 aromrings1s4

1 aC-NH-aC (different rings)

1774 | Chem. Soc. Rev., 2010, 39, 1764–1779

This journal is

 c

The Royal Society of Chemistry 2010

reliability and the range of application of the Marrero and Gani method for the following properties: normal boiling point, normal melting point, critical properties, the standard heat of formation, the standard heat of fusion and the standard enthalpy of vaporization. A sample of first-, secondand third-order groups are given as examples in Table 3. Use of the GC-method requires the correct representation of the compound molecular structure with the functional groups. Marrero-Gani53 has proposed rules to correctly assign the groups that occur in a given compound and avoid ambiguous descriptions. According to these rules, first-order groups should describe the entire molecule. In other words, there should be no fragment of a given molecule that cannot be represented by first-order groups. It should be also required that no atom of the molecule can be included in more than one group, which implies that no group is allowed to overlap any other first-order group. These requirements are equivalent to consider the set of groups describing a given compound as a full and disjoint splitting of the molecular structure. Also, the ‘‘weight’’ of a group is defined as the sum of atomic weights of the atoms involved in that group. It is assumed that heavier groups hold more information about the molecular structure than lighter groups. Consequently, if the same fragment of a given compound is related to more than one group, the heavier group must be chosen53 to represent it instead of the lighter groups. For example, 3,3-dimethyl-2-butanone, (CH3)3C–CO–CH3, is described correctly in the following way:53 (1) CH3CO, (3) CH3, (1) C. Unlike first-order groups, second-order and third-order groups are allowed to overlap when they have atoms in common. It is necessary, however, to prevent a situation in which one group overlaps completely another group since it would lead to a redundant description of a same molecular fragment. 4.2

Development and use of GC-methods

Development of the GC-methods requires data, that is, a set of measured values of the property of interest versus the molecular structures of the corresponding compounds. A suitable correlation, such as eqn (7), where the property function f(Y), which may be a non-linear function, needs to be established as a trial model. A groups set, for example, a set of first-order groups, is selected and the GC-model parameters (Ci, i = 1, NC; where NC is the total number of first order groups), that is, the contributions Ci, are regressed. If the correlation errors are acceptable, a GC-model may be considered developed for the property Y. The same procedure may be repeated using functional groups of 1st-, 2nd- and 3rd-order. Once the model has been developed, a table of the functional groups and their regressed parameters (group contributions) are made available.50,52,53 The following steps may be used for estimation of a property of interest: 1. Identify the group types available (1st-, 2nd- and 3rdorder groups) 2. Determine how many groups of each type (from the available list) are needed to represent the molecular structure of the compound 3. Retrieve the parameters from the regressed model parameter (group contributions) tables for the property of interest This journal is

 c

The Royal Society of Chemistry 2010

Table 4 Illustration of the application of a GC-method (MarreroGani) for calculation of the heat of formation at 298 K for aniline. Note that for this molecule only needs first order groups Compound details Name: Aniline Cas-Number: 000062-53-3 SMILES: Nc(cccc1)c1

Molecular structure 2-D drawing

Formula: C6H7N

Group representation 5 of aCH (aromatic CH) 1 of aCH–NH2

Heat of formation calculation DfH  5.549 = Si MiCi + Sj NjDj + Sk OkEk DfH = 88 kJ mol1

4. Apply the property model function (eqn (7)) to calculate the property Y. For aniline (also listed in Table 1), the calculation of the heat of formation DfH (in kJ mol1) at 298 K is highlighted in Table 4. Note that for this property model, f(Y) = DfH. In Fig. 8–11, the correlation of DfH (kJ mol1), DfG (kJ mol1), DfusionH (kJ mol1) and DvapH (kJ mol1) at 298 K with the Marrero-Gani method53 is illustrated, where organic compounds formed by the atoms C, H, O, N, S and Cl were included in the database. It can be noted that the regression of DfH, DfG and DvapH have been quite good, but it is evident that the standard enthalpy of fusion (and also normal melting point temperature) is difficult to model accurately only through the contributions of the functional groups. 4.2.1 Other examples of GC-methods. Successful applications of the GC-method to estimate secondary or functional properties, such as the enthalpy of vaporization as a function of temperature and vapour pressures as a function of temperature for the pure organic compound have also been reported. Also the GC-method has been very successfully applied for the estimation of liquid phase activities as a function of temperature and composition of the compounds present in the liquid mixture (such as the well-known UNIFAC-models63,64). In the present review paper, however, only the pure compound properties are considered. First-order GC-models for standard enthalpies of vaporization, standard enthalpies of fusion, and standard heats of formation have been proposed by Joback and Reid,50 Ma and Zhao.65 Higher-order models for these properties have been proposed by Benson,51 Constantinou and Gani,52 Marrero and Gani.53 Kolska et al.,66 developed higher-order GC-models for enthalpies of vaporization and entropies of vaporization at 298 K as well as at the normal boiling point of the compound and compared their models with other GC-models67–71 for the same properties. All these models can be represented by eqn (7), although the groups sets for each model are different. 4.2.2 Use of corresponding states principle with GC-methods. Some authors have combined the corresponding states principles with GC-methods to propose estimation methods for temperature dependent properties. For example, Ping et al.72 proposed a method for estimation of enthalpy of vaporization as a function Chem. Soc. Rev., 2010, 39, 1764–1779 | 1775

Fig. 8 Correlation of DfH (kJ mol1) at 298 with 787 measured data points representing organic compounds formed by atoms C, H, O, N, S and Cl. The results of the linear fitting gives following relation DH fcalc = 1.0008 DHfexp 0.04622 with an R2 value of 0.9959. The MAD for this dataset is 7.25 kJ mol1.

Fig. 9 Correlation of DfG (kJ mol1) at 298 with 773 measured data points representing organic compounds formed by atoms C, H, O, N, S and Cl. The results of the linear fitting gives following relation DGfcalc = 0.997 DGfexp 0.3534 with an R2 value of 0.9951.

of temperature where the group contribution approach is combined with the corresponding states principle. Similarly, estimation of properties such as the heat capacities of liquids as a function of temperature is given in Poling et al.73 cpli  cpio 0:49 ¼ 1:586 þ R 1  TTc 2 6 þo6 44:2775 þ

 1 3 6:3  1  TTc T Tc

3

ð9Þ

0:43557 7 þ 1  TTc 5

In the above model, the primary properties Tc (critical temperature) and o (acentric factor) are estimated with a suitable GC-method.50,52,53 Similarly, corresponding states principles may be used to generate the temperature dependent enthalpies 1776 | Chem. Soc. Rev., 2010, 39, 1764–1779

Fig. 10 Correlation of DfusionH (kJ mol1) at 298 with 765 measured data points representing organic compounds formed by atoms C, H, O, N, S and Cl. The results of the linear fitting gives following relation DHfusion,calc = 0.7928 DHfusion,exp + 3.389 with an R2 value of 0.8004.

Fig. 11 Correlation of DvapH (kJ mol1) at 298 with 424 measured data points representing organic compounds formed by atoms C, H, O, N, S and Cl. The results of the linear fitting gives following relation DHvap,calc = 0.9862 DHvap,exp +0.5416 with an R2 value of 0.9846.

of vaporization, where the vapour pressures as a function of temperature may be generated through an equation of state, such as the well-known cubic SRK equation of state74 or the more recent PC-SAFT (perturbed chain—statistical associating fluid theory) equation of state.75 For the SRK equation of state, the needed pure compound primary properties (critical temperature, critical pressure and acentric factor) may be estimated through a GC-method. Also, a GC-method may be directly employed to estimate the three pure compound parameters (the average number of segments (hard-chains); the temperature independent segment diameter; and the depth of the pair potential characterizing the dispersion forces) for the PC-SAFT equation of state. A more direct GC-approach has been proposed recently by Kolska et al.76 and Ceriani et al.,77 where the latter proposed a This journal is

 c

The Royal Society of Chemistry 2010

Table 5 Comparison of correlation statistics for a class of compounds for liquid heat capacities. The numbers in parenthesis indicate the number of data-points Average relative deviation (ARDa) Class of compound

eqn (9)

eqn (10) (Ceriani et al.77)

Kolska et al.76

Fatty acids (150) Fatty esters (125) Fatty alcohols (557) Hydrocarbons (1395)

2.79 0.46 3.15 2.32

5.78 8.86 11.58 7.94

2.94 0.58 7.52 2.18

a ARD = Si (1/ND) 100 (CplE  CplC)/CplE)I,where ND is the number of data-points; CplE and CplC are the experimental and calculated liquid heat capacities (kJ/(kmol K), respectively; and the summation is for i = 1, ND.

GC-model for the estimation of the heat capacities of organic liquids (mainly fatty compounds found in edible oil and biofuel industries) as a function of temperature and compared the performance of this model with that of Kolska et al.76 as well as the corresponding states model (eqn (9)). The model proposed by Ceriani et al.77 is given by X f ðYÞ ¼ Mi ðCi þ Bi TÞ ð10Þ i

Where, f(Y) represents the liquid heat capacity of an organic compound at temperature T and the Bi and Ci are regressed group parameters and Mi is the number of times group i is present in the molecular structure. Ceriani et al.77 have compared the use of eqn (9) and (10) with that of the model proposed by Kolska et al.76 for liquid heat capacities (kJ kmol1 K1) for hydrocarbons and complex lipid compounds (fatty esters, acids, glycerides, etc.). A summary of the correlation statistics77 is presented in Table 5. 4.3

Predictions of group contributions

A common frustration of using property estimation methods in general and GC-methods in particular is that the selected model/method may not have all the needed parameters, such as the functional groups needed to represent the molecular structure of the compound whose properties are to be estimated. Also, even if the groups are available, for some compound, the regressed group parameters may not give a prediction of acceptable accuracy. One way to address these limitations within the GC-method is to add new groups and/or regress the group parameters again. Both these options, however, normally require experimental data so that the new groups can be defined and their contributions estimated or the existing contributions fine-tuned for improved accuracy. This requires time and resources and therefore may not be convenient for the model user. Instead, Gani et al.78 proposed a new GC+ method where missing groups are created and their contributions predicted through a set of zero-order and firstorder connectivity indices.55 First, an atom-connectivity (CI) method is created for the same property function, f(Y) from eqn (7), as shown below (see eqn (11)). X f ðYÞ ¼ ðai Ai Þ þ bðn w0 Þ þ 2cðn w1 Þ þ d ð11Þ i

This journal is

 c

The Royal Society of Chemistry 2010

where, Y is the pure component property to estimate; Ai is the number of ith-atom occurring in the molecular structure; vw0 is the zeroth-order (atom) connectivity index; vw1 is the firstorder (bond) connectivity index; ai is the contribution of atom i; b and c are adjustable parameters; and, d is a constant (could also be set to zero). Now using the same data-set employed to regress the GCmethod parameters, the parameters for the atom-connectivity based model (eqn (11)) are also regressed. High accuracy in the prediction of f(Y) therefore cannot be expected from this method since with a smaller set of parameters, the same set of compounds are being represented, although, greater accuracy can be obtained by adding higher order connectivity indices79,80 or other molecular descriptors. The GC+ method (automatic creation of groups with CI-method and property prediction with GC-method) is used only when the molecular structure of a compound cannot fully be represented by the available functional groups and/or when one or more of the functional groups representing the compound do not have the corresponding property contribution in the group parameter tables. In this case, the CI-method is used to predict the missing (or new) group contribution. In order to maintain consistency when applied to groups the CI model is rewritten in the following form:78 X f ðYm Þ ¼ ðam;i Am;i Þ þ bðn w0 Þm þ 2cðn w1 Þm ð12Þ i

f ðYÞ ¼

X

! nm f ðYm Þ

þd

ð13Þ

m

f ðYÞ ¼

X

! Ni Ci

þ f ðYÞ þ higher-order terms

ð14Þ

i

In eqn (12)–(14), m indicates the number of different missing groups/fragments and nm indicates the number of times missing group/fragment appears in the molecule. This ensures additivity regardless of the number of groups generated from connectivity indices. The final form of the GC+ method is given by eqn (14). Therefore, the applicability of the GCmethods can be significantly increased through the GC+ method, without the need for new experimentally measured data. However, when such data becomes available, the predicted GC-parameters may be fine tuned. According to Gani et al.,78 fitting the same number of data points as Marrero and Gani,53 the average absolute deviation was found to be 7.2 kJ mol1 for DfH at 298 K. For example, the aC functional group contribution representing 1,4-benezenediamine, N-phenyl (see Table 3) is not available for DfH. This contribution is predicted by the CI-method (eqn (12)–(13)) and inserted into eqn (14) to predict a value of 225 kJ mol1 (experimental value is 206 kJ mol1). Finally, more recently ab initio based predictions have also been reported81 with accuracy down to the o8 kJ mol1 range for the heats of formation. Similarly group contribution values for various groups of short lives species such as radicals, but which are abundant species in various reaction networks, were estimated on basis of ‘‘high level’’ ab initio methods in various papers by Marin and co-workers and Green and Chem. Soc. Rev., 2010, 39, 1764–1779 | 1777

co-workers.16,82 The method was also extended to predict heat capacities and standard entropies for hydrocarbon radicals.83

5. Final discussion, summary and outlook We have discussed the calculation of the gas-phase molecular enthalpy of formation and corresponding molecular entropy using quantum mechanical methods as well as group contribution methods. In general, it has turned out very difficult, if at all possible, to produce a fair comparison between published results. The reason for this includes the fact that different sets of molecules are used in different studies. Moreover most data sets comprise primarily organic molecules, whereas the number of available benchmarks for inorganic species is rather limited. In order to be able to compare with experiment, one has to include all the correct physics, e.g. other low lying states. Interestingly, there are papers claiming a very good agreement with experiment without taking into account such factors, which might be an indicate that the ‘‘true’’ agreement when taking into account all physical factors might be not that good. Furthermore, there is an ever-growing set of density functionals and new composite post-Hartree–Fock methods that are, at the moment of publication, tested for calculation of heats of formation. Some of the more recent methods include empirical corrections such as additional dispersion term in a DFT calculation. Other refinements are based on taking into account additional physical phenomena such as the inclusion of more, low-lying, conformations. A relevant question for all of the tested methods is their universality in the sense that a large variety of methods are benchmarked and tested for a selected number of compounds and that the usage of them beyond the original set is not guaranteed. Despite all these efforts, and a world of different approaches tested thus far, in general, the quantum mechanical methods do not yet seem able to reach what has been defined long ago as chemical accuracy: ‘1 kcal mol1’. For those methods that show this best current performance, always either of the following factors applies: (i) it is for a limited set of species, in a lot of cases a (restricted) set of hydrocarbons (e.g. Hartree– Fock or semi-empirical methods referred to in the above), or (ii) it uses a composite post-Hartree–Fock method that has been designed to do a good job for a certain set of species. For those cases a MAD (mean average deviation) of 5–10 kJ mol1 can be reached, though it should be born in mind that the largest deviation is often about 5 times higher. In addition, the molecules studied are still comparatively small, and errors in heat of formation obtained by quantum mechanical ab initio calculations tend to increase with the size of the molecule unless specific corrections are accounted which are again based on fitting to get the best agreement for a given set of molecules. Next, Quantum Monte Carlo, though its application in the chemical field is thus far very limited, seems a promising method. First of all, the limited amount of available data shows very good performance. Secondly, it scales favourably with the size of the molecule. Thirdly, it does not involve any form fitting like the composite QM methods, and is therefore truly ab initio. Further developments are to be expected within 1778 | Chem. Soc. Rev., 2010, 39, 1764–1779

the framework of QMC that can be expected to increase the quality of results. The GC-methods, which are predictive by their nature, thus far primarily developed on organic molecules, are being continuously refined in terms of accuracy and application range. Consequently, they are limited to predicting variations within specific classes of molecules. Attempts to combine the GC-methods with smaller scale models (atom connectivity, ab initio, etc.) have resulted in interesting options to further expand the application range of GC-methods without suffering from loss of accuracy or flexibility. Also, these options have provided opportunities to predict any missing group contribution without the need for new experimental data or regression of data. Therefore, much work is necessary to expand the scope of the combined GC-methods so that a larger percentage of all possible millions of chemical structures can be analyzed by these models in a pre-screening step before employing the more expensive experimental resources. Finally, for large, complex multifunctional compounds, QSAR like methods combining GC and molecular descriptors may be more appropriate. To be able to treat real-life (not only small) molecules, an approach is needed that can handle large molecular systems. A method is needed with favourable scaling behaviour. This essentially rules out most post-Hartree–Fock methods and a lot of composite quantum mechanics methods for the direct calculation of the heat of formation of larger molecular systems, as the computational cost increases tremendously in terms of the number of atoms in the molecular system. A viable alternative for larger molecules is the combination of group contribution methods with groups additive values that are determined with the best available computational ab initio methods seems to be a viable alternative to obtain thermodynamic properties near chemical accuracy. New developments and full use of existing tools may lead to further improvements. Finally, it is perhaps interesting to notice that chemists traditionally think from the point of view of an absolute error of ‘1 kcal mol1’ as ‘chemical accuracy’, which is perhaps understandable because they look at thermal equilibrium constants and rate constants, and a 1 kcal mol1 error has a significant effect on those quantities. Process technologists want to build plant models with say an accuracy of 5%, this is relative. This is reflected in the use of MAD’s by chemists (ab initio part) and the ARD in the GC part.

Acknowledgements Dr Betty Coussens, DSM Research, Geleen, The Netherlands, is gratefully acknowledged for the ab initio data in Table 2, while we thank Dr Lubos Mitas, University of Illinois at Urbana-Champaign, for the QMC results in the same Table. VVS acknowledges the Research Board of the Ghent University for Funding.

References 1 W. J. Hehre, R. Ditchfield, L. Radom and J. A. Pople, J. Am. Chem. Soc., 1970, 92, 4796–4801. 2 S. E. Wheeler, K. N. Houk, P. v. R. Schleyer and W. D. Allen, J. Am. Chem. Soc., 2009, 131, 2547–2560. 3 J. J. P. Stewart, J. Comput.-Aided Mol. Des., 1990, 4, 1–105.

This journal is

 c

The Royal Society of Chemistry 2010

4 M. P. Repasky, J. Chandrasekhar and W. L. Jorgensen, J. Comput. Chem., 2002, 23, 498–510. 5 W. C. Herndon, Chem. Phys. Lett., 1995, 234, 82–86. 6 A. G. Vandeputte, M.-F. Reyniers and G. B. Marin, Theor. Chem. Acc., 2009, 123, 391–412. 7 M. Saeys, M.-F. Reyniers, G. B. Marin, V. Van Speybroeck and M. Waroquier, J. Phys. Chem., 2003, 107, 9147–9159. 8 F. B. Vanduijneveldt, J. G. C. M. Vanduijneveldt-Vanderijdt and J. H. Van Lenthe, Chem. Rev., 1994, 94, 1873–1885. 9 L. A. Curtiss, K. Raghavachari, P. C. Redfern and J. A. Pople, J. Chem. Phys., 2000, 112, 7374–7383. 10 P. C. Redfern, P. Zapol, L. A. Curtiss and K. Raghavachari, J. Phys. Chem. A, 2000, 104, 5850–5854. 11 J. Labanowski, L. Schmitz, K.-H. Chen and N. L. Allinger, J. Comput. Chem., 1998, 19, 1421–1430. 12 N. L. Allinger, K. Sakakibara and J. Labanowski, J. Phys. Chem., 1995, 99, 9603–9610. 13 T. Schwabe and S. Grimme, Phys. Chem. Chem. Phys., 2006, 8, 4398–4401. 14 S. Grimme, J. Antony, T. Schwabe and C. Mu¨ck-Lichtenfeld, Org. Biomol. Chem., 2007, 5, 741–758. 15 J. Tirado-Rives and W. L. Jorgensen, J. Chem. Theory Comput., 2008, 4, 297–306. 16 M. K. Sabbe, M. Saeys, M.-F. Reyniers, G. B. Marin, V. Van Speybroeck and M. Waroquier, J. Phys. Chem. A, 2005, 109, 7466–7480. 17 K. W. Sattelmeyer, J. Tirado-Rives and W. L. Jorgensen, J. Phys. Chem. A, 2006, 110, 13551–13559. 18 J. J. P. Stewart, J. Mol. Model., 2004, 10, 6–12. 19 D. F. J. DeTar, J. Phys. Chem. A, 1998, 102, 5128–5141. 20 K. Raghavachari, B. B. Stefanov and L. A. Curtiss, Mol. Phys., 1997, 91, 555–559. 21 G. A. Petersson, D. K. Malick, W. G. Wilson, J. W. Ochterski, J. A. Montgomery, Jr. and M. J. Frisch, J. Chem. Phys., 1998, 109, 10570–10579. 22 I. Fishtik, R. Datta and J. F. Liebman, J. Phys. Chem. A, 2003, 107, 695–705. 23 J. Yu, R. Sumathi and W. H. Green, Jr., J. Am. Chem. Soc., 2004, 126, 12685–12700. 24 M. N. Weaver, Y. Yang and K. M. Merz, Jr., J. Phys. Chem. A, 2009, 113, 10081–10088. 25 A. Schulz, B. J. Smith and L. Radom, J. Phys. Chem. A, 1999, 103, 7522–7527. 26 M. B. Sullivan, M. A. Iron, P. C. Redfern, J. M. L. Martin, L. A. Curtiss and L. Radom, J. Phys. Chem. A, 2003, 107, 5617–5630. 27 Y. Yang, M. N. Weaver and K. M. Merz, Jr., J. Phys. Chem. A, 2009, 113, 9843–9851. 28 L. Mitas, Comput. Phys. Commun., 1996, 96, 107–117. 29 K. P. Esler, J. Kim, D. M. Ceperley, W. Purwanto and E. J. Walter, J. Phys. Conf. Ser., 2008, 125, 012057. 30 B. Galabov, J. P. Kenny, H. F. Schaefer III and J. R. Durig, J. Phys. Chem. A, 2002, 106, 3625–3628. 31 A. Bande and A. Lu¨chow, Phys. Chem. Chem. Phys., 2008, 10, 3371–3376. 32 P. W. Atkins, Physical Chemistry, Oxford University Press, Oxford, 2nd edn, 1982. 33 W. J. Moore, Physical Chemistry, Longman, London, 1972. 34 R. A. Barrett and R. J. Meier, THEOCHEM, 1996, 363, 203–209. 35 J. E. Mayer, S. Brunauer and M. Goeppert-Mayer, J. Am. Chem. Soc., 1933, 55, 37–53. 36 K. S. Pitzer and W. D. Gwinn, J. Chem. Phys., 1942, 10, 428–440. 37 J. P. A. Heuts, R. G. Gilbert and L. Radom, Macromolecules, 1995, 28, 8771–8781. 38 J. Pfaendtner, X. Yu and L. J. Broadbelt, Theor. Chem. Acc., 2007, 118, 881–898. 39 V. Van Speybroeck, D. Van Neck, M. Waroquier, S. Wauters, M. Saeys and G. B. Marin, J. Phys. Chem. A, 2000, 104, 10939–10950. 40 V. Van Speybroeck, P. Vansteenkiste, D. Van Neck and M. Waroquier, Chem. Phys. Lett., 2005, 402, 479–484. 41 P. Vansteenkiste, V. Van Speybroeck, G. B. Marin and M. Waroquier, J. Phys. Chem. A, 2003, 107, 3139–3145.

This journal is

 c

The Royal Society of Chemistry 2010

42 P. Vansteenkiste, T. Verstraelen, V. Van Speybroeck and and M. Waroquier, Chem. Phys., 2006, 328, 251–258. 43 P. Vansteenkiste, D. Van Neck, V. Van Speybroeck and M. Waroquier, J. Chem. Phys., 2006, 124, 044314. 44 K. S. Pitzer, Quantum Chemistry, Prentice-Hall, New York, 1953. 45 J. J. P. Stewart, MOPAC 93. 00 Manual, Fujitsu Limited, Tokyo, Japan, 1993. 46 J. Li and N. L. Allinger, J. Am. Chem. Soc., 1989, 111, 8566–8575. 47 K. S. Pitzer, J. Chem. Phys., 1939, 7, 251–255. 48 E. Estrada and D. Avnir, J. Am. Chem. Soc., 2003, 125, 4368–4375. 49 A. L. East and L. Radom, J. Chem. Phys., 1997, 106, 6655–6674. 50 K. G. Joback and R. C. Reid, Chem. Eng. Commun., 1987, 57, 233–243. 51 S. W. Benson, F. R. Cruickshank, D. M. Golden, G. R. Haugen, H. E. O’Neal, A. S. Rodgers, R. Shaw and R. Walsh, Chem. Rev., 1969, 69, 279–324. 52 L. Constantinou and R. Gani, AIChE J., 1994, 40, 1697–1710. 53 J. Marrero and R. Gani, Fluid Phase Equilib., 2001, 183–184, 183–2004. 54 J. Bicerano, Prediction of Polymer Properties, Marcel Dekker Inc., New York, 1993. 55 L. B. Kier and H. L. Hall, Molecular Connectivity in Structure Activity Analysis, John Wiley & Sons, New York, USA, 1986. 56 M. Randic, J. Mol. Graphics Modell., 2001, 20, 19–35. 57 R. D. J. Brown, J. Chem. Soc., 1953, 2615–2621. 58 D. W. Smith, J. Chem. Soc., Faraday Trans., 1998, 94, 3087–3091. 59 L. Constantinou, S. E. Prickett and M. L. Mavrovouniotis, Ind. Eng. Chem. Res., 1993, 32, 1734–1746. 60 L. Constantinou, S. E. Prickett and M. L. Mavrovouniotis, Ind. Eng. Chem. Res., 1994, 33, 395–402. 61 A. R. Katrizky, S. H. Slavov, D. A. Dobchev and M. Karelson, Comput. Chem. Eng., 2007, 31, 1123–1130. 62 O. Kahrs, N. Brauner, G. St. Cholakov, R. P. Stateva, W. Marquardt and M. Shacham, Comput. Chem. Eng., 2008, 32, 1397–1410. 63 R. Wittig, J. Lohmann and J. Gmehling, Ind. Eng. Chem. Res., 2003, 42, 183–188. 64 H. E. Gonza´lez, J. Abildskov, R. Gani, P. Rousseauz and B. Le Bert, AIChE J., 2007, 53(6), 1620–1632. 65 P. Ma and X. Zhao, Ind. Eng. Chem. Res., 1993, 32, 3180–3183. 66 Z. Kolska´, V. Ruzicka´ and R. Gani, Ind. Eng. Chem. Res., 2005, 44, 8436–8454. 67 J. S. Chickos, W. E. Acree, Jr. and J. F. Liebman, Estimation Phase-Change Enthalpies and Entropies, Computational Thermochemistry. Chapter 4. Prediction and Estimation of Molecular Thermodynamics, Am. Chem. Soc, Washington, DC, 1996. 68 M. Ducros, J. F. Gruson and H. Sannier, Thermochim. Acta, 1980, 36, 39–65. 69 M. Ducros, J. F. Gruson, H. Sannier and I. Velasco, Thermochim. Acta, 1981, 44, 131–140. 70 M. Ducros and H. Sannier, Thermochim. Acta, 1982, 54, 153–157. 71 M. Ducros and H. Sannier, Thermochim. Acta, 1984, 75, 329–340. 72 L. Ping, L.-H. Liang, P.-S. Ma and Z. Chen, Fluid Phase Equilib., 1997, 137, 63–74. 73 B. E. Poling, J. M. Prausnitz and J. P. O’Connell, The Properties of Gases and Liquids, McGraw-Hill, New York, 5th edn, 2004. 74 G. Soave, Chem. Eng. Sci., 1972, 27, 1197–1203. 75 J. Gross and G. Sadowski, Ind. Eng. Chem. Res., 2002, 41, 1084–1093. 76 Z. Kolska´, J. Kukal, M. Za´bransky´ and V. Ru˚icˇka, Ind. Eng. Chem. Res., 2008, 47, 2075–2085. 77 R. Ceriani, R. Gani and A. J. A. Meirelles, Fluid Phase Equilib., 2009, 283, 49–55. 78 R. Gani, P. M. Harper and M. Hostrup, Ind. Eng. Chem. Res., 2005, 44, 7262–7269. 79 J. He and C. Zhong, Fluid Phase Equilib., 2003, 205, 303–316. 80 B. Lin, S. Chavali, K. V. Camarda and D. C. Miller, Comput. Chem. Eng., 2005, 29, 337–347. 81 J. P. Guthrie, J. Phys. Chem. A, 2001, 105, 9196–9202. 82 R. Sumathi, H.-H. Carstensen and W. H. Green, Jr., J. Phys. Chem., 2001, 105, 6910–6925. 83 M. K. Sabbe, F. De Vleeschouwer, M. F. Reyniers, M. Waroquier and G. B. Marin, J. Phys. Chem. A, 2008, 112, 12235–12251.

Chem. Soc. Rev., 2010, 39, 1764–1779 | 1779