Supplementary Material: Uncertainty Quantification

1 downloads 0 Views 8MB Size Report
Panels. (a)-(d) correspond to Tc, ρc, Pc, and Zc respectively. The probability densities are defined as the ...... In Figures S.22-S.26 we see that indeed the numerical uncertainty can be quite ...... Industrial & Engineering Chemistry Research,.
Supplementary Material: Uncertainty Quantification and Propagation of Errors of the Lennard-Jones 12-6 Parameters for n-Alkanes Richard A. Messerly,∗ Thomas A. Knotts IV, and W. Vincent Wilding Department of Chemical Engineering, Brigham Young University, Provo, UT 84602 Keywords: Transferability, Force Field, Critical Constants, Molecular Simulation



[email protected]

CONTENTS

S.I. Numerical Uncertainties

S.3

S.II. UQ+PoE

S.4

S.III. UQ+PoE for CH4 A. UQ for CH4 B. MCS for CH4 C. PoE for CH4

S.4 S.5 S.6 S.8

S.IV. Type A CH3 and CH2 A. Monte Carlo Sampling Parameter Sets B. Propagation of Errors for Ethane

S.11 S.11 S.14

S.V. Propagation of Errors Histograms

S.16

S.VI. Surrogate Model A. Development B. Validation

S.35 S.35 S.37

S.VII. Data Evaluation

S.40

S.VIII. Correlation Between CH3 and CH2 Parameters S.IX. Prediction of Pv for Ethane

S.43 S.46

S.X. Correlation Between Lennard-Jones Parameters and Bond Length for Ethane References

S.48 S.51

S.2

S.I.

NUMERICAL UNCERTAINTIES

Figure S.1 presents an example of the numerical uncertainties for Tc , ρc , Pc , and Zc using the algorithms presented in our previous work [1, 2]. The TraPPE 2014 ethane validation data [3] were used to produce Figure S.1. The 95% confidence interval can be approximated by integrating the histogram such that 5% of the area is evenly distributed between the left and right tails. Also, since the histograms appear to follow a normal distribution we can estimate the standard deviations by fitting the histograms to a normal distribution model. This is useful for generating a probability density function to quantify the numerical uncertainty in the the critical constants. This is how the numerical uncertainties are obtained for the Type B results found in Section S.V.

FIG. S.1. Examples of the numerical uncertainties obtained using the algorithms presented in Refs [1, 2]. Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc respectively. The probability densities are defined as the number of counts in a single bin divided by both the total number of counts and the bin width. The data used are those from the TraPPE 2014 validation of ethane.

S.3

S.II.

UQ+POE

The general outline of the Type A UQ+MCS+PoE methodology is: 1. Create a grid of the p-dimensional parameter space 2. Calculate the P DF for each parameter set, θ 3. Generate hundreds, thousands, or millions of random numbers 4. Assign each random number to a parameter set (via the P DF values) 5. Obtain the desired property value for the parameter sets sampled in Step 4 6. Create a histogram of the property values from Step 5 7. Integrate the histogram at a given confidence level where Steps 1-2, 3-4, and 5-7 are typically categorized as the UQ, MCS, and PoE steps, respectively. Step 2 is the prohibitively expensive step since calculating the P DF for a single parameter set can require tens of molecular simulations. For a refined 2-dimensional grid, of say 500x500 parameter sets, this necessitates millions of simulations. For this reason, a surrogate model is essential for the UQ portion. A surrogate model reduces the number of molecular simulations required by predicting the P DF and/or physical property values for each parameter set [4, 5]. Fortunately, it is not necessary that the surrogate model accurately predict every physical property, only the properties included in the objective function used to calculate the P DF . An additional advantage of a surrogate model is that it reduces the numerical uncertainty by smoothing the simulation output. The general outline of the the Type B UQ+PoE methodology is: 1. Create a grid of the p-dimensional parameter space 2. Accept all θ that satisfy Equation 5 in Section IV B1 of the text 3. Find extrema of θ accepted in Step 2 4. Obtain property values and numerical uncertainties for each extrema parameter set 5. Determine the minimum and maximum property estimates (with numerical uncertainties included) where Steps 1-2 and 3-5 are categorized as the UQ and PoE steps, respectively. S.III.

UQ+POE FOR CH4

This section provides an example of the UQ+PoE methods outlined previously. The simplest example is that of the Lennard-Jones fluid, that is, a single-site molecule with a LJ 12-6 potential and no polar interactions. Since we are focusing on the n-alkanes in this study, the logical choice is the united-atom methane molecule. Again, we use the RM S as the objective function to obtain ǫCH4 and σCH4 . In the case of the single-site LJ fluid some thermophysical properties have analytic expressions (such as the second virial coefficient (B2 )) while others require molecular simulation (such as the critical point constants). Fortunately, molecular simulations may not be necessary even for properties that cannot be predicted by rigorous theoretical expressions because the LJ fluid has been extensively studied in the literature. For example, correlations (surrogate models) exist for the LJ fluid to relate ǫ and σ to many properties, such as saturated liquid density (ρl ). This is advantageous because an analytic expression to predict yˆ(T, θ) greatly simplifies the UQ+PoE process. Likewise, molecular simulation results for the LJ coexistence curve have provided approximate estimates for the reduced critical temperature (Tc⋆ ), reduced critical density (ρ⋆c ), and reduced critical pressure (Pc⋆ ). As different studies of the LJ fluid have predicted slightly different values for Tc⋆ , ρ⋆c , and Pc⋆ we use those most recently reported by Dinpajooh et al. [6] S.4

Since the main purpose of this section is to demonstrate the UQ+PoE methodology, we did not attempt to follow the TraPPE parameterization method for methane that utilizes a perturbation theory approach to relate pressure (P ) to ǫ and σ. Instead, we have chosen B2 and ρl to optimize the LJ parameters because these two properties have simple analytic expressions (i.e. theoretical relationships or correlations). The second virial coefficient of the single-site LJ fluid is [7]    2n+1  ∞ 4 −2πσ 3 X 2(2n+1)/2 2n − 1 ǫ B2 (T ; ǫ, σ) = Γ 3 4n! kB T 4 n=0

(S.1)

where kB is the Boltzmann constant and Γ is the gamma function. Equation S.1 is a completely rigorous expression to calculate B2 from ǫ and σ (note that Equation S.1 is mathematically equivalent to the equation found in Ref [8]). We developed a new correlation for predicting ρl from ǫ and σ for the LJ fluid by fitting GEMC simulation results in reduced units to the expression ρ⋆l (T ⋆ ) = b0 + b1 (b2 − T ⋆ ) + b3 (b2 − T ⋆ )b4

(S.2)

where ρ⋆l = ρl σ 3 , T ⋆ = T kǫ B , and bi are fitting coefficients. The optimal fitting coefficients were found to be b0 = 0.3144, b1 = 0.1741, b2 = 1.2996, b3 = 0.5002, and b4 = 0.3333. Note that this model is very similar, in both the functional form and the predicted values of ρ⋆l , to that presented by Lofti et al. [9] We use a similar approach when developing our surrogate model in Section S.VI. Again, since the experimental data used in parameterization have random error the regressed values for ǫCH4 and σCH4 inherit this uncertainty from the data. Furthermore, it has been shown that the single-site LJ 12-6 model is not flexible enough to predict B2 over a large temperature range [10]. Therefore, the model inadequacies will also lead to an enlargement in the parameter uncertainties. For this reason, we have only used B2 data in a limited temperature range. In Section S.III A we demonstrate how the uncertainty in ǫCH4 and σCH4 is quantified. In Section S.III B we use Monte Carlo sampling (MCS) to generate millions of statistically acceptable ǫCH4 and σCH4 parameter sets. In Section S.III C we perform a propagation of errors analysis to show how the uncertainty in ǫCH4 and σCH4 leads to uncertainties in ρl , B2 , Tc , ρc , and Pc . As a proof of concept, we only present a Type A and Type B analysis. One purpose of this section is to compare the uncertainty regions for the Type A and Type B approaches. Another purpose is to compare the effect of using two different types of data in the expression for RM S. A.

UQ for CH4

The first step in determining the uncertainty in ǫCH4 and σCH4 is to create a 2-dimensional (p = 2) grid. For every combination of ǫCH4 and σCH4 the RM S(ǫCH4 , σCH4 ) is calculated. The Type A 95% joint confidence region consists of all the sets of ǫCH4 and σCH4 that satisfy Equation 2 in Section IV A1 of the text with α = 0.95, p = 2, ν = n − p and s2 from Equation 4 in Section IV A3 of the text. The Type B acceptable parameter sets are determined using the criterion from Equation 5 in Section IV B1 of the text. It has long been known that different types of data yield different optimal values for the LJ parameters [11– 13]. Historically, this discrepancy is seen when comparing thermodynamic and transport properties, liquid and vapor phase properties, as well as different temperature ranges for a property such as B2 . However, rarely are confidence regions reported for these parameters to demonstrate that they are indeed statistically different. Recently, Cailliez et al. performed such a study for argon [8]. They observed that the confidence regions did not overlap when the parameters were obtained from B2 or low-pressure gas viscosity data. Here we present a similar analysis for methane. Figure S.2 provides a comparison between the Type A and Type B 95% confidence regions for the two types of data (B2 and ρl ), two common values reported in the literature [14], the TraPPE parameters [15], and the optimal set for predicting Tc , ρc , and Pc (obtained using the DIPPR values and simulation results reported by Ref [6]). Figure S.2 demonstrates that B2 and ρl yield parameter sets that are statistically different at the 95% confidence level for both uncertainty analysis methods. This supports the notion that the LJ 12-6 model is simply an approximation and, therefore, one should proceed with caution when attempting S.5

to predict properties far removed from those used in the parameterization. For example, the ρl uncertainty region does not intersect with either the Tc or Pc optimal lines. Another significant observation from Figure S.2 is that the confidence region for B2 is considerably larger than that for ρl . This suggests that utilizing ρl data can yield lower uncertainties in the optimized LJ parameters. However, it should be noted that the discrepancy between the ρl and B2 optimizations is attributed to the limitations in the model (i.e. UA, LJ 12-6, pair-wise additivity). In other words, if the model uncertainty were negligible the ρl and B2 uncertainty regions would overlap considerably. This demonstrates that this UQ+PoE analysis does not rigorously account for model uncertainties. Finally, it is worth noting the difference in shape between the Type A and Type B approaches. The Type A approach has an elliptical shape, similar to the results from Cailliez et al. [8], while the Type B approach has more linear borders that arise from accounting for bias with a hard constraint (Equation 5 in Section IV B1 of the text).

FIG. S.2. Comparison of the UA, LJ 12-6, parameters for methane when regressed to different types of experimental data. Literature (“Lit.”) values are from Ref [14]. Optimal parameters for predicting Tc , ρc , and Pc were obtained using the DIPPR values and simulation results reported by Ref [6]. The confidence regions for the Type A and Type B analysis are obtained at the 95% confidence level.

B.

MCS for CH4

We will demonstrate the Monte Carlo sampling (MCS) approach for the Type A analysis. The first step for MCS is to generate millions of random numbers. Each random number corresponds to a different set of ǫCH4 and σCH4 . The assignment of a random number to a parameter set is determined by the P DF values for each parameter set. In Section S.III A we obtained the P DF for every set of ǫCH4 and σCH4 in our 2-dimensional grid. We are able to assign a large number of random numbers because we have an analytic expression to calculate ρl and B2 and, thereby, P DF . We generate millions of random numbers since increasing the amount of random numbers will result in more accurate estimates of uncertainty. From S.6

this set of random numbers we obtain a properly weighted sample of millions of different ǫCH4 and σCH4 sets. Figure S.3 contains histograms of the ǫCH4 and σCH4 values obtained from the MCS Type A analysis. Panels (a,c) and (b,d) correspond to ǫCH4 and σCH4 , respectively. Panels (a)-(b) were obtained using ρl data, i.e. the MCS parameter sets are sampled from a P DF that depends on ρl , while Panels (c)-(d) utilized B2 data. Probability density is defined as the number of counts in a single bin divided by both the total number of counts and the bin width. Note that the MCS parameter sets appear to follow a normal distribution.

FIG. S.3. Histograms of ǫCH4 and σCH4 from MCS Type A analysis. Panels (a)-(b) used ρl data while Panels (c)-(d) utilized B2 data. Panels (a) and (c) correspond to ǫCH4 while Panels (b) and (d) correspond to σCH4 . The normal distribution fits are also included. Probability density is defined as the number of counts in a single bin divided by both the total number of counts and the bin width.

S.7

C.

PoE for CH4

This section demonstrates how the uncertainty in ǫ and σ propagates when predicting physical properties. Figure S.4 compares the Type A and Type B uncertainties with the experimental data included in the analysis and the DIPPR uncertainties. Panels (a) and (b) correspond to ρl and B2 , respectively. Since it is known that the optimal B2 parameters are not capable of predicting ρl accurately and vice versa, we did not consider predicting ρl or B2 with the parameter sets obtained from the other data type.

FIG. S.4. Comparison of ρl and B2 experimental data, the DIPPR 801 correlation uncertainty, the Type A uncertainty, and the Type B uncertainty. Panels (a) and (b) correspond to ρl and B2 , respectively. The Type A uncertainties represent the 95% confidence interval.

There are two key conclusions from Figure S.4. First, the Type A approach results in considerably smaller uncertainties than the Type B approach. When compared to the DIPPR uncertainties the size of the Type A uncertainties does not seem justified. Remember that this is primarily because the Type A analysis does not account for systematic errors (bias) in the data. Second, the Type B uncertainties agree very well with the DIPPR uncertainties. This is significant since only the extrema of the acceptable parameter sets (those that satisfy Equation 5 in Section IV B1 of the text) were used in obtaining the Type B uncertainties. Therefore, the results presented in Figure S.4 validate the assumption that the Type B uncertainties (for the properties included in the parameterization) can be adequately represented by sampling only at the extrema. We will now focus our attention on Tc , ρc , and Pc since quantifying the uncertainty in the critical constants is our primary goal for the larger n-alkanes. For each set of ǫCH4 and σCH4 we evaluate the critical constant expressions. We are able to evaluate the critical constants at millions of parameter sets because no further simulations are required as the critical constants are reported in Ref [6] in reduced units for the UA, LJ, ǫT ⋆ ρ⋆ ǫP ⋆ single-site molecule. Specifically, Tc = kBc , ρc = σc3 , and Pc = σ3c where Tc⋆ = 1.3128, ρ⋆c = 0.316, Pc⋆ = 0.1274. In general, there is not a simple mathematical expression for propagating the uncertainty from the LJ parameters to the desired property. Typically, determining the desired property for a given set of LJ parameters requires between 1-20 molecular simulations. Such is the case for larger n-alkanes, as we demonstrated in the main text. Again, when molecular simulations are required the number of ǫ and σ parameter sets that can be evaluated is greatly reduced. For the Type A analysis, we create a histogram of the different values of Tc , ρc , and Pc . These histograms can be integrated such that 5% of the area is found equally distributed between the left and right tails. This yields an estimate of the 95% confidence interval for Tc , ρc , and Pc due to the parameter uncertainty. The S.8

Type B analysis is much simpler because only the minimum and maximum ǫ, σ, and σǫ3 acceptable values are required to estimate the 95% confidence regions for Tc , ρc , and Pc , respectively. In Table S.I we compare the size of the 95% confidence intervals due to numerical, parameter, and overall uncertainties. We compare the parameter and overall uncertainties obtained from B2 and ρl data for both the Type A and Type B analysis. TABLE S.I. Comparison of numerical, parameter, and overall uncertainties for the critical constants for UA, LJ 12-6, methane molecule. Uncertainties are presented as a relative combined expanded uncertainty (at the 95% confidence level) multiplied by 100%. Uncertainty Data Numerical VLCC Parameter B2 Parameter B2 Parameter ρl Parameter ρl Overall B2 Overall B2 Overall ρl Overall ρl

Analysis N/A Type A Type B Type A Type B Type A Type B Type A Type B

Tc 0.122 0.264 2.40 0.036 0.231 0.291 2.52 0.127 0.353

ρc 1.269 0.860 9.71 0.091 0.947 1.530 10.98 1.269 2.213

Pc 1.020 0.229 12.10 0.010 0.799 1.046 13.10 1.020 1.819

CI The standard deviation from numerical uncertainty (sN U ) was calculated as sN U = 1.96 , where CI is the 95% confidence interval reported by Dinpajooh et al. [6] In other words, we assumed that the numerical uncertainty follows a normal distribution (as we observed in Figure S.1). We also discovered that the histograms obtained from MCS PoE for the critical constants follow a normal distribution, similar to the results presented in Figure S.3. We obtained an estimate for the standard deviation caused by the parameter uncertainty (sP U ) for each critical constant by fitting a normal distribution model to the histograms produced from the MCS PoE results. One advantage of obtaining an estimate of sN U and sP U is that an estimate of the overall uncertainty (sOU ) can be obtained from the traditional propagation of error expression

s2OU = s2N U + s2P U

(S.3)

Equation S.3 is a rigorous approach for estimating the overall uncertainty in the critical constants because the numerical simulations and the parameterization were performed independently. In Figure S.5 we provide a graphical comparison between the numerical, parameter, and overall uncertainties for Tc , ρc , and Pc for the UA, LJ 12-6, CH4 molecule. We have included the results for both the B2 and ρl parameterization methods explained previously. Panels (a,d), (b,e), and (c,f) contain the uncertainties for Tc , ρc , and Pc , respectively. For clarity, we have only included the Type A analysis results in Figure S.5 Panels (a)-(c). The Type A and Type B overall uncertainties are compared in Panels (d)-(f). There are several important conclusions from the results presented in Table S.I and Figure S.5. In Table S.I and Figure S.5 Panels (a)-(c) we see that the parameter uncertainty is much greater when B2 data are used in the objective function rather than ρl data (in agreement with Figure S.2). For the ρl optimization, the numerical uncertainty is the dominant term for Tc , ρc , and Pc as the overall uncertainty is almost identical to the numerical uncertainty. By contrast, for the B2 parameterization, both the parameter and numerical uncertainty contribute significantly to the overall uncertainties in Tc , ρc , and Pc . Also, notice that the ρl parameterization approach yields values for Tc and ρc that agree more strongly with the experimental value. By contrast, the B2 optimization results for Pc are in better agreement with the experimental value. This is primarily due to the fact that Tc and ρc are predicted from the law of rectilinear diameters and the density scaling law, both of which depend strongly on ρl . On the other hand, Pc is obtained from vapor pressure data (Ref [6] utilized the traditional approach with the Antoine equation) which is more closely related to a vapor phase property, such as B2 . However, Panels (d)-(f) demonstrate that only the B2 Type B uncertainties are large enough to overlap with the experimental uncertainties for all three critical constants. By contrast, the Type A and Type B uncertainties for ρl are approximately the same size and do not overlap with the Tc and S.9

FIG. S.5. Comparison of the uncertainties for the critical constants for the UA, LJ 12-6, CH4 molecule. Panels (a,d), (b,e), and (c,f) contain the uncertainties for Tc , ρc , and Pc , respectively. The numerical uncertainties were obtained from Dinpajooh et al. while the parameter uncertainties were obtained using the Type A analysis discussed in Section S.III C for the two different types of experimental data. Experimental uncertainties are those found in the DIPPR 801 database (assumed to be reported at the 95% confidence level for a normal distribution).

Pc experimental uncertainties. As we saw in Section VI of the text, the ρl Type B uncertainties for longer n-alkanes are large enough to resolve the discrepancy with the experimental values for Tc and ρc (and Pc if the Vetere approach is utilized).

S.10

S.IV. A.

TYPE A CH3 AND CH2

Monte Carlo Sampling Parameter Sets

Figure S.6 shows the UQ and MCS results for the ǫCH3 -σCH3 and ǫCH2 -σCH2 parameter sets. Panels (a)-(b) and (c)-(d) contain the CH2 and CH3 parameters, respectively. Panels (a) and (c) were obtained with a Type A analysis while Panels (b) and (d) represent a Type AB analysis. Note the difference in the range as the uncertainty region is much larger for the Type AB analysis. Included in Figure S.6 are the 100 MCS parameter sets for each compound included in the PoE procedure.

FIG. S.6. Contours of the ǫCH2 -σCH2 (Panels (a) and (b)) and ǫCH3 -σCH3 (Panels (c) and (d)) parameters. Panels (a) and (c) represent a Type A uncertainty analysis while Panels (b) and (d) represent a Type AB uncertainty analysis. The different points represent parameter sets that were simulated for the corresponding compound. The CH3 parameters for n-octane (C8 ) were the optimal parameter set (ǫCH3 = 98.4966 and σCH3 = 3.7491).

S.11

The main conclusion from Figure S.6 is that the CH3 and CH2 MCS parameter sets are an accurate representation of the P DF values for both the Type A and Type AB analysis. Specifically, the number of parameter sets agrees reasonably well with the confidence level for a given contour. Furthermore, the spatial distribution does not suggest an improper bias towards a specific region of parameter space. To facilitate future use, Table S.II provides 100 MCS parameter sets for ǫCH3 , σCH3 , ǫCH2 , and σCH2 . TABLE S.II. MCS parameter sets for ǫCH3 , σCH3 , ǫCH2 , and σCH2 using Type A and AB analysis methods.

Type A Type AB A) ǫCH2 (K) σCH2 (˚ A) ǫCH3 (K) σCH3 (˚ A) ǫCH2 (K) σCH2 (˚ A) ǫCH3 (K) σCH3 (˚ 98.54 3.749 45.35 3.973 98.50 3.748 45.69 3.966 98.52 3.749 45.36 3.971 98.21 3.747 45.44 3.976 98.55 3.749 45.36 3.973 98.07 3.740 45.46 3.956 98.53 3.750 45.32 3.974 98.28 3.750 45.33 3.973 98.51 3.749 45.35 3.974 98.55 3.748 45.49 3.961 98.43 3.748 45.38 3.972 99.31 3.758 45.53 3.977 98.47 3.749 45.37 3.972 98.03 3.746 45.40 3.968 98.45 3.748 45.36 3.974 98.85 3.754 45.46 3.969 98.47 3.749 45.40 3.972 98.07 3.747 45.39 3.965 98.53 3.750 45.37 3.975 98.45 3.749 45.40 3.976 98.51 3.750 45.36 3.978 97.84 3.740 45.37 3.980 98.52 3.750 45.36 3.975 98.25 3.748 45.41 3.963 98.54 3.750 45.38 3.972 98.08 3.747 45.34 3.975 98.51 3.749 45.37 3.973 98.58 3.748 45.34 3.990 98.47 3.748 45.36 3.975 98.53 3.746 45.36 3.978 98.49 3.749 45.38 3.977 98.87 3.753 45.48 3.974 98.46 3.749 45.34 3.974 98.33 3.746 45.17 3.976 98.42 3.748 45.40 3.971 98.02 3.742 45.54 3.961 98.46 3.749 45.37 3.971 98.44 3.749 45.44 3.970 98.45 3.749 45.39 3.970 98.27 3.743 45.44 3.973 98.50 3.750 45.42 3.969 98.97 3.757 45.38 3.976 98.49 3.749 45.38 3.974 98.47 3.747 45.37 3.972 98.53 3.750 45.34 3.972 98.60 3.754 45.56 3.958 98.39 3.748 45.38 3.977 98.19 3.744 45.40 3.968 98.46 3.748 45.37 3.975 98.69 3.753 45.47 3.973 98.43 3.748 45.36 3.973 98.25 3.745 45.43 3.972 98.51 3.750 45.35 3.973 98.54 3.752 45.35 3.966 98.49 3.749 45.35 3.975 98.54 3.744 45.03 3.988 98.45 3.748 45.42 3.972 98.59 3.746 45.42 3.976 98.53 3.750 45.38 3.972 99.04 3.760 45.48 3.965 98.56 3.750 45.37 3.972 98.50 3.747 45.39 3.970 98.47 3.749 45.39 3.973 98.55 3.744 45.48 3.964 98.51 3.749 45.40 3.973 98.77 3.758 45.25 3.986 98.51 3.749 45.37 3.969 98.07 3.744 45.58 3.971 98.50 3.749 45.39 3.968 98.76 3.751 45.30 3.982 98.54 3.749 45.38 3.970 98.67 3.750 45.37 3.972 98.53 3.749 45.41 3.972 98.31 3.746 45.40 3.973 98.49 3.749 45.40 3.973 98.00 3.742 45.56 3.973 98.51 3.749 45.38 3.975 98.36 3.745 45.55 3.965 98.59 3.751 45.34 3.970 98.46 3.749 45.40 3.976 98.50 3.749 45.36 3.975 98.36 3.749 45.50 3.946 98.48 3.748 45.35 3.974 97.62 3.737 45.55 3.956 98.38 3.748 45.35 3.972 99.08 3.751 45.15 3.987 98.48 3.749 45.35 3.975 98.72 3.747 45.33 3.979 98.38 3.747 45.39 3.968 98.08 3.743 45.15 3.981 Continued on next page

S.12

TABLE S.II. – continued from previous page Type A Type AB A) ǫCH2 (K) σCH2 (˚ A) ǫCH3 (K) σCH3 (˚ A) ǫCH2 (K) σCH2 (˚ A) ǫCH3 (K) σCH3 (˚ 98.50 3.749 45.36 3.974 97.88 3.739 45.48 3.956 98.49 3.749 45.35 3.978 98.81 3.751 45.38 3.961 98.55 3.750 45.39 3.973 98.91 3.749 45.33 3.989 98.52 3.750 45.38 3.971 97.86 3.740 45.31 3.973 98.49 3.749 45.44 3.970 98.49 3.749 45.43 3.965 98.50 3.750 45.33 3.975 98.00 3.744 45.38 3.971 98.58 3.751 45.36 3.977 98.54 3.749 45.43 3.975 98.54 3.749 45.36 3.974 98.62 3.752 45.19 3.978 98.55 3.750 45.40 3.971 98.46 3.747 45.36 3.975 98.46 3.749 45.40 3.972 98.52 3.748 45.50 3.948 98.51 3.749 45.36 3.970 98.67 3.753 45.33 3.964 98.42 3.748 45.40 3.972 98.03 3.741 45.26 3.984 98.41 3.748 45.37 3.972 99.04 3.752 45.46 3.979 98.41 3.748 45.39 3.973 97.68 3.739 45.52 3.955 98.54 3.750 45.34 3.977 98.04 3.740 45.35 3.986 98.53 3.750 45.36 3.972 99.04 3.758 45.40 3.971 98.57 3.751 45.37 3.974 98.49 3.752 45.39 3.978 98.53 3.750 45.39 3.975 98.53 3.752 45.29 3.980 98.50 3.749 45.37 3.971 98.44 3.745 45.20 3.970 98.43 3.748 45.39 3.969 98.60 3.752 45.14 3.979 98.46 3.749 45.43 3.971 98.43 3.751 45.25 3.983 98.47 3.749 45.35 3.973 98.33 3.752 45.54 3.960 98.51 3.749 45.35 3.975 98.81 3.754 45.31 3.973 98.55 3.751 45.36 3.974 98.22 3.745 45.35 3.982 98.40 3.748 45.34 3.975 98.57 3.748 45.35 3.967 98.45 3.749 45.40 3.969 98.21 3.746 45.49 3.959 98.55 3.750 45.39 3.975 98.84 3.751 45.31 3.990 98.50 3.749 45.48 3.977 98.41 3.750 45.51 3.961 98.52 3.750 45.38 3.970 98.24 3.743 45.42 3.962 98.56 3.750 45.35 3.976 98.74 3.752 45.24 3.983 98.42 3.748 45.42 3.969 98.04 3.739 45.33 3.971 98.53 3.749 45.36 3.972 97.87 3.740 45.45 3.969 98.45 3.748 45.37 3.968 98.36 3.751 45.54 3.953 98.50 3.749 45.44 3.971 98.47 3.750 45.25 3.974 98.45 3.748 45.37 3.974 98.61 3.752 45.42 3.955 98.54 3.749 45.38 3.972 98.46 3.750 45.40 3.958 98.54 3.750 45.38 3.975 98.11 3.745 45.25 3.975 98.45 3.749 45.33 3.975 97.83 3.735 45.33 3.975 98.54 3.750 45.37 3.971 98.54 3.752 45.68 3.957 98.43 3.749 45.43 3.972 98.76 3.753 45.31 3.980 98.46 3.749 45.38 3.972 98.41 3.750 45.42 3.966 98.49 3.749 45.36 3.972 98.60 3.750 45.40 3.971 98.47 3.749 45.38 3.972 98.70 3.754 45.43 3.966 98.54 3.750 45.35 3.974 98.77 3.755 45.51 3.973 98.45 3.749 45.42 3.968 98.56 3.742 45.47 3.967 98.52 3.749 45.36 3.975 98.78 3.753 45.35 3.968 98.43 3.748 45.36 3.975 98.85 3.756 45.40 3.964 98.53 3.749 45.41 3.969 98.35 3.746 45.44 3.982 98.46 3.748 45.35 3.975 98.49 3.746 45.47 3.975 98.49 3.749 45.37 3.974 99.05 3.759 45.41 3.979 98.54 3.750 45.39 3.975 98.31 3.737 45.21 3.979 98.52 3.749 45.38 3.972 98.64 3.753 45.29 3.959 98.45 3.748 45.39 3.972 98.85 3.758 45.33 3.972 98.53 3.749 45.39 3.972 98.13 3.740 45.34 3.971 98.50 3.749 45.38 3.969 98.90 3.752 45.25 3.966

S.13

B.

Propagation of Errors for Ethane

Section S.III demonstrated the PoE approach for methane. The key difference between the PoE steps for ethane and methane is that analytical expressions exist to relate Tc , ρc , and Pc from ǫ and σ for methane. By contrast, molecular simulations were performed to predict the critical constants for ethane. Specifically, we performed 10 replicate simulations at 12 temperatures between 185-275 K using the 2000 randomly sampled parameter sets for ǫ and σ. Figure S.7 presents the Type A uncertainties in Tc for ethane. Panel (a) contains a histogram of the Tc estimates obtained from the MCS parameter sets. It is important to mention that these estimates have numerical uncertainty since they were obtained by regressing simulation results to the law of rectilinear diameters and the density scaling law. Panel (b) compares the numerical, parameter, and overall uncertainties. The numerical uncertainty represents the average numerical uncertainty from regressing the simulation results. The Type A parameter uncertainty is obtained by predicting Tc with the surrogate model for millions of MCS parameter sets. We present two different approximations for the overall uncertainty. The MCS overall uncertainty is obtained by fitting a normal distribution to the histogram presented in Panel (a) for 2000 MCS parameter sets. The other overall uncertainty is obtained using Equation S.3 and the numerical and parameter uncertainties.

FIG. S.7. Comparison of the uncertainties in predicting Tc for ethane. Panel (a) compares the histograms obtained using 100 and 2000 MCS parameter sets. The normal distribution fits are also included. Probability density is defined as the number of counts in a single bin divided by both the total number of counts and the bin width. Panel (b) depicts the Tc probability density functions for the numerical, parameter, and overall uncertainties. The numerical uncertainty represents the average numerical uncertainty obtained from the simulations. The surrogate model is used to estimate the parameter uncertainties. The overall uncertainties are obtained in two different ways, as explained in the text.

In Figure S.7 we see that the surrogate model has a slight bias in the estimate for Tc . However, this is of lesser importance for our purposes since we are primarily concerned with the uncertainty size. The surrogate model results provide an estimate for the parameter uncertainty without any numerical uncertainty. Notice that the overall uncertainty obtained from the MCS PoE approach is very similar to that obtained from combining the numerical and parameter uncertainties with Equation S.3. This demonstrates that the MCS PoE approach properly accounts for both numerical and parameter uncertainties. Another important observation from Figure S.7 Panel (a) is that our results with 2000 samples were not S.14

significantly different from our results with only 100 samples. In other words, the histogram and the normal distribution fit were nearly identical. In Figure S.8 Panels (a)-(c) we see that this is also true for ρc , Pc , and Zc , respectively. For this reason, we only sampled 100 parameter sets for the other systems studied.

FIG. S.8. Comparison of the histograms obtained using 100 and 2000 MCS parameter sets for ethane. Panels (a)-(c) correspond to ρc , Pc , and Zc , respectively. The normal distribution fits are also included. The probabilities are defined as the number of counts in a single bin divided by the total number of counts.

As ρc , Pc , and Zc are not used in the force field parameterization, our surrogate model was not devised to predict these properties accurately. Therefore, we do not attempt to elucidate the contributions from numerical and parameter uncertainties. However, since the numerical uncertainties in ρc , Pc , and Zc are much larger than Tc , it seems reasonable that the parameter uncertainty is more significant for Tc than for ρc , Pc and Zc .

S.15

S.V.

PROPAGATION OF ERRORS HISTOGRAMS

In this section we present the histograms that were used for determining the uncertainties in ρl , Tc , ρc , Pc , and Zc in Section VI of the text. Recall that these histograms were obtained by simulating 100 MCS parameter sets. The bin count is divided by the total number of counts (100, i.e. the number of MCS parameter sets) to provide a bin probability. The normal distribution fits are only included for visual purposes. To clarify, the vertical axis is not applicable to the normal distribution fits since these should be probability densities (which have a different scale and inverse units). Figures S.9-S.14 contain the Type A and AB ρl histograms for ethane, n-octane, C16 , C24 , C36 , and C48 , respectively. Figures S.15-S.20 provide the Type A and AB histograms for Tc , ρc , Pc , and Zc of ethane, n-octane, C16 , C24 , C36 , and C48 , respectively. Figures S.21-S.26 depict the Type B results as probability densities representing the numerical uncertainties in Tc , ρc , Pc , and Zc of ethane, n-octane, C16 , C24 , C36 , and C48 , respectively for each of the extrema parameter sets listed in Table III in Section VI B of the text. In Figure S.27 we validate that the uncertainty in the CH3 parameters has a negligible impact on the results for C16 . Finally, in Figure S.28 we present the extrema parameter sets as a reference. We have used the same color scheme in this figure as that found in Figures S.21-S.27 to facilitate comparing the results for the different extrema parameter sets. In Figure S.9 we see that the Type AB uncertainties in ρl are always larger than those from the Type A analysis. However, the difference appears to be largest at lower temperatures. This is because numerical uncertainty is more significant at higher temperatures. Therefore, since these histograms include both parameter and numerical uncertainties, the Type A and AB results become similar at high temperatures. By contrast, at low temperatures the difference between the Type A and AB histograms is almost entirely due to the parameter uncertainties since numerical uncertainty is negligible. This phenomenon is more readily observed in Figure S.10. Notice that in Panel (d) the Type A and AB histograms are nearly identical at 515 K while in Panel (a) they are much different at 390 K. This is because we have used fewer particles for n-octane than ethane so the numerical uncertainty is even more significant. We do not observe this behavior in Figures S.11-S.14 because the highest temperature simulated was at a reduced temperature of 0.85, which is not close enough to the critical point to result in large numerical uncertainties. In Figures S.15-S.20, notice that for Tc and ρc the Type A and Type AB regions have a much different range. For this reason, we have used different bin sizes to cover the whole region with a feasible number of bins (between 20-25). Therefore, when comparing probabilities you must consider that a single bin with the Type AB analysis encompasses multiple bins for the Type A analysis. The situation is different for Pc and Zc , where the ranges are similar. For these properties we have used the same bin sizes. It is interesting that these properties are nearly irrespective of the uncertainty approach. In some cases, due to the relatively small sample size, the Pc or Zc Type AB uncertainty is actually smaller than the Type A uncertainty. Most likely this is because the numerical uncertainty is the primary contributor and, therefore, the overall uncertainties will be practically identical. In Figures S.22-S.26 we see that indeed the numerical uncertainty can be quite large for ρc , Pc , and Zc of larger n-alkanes. Figures S.21-S.26 are provided to demonstrate how the Type B uncertainties were determined for ethane, n-octane, C16 , C24 , C36 , and C48 , respectively. The distributions in these figures represent the numerical uncertainty for each of the eight extrema parameter sets simulated (see Section S.I). Panels (a)-(d) contain the uncertainties in Tc , ρc , Pc , and Zc , respectively. As seen in Panel (c) the numerical uncertainty in Pc is large enough that there is considerable overlap between the different extrema. In addition, in Panel (d) the numerical uncertainty in Zc is more than the Type B parameter uncertainty. Some useful insight is obtained by comparing Figures S.21-S.26 with Figure S.28. For example, notice that P1 , P2 , and P5 (the left side of the acceptance region in Figure S.28 Panel (a)) correspond to the lower Tc values in Figure S.21 Panel (a). By contrast, P3 , P4 , and P6 (the right side of the acceptance region in Figure S.28 Panel (a)) correspond to the higher Tc values in Figure S.21 Panel (a). Similar observations are found for the other critical constants for both the CH3 and CH2 parameter extrema. S.16

FIG. S.9. Comparison of the ρl histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for ethane. Panels (a)-(f) correspond to different temperatures. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.17

FIG. S.10. Comparison of the ρl histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for n-octane. Panels (a)-(e) correspond to 390, 440, 490, 515, and 540 K, respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.18

FIG. S.11. Comparison of the ρl histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for C16 . Panels (a)-(b) correspond to 510 and 625 K, respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

FIG. S.12. Comparison of the ρl histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for C24 . Panels (a)-(b) correspond to 600 and 700 K, respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.19

FIG. S.13. Comparison of the ρl histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for C36 . Panels (a)-(b) correspond to 650 and 775 K, respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

FIG. S.14. Comparison of the ρl histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for C48 . Panels (a)-(b) correspond to 730 and 830 K, respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.20

FIG. S.15. Comparison of the histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for ethane. Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.21

FIG. S.16. Comparison of the histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for n-octane. Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.22

FIG. S.17. Comparison of the histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for C16 . Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.23

FIG. S.18. Comparison of the histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for C24 . Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.24

FIG. S.19. Comparison of the histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for C36 . Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.25

FIG. S.20. Comparison of the histograms obtained using 100 MCS parameter sets from a Type A and AB analysis for C48 . Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distribution fits are only included for visual purposes. The bin count is divided by the total number of counts (100) to provide a bin probability.

S.26

FIG. S.21. Comparison of the eight extrema parameter sets from a Type B analysis of ethane. Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distributions represent the numerical uncertainties for a given parameter set.

S.27

FIG. S.22. Comparison of the eight extrema parameter sets from a Type B analysis of n-octane. Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distributions represent the numerical uncertainties for a given parameter set.

S.28

FIG. S.23. Comparison of the eight extrema parameter sets from a Type B analysis of C16 . Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distributions represent the numerical uncertainties for a given parameter set.

S.29

FIG. S.24. Comparison of the eight extrema parameter sets from a Type B analysis of C24 . Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distributions represent the numerical uncertainties for a given parameter set.

S.30

FIG. S.25. Comparison of the eight extrema parameter sets from a Type B analysis of C36 . Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distributions represent the numerical uncertainties for a given parameter set.

S.31

FIG. S.26. Comparison of the eight extrema parameter sets from a Type B analysis of C48 . Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distributions represent the numerical uncertainties for a given parameter set.

S.32

FIG. S.27. Comparison of the eight CH3 extrema parameter sets from a Type B analysis of C16 . The CH2 parameters are the optimal set reported in Table II in Section VI of the text. Panels (a)-(d) correspond to Tc , ρc , Pc , and Zc , respectively. The normal distributions represent the numerical uncertainties for a given parameter set.

S.33

FIG. S.28. Extrema parameter sets used in Type B analysis when both ρl and Tc uncertainties are considered. Panels (a) and (b) correspond to the CH3 and CH2 LJ parameters, respectively.

S.34

S.VI.

SURROGATE MODEL

Although some analytic models have been developed for the two-site LJ 12-6 model [13, 16], no exact method exists to predict ρˆl (Ti ; ǫ, σ) for ethane and n-octane. Instead, we utilized GEMC simulations to estimate ρˆl (Ti ; ǫ, σ). This presents a challenge that is unique to molecular simulation parameterization. Specifically, in most optimization problems the evaluation of yˆ (and RM S, P DF , etc.) is both cheap and exact. By contrast, molecular simulations are expensive and subject to random error. In order to overcome both of these obstacles, we implement a surrogate model in this study to predict ρˆl (Ti ; ǫ, σ) and, thereby, evaluate RM S(θ) and P DF (θ). There are two key advantages of using a surrogate model. First, a surrogate model effectively eliminates the numerical uncertainty, although it may introduce some bias. Second, a surrogate model can be used in a propagation of errors analysis to sample the parameter space. In this section, we discuss the development and validation of the surrogate model used in this study.

A.

Development

There are at least three different types of surrogate models used in the simulation literature. The first type is a model that can predict a physical property for a given set of force field parameters [16]. This type of surrogate model is useful because it can reduce the need for performing molecular simulations. For example, the PC-SAFT model utilizes molecular simulations in conjunction with an equation of state to optimize the intermolecular potential parameters [17]. The second type of surrogate model simply provides a means of interpolating the RM S for the parameter space (in this case, ǫ and σ) [5]. With this type of model it is no longer necessary to evaluate RM S at every possible ǫ and σ. This is beneficial because evaluating RM S requires performing several simulations for a given set of ǫ and σ. In this work we utilize a third type of surrogate model referred to by Hulsmann et al. as “Lipra” (linear property approximation) [5]. Rather than interpolate the RM S itself, the Lipra approach interpolates/smooths the ρl values at a given temperature with respect to ǫ and σ. By smoothing the ρl results as a function of ǫ and σ we eliminate any outliers and obtain a more continuous contour for ρl . This produces a more internally consistent set of ρl values and thereby reduces the effect of numerical uncertainty in the GEMC results. The second aspect of our surrogate model is designed to further reduce the numerical uncertainty and to allow for estimating ρl at any temperature. This is done by regressing the smoothed ρl values for every set of ǫ and σ to the following equation ρl = ρ0 + A(T0 − T ) + B(T0 − T )β

(S.4)

where ρ0 , T0 , and β are simply considered fitting parameters. In this case, Equation S.4 is merely a way of interpolating the simulation results for ρl and, thus, ρ0 and T0 are not given any physical interpretation. By using Equation S.4 we are able to evaluate ρl at any temperature (and, thereby, RM S and P DF for any set of ǫ and σ). In summary, the key steps in the development of our surrogate model are: 1. Create a grid of ǫ and σ values. 2. Obtain ρl at the same temperatures reported by Martin et al. using GEMC simulations for every set of ǫ and σ values chosen in Step 1. 3. Perform a double interpolation to obtain ρl for any possible set of ǫ and σ but strictly at the temperatures simulated in Step 2. 4. Regress the smoothed ρl values from Step 3 to Equation S.4. In Figure S.29 we see the improvement in the RM S contours obtained by using the surrogate model for ethane and n-octane. Panels (a) and (c) are used with the Type AB and B analysis methods while Panels (b) and (d) are utilized with the Type A approach. Notice the difference in the axis ranges in Panels (a)-(b) S.35

and (c)-(d) due to the order of magnitude difference in the parameter space required for a Type A analysis compared to the Type AB and B analysis. In each case the ǫ by σ grid used in simulation was 10 by 14, where the values were evenly spaced between their respective minimum and maximum values as depicted. It is significant that Panels (a) and (c) were obtained by performing a single simulation at each temperature for a given parameter set. By contrast, we performed 12 and 9 replicate simulations at each temperature to generate the results in Panels (b) and (d), respectively. A large number of replicates were used in an attempt to mitigate the numerical uncertainty that becomes significant when generating contours in such a narrow region of parameter space close to the minimum.

FIG. S.29. Contours of RM S with respect to ǫ and σ. Panels (a)-(b) correspond to the CH3 contours for ethane while Panels (c)-(d) correspond to the CH2 contours for n-octane. The green and black lines represent the contours for the simulation results and the surrogate model, respectively. Panels (a) and (c) span the much wider parameter space required for a Type AB and B analysis. Panels (b) and (d) span the parameter space required for a Type A analysis.

S.36

B.

Validation

The aforementioned methodology is admittedly risky due to multiple layers of smoothing and correlating simulation results. For this reason, we believe a validation of our surrogate model is essential. We have implemented several different methods to accomplish this task. First, we compare the contours obtained from the raw simulation data and the surrogate model obtained from the algorithm described in Section S.VI A. In Figures S.30-S.33 we include each of the temperatures simulated for ethane and n-octane. Figures S.30-S.31 represent ethane ρl while Figures S.32-S.33 are for n-octane ρl . Figures S.31 and S.33 use a very refined grid near the optimum for the Type A analysis while Figures S.30 and S.32 scan the much wider range of ǫ and σ required for the Type AB and B analysis methods. Included in these figures are each of the temperatures simulated for developing the surrogate model (i.e. the same temperatures as those used by Martin et al. [15]). The main purpose of these figures is to see that all of the anomalies appear to have been eliminated. We reiterate that Figures S.30 and S.31 were obtained by performing a single simulation at each temperature while Figures S.31 and S.33 used 12 and 9 replicate simulations, respectively. The similarity between the surrogate models obtained with the coarse and refined grids suggests that the significant computational cost of performing numerous replicate simulations was not merited for developing our surrogate model.

FIG. S.30. Contours of ρl for ethane with respect to ǫCH3 and σCH3 (coarse grid). The green and black lines represent the contours for the simulation results and the surrogate model, respectively. Panels (a)-(f) correspond to 178, 197, 217, 236, 256, and 275 K, respectively.

We believe that Figures S.30-S.33 are sufficient justification that our surrogate model is a reliable representation of the simulation results. However, we will briefly discuss some of the other methods we employed for validating our model. The second approach, as discussed in Section VI of the text, is to verify that the RM S from our surrogate model for the optimal TraPPE parameters is very close to the RM S reported by Martin et al. [15] Third, we verified that the 95% joint confidence region predicted by the surrogate model accurately represents that obtained by the simulation data. In other words, we used Equation 2 in Section IV A1 of the text to analyze the parameter sets that were simulated and determine if they would be accepted at the 95% confidence level. Then, we compared these acceptable parameter sets with those S.37

FIG. S.31. Contours of ρl for ethane with respect to ǫCH3 and σCH3 (refined grid). The green and black lines represent the contours for the simulation results and the surrogate model, respectively. Panels (a)-(f) correspond to 178, 197, 217, 236, 256, and 275 K, respectively. Notice the difference in axis ranges compared with Figure S.30.

that the surrogate model predicted would be acceptable. We observed that the same parameter sets were acceptable for both methods. Finally, we compared the propagation of errors results for both the surrogate model and the simulation results. Specifically, we evaluated the uncertainties in ρl and Tc . We observed that the uncertainty in ρl obtained from the simulations agreed with the predicted uncertainty. For example, notice in Figures 4-5 in Section VI C1 of the text that the Type B results follow the DIPPR uncertainties. Therefore, the surrogate model accurately determines the extrema parameter sets. Furthermore, recall that in Figure S.7 the overall Tc uncertainty obtained from propagating the numerical and parameter (obtained from the surrogate model) uncertainties was very similar to that from the actual simulations (using MCS). Apart from validating our surrogate model, a great deal of molecular insight can be teased out from Figures S.30-S.33. Specifically, notice the contour directionality, i.e. the dependence of σ upon ǫ for a constant ρl . The trend for ethane is uniform for all temperatures. That is, an increase in ǫ necessitates an increase in σ to maintain constant ρl . This makes sense at a molecular level because increasing the size of the UA sites (σ) is required to maintain a constant density if the attraction is increased (caused by increasing ǫ). However, notice that for n-octane (used to obtain the CH2 parameters), the correlation between ǫ and σ is temperature dependent. To be specific, at low temperatures we observe the same trend as for ethane while at high temperatures the trend is exactly opposite. At high temperatures an increase in attraction (ǫ) requires a decrease in size (σ) to maintain a constant density. In addition, due to this change in correlation, at an intermediate temperature (440 K) their is no correlation between ǫ and σ. Surprisingly, at this temperature the liquid density is insensitive to σ! In other words, ρl is solely dependent upon ǫ. The lack of dependence upon σ at intermediate temperatures and the inverse relationship between ǫ and σ at higher temperatures are completely counter-intuitive results. We believe that in order to understand these conclusions it is important to remember that these are saturated ρl contours. In other words, the dependence of vapor chemical potential upon ǫ and σ needs to be considered. The only explanation that we can provide is that the importance of the vapor chemical potential dependence upon ǫ and σ increases with temperature for larger molecules. S.38

FIG. S.32. Contours of ρl for n-octane with respect to ǫCH2 and σCH2 (coarse grid). The CH3 parameters are set to the optimal values reported in text. The green and black lines represent the contours for the simulation results and the surrogate model, respectively. Panels (a)-(e) correspond to 390, 440, 490, 515, and 540 K, respectively.

FIG. S.33. Contours of ρl for n-octane with respect to ǫCH2 and σCH2 (refined grid). The CH3 parameters are set to the optimal values reported in text. The green and black lines represent the contours for the simulation results and the surrogate model, respectively. Panels (a)-(e) correspond to 390, 440, 490, 515, and 540 K, respectively. Notice the difference in axis ranges compared with Figure S.32.

S.39

S.VII.

DATA EVALUATION

Since obtaining accurate and precise ǫ and σ values necessitates reliable and evaluated experimental data, in this section we detail the data that are used in calculating RM S and P DF . Our initial objective was to exactly replicate the work originally done by Martin et al. and, thus, use the same set of data. However, for the Type A and AB analysis, we found that it was necessary to utilize a slightly different set of data. There are two key differences between the data used by Martin et al. and those used in the Type A and AB analysis. First, we used solely experimental data. Second, we utilized a larger set of data. In the subsequent paragraphs we explain why these modifications were necessary and justified. For Equation 3 in Section IV A2 of the text (used in the Type A and AB analysis) to be valid it is essential that the data used in the objective function (RM S) have normally distributed error. This poses a problem for the methodology used by Martin et al. because the TraPPE parameters were optimized to correlations for ρl [18] rather than true experimental data. Fortunately, in 2002 highly accurate ρl data were measured for ethane that agree strongly with the correlations used by TraPPE [19]. Therefore, we utilized these experimental data in our parameter optimization and subsequent uncertainty quantification. Similarly, we have utilized evaluated experimental ρl data for n-octane [20] in lieu of the correlations reported by Smith et al. (the ethane correlation was actually developed by Ref [21] and merely cited by Ref [18]). To quantify the difference in the data sets used in this study and by Martin et al., the RM S between the experimental data of Refs [19, 20] and the correlations from Ref [18] are 0.00022 kg/L for ethane and 0.00072 kg/L for n-octane. Furthermore, in order to perform any statistical analysis a key factor is the amount of data used in the objective function. The original TraPPE methodology used a relatively small data set, namely six ρl values for ethane and five for n-octane. That being said, a considerable amount of experimental ρl data exist for both ethane and n-octane. Therefore, the limitation lies not in the availability of experimental data but rather it is found in the computational cost of performing molecular simulations at the temperature corresponding to each experimental data point. With advances in computational power the cost of performing simulations at additional temperatures is less significant. However, we found that a surrogate model was a cheap and reliable way of predicting ρl at temperatures that were not simulated. That being said, we have only utilized experimental ρl data in the temperature range originally used by Martin et al. We have done this for two reasons. First, to be consistent with the work done by Martin et al. Second, because we suspect that our surrogate model may be less accurate for extrapolation than it is for interpolation. The specific ρl data used to calculate RM S for ethane [19] and n-octane [20] in this study are found in Table S.IV. For comparison, Table S.V contains the ρl values used by Martin et al. that were calculated with the correlations of Smith et al. Figure S.34 is provided to help visualize the similarity between the experimental data used in this study and the correlations used by Martin et al. TABLE S.IV. Experimental ρl data values for ethane and n-octane that were used to calculate RM S when developing the Mess-UP force field. Ethane n-Octane Temperature (K) ρl (kg/L) Temperature (K) ρl (kg/L) 185.00 0.5433 393.15 0.6170 195.00 0.5305 398.15 0.6120 200.00 0.5240 403.15 0.6080 210.00 0.5105 413.15 0.5990 220.00 0.4963 423.15 0.5875 230.00 0.4813 433.15 0.5770 240.00 0.4653 448.15 0.5609 250.00 0.4481 473.15 0.5310 260.00 0.4291 498.15 0.4978 265.00 0.4188 513.15 0.4730 270.00 0.4077 523.15 0.4554 275.00 0.3958

S.40

TABLE S.V. Computed ρl values for ethane and n-octane that were used by Martin et al. to calculate RM S when developing the TraPPE force field. Ethane Temperature (K) 178 197 217 236 256 275

545

(a) Ethane

270

530

260

515

250

500

240

485 Temperature (K)

Temperature (K)

280

n-Octane ρl (kg/L) Temperature (K) ρl (kg/L) 0.55215 390 0.61941 0.52819 440 0.57000 0.50083 490 0.50968 0.47213 515 0.47073 0.43713 540 0.41896 0.39581

230 220

470 455

210

440

200

425

190

410

180

(b) n-Octane

395

Experimental Data (Funke et al.) Correlation (Smith et al.)

170

Experimental Data (DIPPR) Correlation (Smith et al.)

380 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 Liquid Density (kg/L)

0.4

0.44

0.48 0.52 0.56 Liquid Density (kg/L)

0.6

0.64

FIG. S.34. Comparison of ρl experimental data used in this study and the correlations used by Martin et al.

By contrast with the Type A and AB approaches, the Type B analysis does not utilize Equation 3 in Section IV A2 of the text. Instead, the Type B approach assumes that the DIPPR correlation uncertainties account for the scatter and inaccuracies in the data. As this approach utilizes the DIPPR correlation uncertainty rather than the experimental data, the Type B approach is ideal for force field developers that are less familiar with analyzing experimental data. Furthermore, by utilizing a correlation uncertainty it is not necessary to predict ρl at a large number of temperature values corresponding to experimental data points. For these reasons, the DIPPR uncertainties are quite useful for quantifying force field parameter S.41

uncertainties. It is worth mentioning that for the Type B analysis the DIPPR uncertainties were assumed to be constant relative error for ethane. By contrast, for n-octane we felt it was more appropriate to use a constant absolute error (of 1% the average ρl value) from 380-530 K and from 530-545 K we used the DIPPR ρc uncertainty of 3% (notice the correlation uncertainties in Figure 4 in Section VI C1 of the text). It is important to note that DIPPR uncertainties are assigned using preset cutoff values of < 0.2%, < 1%, < 3%, < 5%, < 10%, < 25%, < 50%, and < 100%. Therefore, the 1% uncertainties should be interpreted as ranging from 0.2-1% and the 3% uncertainties are from 1-3%. As the Type B analysis requires a single uncertainty value, we used the upper value from a given range in order to provide the most conservative estimate of uncertainty.

S.42

S.VIII.

CORRELATION BETWEEN CH3 AND CH2 PARAMETERS

Several studies have demonstrated that the CH3 and CH2 parameters are highly correlated, particularly when they are regressed simultaneously [22]. In the case of a sequential optimization, any uncertainty in the CH3 parameters is inherited by the CH2 parameters. This correlation can be accounted for by developing expressions that relate the deviations in CH3 parameters to the CH2 parameters. For example, we can assume that the Lorentz-Berthelot (LB) cross interactions are constant. Therefore, any deviation in the CH3 parameters will shift the CH2 contours so that the new optimal CH2 parameters fulfill the following criteria 0 MCS + σCH ˆCH2 = σCH σ ˆCH3 + σ 2 3

(S.5)

0 ǫCH3 ǫˆCH2 = ǫMCS ˆ CH3 ǫCH2

(S.6)

MCS and ǫMCS ˆCH3 , ǫˆCH3 , and ǫˆCH2 are the optimal LJ parameters, σCH where σ ˆCH3 , σ CH3 are the CH3 parameters 3 0 0 obtained from the Monte Carlo sampling algorithm and σCH2 and ǫCH2 are the optimal CH2 parameters for the MCS CH3 parameters. Finally, the correlated MCS CH2 parameters are obtained from MCS ⋆ = σCH σCH 2 2

0 σCH 2 σ ˆCH2

(S.7)

ǫ⋆CH2 = ǫMCS CH2

ǫ0CH2 ǫˆCH2

(S.8)

MCS ⋆ where σCH and ǫMCS CH2 are the CH2 parameters obtained from the Monte Carlo sampling algorithm and σCH2 2 ⋆ and ǫCH2 are the MCS values after correcting for the correlation between CH3 and CH2 parameters. These equations assume that the ratio between the optimal and MCS parameter sets should be equal regardless of correlation. In Figures S.35-S.36 we demonstrate how MCS CH2 parameter sets are affected by this correlation. Specifically, we sampled 50000 CH3 parameter sets and implemented Equations S.5-S.8 to modify the 50000 sampled CH2 parameter sets. In Figure S.35 we present histograms of the 50000 MCS parameter sets with and without CH3 correlation. Panels (a)-(b) represent the Type A analysis while Panels (c)-(d) are for the Type AB analysis. Clearly using the LB combining rules causes the ǫ uncertainty to widen significantly while having a small effect on the σ uncertainty (compare Panels (a) and (c) with Panels (b) and (d) in Figure S.35). Figure S.36 plots the uncorrelated and correlated CH2 parameter sets for the ǫ and σ parameter space. Panels (a) and (b) utilize a Type A and AB approach, respectively. Only 1000 parameter sets are included in Figure S.36 for clarity. Notice that accounting for correlation has a much larger effect on the Type AB analysis. This is expected as the Type AB analysis results in much larger uncertainties in the CH3 parameters.

S.43

FIG. S.35. Comparison between the uncorrelated and correlated histograms for ǫCH2 and σCH2 . Panels (a)-(b) correspond to the Type A analysis for both the CH3 and CH2 parameters while Panels (c)-(d) use the Type AB analysis. The normal distribution fits to the histograms are also included. Probability density is defined as the number of counts in a single bin divided by both the total number of counts and the bin width.

S.44

FIG. S.36. Comparison between the uncorrelated and correlated MCS parameter sets for ǫCH2 and σCH2 . Panels (a)-(b) correspond to the Type A and Type AB analysis, respectively. Only 1000 parameter sets are presented for clarity.

S.45

S.IX.

PREDICTION OF Pv FOR ETHANE

Figure S.37 compares the DIPPR correlation for vapor pressure (Pv ) with the Mess-UP, TraPPE-UA, TraPPE-EH, Anisotropic united-atom (AUA4), Mie, and Transferable anisotropic Mie (TAMie) force fields. The TraPPE-UA values were obtained from the 2014 validation data [3] while the TraPPE-EH values are from the original parameterization paper [23]. The AUA4 results were calculated using the parameters reported by Ungerer et al. [24] with an analytic correlation developed by Stoll et al. [16] The Mie results are those reported by Potoff et al. [25] The TAMie values were obtained from Ref [26]. The uncertainties for the TraPPE-UA, AUA4, and Mie models are smaller than one symbol size. The uncertainties for the TraPPEUA, TraPPE-EH, and Mie models were obtained from the respective literature articles. These are numerical uncertainties that only account for simulation fluctuations. The AUA4 uncertainties were approximated from the correlation uncertainties reported in Ref [16]. The Mess-UP uncertainties represent the minimum and maximum values obtained from the Type B extrema (i.e. parameter and numerical uncertainties). Uncertainties were not available for the TAMie force field but are expected to be smaller than one symbol size for the method implemented in Ref [26]. The DIPPR correlation uncertainties are approximately the width of the line.

3.2

0.5 (b)

DIPPR Correlation Mess-UP, Type B TraPPE-UA (Validation 2014) TraPPE-EH (Chen et al.) AUA4 (Ungerer et al.) Mie 16-6 (Potoff et al.) TAMie (Hemmen et al.)

(a)

2.4

log(Pv / MPa)

Vapor Pressure (MPa)

0.0

1.6

-0.5 DIPPR Correlation Mess-UP, Type B TraPPE-UA (Validation 2014) TraPPE-EH (Chen et al.) AUA4 (Ungerer et al.) Mie 16-6 (Potoff et al.) TAMie (Hemmen et al.)

0.8

-1.0

0.0 180

200

220 240 Temperature (K)

260

280

3.50

4.00

4.50 5.00 1000 / T (1/K)

5.50

FIG. S.37. Comparison of the DIPPR correlation for Pv with different force fields. Panel (a) is a T -Pv plot while -log(Pv ). Panel (b) is a Clausius-Clapeyron plot for 1000 T

Notice that even the Type B uncertainties do not encompass the Pv correlation for ethane. (Although not depicted here, similar results are observed for n-octane.) This demonstrates that the Mess-UP parameter S.46

uncertainties do not account for model error and, therefore, not all properties are predicted accurately with this model. Thus, the Mess-UP force field should not be used to predict Pv (or heat of vaporization, ∆Hv , not shown). A three-parameter model, such as the anisotropic-united-atom Lennard-Jones model (e.g. AUA4) or the united-atom Mie model, is required to predict ρl , Pv , and ∆Hv accurately. Alternatively, similar accuracy can be obtained with an explicit hydrogen LJ model (which uses four LJ parameters for ethane). The four parameter anisotropic Mie potential can provide extremely accurate estimates of these physical properties for ethane. As mentioned in the main text, the two-cite Lennard-Jones model is not flexible enough to predict both ρl and Pv to within the DIPPR uncertainties for ethane. This can be visualized by plotting the Type B feasible region for Pv , which was obtained using the Pv correlations from Stoll et al. [16] Figure S.38 compares the Type B feasible regions for Pv and ρl . Clearly, the two regions do not overlap and, therefore, it is impossible to predict ρl and Pv to within the DIPPR uncertainties using the UA LJ model. Note that the DIPPR uncertainty for Pv of ethane is 1%. This is interpreted as a constant relative error over the temperature range of 178-275 K. We recommend that for even lower temperatures a constant absolute error be implemented.

3.83

Optimal (Mess-UP) TraPPE NERD Type B, ρl Type B, Pv

3.78

3

σCH (Å)

3.73 3.68 3.63 3.58 3.53 97

99

101

εCH (K)

103

105

107

3

FIG. S.38. Comparison of Type B feasible regions for ρl and Pv . The TraPPE, NERD, and Mess-UP optimals are depicted for reference.

S.47

S.X.

CORRELATION BETWEEN LENNARD-JONES PARAMETERS AND BOND LENGTH FOR ETHANE

In this study, we have assumed that the intramolecular parameters (i.e. bond lengths, bond angels, etc.) are known accurately enough to contribute a negligible error. There are two key reasons for this assumption. The first reason is that molecular mechanics based models have been shown to reproduce ab initio calculations for simple hydrocarbons [27–31]. The second reason is because a systematic approach to force field parameterization greatly simplifies the uncertainty analysis by reducing the dimensions of the parameter space. That being said, the optimal Lennard-Jones parameters are certainly dependent on the intramolecular model to some extent. In this section we demonstrate the amount of intercorrelation between the bond length and the LJ parameters for ethane. The computational costs of this three parameter uncertainty analysis are greatly reduced by using the analytic expressions for ρl and Tc developed by Stoll et al. for the two-center Lennard-Jones molecule (2CLJQ) [16].

101

3.77

(a)

σCH (Å)

3.76

3

99

3

εCH (K)

100

3.75

98

3.74

97

3.73

1.50 0.43

1.52

1.54 1.56 L (Å)

1.58

1.60

1.50 306.8

(c)

0.42

1.52

1.54 1.56 L (Å)

1.58

1.60

1.52

1.54 1.56 L (Å)

1.58

1.60

(d)

L

*

Tc (K)

306.7

0.41

0.40 1.50

(b)

306.6

306.5 1.52

1.54 1.56 L (Å)

1.58

1.60

1.50

FIG. S.39. Dependence of optimal LJ parameters and Tc on the CH3 -CH3 bond length (L) for ethane. Panels (a)-(d) correspond to ǫCH3 , σCH3 , L⋆ L , and Tc , respectively. σ

Figure S.39 Panels (a)-(d) demonstrate the correlation between the bond length (L) and the optimal ǫCH3 , σCH3 , L⋆ L σ , and Tc , respectively. These curves were obtained by optimizing ǫCH3 and σCH3 for a given value of L. We utilized the 2CLJQ correlations proposed by Stoll et al. and the ρl data found in Section S.VII to find the optimal ǫCH3 and σCH3 . Panel (c) depicts the ratio of the bond length and the optimal σ (i.e. L⋆ = L σ ) with respect to L. Specifically, the optimal curve for σ with respect to L (Panel (b)) is S.48

used to obtain the optimal curve of L⋆ . Panel (d) was obtained by using the 2CLJQ correlation proposed by Stoll et al. to predict Tc⋆ for the optimal values of L⋆ found in Panel (c). Subsequently, Tc is calculated by multiplying each value of Tc⋆ by the optimal value of ǫ for the corresponding value of L (Panel (a)). Figure S.39 Panels (a) and (b) demonstrate that the optimal values of ǫCH3 and σCH3 increase and decrease, respectively, with increasing bond length. The fact that σ decreases with increasing L causes the optimal value of L⋆ to increase with respect to L (see Panel (c)). Panel (d) shows that the predicted value of Tc with the optimal values of ǫCH3 and σCH3 increases with increasing L. To quantify the degree of correlation, a +1% deviation in L corresponds to a +0.66%, -0.35%, +1.15%, and +0.014% deviation in the optimal values of ǫCH3 , σCH3 , L⋆ , and Tc , respectively. Therefore, although the optimal value of Tc increases with respect to L, the propagation of errors from L to Tc is quite small. It appears that Tc benefits from a fortuitous cancellation of errors since ǫ increases linearly with increasing L while Tc⋆ (not shown) decreases linearly with increasing L⋆ . Note that the Stoll 2CLJQ correlation predicts a Tc value of approximately 306.6 K for the Mess-UP optimal parameters compared to the value we obtained from molecular simulation of 304.4 K. This deviation of approximately 2.2 K is only slightly larger than the 0.5% (or 1.5 K) uncertainty reported by Stoll et al. for their Tc correlation. Therefore, we conclude that the deviations between the Mess-UP optimal and the Stoll correlation are not significant. Some of the values for L depicted in Figure S.39 deviate strongly from the known equilibrium bond length for ethane. Ungerer et al. reported a value of 1.535 ˚ A for ethane and a range of bond lengths for n-alkanes of 1.527 ˚ A to 1.544 ˚ A [24]. This range represents an uncertainty of less than 1% and the uncertainty in the bond length for ethane is likely even smaller. For bond lengths shorter than 1.527 ˚ A and longer than 1.544 ˚ A, the model would be more appropriately classified as an anisotropic-united-atom (AUA) model. As Ungerer et al. demonstrated, the optimal LJ parameters for an AUA model will be deviate strongly from the UA model. Figure S.40 depicts the optimal ǫ and σ parameter sets for feasible values of L ranging from 1.527 ˚ A to 1.544 ˚ A (referred to as ‘L-dependent Optimal’). The single point along the ‘L-dependent Optimal’ corresponds to the common literature value of L = 1.535 ˚ A [24]. Figure S.40 also includes the Mess-UP optimal, TraPPE optimal, and the Type B extrema as a reference. Note that the small deviation between the Mess-UP optimal and the L-dependent optimal line is likely due to the uncertainties in the Stoll correlation. From this figure we see that, even with the rather large uncertainty range of 1.527-1.544 ˚ A for ethane, the L-dependent optimal curve is almost completely encompassed by the Type B extrema. Furthermore, the ǫ-σ optimal for the more precise value for ethane of L = 1.535 ˚ A is clearly within the Type B extrema. From the results presented in this section, we conclude that the uncertainties in the CH3 LJ parameters due to the bond length uncertainty are smaller than the Type B uncertainty. Repeating this process for n-octane would be extremely computationally expensive, especially if we investigated uncertainties in all bond lengths, bond angles, and torsions. However, we would like to emphasize that the analysis found in this study assumes a given intramolecular model. In other words, it is assumed that any shortcomings in the intramolecular model are small enough that they can be accounted for by the LJ parameters when predicting ρl and Tc . Essentially, we have performed a constrained optimization where the constraints are that the intramolecular parameters do not change. Therefore, the ǫ and σ uncertainties reported in the main text do not represent every feasible set of LJ parameters. Instead, they represent the feasible set of LJ parameters to be used with the TraPPE-UA intramolecular model when parameterized to predict ρl and Tc for ethane and n-octane. It is crucial that the same intramolecular model be used when simulating with the Type A, AB , and B Lennard-Jones parameters.

S.49

3.78

L-dependent Optimal L = 1.535 Å Optimal (Mess-UP) Type B Extrema TraPPE

3.77

3

σCH (Å)

3.76 3.75 3.74 3.73 3.72 97

98

εCH (K)

99

100

3

FIG. S.40. Comparison of the optimal ǫCH3 -σCH3 parameter sets for various bond lengths of ethane (L). Included are the Mess-UP and TraPPE optimal parameter sets and the Type B extrema. The line labeled ‘L-dependent Optimal’ corresponds to the optimal values of ǫ and σ for the range of L values from 1.527-1.544 ˚ A. The single point along this line labeled ‘L = 1.535 ˚ A’ corresponds to a common L value found in the literature.

S.50

[1] R. A. Messerly, R. L. Rowley, T. A. Knotts, and W. V. Wilding. An improved statistical analysis for predicting the critical temperature and critical density with Gibbs ensemble Monte Carlo simulation. Journal of Chemical Physics, 143(10):104101, 2015. [2] R. A. Messerly, T. A. Knotts, R. L. Rowley, and W. V. Wilding. An improved approach for predicting the critical constants of large molecules with Gibbs ensemble Monte Carlo simulation. Fluid Phase Equilibria, 425:432–442, 2016. [3] B.L. Eggimann, P. Bai, A.P. Bliss, Q.P. Chen, T.F. Chen, A.D. Corest-Morales, E. Fetisov, E. Haldoupis, D.B. Harwood, R.K. Lindsey, T.L. Arachchi, M.S. Shah, H.D. Stern, K.N. Struk, J. Sung, A.J. Sunnarborg, B. Xue, and J. I. Siepmann. T-UA No. 2 ethane. TraPPE Validation Database, University of Minnesota: Minneaoplis, MN. http://www.chem.umn.edu/groups/siepmann/trappe/ (accessed 2015 June 11). [4] Panagiotis Angelikopoulos, Costas Papadimitriou, and Petros Koumoutsakos. Bayesian uncertainty quantification and propagation in molecular dynamics simulations: A high performance computing framework. The Journal of Chemical Physics, 137(14):144103, 2012. [5] Marco Hulsmann and Dirk Reith. SpaGrOW - a derivative-free optimization scheme for intermolecular force field parameters based on sparse grid methods. Entropy, 15(9):3640, 2013. [6] M. Dinpajooh, P. Bai, D. A. Allan, and J. I. Siepmann. Accurate and precise determination of critical properties from Gibbs ensemble Monte Carlo simulations. Journal of Chemical Physics, 143(11):114113, 2015. [7] R. L. Rowley. Molecular Modeling. Unpublished. [8] Fabien Cailliez and Pascal Pernot. Statistical approaches to forcefield calibration and prediction uncertainty in molecular simulation. The Journal of Chemical Physics, 134(5):054124, 2011. [9] Amal Lotfi, Jadran Vrabec, and Johann Fischer. Vapour liquid equilibria of the Lennard-Jones fluid from the NpT plus test particle method. Molecular Physics, 76(6):1319–1333, 1992. [10] Ali Kh. Al-Matar, Ahmed H. Tobgy, and Ibrahim A. Suleiman. The phase diagram of the Lennard-Jones fluid using temperature dependent interaction parameters. Molecular Simulation, 34(3):289–294, 2008. [11] K. Stobener, P. Klein, S. Reiser, M. Horsch, K.-H. Kufer, and H. Hasse. Multicriteria optimization of molecular force fields by Pareto approach. Fluid Phase Equilibria, 373:100 – 108, 2014. [12] Stephan Werth, Katrin Stbener, Peter Klein, Karl-Heinz Kufer, Martin Horsch, and Hans Hasse. Molecular modelling and simulation of the surface tension of real quadrupolar fluids. Chemical Engineering Science, 121:110 – 117, 2015. 2013 Danckwerts Special Issue on Molecular Modelling in Chemical Engineering. [13] K. Stobener, P. Klein, M. Horsch, K. Kufer, and H. Hasse. Parametrization of two-center Lennard-Jones plus point-quadrupole force field models by multicriteria optimization. Fluid Phase Equilibria, 411:33 – 42, 2016. [14] Loup Verlet and Jean-Jacques Weis. Perturbation theory for the thermodynamic properties of simple liquids. Molecular Physics, 24(5):1013–1024, 1972. [15] M. G. Martin and J. I. Siepmann. Transferable potentials for phase equilibria. 1. United-atom description of n-alkanes. Journal of Physical Chemistry B, 102(14):2569–2577, 1998. [16] Jurgen Stoll, Jadran Vrabec, Hans Hasse, and Johann Fischer. Comprehensive study of the vapour-liquid equilibria of the pure two-centre Lennard-Jones plus point-quadrupole fluid. Fluid Phase Equilibria, 179(1-2):339 – 362, 2001. [17] Thijs van Westen, Thijs J. H. Vlugt, and Joachim Gross. Determining force field parameters using a physically based equation of state. The Journal of Physical Chemistry B, 115(24):7872–7880, 2011. PMID: 21568280. [18] Buford D. Smith and Rakesh Srivastava. Thermodynamic Data for Pure Compounds. Part A. Hydrocarbons and ketones. Elsevier; Distributors for the U.S. and Canada E, Amsterdam; New York, 1986. [19] M. Funke, R. Kleinrahm, and W. Wagner. Measurement and correlation of the (P, ρ, T) relation of ethane II. Saturated-liquid and saturated-vapour densities and vapour pressures along the entire coexistence curve. The Journal of Chemical Thermodynamics, 34(12):2017 – 2039, 2002. [20] R. L. Rowley, W. V. Wilding, J. L. Oscarson, T. A. Knotts, and N. F. Giles. DIPPR Data Compilation of Pure Chemical Properties. Design Institute for Physical Properties. AIChE, New York, NY, 2013. [21] Robert D. Goodwin, H.M. Roder, and G.C. Straty. Thermophysical properties of ethane, from 90 to 600 K at pressures to 700 bar, volume NBS Technical Note 684. National Bureau of Standards (U.S.), U.S. Department of Commerce, 1976. [22] Sinan Ucyigitler, Mehmet C. Camurdan, and J. Richard Elliott. Optimization of transferable site-site potentials using a combination of stochastic and gradient search algorithms. Industrial & Engineering Chemistry Research, 51(17):6219–6231, 2012. [23] B. Chen and J. I. Siepmann. Transferable potentials for phase equilibria. 3. Explicit-hydrogen description of normal alkanes. Journal of Physical Chemistry B, 103(25):5370–5379, 1999. [24] Philippe Ungerer, Christele Beauvais, Jerome Delhommelle, Anne Boutin, Bernard Rousseau, and Alain H.

S.51

[25] [26] [27] [28]

[29] [30] [31]

Fuchs. Optimization of the anisotropic united atoms intermolecular potential for n-alkanes. The Journal of Chemical Physics, 112(12):5499–5510, 2000. J. J. Potoff and D. A. Bernard-Brunel. Mie potentials for phase equilibria calculations: Applications to alkanes and perfluoroalkanes. Journal of Physical Chemistry B, 113(44):14725–14731, 2009. Andrea Hemmen and Joachim Gross. Transferable anisotropic united-atom force field based on the mie potential for phase equilibrium calculations: n-alkanes and n-olefins. The Journal of Physical Chemistry B, 119(35):11695– 11707, 2015. Marc C. Nicklaus. Conformational energies calculated by the molecular mechanics program CHARMm. Journal of Computational Chemistry, 18(8):1056–1060, 1997. Carl S. Ewig, Rajiv Berry, Uri Dinur, Jorg-Rudiger Hill, Ming-Jing Hwang, Haiying Li, Chris Liang, Jon Maple, Zhengwei Peng, Thomas P. Stockfisch, Thomas S. Thacher, Lisa Yan, Xiangshan Ni, and Arnold T. Hagler. Derivation of class II force fields. VIII. Derivation of a general quantum mechanical force field for organic compounds. Journal of Computational Chemistry, 22(15):1782–1800, 2001. J.R. Maple, M.-J. Hwang, T.P. Stockfisch, and A.T. Hagler. Derivation of class II force fields. III. Characterization of a quantum force field for alkanes. Israel Journal of Chemistry, 34(2):195–231, 1994. Dirk Reith and Karl N. Kirschner. A modern workflow for force-field development - Bridging quantum mechanics and atomistic computational models. Computer Physics Communications, 182(10):2184 – 2191, 2011. William L. Jorgensen, David S. Maxwell, and Julian Tirado-Rives. Development and testing of the OPLS allatom force field on conformational energetics and properties of organic liquids. Journal of the American Chemical Society, 118(45):11225–11236, 1996.

S.52