Eur. Phys. J. E 13, 133–140 (2004) DOI: 10.1140/epje/e2004-00050-3
THE EUROPEAN PHYSICAL JOURNAL E
A nonparametric approach to calculate critical micelle concentrations: the local polynomial regression method J.L. L´ opez Font´ an, J. Costa, J.M. Ruso, G. Prieto, and F. Sarmientoa Group of Biophysics and Interfaces, Department of Applied Physics, Faculty of Physics, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain and Department of Mathematics, Faculty of Informatics, University of A Coru˜ na, 15071 A Coru˜ na, Spain Received 2 October 2003 c EDP Sciences, Societ` Published online 25 March 2004 – a Italiana di Fisica, Springer-Verlag 2004 Abstract. The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. PACS. 02.50.-r Probability theory, stochastic processes, and statistics – 82.70.-y Disperse systems; complex fluids
1 Introduction The structure of molecular amphiphilics consists of two well-defined regions, one which is oil-soluble (lipophilic or hydrophobic) and the other being water-soluble (hydrophilic). The hydrophobic part is non-polar and generally contains aliphatic or aromatic hydrocarbon residues. The hydrophilic part consists of water-interacting polar groups. A significant characteristic of these substances is their capacity to form structures called micelles, in which the hydrophobic portions of the molecule associate to form regions where water (or solvent) is excluded. The hydrophilic head groups remain on the external surface to maximize their interaction with water (and oppositely charged ions or counterions). The concentration (or narrow concentration range) at which micelles first form in solution is called the critical micelle concentration (cmc) or more generally critical concentration (cc) and can be identified by observing the behavior of the physical properties of the solution (self-diffusion, conductivity, surface tension, magnetic resonance, solubilization, turbidity, osmotic pressure), each of which undergoes a rather abrupt change in concentration dependence with these physical properties [1]. A study of the results obtained by different measurement methods reveals that the cmc values obtained by various methods can vary by almost 50% and the same a
e-mail:
[email protected]
method in different hands can produce a similar spread. Some variations can be traced to uncertainties in the extrapolation procedures used because the transition from monomeric to aggregated state is extended over a narrow concentration range. For this reason, several theoretical definitions have been proposed for cmc. Corrin [2] defines it as the total amphiphilic concentration at which a small and constant number of molecules are in aggregated form. For Williams et al. [3] the cmc is the amphiphilic concentration at which the number of micelles would become zero if the micellar concentration continued to change at the same rate as it does at a slightly higher concentration. A more general definition is that given by Phillips [4], as the concentration corresponding to the maximum change in a gradient in the solution property versus concentration curve 3 d φ =0 (1) dCT3 CT =CMC where φ is any of the physical properties and CT is the total concentration of the amphiphile. Hall et al. [5] developing the Phillips’ idea gave another definition that places the cmc at the point where ∂(x2 + xm ) = 0.5 (2) ∂xd T,p and ¯x xd = x2 + n
(3)
134
The European Physical Journal E
where x2 and xm represent the mole fractions of nonmicellar and micellar amphiphilics, respectively, and n ¯ is the mean aggregation number of micelles. Israelachvili et al. [6] define the cmc as the concentration at which the analytical surfactant concentration in micelles equals the monomer concentration in bulk. The definition of Phillips has been the most commonly used because it is centered in the area of major variation of physical properties, and thus, it is more rigorous than that of Williams et al. We have proposed [7] an application of the Phillips method consisting of a combination of the Runge-Kutta [8] numeral integrations method and the Levenberg-Marquardt [9] least-squares fitting algorithm. The method has been applied to predict cmc’s of classical surfactants [7] and systems constituted by molecules with amphiphilic structure which show a slow dependence of a physical property with concentration or have two or more cmc’s [7,10–12]. All these mathematical methods are essentially of parametric character, yielding partially good results, although not considered universally adequate. Our objective in the present work is to introduce the application of a statistical method, based on a nonparametric estimation of the regression function, which has the advantage of being extremely flexible given it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Hence, this methodology is applicable in practically all circumstances with good results.
2 Theoretical background What follows is an explanation of the statistical method, in the setting of the general regression theory, and focuses on solving the problem here proposed. A general overview of a classical regression study is shown in Draper and Smith [13]. A study of nonparametric regression can be seen in M¨ uller [14], Francisco Fern´ andez [15] and in the excellent monograph of Fan and Gijbels [16] on local polynomial regression. At our disposal is a series of observations from two variables, an independent variable X (covariate or predictor) that in our case represents the concentration, and a dependent variable Y (response) that represents the measured physical property (specific conductivity, apparent molar volume, surface tension, diffusion coefficient, etc.) Our objective is to establish the relationship between the two variables, thus, explaining a physical property as a function of concentration, to study characteristics of interest, in particular, cmc. The usual way one establishes the relationship between both variables is by means of a regression analysis. For given pairs of data (Xi , Yi ), i = 1, 2, ....n, one attempts to fit a mathematical function through the data, the so-called regression function. The part that cannot be explained is the error (often treated as noise). Y = m(X) + error.
(4)
The statistical problem consists in estimating m and its ˆ (j) be the estimation of jth derivative m(j) , j > 0. Let m the jth derivative. The problem of calculating cmc is related with the statistical problem of the regression where interest is in the location of a peak xcmc of second derivative m(2) (5) xcmc = arg max m(2) (x) and the physical property at the cmc m(xcmc )
(6)
that will be estimated from 2 ˆ (x) x ˆcmc = arg max m
(7)
m(ˆ ˆ xcmc ). Noting our control over the variable X, two regression models are evident: fixed design and random design. In the random design regression model, we observe pairs of random variables (Xi , Yi ), i = 1, 2, ....n, drawn independently from a bivariate distribution. m(x) = E(Y /X = x).
(8)
In the fixed design regression model, we assume that measurements of X are made at the fixed points x1 , x2 , ..., xn (points chosen by the experimenter). Yi = m(xi ) + εi , i = 1, 2, ...n
(9)
where Y1 , Y2 , ...Yn are the measurements made at points x1 , x2 , ..., xn , and εi is the measurement error of the ith measurement. In the problem that concerns us, the experimenter chooses the design points, hence, the model before us is a fixed design model. Moreover, we can assume the following structure in the errors: εi are independent; E(εi ) = 0, i.e., the errors are unbias (with null mean); and V ar(εi ) = σ 2 < ∞, i.e., the errors are homoscedastic (with constant variances). Noting the hypotheses assumed on the regression function m, we can differentiate between parametric regression and nonparametric regression. The classical regression methods are parametric models, where the type of regression function is specified depending on finitely many parameters. For example, the simplest case, and therefore most commonly used, is the simple linear regression (univariate linear first-order model), where we assume that the regression function corresponds to a straight line, hence called regression line, and the estimation of this straight line, for which different methods exist, is reduced to the estimation of two of its parameters, slope and intercept. The advantage of these parametric methods is that when the functional model assumed for m is adequate, the estimation is reduced to a few parameters, and therefore, it is extremely efficient. However, when the model chosen is inadequate, the estimation will be invalid and in many cases of no use. This is why, after the pioneering works of Nadaraya [17] and Watson [18], the development of
J.L. L´ opez Font´ an et al.: A nonparametric regression method to calculate CMC
nonparametric regression has been growing steadily. The class of regression functions one is working with depends on infinitely many parameters, that is, not imposing a rigid structure but rather only very general conditions in terms of smoothness and differentiability. Therefore, these methods do not reach the efficiency levels of parametric methods, but instead are extremely flexible and versatile, since they are not restricted by a rigid parametric model, and they can be applied in practically any situation. The spread of nonparametric methods has been delayed given their use is very much linked to advances in computer technology of the last decades. In general, the estimation is repeated for each point of interest of the independent variable, making nonparametric techniques much more expensive than parametric techniques in terms of computational processes. Moreover, we would like to point out that from the beginning nonparametric techniques have been considered not antagonistic but rather complementary to parametric techniques. Although interesting nonparametric regression methods exist, as in the case of spline methods and wavelet expansions, that model the regression function globally, kernel methods, that apply local fitting, are most studied and used. Among the kernel methods are the estimators of Nadaraya-Watson, of Gasser-M¨ uller and the local polynomial estimator. In general, the basic idea of these estimators consists in obtaining local mean values: assuming that the function m to estimate is smooth, observations near the point to estimate contain important information to do just that, and these estimators will be greater the closer they are to one another. Therefore, the estimation is achieved by obtaining an adequately weighted mean. Local polynomial regression In our study we have opted for using local polynomial regression. The fundamental idea of this methodology appears in the pioneering work of Macauley [19], in the context of time series analysis, and was first applied to nonparametric regression at the end of the 70s. However, it is not until the beginning of the 90s when it becomes quite popular thanks especially to the works of Fan and Gijbels. The local polynomial regression presents important advantages, both theoretical and applied. Among the most important are those that: have good asymptotic properties of the bias and variance; good behavior in terms of minimax efficiency and orders of convergence; lack harmful boundary effects; are polyvalent and easy to apply in practice; the estimation of the derivatives, very important to us, is carried out in a direct and easy manner; and above all, no remarkable disadvantage is observed, except for the presence of a higher computation cost with respect to the other methods cited, but in any case, a cost that could easily be covered with the currently available computer resources. Undoubtedly, its wide applicability justifies the enormous attention that has been focused on this regression method in the last few years. The basic idea of the local polynomial estimator consists in performing weighted local fittings of polynomial
135
functions by least squares. If the polynomials used are of zero order (constant functions), the estimator obtained coincides with the estimator of Nadaraya-Watson. If the polynomials used are of first order (straight lines), the estimator is called a local linear regression estimator, definitely most commonly used. It is interesting to point out that it is always preferable to use fittings with polynomials of odd order since they show better performance than the fittings of even order; on the other hand, among the odd fittings, none is considered universally better, although there is always the tendency to use low orders. When the objective is to consider the jth derivative of the regression function an odd difference between the order of the polynomial and the order of the derivative is preferable. Obviously, since in our case the estimation of the second derivative of the regression function using polynomials of odd order is necessary, at least a third order fitting must be used. The following is the mathematical formulation of the local polynomial modeling. Suppose that the (p + 1)th derivative of m(x) at point x0 exists. We approximate the unknown regression function m(x) locally at x0 by a polynomial of order p. The theoretical justification is that we can approximate, in a neighborhood of x0 , m(x) using a Taylor expansion m(x) ≈
p
βk (x − x0 )k
(10)
k=0
where βk =
m(k) (x0 ) . k!
This polynomial, used to approximate the unknown function locally at x0 , is obtained by solving a locally weighted least squares regression problem, i.e. by minimizing n i=1
Yi −
p
2 βk (xi − x0 )
k
Kh (xi − x0 )
(11)
k=0
where h is a parameter called bandwidth (also called a smoothing parameter), which is a nonnegative number controlling the size of the local neighborhood, and Kh (·) = K(·/h)/h, where K is a weighting function called the kernel function. Usually, although this condition is not necessary, K is a symmetric probability density function. These two quantities, h and K, still have to be specified. Let βˆk , k = 0, 1, ...p be the solution of the minimizing problem. From the previous explanation, it is clear that j!βˆj is an estimator for the derivatives m(j) (x0 ), j = 0, 1, ...p. Thus, the estimation obtained, of both the regression function and its derivatives, is local, and therefore, the process must be repeated at all points where an estimation is of interest. Before discussing in detail the aspects concerning the bandwidth parameter and the kernel function, let us see the analytical expression of the solution βˆk , k = 0, 1, ...p of the locally weighted least squares
136
The European Physical Journal E
regression problem. Let X be the n × (p + 1) matrix 1 (x1 − x0 ) · · · (x1 − x0 )p .. .. (12) X = ... . . 1 (xn − x0 ) · · · (xn − x0 )p
and the vectors y = (Y1 , Y2 , ..., Yn )T and (βˆ0 , βˆ1 , ..., βˆp )T . Finally, denote by W the n × n nal matrix of weights W = diag {Kh (xi − x0 )}. the solution is −1 βˆ = XT WX XT Wy.
βˆ = diagoThen,
(13)
The selection of K does not influence the results much. We opted for using the quartic kernel 15 2 2 if |u| ≤ 1 16 (1 − u ) K(u) = . (14) 0 otherwise Other kernel functions that are widely used are: −1/2 K1 (u) = (2π) exp −u2 /2 (Gaussian kernel) 3 K2 (u) = 1 − u2 I (|u| ≤ 1) (Epanechnikov kernel). 4 Bandwidth selection An adequate selection of the bandwidth parameter h is crucial for good estimation of the curve of interest. If the value chosen for h is too small, the bias of the estimation will also be small, but at the cost of increasing the variance, that is, we would be estimating a strongly oscillating curve; if on the other hand, the value chosen for h is too large, the variance will be small, but the bias will be too big, in other words, we would be oversmoothing the curve. Therefore, the aim is to find a trade-off between bias and variance. Different criteria exist for adequate bandwidth selection. The most popular are based on known techniques such as cross-validation, bootstrap and plug-in. Criteria based on the idea of cross-validation were very popular some years ago and are still used today. However, methods that are currently recommended are bootstrap and plug-in methods; among them, we opted for using a plug-in type selector to solve our problem, because of its lower computational cost. In addition, another decision to make for bandwidth selection in this context is to decide whether to use a global bandwidth or a local or variable bandwidth; in this last case a different value is used for the bandwidth parameter in the estimation of the curve at each point of interest x0 , that is, the bandwidth is fitted to the estimation point, whereas in the case of global criterion the same bandwidth value is used for all points. As general criteria, if the curve to estimate is very oscillating, or the design data {xi }1≤i≤n are far from being equally-spaced, a local bandwidth is preferred. We opted for using a local bandwidth because although it has a higher computational cost, the results of the estimation
at each of the points are much better. Nonetheless, a local bandwidth has the inconvenience that the estimation of the function of interest will be rougher than that with the global bandwidth, which can make the overall view of the regression function, and hence any comparisons, slightly more difficult. The basic idea of the bandwidth selectors consists in choosing the bandwidth that minimizes a measurement of the error in the estimation. Thus, in the case of wanting a local bandwidth for the estimation at x0 of the jth derivative m(j) , the idea is to minimize the mean squared error (MSE ) 2 M SE(h) = E m ˆ (j) (x0 ) − m(j) (x0 ) 2 = Bias m ˆ (j) (x0 ) + Var m ˆ (j) (x0 ) . (15) If we wish to work with a global bandwidth, we can opt for minimizing the integrated squared error (ISE ), if a random bandwidth is desired, 2 m ˆ (j) (x) − m(j) (x) dx ISE(h) = (16) or minimizing the mean integrated squared error (MISE ), if a deterministic bandwidth is preferred, (j) (j) 2 m ˆ (x) − m (x) dx . (17) M ISE(h) = E Unfortunately, a completely satisfying finite sample solution of the bandwidth choice problem is not possible. This is the reason why asymptotic considerations are introduced, even at the cost that its validity is limited when working with small sample sizes. The local bandwidth, which minimizes the asymptotic mean squared error (AMSE ) [20] is given as hopt j (x0 )
= Cj,p
σε2
2 n m(p+1) (x0 ) f (x0 )
1/(2p+3) (18)
where σε2 is the variance of the errors εi and Cj,p is a constant that depends on the kernel function K used. In the case of the function K used here, quartic kernel, these constants appear in the table that follows, for the different values of (j, p), j, order of the derivative to estimate and p order of the local polynomial fitting, that most interest us: j 0 0 1 2
p 1 3 2 3
Cj,p 2.036 3.633 2.586 3.208
The asymptotically optimal local bandwidth previously defined presents the obvious problem that it is not
J.L. L´ opez Font´ an et al.: A nonparametric regression method to calculate CMC
137
calculable, since it depends on unknown quantities, among them derivatives of the actual regression curve. The solution the plug-in method proposes before this difficulty is to substitute the unknown quantities by previous estimations of these, either by parametric or nonparametric methods; in the latter case, we are obligated to use a different bandwidth, the pilot bandwidth, for the previous estimations. For the estimation of the design density function f , the Parzen-Rosenblatt [21] kernel estimator is used 1 fˆ(x) = Kh (xi − x). n i=1 n
(19)
3 Results and discussion The problem of estimating the derivatives of the regression curve has been resolved by implementing the local polynomial regression estimator in the programming language Matlab, using the quartic kernel and plug-in optimal AMSE bandwidth. The calculation of hopt j (x0 ) was achieved by estimating σε2 using a local polynomial regression, with a pilot bandwidth; the Parzen-Rosenblatt kernel estimator for f (x0 ) was also obtained by using a pilot bandwidth, and a parametric regression and a local polynomial regression were used for m(p+1) (x0 ). To avoid the problem of choosing an initial value h0 , an iterative approach was taken starting with a large h0 . This initial pilot bandwidth produces another bandwidth, and this last one produces another, and so on. The iteration is continued to convergence. The reason for using a local bandwidth, and therefore applying AMSE criteria to obtain the bandwidth parameter, is that the objective we are after is not as much to have an overall view of the regression function but more to locate the point that proves a particular condition (cmc point). Nevertheless, in the Figures presented (1 to 4) we opted for including not only the graphs obtained using local bandwidths (Figs. (a)), but also those corresponding to global bandwidths (Figs. (b)). The reason for choosing this option is to provide a greater overview of the function we want to maximize, while avoiding small variations that occur and that are inherent to local bandwidths. Lastly, for the selection of the order of the local polynomial fitting, we opted for using order p = 3 fittings for the estimation of the second derivative of the regression function, although for the estimation of m(xcmc ) we have used linear fittings. Table 1 shows some results obtained by application of the LPRM and by other methods (OM). In general, good concordance exists between the cmc values obtained with both methods, only minor discrepancies being observed when the variation of the physical property with concentration is slight. Figures 1–4 show the application of the method to determine the cmc. The solid line represents c times (c = const.) the absolute value of the second derivative, the arrows the cmc values and the open squares correspond to experimental values of the physical property measured. Figures (a) correspond to the plots obtained by applying the local bandwidth and Figures (b)
Fig. 1. Specific conductivity of sodium n-dodecylsulphate in water as a function of molar concentration at 298.15 K and 200 kHz: () experimental points from reference [7]. The solid line represents the absolute value of the second derivative obtain with the LPRM method: (a) local bandwidth, (b) global bandwidth. The arrows indicate the critical micelle concentrations obtained.
to those obtained by global bandwidth. In Figures 1a and 1b the system sodium n-dodecylsulphate in water at 298.15 K and 200 kHz is shown. The relative maximum, at approximately 4.18 (Figure 1a) and that seems to be a spurious effect due to the small irregularity in the data (or nonlinearity), no longer exists when using the global bandwidth (Fig. 1b), confirming, as it appeared, that this point was not to be taken into consideration. This system shows a clear maximum at 8.20 mmol L−1 which is comparable with the value in the literature (Tab. 1). The maximum of n-dodecyltrimethylammonium bromide obtained by LPRM, applied to data of specific conductivity in water at 298.15 K and 200 kHz, is 15.2 mmol L−1 (Figs. 2a and 2b), being in very good concordance with
138
The European Physical Journal E
Table 1. Critical micelle concentrations (cmc) obtained by the local polynomial regression method (LPRM) and other methods (OM) for the systems: (ClPM), Chlorpromazine hydrochloride in 0.05 NaCl aqueous solution at 303.15 K; (DDBACl), Decyldimethylbenzylammonium chloride in water at 308.15 K; (DePCl), Decylpyridinium chloride in water at 298.15 K; (DeTAB), Decyltrimethylammonium bromide in water at 283.15 K; (DPCl), Dodecylpyridinium chloride in water at 298.15 K;(DTAB), Dodecyltrimethylammonium bromide in water at 298.15 K and 200 kHz, (DTAB(a)) Dodecyltrimethylammonium bromide in aqueous urea solutions (5 mol kg−1 ) at 298.15 K; (DTAB(b)) Dodecyltrimethylammonium bromide in aqueous urea solutions (7 mol kg−1 ) at 298.15 K; (HTAB) Hexadecyltrimethylammonium bromide in water at 328.15 K; (IM), Imipramine hydrochloride in water at 313.15 K; PMTZ, Promethazine hydrochloride in water at 303.15 K; (OPCl), Octylpyridinium chloride in water at 298.15 K; (SDS), Sodium dodecylsulphate in water at 298.15 K; (SDS(a)), Sodium dodecylsulphate in water at 298.15 K and 200 kHz; (SHS), Sodium 1-hexylsulphate in water at 308.15 K; (TTAB), Tetradecyltrimethylammonium bromide in water at 298.15 K. System
CMC (by LPRM)
CMC (by OM)
Method
References
ClPM
0.018 mol kg−1
0.017 mol kg−1
[22]
DDBACl
0.036 mol kg−1
0.038 mol kg−1
DePCl
0.057 mol kg−1
0.073 mol kg−1
DePCl
0.057 mol kg−1
0.057 mol kg−1
DeTAB
0.054 mol kg−1
0.052 mol kg−1
DPCl
0.017 mol kg−1
0.018 mol kg−1
DPCl
0.017 mol kg−1
0.016 mol kg−1
DTAB
15.2 mmol L−1
15.5 mmol L−1
DTAB(a)
0.0266 mol kg−1
0.0276 mol kg−1
DTAB(b)
0.0369 mol kg−1
0.0329 mol kg−1
HTAB
0.0011 mol kg−1
0.0011 mol kg−1
IM
0.054 mol kg−1
0.053 mol kg−1
PMTZ
0.077 mol kg−1
0.055 mol kg−1
OPCl
0.21 mol kg−1
0.30 mol kg−1
OPCl
0.21 mol kg−1
0.23 mol kg−1
SDS
7.81 mmol L−1
7.50 mmol L−1
SDS(a)
8.20 mmol L−1
8.22 mmol L−1
SHS
0.47 mol kg−1
0.55 mol kg−1
TTAB
0.0024 mol kg−1
0.0029 mol kg−1
Osmotic coefficients Specific conductivity Averaged value from conductivity, heat capacity and volumes measurements Enthalpies of dilution Enthalpies of dilution Averaged value from conductivity, heat capacity and volumes measurements Enthalpies of dilution Specific conductivity Specific conductivity Enthalpies of dilution Enthalpies of dilution Osmotic coefficients Osmotic coefficients Averaged value from conductivity, heat capacity and volumes measurements Enthalpies of dilution Diffusion coefficients Specific conductivity Apparent molar volumes Enthalpies of dilution
[23] [24]
[24] [25] [24]
[25] [7] [26] [27] [25] [28] [22] [24]
[24] [29] [7] [30] [25]
J.L. L´ opez Font´ an et al.: A nonparametric regression method to calculate CMC
139
Fig. 2. Specific conductivity of n-dodecyltrimethylammonium bromide in water as a function of molar concentration at 298.15 K and 200 kHz: () experimental points from reference [7]. The solid line represents the absolute value of the second derivative obtain with the LPRM method: (a) local bandwidth, (b) global bandwidth. The arrows indicate the critical micelle concentrations obtained.
Fig. 3. Osmotic coefficients of promethazine hydrochloride in water as a function of molal concentration at 303.15 K: () experimental points from reference [22]. The solid line represents the absolute value of the second derivative obtain with the LPRM method: (a) local bandwidth, (b) global bandwidth. The arrows indicate the critical micelle concentrations obtained.
the value reported in the literature (Tab. 1). Figure 3 corresponds to the system promethazine hydrochloride in water at 303.15 K and the experimental points represent osmotic coefficient values. Contrary to what occurred in Figures 1a and 1b, here the relative maximum, appearing at around 0.03 mol/kg, does seem to have significance. The function osmotic coefficient-concentration is of type sigma and this relative maximum indicates the change of linearity of previous points. Figure 4 corresponds to the specific conductivity of decylpyridinium chloride in water at 298.15 K. In this case, the global bandwidth was not used since the graph obtained would be of no use given the data points are far to be equidistant from a concentration ≈0.2 mol l−1 .
As a final comment, we should insist that the large differences between the existing methods for the determination of the cmc can be due to, on the one hand, the use of methods that are too rigid, and on the other, to the fact that experimental errors or random errors were not considered. We present a method, that is far from being rigid, is extremely flexible, exactly what is needed when the cmc is not clearly defined as a transition point, and because it is a statistical method, the treatment of random errors is incorporated. The methodology proposed contributes an adequate tool for exploratory data analysis that serves as a preliminary study of the data. This approach can provide much information, for example in cases when a single point does not appear for the cmc but rather a whole area
140
The European Physical Journal E
Fig. 4. Specific conductivity of decylpyridinium chloride in water as a function of molal concentration at 298.15 K: () experimental points from reference [24]. The solid line represents the absolute value of the second derivative obtain with the LPRM method and global bandwidth. The arrow indicates the critical micelle concentration obtained.
identifiable as an interval between two extremes of the function we are working with. This does not exclude the later application of another methodology that uses some parametric model, hence more rigid, that will be seen as more or less adequate for a previous exploratory overview. This research was funded in part by Spanish Ministry of Science and Technology (Projects BFM2002-00265 and MAT2002-00608, European FEDER support included) and by Xunta de Galicia (Projects PGIDIT03PXIC10505PN and PGIDIT03PXIC20615PN). The authors thank Ricardo Cao Abad, Mario Francisco Fern´ andez and Duarte Santamarina Rios for their helpful suggestions.
References 1. R.J. Hunter, Foundations of Colloidal Science (Oxford University Press, Oxford 2001) 2. M.L. Corrin, J. Colloid Sci. 3, 333 (1948) 3. R.J. Williams, J.N. Phillips, K.J. Mysels, Trans. Faraday Soc. 51, 728 (1955) 4. J.N. Phillips, Trans. Faraday Soc. 51, 561 (1955) 5. D.G. Hall, B.A. Pethica, Nonionic Surfactants (Schick, M.J. Ed, Dekker, New York, 1967)
6. J.N. Israelachvili, D.J. Mitchell, B.W. Ninham, J. Chem. Soc., Faraday Trans. II 72, 1525 (1967) 7. M. P´erez-Rodr´ıguez, G. Prieto, C. Rega, L.M. Varela, F. Sarmiento, V. Mosquera, Langmuir 14, 4422 (1998) 8. G.F. Simmons, J.S. Robertson, Differential Equations with Applications and Historical Notes (McGraw-Hill, New York, 1991) 9. P.R. Bevington, Data reduction and error analysis for the physical sciences (McGraw-Hill, New York, 1969) 10. J.M. Ruso, P. Taboada, D. Attwood, V. Mosquera, F. Sarmiento, Phys. Chem. Chem. Phys. 2, 1261 (2000) 11. J.M. Ruso, F. Sarmiento, Colloid Polym. Sci. 278, 800 (2000) 12. L. Besada, P. Mart´ınez-Landeira, L. Seoane, G. Prieto, F. Sarmiento, J.M. Ruso, Mol. Phys. 24, 2003 (2001) 13. N.R. Draper, H. Smith, Applied regression analysis (WileyInterscience, New York, 1998) 14. H.-G. M¨ uller, Nonparametric regression analysis of longitudinal data (Springer-Verlag, Berlin, 1988) 15. M. Francisco Fern´ andez, La regresi´ on polin´ omica local en dise˜ no fijo con observaciones dependientes, Doctoral Thesis (Universidad de Santiago de Compostela, 2001) 16. J. Fan, I. Gijbels, Local polynomial modelling and its applications (Chapman and Hall, New York, 1996) 17. E.A. Nadaraya, Theory Probab. Appl. 15, 134 (1964) 18. G.S. Watson, Sankhya Ser. A 26, 359 (1964) 19. R.R. Macauley, The smoothing of time series (National Bureau of Economic Research, New York, 1931) 20. J. Fan, I. Gijbels, T.-C. Hu, L.-S. Huang, Statistica Sinica 6, 113 (1996) 21. B.W. Silverman, Density estimation for statistics and data analysis (Chapman and Hall, New York, 1986) 22. J.M. Ruso, D. Attwood, P. Taboada, M.J. Suarez, F. Sarmiento, V. Mosquera, J. Chem. Eng. Data 44, 941 (1999) 23. A. Gonz´ alez-P´erez, J.L. del Castillo, J. Czapkiewicz, J.R. Rodr´ıguez, J. Chem. Eng. Data 46, 709 (2001) 24. S. Causi, R. de Lisi, S. Milioto, J. Sol. Chem. 20, 1031 (1991) 25. E.M. Woolley, M.T. Bashford, D.G. Leaist, J. Sol. Chem. 31, 607 (2002) 26. S. Causi, R. de Lisi, S. Milioto, N. Tirone, J. Phys. Chem. 95, 5664 (1991) 27. E. Caponetti, S. Causi, R. de Lisi, M.A. Floriano, S. Milioto, R. Triolo, J. Phys. Chem. 96, 4950 (1992) 28. P. Taboada, D. Attwood, J.M. Ruso, M.J. Suarez, F. Sarmiento, V. Mosquera, J. Chem. Eng. Data 44, 820 (1999) 29. A. Siderius, S.K. Kehl, D.G. Leaist, J. Sol. Chem. 31, 607 (2002) 30. M.J. Su´ arez, J.L. L´ opez-Font´ an, V. Mosquera, F. Sarmiento, J. Chem. Eng. Data 44, 1192 (1999)