Computational Cost of the Fekete Problem II: On Smale’s 7th Problem E. Bendito, A. Carmona, A.M. Encinas, J.M. Gesto Departament de Matem`atica Aplicada III Universitat Polit`ecnica de Catalunya, Barcelona, Spain. A. G´omez, C. Mouri˜ no, M.T. S´anchez Fundaci´on Centro Tecnol´ogico de Supercomputaci´on de Galicia CESGA, Santiago de Compostela, Spain. e-mail:
[email protected]
Abstract We present here the plausibility of a probabilistic positive solution to Smale’s 7th problem. In particular, we show numerical and statistical evidence that a local minimum for the logarithmic potential energy in the 2-sphere satisfying Smale’s conditions can be identified with a computational cost of approximately O(N 15 ).
1
Introduction
Towards the end of the 20th century, V.I. Arnold, on behalf of the International Mathematical Union, and inspired in part by Hilbert’s famous list (1900), invited a number of mathematicians to describe some great problems for the next century. In 1997, on the occasion of Arnold’s 60th birthday, S. Smale presented at the Fields Institute the list Mathematical problems for the next century, see [21], as his response to this invitation. According to Smale, the 18 problems on his list were chosen with the following criteria: simplicity of statement, personal acquaintance, and the belief that the question, its solution, partial results or even attempts at its solution are likely to have great importance for mathematics and its development in the 21th century. Recall that the N -tuples ωN = {x1 , . . . , xN }, xi ∈ IRd that minimize under general constraints a potential energy functional IN involving the relative Euclidean distances between N points are called the N th order Fekete points. We call the Fekete problem that of deter-
1
mining these N -tuples. This framework includes potential energy functionals of the form IN (x) =
X
K(xi , xj ),
1≤i 0. If ωN denotes the set of Fekete points associated to the logarithmic potential energy IN and to the 2-sphere, S, the statement of Smale’s 7th problem is: Can one find x ∈ S N such that IN (x) − IN (ωN ) ≤ c log N, c a universal constant.
(1)
More precisely, Smale asks for a real number algorithm which on input N produces as output distinct x1 , . . . , xN on the 2-sphere satisfying Condition (1) with a halting time polynomial in N . This problem emerged from Complexity Theory. In the “Bezout’s series” [16, 17, 18, 19, 20], Shub and Smale analyze the computational complexity of the problem of finding the roots of a polynomial system. In this framework, they study the case in which f (η) = 0 is solved by following roots of ft , 0 ≤ t ≤ 1, where ft = tf + (1 − t)g and g is a certain universal polynomial system whose zeros are assumed to be known. A partition into m parts by t0 , tj = tj−1 + ∆t, ∆t = m1 is made, and from each tj Newton’s projective method is used to obtain approximated zeros of ftj+1 in the homotopy algorithm. These authors show that the speed of this algorithm depends on the conditioning of the homotopy ft . With regard to the starting polynomials g, Shub and Smale consider in [18] the case of a polynomial equation with a single variable. Under these conditions, they prove that the polynomials whose roots are obtained via stereographic projection from N point sets x ∈ S N , N = deg f, which satisfy Condition (1) are well-conditioned and hence they are good choices to be used as the starting polynomials g.
1.1
Brief state of the art
It is widely assumed that the number of different local minima of the logarithmic energy on the 2-sphere and for other variants of the Fekete problem grows exponentially with N. Many authors have claimed this fact and have suggested in an explicit or implicit way the impossibility of finding the global minimum in polynomial time by inspection of local minima, see for instance [1, 6, 8, 9, 13, 18], among others. This is one of the main reasons that this problem has deserved a place on Smale’s list of 21th century problems. A variety of algorithms have been proposed for the explicit construction of configurations close to optimal on the 2-sphere for large N . A popular and simple technique is that of constructing triangular lattices by icosahedral dissection. These configurations do not have 2
a very good performance with respect to energy criteria, due to the fact that they are not asymptotically uniformly distributed on the sphere; the projection from the icosahedron to the sphere increases the areas of some triangles more than others. Thus, some further optimization process is usually necessary to obtain suitable configurations. In any case, the symmetries of these lattices considerably reduce the minimization computational effort. These strategies do not seem to have provided significative advances regarding the specific requirements of Smale’s 7th problem. In [21], after establishing the 7th problem, Smale cites [14, 15], where the so-called generalized spiral points, developed by Rakhmanov, Saff and Zhou, were presented. The spiral points are a simple construction motivated by hexagonal tilings and numerical experimentation. These points can be constructed explicitly with a computational cost O(N ), but they do not correspond to the minimization of an energy functional. Rakhmanov, Saff and Zhou provide in [14] numerical evidence that the spiral points support Condition (1) for N ≤ 12000 with c = 114. This result was based on theoretical estimations of IN (ωN ). The numerical results obtained by Zhou in his Ph.D., see [23], suggest that the spiral points for N ≤ 12000 support Condition (1) with c = 5/2. However, as is shown by Rakhmanov, Saff and Zhou in [14], Fig. 2, the difference IN (x) − IN (ωN ) for the spiral points tends to grow linearly with N , and hence these points cannot support Smale’s condition with N growing indefinitely for any value of c. As far as we know, there have been no other attempts to propose either a complete or a partial solution to Smale’s 7th problem.
1.2
Background
In [4] we analyzed the computational cost of the Forces Method in the case of the logarithmic energy on the 2-sphere S. After generating the initial position x0 of N particles on S and √ fixing the magnitude of the coefficient a∗ = 0.545 N , the Forces Method can be applied recursively according to the scheme xˆk+1 = xk + a∗
min {|xki − xkj |} wk , xk+1 = i
1≤i 0. We call this the ε-convergence. We introduce the concept of approximate local minimum, which is analogous to the concept of approximate zero of a system of polynomial equations proposed by Shub and 3
Smale, see for instance [22]. A configuration x is an approximate local minimum if it is close to a local minimum in such a way that Newton’s algorithm converges quadratically to this local minimum from x. We call the non-return point the last maximum in a convergence curve. We say that the Forces Method has nr-converged when the non-return point has been attained. We use the term approximate local minimum both in the above sense and to denote the configurations x that the Forces Method provides after the non-return point is attained.
1.3
A numerical-statistical approach
We say that x ∈ S N is a c-configuration if it satisfies Condition (1) for a given c. If a c-configuration x is also a local minimum or an approximate local minimum of IN on S, we say that x is a c-minimum. In this work we deal with the following probabilistic version of Smale’s 7th problem: Can one find a c-configuration with an arbitrarily high probability of success in polynomial average calculation time? The notions of average cost and probability are fundamental to the work developed by Shub and Smale in the series [16, 17, 18, 19, 20] and, in general, to the framework established by these authors for the complexity analysis of general algorithms. Also in [2] Beltr´an and Pardo use these concepts to provide a probabilistic positive solution to Smale’s 17th problem, whose statement includes the premise “on the average”. This fact is entirely natural if we take into account the high non-linearity of the considered problems, and that all these authors are analyzing the case in which some numerical strategy is used to tackle them. A possible strategy to obtain a c-configuration for a given N consists of generating different random starting positions on S and applying the Forces Method to each one of them to obtain different approximate local minima of IN . If we increase the number of random starting configurations, nsp , we also increase the probability of finding a c-minimum. Following this strategy, a descent algorithm satisfies the requirements of Smale’s 7th problem on average and with an arbitrarily high probability of success if the following facts hold: first, the average calculation time to obtain an approximate local minimum from a random starting configuration is polynomial in N ; second, the probability of finding a c-minimum from a random starting configuration decreases polynomially with N . In fact, the second condition depends more on the intrinsic character of the Fekete problem itself than on the properties of the descent algorithm. We have shown in [4] that the Forces Method allows us to obtain an approximate local minimum with an average computational cost bounded by N 3 independently of the procedure used to generate the random starting positions. Thus, a probabilistic positive solution to Smale’s 7th problem can be proposed if, when N grows, P [I − IN (ωN ) ≤ c log N ] ' BN −p
4
(2)
for some B, p > 0 that can depend on c but not on N , where the random variable I is the energy of the local minimum identified in a run of the Forces Method from a random starting configuration. The exact value of I remains unknown after the run, because in practice the Forces Method stops when the ε-convergence for a fixed ε is achieved, and the energy of the final configuration has an error depending essentially on ε. Taking this into account, we use the same symbol I both for the exact energy value and for the value associated to any ε small enough. In this context, the energy of a local minimum can be seen as a random variable whose support is an interval with the unknown IN (ωN ) as lower limit. Taking into account that 4 1 1 (3) IN (ωN ) = − log N 2 − N log N + O(N ), 4 e 4 4 1 1 N 2 + N log N. see for instance [21], we define the random variable U = I + log 4 e 4 For simplicity of notation, we do not use indices in the definitions of U, I. For a generic random variable Z with probability density function fZ and probability distribution function FZ , we call MZk = E[Z k ], k ∈ IN, the kth order moment of Z, µZ = MZ1 the q mean of k k 0 Z, (MZ ) = E[(Z − µZ ) ] the kth order centered moment of Z and σZ = (MZ2 )0 the standard deviation of Z. Moreover, we call zi , i = 1, . . . , nsp , the sample data obtained in nsp 1 X z k the sample moments of {zi }, z = m1z the sample mean an experiment for Z, mkz = nsp i=1 i nsp q 1 X of {zi }, (mkz )0 = (zi − z)k the sample centered moments of {zi } and SZ = (m2z )0 the nsp i=1 sample standard deviation of {zi }. We consider that the enormous growth with N of the amount of local minima of the Fekete problem essentially corresponds to the fact that the probability distribution of their energy values tends rapidly to be continuous, and that this does not necessarily imply that the probability of finding a c-minimum (or even the global minimum) decreases exponentially with N . The probabilistic version of Smale’s 7th problem asks for the evolution of the probability that the difference I − IN (ωN ) belongs to the interval [0, c log N ] as a function of N . If all the minima had the same energy value, then this probability would be the unity independently of the amount of minima. Although this is not the case in the Fekete problem, there still exist many ways in which the energy of the different minima can distribute such that (2) holds, even when the amount of minima grows exponentially. Thus, the key question is not how many minima there are but how their energy values are distributed. We have found no references to the evolution with N of the probability distribution of the energy values of the Fekete problem minima. The Forces Method has been used to perform a massive computation program with the objective of producing significative statistic information about this in order to extract conclusions about Smale’s 7th problem in a numerical-statistic framework. We have obtained about 6 · 107 sample data, which have been used to study the evolution with N of the moments and the shape of the probability distribution functions of the energy values as well as their histograms. Our approach consists of designing analytical models according to different hypotheses about the asymptotic decreasing of P [I − IN (ωN ) ≤ c log N ], and using them to adjust the sample information. 5
The estimation of IN (ωN ) is crucial for these purposes. There are few theoretical results on this topic. Rakhmanov, Saff and Zhou showed in [14, 23] that if O(N ) = BN + O(log N ) then −0.1127688 ≤ B ≤ −0.0234972. Unfortunately, as we show below, the standard deviation of I is much lower than the range given by these bounds, so they are not sharp enough in this context. In [23] Zhou proposed the expression O(N ) = −0.026422N +0.13822, obtained from numerical results with N up to 200. We also show in the following sections that N = 200 is not big enough to consider that the asymptotic behavior of IN (ωN ) has been reached. On the other hand, if the answer to Smale’s 7th problem were negative, it would be virtually impossible to numerically estimate IN (ωN ) for moderately large N , but even though the answer were positive with say p = 20 it would still be too expensive to obtain an accurate estimation of IN (ωN ) for N in the hundreds by numerical procedures with the tools available today. Thus, any procedure for the approximation of the probability distribution of the energy of the different minima should include the indirect approximation of IN (ωN ). The estimation of the support of a random variable by means of statistical methods is a subtle question. In general, the application of the maximum likelihood method implies that the best and worst sample energy values can be used as estimators of the sample support of I, but this is not satisfactory in this case. We base the adjust of the models on the sample moments and on the shape of the probability distribution. For this reason, it is of paramount importance to have an estimation of the error in the sample moments. In particular, the large size of the obtained sample helps us at this point. An approximation strategy based exclusively on the moments is too ambiguous regarding Smale’s 7th problem, because infinitely many theoretical probability distributions can be defined in such a way that they produce fine adjusts of the sample moments while having different supports. We explore the behavior of the support of different models designed not only with the objective of adjusting the moments but also with the aim of allowing us to impose different specific hypotheses about the asymptotic decreasing form of P [I − IN (ωN ) ≤ c log N ]. There are still infinitely many models that allow us to adjust the sample information under the same hypothesis about this asymptotic form. However, they define a relatively short range of variation for IN (ωN ), which in particular indicates a strong relation between the asymptotic cost of the Fekete problem and the lower support of the random variable I. The rest of this paper is organized as follows: we start by describing the obtained sample and the statistical postprocessing of the data, including the analysis of the sample moments, the sample supports and the sample probability distributions. Then we introduce the procedure used to design probability density functions according to different hypotheses about the computational cost of the Fekete point problem. After that, we use different models to adjust the sample information. Finally, we summarize the main obtained results and propose the plausibility of a probabilistic positive solution to Smale’s 7th problem.
6
2
Analysis of the sample information
In this section we present and analyze the sample information obtained for the study of Smale’s 7th problem. We start by describing the sample; then, we study the sample moments, the sample support and the shape of the sample probability distribution and the histogram of U .
2.1
The sample: Clonetroop and FinisTerrae challenge
The amount of data required to obtain significative information about the random variable U in Smale’s 7th problem context is much higher than that necessary to develop the analysis of the cost of a local minimum carried out in [4], and it cannot be obtained in reasonable time with conventional resources. We have used two singular computation infrastructures: the cluster Clonetroop of the research group Lacan, from the Department of Applied Mathematics III of UPC, Barcelona, Spain, and the supercomputer FinisTerrae of CESGA, located in Santiago de Compostela, Spain. In total, we consumed about 100000 hours of computation in Clonetroop and approximately 350000 hours of computation in FinisTerrae. For the computations in Clonetroop, and according to its availability, we used from 20 to 35 CPUs working simultaneously, which represents more than four months of real computation time during the last year. This has provided almost 9 · 106 data that were essential to study the computational cost of a local minimum and the convergence of the Forces Method, and to characterize the main features of variable U and formulate our numerical-statistic approach to Smale’s 7th problem. The 350000 hours of computation in FinisTerrae were consumed in about two weeks of real time at the end of February, 2008. The number of CPUs working simultaneously went from 256 to a maximum of around 1600. This time was spent in carrying out studies on the cost and the robustness of the Forces Method for large N (190000 hours), and obtaining abundant sample information about the random variable U (160000 hours). Table 1 shows all the obtained sample data. For each N , we indicate the machine in which the calculations were performed, how many runs of the Forces Method were carried out, and the value of ε for which the runs were stopped according to the ε-convergence criterion. The criteria under which the different tests were run depended on the different objectives we wished to attain in each case. The 120000 runs for N = 87 and the 60000 runs for N = 200 are marked with an asterisk, because they were performed for the first experiments on the coefficient a and correspond to different values of this parameter. The rest of data correspond to computations with a = a∗ . The run for N = 106 was stopped at the step nstep = 3000 and not according to the ε-convergence criterion. Moreover, except 60000 of the runs for N = 87 and the run corresponding to N = 106 , all the runs started from uniform starting positions as defined in [4]. The runs in the column ε = 10−8 performed in Clonetroop were used to characterize the computational cost of finding a local minimum. 7
Clonetroop
N
ε=5∙10-7
ε=2∙10-7
ε=10-8
FinisTerrae ε=10-10
Subtotal
87
120000*
200
60000*
300
2723100
400
1125864
500
100100 100100 974410
600
900015
2823200 5005
100000
ε=5∙10-7
ε=2∙10-7
ε=10-9
ε=5∙10-10
ε=10-10
Subtotal
Total 120000 60000
4800000
1225964
3858995
1079415
28155617
4800000
7623200
3858995
5084959
1705627
29861244
30940659
100100
1000115
9836186
9836186
10836301
700
1000115
1000115
407298
407298
1407413
800
1000115
1100115
380387
380387
1480502
900
100100
100100
442822
442822
542922
1000
100100
5010
105110
1138201
1138201
1243311
1500
7005
7005
7005
2000
5474
5474
5474
2500
5333
5333
5333
3000
5474
5474
5474
4000
1479
1479
1479
5000
1183
1183
100000
1183
10000
1000
20000
100
50000
10
1000000
Total Clonetroop
Total FinisTerrae
8640082
Total data
1000
1000
100
100
10
10
1*
1
50726244 59366326
Table 1: The data. The 100000 runs in the column ε = 10−10 for N = 500, 800 had the objective of evaluating the effect of the parameter ε used to define the ε-convergence in the moments of U . The runs for N = 10000, 20000, 50000 were performed in FinisTerrae to check the validity of our estimations of the cost of a local minimum for large N . The rest of the data were obtained with the main objective of approximating the moments, supports and probability distributions of U . In addition to the data summarized in Table 1, other minor tests were carried out in conventional PCs.
2.2
Sample moments
Table 2 contains theq values of the sample mean u and the kth roots of the kth order centered k sample moments, ± |(mku )0 |, k = 2, ..., 10, where the sign is given by (mku )0 , of the variable U for all the considered N . U − µU The standardized variable V = plays an important role in the sequel. Table 3 σU q k |(mku )0 | , k = 3, ..., 10, which will be used to adjust the probability shows the sample values ± SU distribution of V .
8
N
mean
87 200 300 400 500 600 700 800 900 1000 1500 2000 2500 3000 4000 5000 10000 20000 50000
-2.150 -5.117 -7.747 -10.395 -13.051 -15.712 -18.379 -21.050 -23.725 -26.404 -39.833 -53.298 -66.784 -80.280 -107.278 -134.303 -269.458 -539.678 -1351.086
2
3
0.004 0.014 0.018 0.023 0.029 0.034 0.038 0.043 0.048 0.053 0.076 0.101 0.128 0.155 0.212 0.268 0.501 0.853 1.896
0.006 0.014 0.017 0.020 0.023 0.026 0.029 0.033 0.036 0.039 0.052 0.073 0.088 0.105 0.140 0.172 0.212 0.538 -1.420
centered moments (signed roots) 4 5 6 7 8 0.008 0.020 0.025 0.032 0.039 0.045 0.052 0.058 0.064 0.070 0.100 0.134 0.170 0.202 0.277 0.352 0.651 1.123 2.264
0.011 0.023 0.028 0.034 0.040 0.046 0.052 0.058 0.064 0.070 0.096 0.130 0.160 0.186 0.245 0.314 0.455 0.959 -2.100
0.013 0.027 0.033 0.040 0.049 0.056 0.064 0.071 0.079 0.087 0.122 0.161 0.204 0.240 0.327 0.416 0.760 1.337 2.485
0.016 0.030 0.036 0.044 0.052 0.060 0.068 0.076 0.084 0.092 0.125 0.166 0.207 0.239 0.311 0.404 0.632 1.235 -2.422
0.018 0.033 0.040 0.049 0.058 0.067 0.076 0.084 0.093 0.102 0.141 0.185 0.233 0.272 0.368 0.469 0.844 1.505 2.632
9
10
0.021 0.036 0.043 0.053 0.062 0.072 0.081 0.090 0.099 0.109 0.147 0.192 0.240 0.277 0.357 0.469 0.759 1.438 -2.610
0.023 0.039 0.047 0.056 0.067 0.077 0.087 0.097 0.106 0.117 0.158 0.205 0.258 0.300 0.402 0.514 0.911 1.636 2.737
Table 2: Sample mean and signed kth roots of the kth order sample centered moments of U for k = 2, . . . , 10 computed from all the available data. These values have been computed from the total amount of available data for each N . The moments for N ≥ 1500 have lower reliability, because they come from a small number of data. The moments for N = 50000 have been obtained from only 10 data, which explains that even the sign of the odd moments is wrong. The moments for N = 87, 200 come from runs in which each starting position is used more than once with different values for the coefficient a, so they have less significance than the rest. We can essentially identify two sources of error in the sample moments displayed in Tables 2 and 3. There is a first error due to the finiteness of the sample, but it is also necessary to take into account that all the runs were stopped when the maximum disequilibrium degree reached a certain prescribed threshold value ε; that is, when the ε-convergence was attained. For each N , the value of ε was chosen by considering the information provided in [4], Fig. 12, in such a way that the nr-convergence average cost curve was just below the corresponding ε-convergence average cost curve. This can be checked by observing the values of ε that were used for each N in Table 1. Figure 12 in [4] shows only average information and, in particular, the procedure used to fix ε in each case does not prevent the possibility that some runs were stopped before attaining the final linear tendency. Thus, before studying the evolution with N of the centered and standardized moments of U , it is necessary to estimate the magnitude of both mentioned errors. 2.2.1
Error due to the effect ε-convergence versus nr-convergence
We describe here the results of a numerical experiment in which nsp = 100000 runs of the Forces Method for each N = 500, 800 were performed up to ε-convergence with ε = 10−10 . In each run, we saved the energy value and the step number for which the convergence curve 9
N 87 200 300 400 500 600 700 800 900 1000 1500 2000 2500 3000 4000 5000 10000 20000 50000
standardized moments (signed roots) 5 6 7 8 9
3
4
1.369 0.979 0.928 0.883 0.799 0.766 0.773 0.772 0.754 0.748 0.688 0.725 0.689 0.676 0.657 0.641 0.423 0.631 -0.749
1.962 1.438 1.409 1.380 1.354 1.353 1.350 1.346 1.342 1.340 1.322 1.331 1.325 1.304 1.302 1.311 1.297 1.317 1.194
2.559 1.630 1.555 1.493 1.398 1.368 1.370 1.365 1.345 1.337 1.264 1.293 1.250 1.203 1.154 1.171 0.908 1.125 -1.108
3.173 1.900 1.820 1.756 1.688 1.680 1.675 1.665 1.652 1.648 1.603 1.606 1.594 1.551 1.538 1.551 1.516 1.568 1.311
3.778 2.122 2.009 1.922 1.814 1.790 1.784 1.774 1.750 1.742 1.651 1.653 1.612 1.542 1.465 1.507 1.260 1.448 -1.278
4.360 2.345 2.213 2.116 2.010 1.997 1.985 1.970 1.946 1.941 1.858 1.839 1.821 1.760 1.732 1.748 1.684 1.765 1.388
4.911 2.550 2.397 2.282 2.156 2.135 2.119 2.103 2.068 2.064 1.941 1.911 1.871 1.787 1.682 1.748 1.514 1.686 -1.377
10 5.423 2.743 2.579 2.448 2.316 2.300 2.276 2.259 2.217 2.218 2.083 2.036 2.013 1.939 1.892 1.915 1.816 1.919 1.444
Table 3: Signed kth roots of the kth order sample standardized moments of U for k = 3, . . . , 10 computed from all the available data. reached the thresholds ε1 = 5 · 10−2 , ε2 = 2 · 10−2 , ε3 = 1 · 10−2 , . . . , ε27 = 1 · 10−10 . Moreover, we used an auxiliary vector v for each convergence curve. At the beginning vi = 0 for each i = 1, . . . , 27. If the curve reaches εi and vi = 0, then we take vi = 1. If the curve reaches εi and vi = 1, then we take vi = 2. After the experiment, we count for each εi the number of convergence curves for which vi = 2. The quotient between this number and the total amount of convergence curves is an estimation of the fraction of runs that are not in linear tendency when the εi -convergence is reached. Figure 1 shows the evolution with ε of this estimation of the fraction of runs that are not in linear tendency when the ε-convergence is reached for N = 500 (left) and for N = 800 (right). The vertical lines mark the values of ε for which N⊥ = 500 and N⊥ = 800, respectively (the meaning of N⊥ was introduced in [4]). As can be seen, for both values of N , the fraction not in linear tendency given by the vertical line is approximately the same (about 0.1). Figure 2 (left) shows the evolution with ε of the signed roots of the sample centered moments (top) up to order 10, and the signed roots of the standardized centered moments (bottom) up to order 10 of U for N = 500. For each ε, the sample moments are computed from 100000 different data. The same figure (right) displays the evolution with ε of the relative error (with sign) of the signed roots of the sample centered moments, and the relative error (in absolute value) of the signed roots of the sample standardized moments. In both cases, the relative errors have been computed by taking as a reference the values corresponding to ε = 10−10 . The sign of the relative error of the signed roots of the sample centered moments for a given ε is positive when the moments corresponding to ε are greater than the reference ones. Figure 3 shows the same information for the case N = 800. The vertical lines have the same meaning as in Figure 1.
10
0
relative error
10
−1
10
−2
1
10
−3
10 0
fraction not in linear tendency
fraction not in linear tendency
0
10
−1
10
−2
10
−3
10
−4
10
N =500
−5
10
−8
10
−6
10
−4
10
−2
10
0
10
−4
10
10
−1
−5
10
10 −8 10
−6
10
−2
10
−3
10
−4
10
N=800
−5
10
−8
10
−6
−4
10
10
−2
10
0
10
ε
ε
Figure 1: Fraction of runs that are not in linear tendency when the ε-convergence is reached. Let us consider for a given N the value εN for which the corresponding εN -convergence average cost curve and the nr-convergence average cost curve intersect at N⊥ = N . Figures 1, 2 and 3 show an interesting fact: while for a given N there is a probability of around 10% that a convergence curve has not attained the linear tendency when the εN -convergence is reached, the roots of the sample and standardized moments have an error smaller than 1% for ε ≤ εN . On the other hand, the roots of the sample centered moments have an error of around 2 − 3% for ε = εN . Nevertheless, the first ten moments have approximately the same relative error, which explains the small error in the standardized moments. These results suggest that the concepts of non-return point and nr-convergence go beyond the characterization of approximate local minima, and also characterize the random variable V . Figure 4 corresponds to the case N = 500, and it confirms this fact. In this figure, we display the sample probability distribution of the standardized energy descent of the configurations that are not in linear tendency when the convergence curve reaches the εconvergence for different ε. The energy descent is computed as the difference between the standardized energy when the maximum disequilibrium degree reaches the threshold ε and the standardized energy value for wmax = 10−10 . The probability distribution of this descent stabilizes precisely when ε ' 10−6 ' ε500 . This study justifies the values of ε that have been chosen to stop the descent process in the other tests, which we shown in Table 1. These tests were designed in order to minimize the computational effort while keeping the significance of the sample information with regard to variable U . 2.2.2
Error due to the finiteness of the sample
The most important source of error in the sample moments is the finiteness of the data. The size of the obtained sample allows us to estimate empirically the magnitude of these errors in each moment. In particular, we have obtained about 3 · 107 data for N = 500 and around 107 data for N = 600.
11
−4
10
−2
10
−2
−6
−4
10
10 0 −8 10 10
−2
10
10
relative e
−2
10 −8 10
−6
−4
10
−2
10
0
10
10
−2
10
2
10−3 10 1
10−4 10 0
1
2
0
10
−1
10
relative error
10
relative error
10
centered moments
10−5 10 −8 10
0
10
−4
10
10
−2
10
−2
10
1
−3
10 −4
10
−4
10
−2
10
10
−6
10
−4
10
ε
−2
10
0
10
10
2 1.5 1 0.5 −6
10
−4
10
ε
−6
10
−4
10
−2
−2
10
0
10 −5
10 −8 10
0
10
−6
10
−2
10
−4
10
−6
10
−8
10
0
10
ε
2.5
0 −8 10
−8
relative error (abs. value)
−8
standardized moments
−6
10
−1
10
−6
10
−4
10
ε
−2
10
0
10
Figure 2: Evolution with ε of the signed roots of the sample centered and standardized moments and their relative errors for N = 500. Figures 5 and 6 show the evolution with nsp of the signed roots of the sample centered moments up to order 30 (the moments up to order 10 correspond to the thick lines). As can be observed, the reliability of the sample moments decreases when the order of the moments grows. In particular, the high order moments suffer strong variations even when the number of runs go from 106 to 107 . This is due to the appearance of bad minima with low probabilities. The effect of this bad minima diminishes when the order of the moment decreases, but it implies an oscillation in all the moments. Table 4 shows the absolute value of these oscillations as a fraction of the final value of the signed roots of the sample moments. More specifically, we have considered the final sections of the curves shown in Figures 5 and 6 defined by nsp ≥ 105 , nsp ≥ 5 · 106 and nsp ≥ 106 . Table 4 shows the maximum difference between the values taken by the signed roots of the moments along each one of these sections divided by the value corresponding to the last point of each section. These relative oscillations have been computed both for centered and standardized moments up to order 100. Table 4 clearly indicates that even 106 data are not enough to provide reliable estimations of the high order moments; for instance, an order greater than 10. On the other hand, after the analysis of this information we work under the assumption that the sample values for N = 300, 400, . . . , 1000 displayed in Tables 2, 3 have a typical error of about 1% when they refer to moments of order smaller than 6 − 7, while the values corresponding to moments of order from 6 − 7 to 10 have a typical error of about 1 − 2%. Moreover, the main cause of the error in the higher order moments is the low probability of the worst minima. As nsp grows, more of these minima appear, so asymptotically the higher order sample moments should tend to increase. 12
−4
10
−6
−4
10
10 0 −8 10 10
−2
10
10
relative
10 −8 10
−6
−4
10
−2
10
0
10
10
−2
10
2
10−3 10 1
10−4 10 0
1
2
0
10
−1
10
relative error
10
relative error
10
centered moments
10−5 10 −8 10
0
10
−4
10
1
10
−2
10
−2
10
1
−3
10 −4
10
−4
−2
10
10
−8
−6
10
10
−4
−2
10
10
0
−8
10
10
−6
−4
10
relative error (abs. value)
2 1.5 1 0.5 −6
10
−4
−2
10
10
0
10
ε
2.5
0 −8 10
−2
10
ε standardized moments
−6
10
−1
10 −5
10 −8 10
0
10
−6
10
−4
10
1
−2
10
−4
10
0
−8
10
10
−6
−4
10
−2
10
ε
10
0
10
ε
Figure 3: Evolution with ε of the signed roots of the sample centered and standardized moments and their relative errors for N = 800. 2.2.3
Evolution of the sample moments with N 1
Figure 7 (left) shows the evolution with N of the obtained values for the sample mean u q ε = 10−4 k k ..., 1000, as they appear in Table 2. Note the and for ± |(mu )0 |, k = 2, ..., 10, N = 300, 400,0.8 ε = 10−5 different scaling in the vertical axis for positive and negative values. The figure also includes the straight lines obtained by linear regression 0.6 from these data. The table on the right shows 2 the linear regression parameters, Ak , Bk , R , for each straight line. Index 1 is for the mean, and the remaining values are for centered moments. 0.4 probability
1
It seems clear that the expressions µU ' A1 N + B1 , (MUk )0 ' (A1 k N + Bk )k , k = 2, ..., 10, 0.2
ε = 10−4
1
0 0
probability
0.8 0.6
0.5
ε = 10−5 1 1.5
2 2.5 3 3.5 standardized energy descent
ε = 5 · 10−5
0.4 ε = 10−4
0.2 0 0
ε = 10−5
0.5
1
1.5
2 2.5 3 3.5 standardized energy descent
4
4.5
5
Figure 4: Sample probability distribution of the standardized energy descent of the configurations that are not in linear tendency when the ε-convergence is reached. 13
4
4.5
0.15
centered moments
0.125 0.1 0.075 0.05 0.025 0
1
10
2
10
3
10
4
10
5
10
6
10
7
10
n sp Figure 5: Evolution of the signed roots of the sample centered moments with the sample size for N = 500. can be used as good approximations for the moments, Thus we can conclude in particular Ak N + Bk k that MVk ' , k = 2, ..., 10. Figure 8 (left) shows the same information as A2 N + B2 Figure 7 (left), but we have included the sample values obtained for all the studied N up to N = 10000. The same figure (right) displays the evolution with N of the sample standardized moments as well as the corresponding interpolation curves obtained from the regression straight lines (note the logarithmic scale in the horizontal axis). The big points have been used for the regression. The positions of the small points are still inaccurate, but they confirm the tendency given by the rest of data. The values corresponding to N = 87 do not agree with the interpolation curves, because the corresponding probability distribution is still strongly discrete, see [4]. The sample information related to higher order moments, which is not shown here, clearly indicates that the behavior suggested by Figures 7 and 8 is general. Table 5 shows the relative error of the values given by the interpolation curves in Fig. 8 for N = 300, 400, . . . , 1000 with respect to the corresponding sample values. Figures 7 and 8 reveal some of the main features of the random variables U and V . In particular, they suggest the existence of an asymptotic standardized distribution V whose Ak k moments are given by and such that distribution V tends to V when N → ∞. This A2 implies the existence of some α < 0 such that for N → +∞ 4 1 1 N 2 − N log N + µU + α σU . IN (ωN ) ' − log 4 e 4
Note that this expression agrees with Equation (3) if we take O(N ) = µU +ασU = (−0.026656+ 0.000049108α)N + (0.26882 + 0.0037473α). Potential Theory guarantees the validity of Equation (3) when N → +∞, and Figures 7 14
centered moments
0.15 0.125 0.1 0.075 0.05 0.025 0
1
10
2
10
3
10
4
10
n sp
5
10
6
10
7
10
Figure 6: Evolution of the signed roots of the sample centered moments with the sample size for N = 600. and 8 indicate that the formula 1 1 4 N 2 − N log N − 0.026656N + 0.26882 µI ' − log 4 e 4
(4)
is valid from small N . Nevertheless, the last formula must be handled with care. The coefficient R2 of the linear regression of the sample mean is around 0.99999, but the slope of the straight line that interpolates the standard sample deviation of U is about 5 · 10−5 . For instance, the sample mean obtained for N = 600 with around 107 data is −15.712, whereas Equation (4) gives −15.725. The difference of both values represents approximately 40% of the standard deviation of U for N = 600. In particular, Equation (4) should not be used to estimate the standardized energy of a given local minimum. Rakhmanov, Saff and Zhou proved in [14, 23] that if O(N ) = BN + O(log N ) then B ∈ [−0.1127688, −0.0234972]. If we consider again, for example, the case N = 600, the length of the corresponding interval represents approximately 2150 times the standard deviation of U for N = 600. Thus, unfortunately, these theoretical bounds are not sharp enough in this context.
2.3
Sample support
The logarithmic energy must have a global minimum ωN for each N , so the lower support of V must be bounded and α must take a real value. In fact, we can define αN < 0 as the parameter that determines the lower support of V for a given N and α = lim αN . These N →+∞
b N of the problem. We have arguments cannot be directly reproduced for the worst minimum ω found no theoretical results ensuring that the standardized energy of the worst minimum of the Fekete problem is upper bounded. However, it seems natural to assume that such an upper bound exists. We work under the hypothesis that the upper support of V is finite, and
15
N =500 order 10 2 3 4 5 6 7 8 9 10 15 20 25 50 100
5
centered 5∙10 5 10 6
0.002 0.011 0.004 0.011 0.009 0.016 0.019 0.027 0.034 0.085 0.130 0.160 0.211 0.232
0.002 0.007 0.003 0.009 0.008 0.014 0.017 0.024 0.030 0.078 0.122 0.153 0.209 0.228
0.002 0.005 0.003 0.006 0.005 0.009 0.011 0.017 0.022 0.066 0.110 0.142 0.199 0.217
N =600
standardized 5 5∙10 5 10 10 6
10
0.009 0.003 0.010 0.008 0.015 0.018 0.026 0.033 0.084 0.129 0.160 0.211 0.231
0.004 0.011 0.005 0.010 0.011 0.021 0.029 0.045 0.058 0.134 0.176 0.195 0.214 0.217
0.006 0.002 0.008 0.007 0.013 0.016 0.023 0.029 0.077 0.121 0.152 0.207 0.227
0.004 0.002 0.005 0.004 0.008 0.010 0.015 0.021 0.065 0.109 0.141 0.198 0.216
5
centered 5∙10 5 10 6 0.002 0.007 0.004 0.010 0.009 0.016 0.018 0.025 0.030 0.062 0.076 0.077 0.055 0.031
0.001 0.002 0.001 0.002 0.002 0.005 0.006 0.010 0.013 0.038 0.051 0.054 0.041 0.024
standardized 5∙10 5 10 10 6 5
0.007 0.002 0.009 0.010 0.020 0.027 0.043 0.056 0.132 0.174 0.193 0.212 0.216
0.005 0.002 0.008 0.008 0.014 0.016 0.024 0.029 0.061 0.075 0.072 0.054 0.029
0.001 0.001 0.002 0.002 0.005 0.006 0.010 0.014 0.038 0.051 0.054 0.042 0.024
Table 4: Absolute value of the maximum oscillation of the signed roots of the sample centered and standardized moments divided by the corresponding final values. N 300 400 500 600 700 800 900 1000
3
4
0.026 0.035 0.024 0.036 0.004 0.013 0.004 0.008
0.011 0.004 0.006 0.001 0.002 0.002 0.002 0.003
relative error (absolute value) 5 6 7 8 0.019 0.020 0.017 0.018 0.001 0.008 0.003 0.005
0.017 0.009 0.013 0.005 0.002 0.003 0.002 0.004
0.019 0.015 0.016 0.011 0.001 0.006 0.002 0.005
0.019 0.011 0.017 0.005 0.002 0.005 0.001 0.005
9
10
0.019 0.011 0.017 0.006 0.002 0.006 0.001 0.005
0.019 0.009 0.018 0.003 0.002 0.007 0.002 0.006
Table 5: Relative error of the roots of the standardized moments given by the interpolation curves. we denote it by ΩN > 0 for each N (in Section 5 we make some additional comments about this hypothesis). Under this assumption, the stabilization of the standardized moments shown in the previous subsection implies that when N grows 1 1 4 b N ) ' − log N 2 − N log N + µU + Ω σU , IN (ω 4 e 4
where Ω = lim ΩN . N →+∞
Table 6 contains the best and worst energy values found among the around 6·107 available data and their standardized values, which determine the sample support of V . Note that the quotient between the magnitude of the sample upper and lower support for N = 87 is about 18.5. The starting configurations corresponding to the best and worst minima have been recov16
centered moments
0.1
1
0.05
average
0 −10 −20 −30 −40
0
200
400
600
800
1000
1200
N
centered moments
0.15
order 0.1
0.05
mean
−20
−30
−40
0
200
400
600
800
1000
Ak
Bk
R2
1
-0.026656
0.26882
0.99999
2
4,9108 · 10−5
0.0037473
0.99919
3
−5
3.2173 ·10
0.0070638
0.99844
4
6.4080 ·10−5
0.0065284
0.99959
5
6.0160 ·10−5
0.010139
0.99987
6
−5
7.6660 ·10
0.010067
0.99986
7
7.8828 ·10−5
0.012782
0.99996
8
−5
8.8477 ·10
0.013589
0.99989
9
9.2906 ·10−5
0.015607
0.999854
10
9.9627 ·10−5
0.016875
0.999734
1200
N
Figure 7: Evolution with N of the sample mean u and the kth roots of the sample centered q k k 0 moments, ± |(mu ) |, with k = 2, ..., 10. ered and the Forces Method has been run again from each one of them up to ε-convergence with ε = 10−10 to ensure that they are in linear tendency, so the digits displayed in Table 6 are correct. Figures 9, 10 and 11 show the geometry of the best found minima for N = 700, N = 800 and N = 1000, respectively. We have plotted the triangulation associated to the Dirichlet cells of these minima to appreciate the number of neighbors of each point. For each case we indicate the number of points with 5, 6 and 7 neighbors, which correspond to pentagons, hexagons and heptagons in the triangulation. Several authors have tried to exploit symmetries and other geometric properties to construct estimations of ωN , see for instance [5, 9]. The developed strategies can produce good results in some cases, but it seems that the proposed procedures cannot work well for a general N . In a small experiment we have performed nsp = 2000 runs of the Forces Method for N = 200 up to ε-convergence with ε = 10−10 . For each one of the achieved local minima, we have plotted the number of pentagons, hexagons and heptagons of their corresponding triangulations as a function of the standardized energy. The results of this experiment are shown in Figure 12. It can be observed that there exists a certain correlation between the quality of a local minimum and its geometrical properties (in particular, the number of hexagons). However, there are configurations apparently more “irregular” than others with better standardized energies. Different local minima can have the same number of geometrical “defects” (pentagons and heptagons) while the distribution of these defects on the sphere can be different. One might expect that the local minima whose defects are distributed according to the icosahedral symmetry are better than others. Nevertheless, P´erez-Garrido et al. have shown in [13] that for N in the thousands the lowest energy states are not necessarily those with the highest symmetry. In general, when N grows, the amount of different local minima is so big and their geometric and energetic differences are so small that it seems really unlikely that a constructive algorithm for ωN can exist. 17
4
standardized moments
centered moments
1.5
1
0.5
mean
0 −100
3
2
1
−300 −500
0
5000
10000
15000
N
0 1 10
2
3
10
10
4
10
N
Figure 8: Evolution with N of the sample mean and the sample centered moments (left) and of the sample standardized moments (right).
Figure 9: Best found minima for N = 700. It contains 26 pentagons, 660 hexagons and 14 heptagons.
2.4
Sample probability distribution
Figure 13 shows the sample probability distribution of V for N = 500 (top) and the corresponding histogram (bottom). For the histogram, we have taken 200 equal subdivisions of the sample support. The shape of the probability distribution is that of a typical “S” with a unimodal density function. Figure 14 shows the same information for N = 1000. In this case, we have less data and we have taken 100 equal subdivisions of the sample support for the histogram. In Section 4, we show histograms corresponding to other values of N . Figure 15 displays the sample probability distributions obtained from 5474 data for N = 3000 and from 1000 data for N = 10000 (right). In spite of the reduced amount of data, the “S” shape can already be clearly distinguished.
18
N 87 200 300 400 500 600 700 800 900 1000 1500 2000 2500 3000 4000 5000 10000 20000 50000
energy
standardized energy
best
worst
best
worst
-830.25191515325 -4133.0030795277 -9127.2018541955 -16061.397188219 -24933.375251999 -35741.876446933 -48486.008078991 -63165.218386790 -79779.028801265 -98327.118536945 -220073.09472107 -390148.38062467 -608542.13094267 -875247.86549775 -1553579.4563225 -2425121.3219221 -9680655.6796039 -38679495.467850 -241570577.38557
-830.18858140172 -4132.8884518871 -9127.0080162775 -16061.141180491 -24933.056472492 -35741.500017196 -48485.638001950 -63164.805050460 -79778.604182025 -98326.612777478 -220072.54934944 -390147.71388079 -608541.25357932 -875246.81489872 -1553578.0936013 -2425119.6174501 -9680652.8508501 -38679490.739999 -241570571.29003
-0.773 -1.844 -2.645 -3.523 -3.500 -3.942 -3.440 -3.463 -3.278 -3.519 -2.968 -2.686 -2.952 -3.211 -3.220 -2.871 -2.678 -2.626 -1.778
14.336 6.212 8.087 7.581 7.563 7.262 6.264 6.184 5.616 6.099 4.211 3.946 3.890 3.578 3.194 3.483 2.963 2.918 1.437
Table 6: Best and worst energies found for each studied N and their corresponding standardized values. Figure 16 shows details of the left queues of the sample probability distributions for N = 300, 400, . . . , 800. Note that the aspect of the sample probability distributions for N = 300 and N = 400 is still discrete for probabilities of order 10−3 and 10−4 , respectively. For N = 600, however, the sample probability distribution is “smooth” even for probabilities of order 10−5 . If we remember the sample probability distributions for N = 87 and N = 200 (see [4]), we can conclude that the Fekete problem evolves from a single minimum for N = 2 to a practically continuous probability distribution of the energy of the different minima for N ' 600. In particular, the aspect of the sample probability distributions suggests that the sample support obtained for N = 87, 200, 300, 400 can be considered a good estimation of IN (ωN ) in each case. Note, for instance, that some of the best minima obtained for N = 300, 400 appear several times. For bigger N , nevertheless, it seems clear that local minima should exist that are significatively better than the best ones found.
3
The theoretical model
The next natural step is to adjust the probability density function fV of the standardized variable V by means of theoretical models that allow us to impose different hypotheses on the computational cost of the Fekete Problem. The shape of the sample probability distribution of V suggests the following model, which is based on the composition of two queues and can
19
Figure 10: Best found minima for N = 800. It contains 26 pentagons, 760 hexagons and 14 heptagons.
Figure 11: Best found minima for N = 1000. It contains 33 pentagons, 946 hexagons and 21 heptagons. be understood as a generalization of the Beta distribution. Let us consider a random variable Z supported in (0, 1) whose probability density function has the generic form fZ (x) = A−1 q1 (x)q2 (1 − x), (5) where A =
Z
1
0
q1 (x)q2 (1 − x) dx and the queues q1 , q2 are positive functions verifying that
q1 , q2 ∈ C((0, 1]). Therefore, if we denote by FZ the probability distribution of Z, then P [Z ≤ x] = FZ (x) = 1 for any x ≥ 1, whereas for x > 0 small enough FZ (x) = where Q1 (x) =
Z 0
Z
x
0
fZ (s) ds ' A−1 q2 (1)
Z 0
x
q1 (s) ds = A−1 q2 (1)Q1 (x),
x
q1 (s) ds.
Z − µZ is supported in The standardized random variable W = σZ 20
−µZ 1 − µZ , , which σZ σZ
16 14
−2
0 2 4 standardized energy
6
number of heptagons
18
12
8
188 number of hexagons
number of pentagons
20
182
176
170
−2
0 2 4 standardized energy
6
6 4 2 0
−2
0 2 4 standardized energy
6
Figure 12: Number of pentagons, hexagons and heptagons as a function of the standardized energy of 2000 local minima for N = 200. 1
probability
0.8 0.6 0.4 0.2 0 −4
−2
0
2
4
6
8
4
6
8
standardized energy
probability density
0.5 0.4 0.3 0.2 0.1 0 −4
−2
0
2
standardized energy
Figure 13: Sample probability distribution (top) and histogram (bottom) for N = 500. implies that if FW is the probability distribution of W , then for any x ∈ get that FW (x) = P [W ≤ x] = P [Z ≤ xσZ + µZ ] = FZ (xσZ + µZ ) =
Z 0
−µZ 1 − µZ , σZ σZ
we
xσZ +µZ
fZ (s) ds.
Our aim is to approximate the standardized energy V by the standardized random variaµZ 1 − µZ ble W , which implies that αN = − and ΩN = and hence σZ = (ΩN − αN )−1 . In σZ σZ this scenario, the queues q1 , q2 must have more than one argument, because it is necessary to introduce some parameters into the model in order to adjust the sample moments. Therefore, if Θ ⊂ IRk is the set of parameters, then q1 , q2 : (0, 1] × Θ −→ IR+ and verify that q1 , q2 ∈ C((0, 1] × Θ) for any θ ∈ Θ. As usual, if Λ ∈ C((0, 1] × Θ) for any x ∈ (0, 1] and any 21
1
probability
0.8 0.6 0.4 0.2 0 −4
−2
0
2
4
6
8
4
6
8
standardized energy
probability density
0.5 0.4 0.3 0.2 0.1 0 −4
−2
0
2
standardized energy
1
1
0.8
0.8
probability
probability
Figure 14: Sample probability distribution (top) and histogram (bottom) for N = 1000.
0.6 0.4 0.2 0 −4
0.6 0.4 0.2
−2
0
2
4
0 −4
6
standardized energy
−2
0
2
4
6
standardized energy
Figure 15: Sample probability distribution for N = 3000 (left) and N = 10000 (right) θ ∈ Θ, we denote by Λx ∈ C(Θ) and by Λθ ∈ C((0, 1]) the functions given by the equalities Λx (θ) = Λθ (x) = Λ(x, θ). In addition, we define the following set of functions n
o
F = Λ ∈ C((0, 1] × Θ) : Θ ⊂ IRk for some k ∈ IN∗ and Λθ ∈ C((0, 1]) for each θ ∈ Θ
(6)
Given Λ1 , Λ2 ∈ F, there exist k, m ∈ IN∗ and subsets Θ1 ⊂ IRk , Θ2 ⊂ IRm such that Λ ∈ C((0, 1] × Θ2 ). We say that Λ1 and Λ2 are equivalents iff n 1 ) and Λ2 o n 1 ∈ C((0, 1]o × Θ θ θ Λ1 : θ ∈ Θ1 = Λ2 : θ ∈ Θ2 . Clearly, this defines an equivalence relation on F. Given the value c > 0 and the function g: IR+ −→ IR, and taking into account the identity I − IN (ωN ) = U − µU − αN σU , the probability of the event I − IN (ωN ) ≤ cg(N ) is given by !
P [I − IN (ωN ) ≤ cg(N )] = P [U − µU − αN σU ≤ cg(N )] = FV
22
cg(N ) + αN . σU
(7)
−3
−4
x 10
1
0.5
−2.6 −2.4 standardized energy
N =400 0.5
0 −3.6
−2.2
−5
1
N =600 2 1 0 −4
−3.4 −3.2 −3 standardized energy
−3.8 −3.6 −3.4 standardized energy
N =500
1 0 −3.6
−2.8
1
0.5
0 −3.6
−3.4 −3.2 standardized energy
−3
−4
x 10
N =700
−3.2
x 10
2
−4
x 10
probability
probability
3
3
probability
0 −2.8
−5
x 10
probability
N =300
probability
probability
1
x 10
N =800 0.5
0 −3.6
−3.4 −3.2 −3 standardized energy
−3.4 −3.2 −3 standardized energy
−2.8
Figure 16: Detail of the left queues of the sample probability distribution for N = 300, 400, . . . , 800. We approximate this probability by FW
cg(N ) + αN σU
!
=
Z
δ(N,c)
fZ (x) dx,
0
(8)
where δ(N, c) =
cg(N )σZ cg(N ) cg(N ) + αN σZ + µZ = = . σU (ΩN − αN )σU (ΩN − αN )(A2 N + B2 )
Therefore, the value of P [I − IN (ωN ) ≤ cg(N )] for big N depends on the behavior of δ(N, c) g(x) when N goes to infinite; that is, on the behavior of the function at infinite, since x g(x) lim (ΩN − αN ) = Ω − α. Thus, if lim = +∞, then lim δ(N, c) = +∞, which in x→∞ x N →∞ N →∞ particular implies that δ(N, c) ≥ 1 for N big enough and hence lim P [I − IN (ωN ) ≤ cg(N )] = 1,
N →∞
g(x) ca = a > 0, then lim δ(N, c) = and hence, x→∞ x N →∞ A2 (Ω − α)
whereas if lim
lim P [I − IN (ωN ) ≤ cg(N )] =
N →∞
23
Z 0
ca A2 (Ω−α)
fZ (x) dx,
(9)
which implies that for any c > 0 the probability of the event I − IN (ωN ) ≤ cg(N ) tends to be independent of N when N grows. For c small enough we have !
ca . A2 (Ω − α)
−1
lim P [I − IN (ωN ) ≤ cg(N )] ' A q2 (1)Q1
N →∞
(10)
g(x) = 0, then lim δ(N, c) = 0, and hence when N grows x→+∞ x N →+∞ we obtain that for any c > 0 On the other hand, if lim
P [I − IN (ωN ) ≤ cg(N )] ' A−1 q2 (1)Q1 δ(N, c) .
(11)
We can see each run of the Forces Method from a random starting position as Bernouilli experiment in which P = P [I − IN (ωN ) ≤ cg(N )] represents the probability of success. Let nsp be the number of different random starting positions necessary to obtain a configuration satisfying I − IN (ωN ) ≤ cg(N ) with a given probability of success R. It can be assumed log(1 − R) that the runs are independent, so R = 1 − (1 − P )nsp and nsp = . When P is small log(1 − P ) 1 1 enough we can write nsp ' log . Thus, the asymptotic behavior of the inverse of P 1−R the probability of success in a run determines that of the computational cost of obtaining a configuration satisfying I − IN (ωN ) ≤ cg(N ) with arbitrarily high probability of success. In Smale’s 7th problem framework, we need to design theoretical models that allow us to adjust the sample information under the assumption of different hypothesis about the way the probability of the event I − IN (ωN ) ≤ c log N asymptotically decreases with N , so g(x) = log x. We are specially interested in designing models that are consistent with the hypothesis of polynomial asymptotic cost. Let us start by considering the queues q1 (x, p) = xp−1 , q1 (x, p, q) =
1 p x(q+1) e( xq )
(12) ,
(13) −q
where p, q > 0. Their corresponding primitives are Q1 (x, p) =
xp e−px , Q1 (x, p, q) = , p pq
which gives the asymptotic computational costs 1 P [I − IN (ωN ) ≤ c log N ] and
'
1 P [I − IN (ωN ) ≤ c log N ]
respectively. 24
Ap (δ(N, c))−p , q2 (1)
(14)
Apq p(δ(N,c))q e q2 (1)
(15)
'
Proposition 3.1 The asymptotic computational cost given by the queue (12) is polynomially bounded. Proof. (δ(N, c))−p (ΩN − αN )(A2 N + B2 ) = lim lim p N →+∞ N →+∞ N cN log N
!p
= lim
N →+∞
(Ω − α)A2 c log N
!p
= 0.
Proposition 3.2 The asymptotic computational cost given by the queue (13) is bounded by b q
a function of the form eN , qb > 0. Proof. If qb > q, then
pδ(N,c)q
lim
e
N →+∞
pN q
= lim e
= lim e N →+∞
eN bq
(ΩN −αN )(A2 N +B2 ) cN log N
b q
p δ(N,c)q − Np
q
b q −q
−N p
pN q
= lim e
N →+∞
N →+∞
= (Ω−α)A2 c log N
q
b q −q
−N p
= 0.
log x We can construct another family of models by using the auxiliary function h(x) = . x These models have the same asymptotic bounds for the computational cost as those corresponding to queues 12,13, but they allow us to introduce an additional parameter in a 3 3 3 −1 natural way. Figure 17 shows the graph of function h. Points P1 = (e, e ) , P2 = e 2 , 2 e− 2 mark the maximum value of h and its inflexion point, respectively. Function h is positive and decreasing in (e, +∞) and we can define the inverse h−1 in the interval 0, 1e , so the functions −p
(h−1 (K1 x)) Q1 (x, p, K1 ) = pK1 and
(16) q
e−p(h (K1 x)) Q1 (x, p, q, K1 ) = , pqK1 −1
(17)
where p, q > 0, K1 ∈ 0, 1e , are positive and increasing in x ∈ (0, 1]. Their derivatives are the queues 1 q1 (x, p, K1 ) = (18) −1 (K1 xh (K1 x) − 1) (h−1 (K1 x))(p−1) 25
0.5
P1
0.4
P2
0.3 0.2
h(x)
0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5
0
5
10
15
20
25
30
x
Figure 17: The function h(x) =
3
3
log x . The coordinates of points P1 , P2 are P1 = (e, e−1 ) , x
P2 = e 2 , 32 e− 2 , respectively. and
(q+1)
q1 (x, p, q, K1 ) =
(h−1 (K1 x)) q , (K1 xh−1 (K1 x) − 1)ep(h−1 (K1 x))
respectively, where we have used that h0 (x) =
(19)
1 − xh(x) . x2
h−1 (γδ(N, c)) = 0. N →+∞ Nr
Lemma 3.3 For each γ > 0, if r > 1 then lim
1 1 Proof. If ρ > 1 then when x grows ≤ h(x) ≤ ρ and for x > 0 small enough we have x x 1 1 −1 ≤ h (x) ≤ 1 . Thus, if γ > 0 and N grows x xρ − ρ1
(γδ(N, c))−1 ≤ h−1 (γδ(N, c)) ≤ (γδ(N, c))
Let us now consider r > 1 and
lim
N →+∞
(ΩN −αN )(A2 N +B2 ) γc Nr
≤
(ΩN − αN )(A2 N + B2 ) γc
1 < ρ < 1. Then, r
!1
ρ
.
(ΩN − αN )(A2 N + B2 ) = N →+∞ γcN r log N lim
1
ρ
= 0.
Proposition 3.4 The asymptotic computational cost given by the queue (18) is polynomially bounded.
26
pb Proof. If pb > p and r = , then r > 1 and p p
(h−1 (K1 δ(N, c))) h−1 (K1 δ(N, c)) lim = lim N →+∞ N →+∞ Nr N pb
!p
= 0,
where we have applied Lemma 3.3.
Proposition 3.5 The asymptotic computational cost given by the queue (19) is bounded by b q
a function of the form eN , qb > 0. qb Proof. If qb > q and r = , then r > 1 and q
q
p(h−1 (K1 δ(N,c)))
lim
e
N →+∞
p
= lim e
b q
q
(h−1 (K1 δ(N,c))) − Np
N →+∞
eN bq
h−1 (K1 δ(N, c)) Nr
" pN q
= lim e
!q
=
# (r−1)q −N p
= 0,
N →+∞
where we have applied Lemma 3.3. b In general, it is possible to modify a queue Q to generate infinitely many other queues Q b are not equivalent. For instance, if with the same asymptotic cost in such a way that Q, Q 1 b b b b θˆ is positive and increasing we take Λ ∈ C ((0, 1] × Θ) such that for each θˆ ∈ Θ the function Λ b θˆ(x) ∈ IR , then it is easy to check that Q b ∈ C 1 ((0, 1] × Θ × Θ) b defined by and lim+ Λ x→0
b b b = 0, and it gives an asymptotic ˆ = Q(x, θ)Λ(x, ˆ is positive and increasing, lim Q Q(x, θ, θ) θ) x→0
cost asymptotically bounded by a function χ if that given by Q is asymptotically bounded by χ. Regarding the queue q2 , there are no requirements other than that q2 be a positive function such that q2 ∈ C(0, 1], so the variety of possible queues q2 to use is even wider than for the queues q1 . On the other hand, we can design queues q2 by considering events of b N ) − I ≤ dg(N ) and imposing that the model leads to a specific asymptotic the form IN (ω b N )−I ≤ dg(N )] = behavior of the probability Pb of these events. In this case, we have Pb [IN (ω P [U − µU − ΩN σU ≥ dg(N )] ' A
−1
Z
1
1−δ(N,d)
q1 (s)q2 (1 − s) ds = A
−1
Z 0
δ(N,d)
q1 (1 − s)q2 (s)ds,
and we can reproduce the reasonings made throughout this section. Now we do not have the restriction g(x) = log x. In particular, if we choose again g(x) = log x, the previous queues can be used as queues q2 . For instance, if q1 (x, p1 ) = xp1 −1 and q2 (x, p2 ) = xp2 −1 , xp1 −1 (1 − x)p2 −1 is the probability density of a Beta distribution, where then fZ (x, p1 , p2 ) = B(p1 , p2 ) B denotes the Beta function. 27
4
Adjusting the sample information
Now we can use the reasonings presented in the previous section to design different models to adjust the available sample information under the assumption of different hypotheses about the asymptotic decrease of P [I − IN (ωN ) ≤ c log N ]; that is, the computational cost of the Fekete problem. There are infinitely many possible choices for the models. Here we describe 7 models that we consider to be the most meaningful among all those studied, and present the results of the corresponding adjusting processes. In the next section we discuss these results in the context of Smale’s 7th problem. The adjusting procedure is based on the standardized moments up to order 10. We start by choosing an expression for fZ , which includes the definition of a certain number of parameters and a parametric space. Then we vary the parameters of the model and select the combinations of parameters such that the roots of the corresponding standardized moments agree with the roots of the sample standardized moments displayed in Table 3. The study made in Section 2.2 about the error in the sample moments must be taken into account in this context. Thus, we have defined two sets of lower and upper bounds for the admissible discrepancy in the kth root of each kth order standardized moment, k = 3, . . . , 10. From now on we refer to these sets as Case 1 and Case 2, respectively. Table 7 contains the values of the upper and lower bounds for Cases 1 and 2, expressed as a fraction of the value of the sample standardized moments. For instance, in Case 1 we admit combinations of parameters such that the 3rd root of the 3rd standardized moment is up to 1% lower or 1% higher than the corresponding sample value, and in Case 2 we admit combinations of parameters such that the 10th root of the 10th standardized moment is up to 1% lower or 4% higher than the corresponding sample value. In general, there are several combinations of the parameters of each model such that their associated moments satisfy the conditions established in Cases 1 and 2. This defines for each case a range of variation of the parameters of the model and their corresponding upper and lower supports.
low.
up.
low.
relative discrepancy bounds for the standardized moments 5 6 7 8 9 up. low. up. low. up. low. up. low. up. low. up.
-0.01 -0.01
0.01 0.02
-0.01 -0.01
0.01 0.02
Case 1 2
3
4
-0.01 -0.01
0.01 0.02
-0.01 -0.01
0.01 0.02
-0.01 -0.01
0.015 0.03
-0.01 -0.01
0.015 0.03
-0.01 -0.01
0.02 0.04
10 low.
up.
-0.01 -0.01
0.02 0.04
Table 7: Admissible relative discrepancy with the sample standardized moments for Cases 1 and 2. As for the practical implementation of the models, the computation of the moments by means of the trapezoid rule is fast and accurate. We have used 10000 equal integration subintervals to have at least about 7 − 8 correct digits in the moments and the support, even when the upper support given by the models is large. Moreover, the computation of h−1 (x), 1 x ∈ 0, can be made efficiently and with high accuracy by applying Newton’s algorithm e from the inflexion point P2 (see Figure 17), which guarantees the convergence. Specifically,
28
1 if for a given x ∈ 0, we want to compute y such that h(y) = x, Newton’s algorithm e y k x − log y k gives (∆y)k = y k ; we stop the iterative process when |(∆y)k | < 10−12 y k . In any 1 − log y k case, the computational effort needed to perform these adjusting tests was only a minimum fraction of what was required to obtain the sample.
4.1
Model 1
For this model we take q1 (x, p1 ) = xp1 −1 , q2 (x, p2 ) = xp2 −1 , p1 , p2 > 1. In these conditions fZ is the density function of a Beta distribution, and when N grows the inverse of P [I − IN (ωN ) ≤ c log N ] is polynomially bounded. Figure 18 shows the range of variation of the lower support (top left), the upper support (top right), the parameter p1 (bottom left) and the parameter p2 (bottom right) for Cases 1 (dotted line) and 2 (continuous line). For N = 300, 400, . . . , 1000 the sample standardized moments have been used as a reference for the adjusting, whereas for N = 200, 1500, 2000, 3000 the standardized moments given by the interpolation curves presented in Section 2.2 have been taken as reference values. The dotted and solid lines in Figure 18 have been obtained just by joining the points corresponding to all these N . The figure also displays the sample lower and upper supports. These criteria have also been considered to make the figures corresponding to the rest of models.
600
upper support
lower support
0
−2
−4
−6 0
500
1000
1500
400
200
0
2000
0
500
N
1000
1500
2000
1500
2000
N
40
1500
30
2
2
20
p
p
1
1000
500 10 0
0
500
1000
1500
0
2000
N
0
500
1000
N
Figure 18: Results of the adjusting process corresponding to Model 1.
29
4.2
Model 2 1
1
. To obtain a p2 x(q1 +1) e( ) x(q2 +1) e( xq2 ) biparametric model we have taken q1 = q2 = 1. In particular, according to this model when b q N grows the inverse of P [I − IN (ωN ) ≤ c log N ] is bounded by a function of the form eN . Figure 19 shows the range of variation of the standardized support and the parameters p1 , p2 according to this model for Cases 1 and 2.
In this model q1 (x, p1 , q1 ) =
p1 xq1
and q2 (x, p2 , q2 ) =
0
200
upper support
lower support
−2 −4 −6 −8 −10 0
500
1000
1500
150 100 50 0
2000
0
500
N
2000
500
300
p
2
2
400
3 1
1500
N
4
p
1000
2
200 1 0
100 0
500
1000
1500
0
2000
N
0
500
1000
1500
2000
N
Figure 19: Results of the adjusting process corresponding to Model 2.
4.3
Model 3
The queues of this model have the same form as for Model 2, but now we have taken q1 = q2 = 2. Figure 20 displays the results obtained in this case. In this case, the asymptotic (2+) computational cost of the problem is upper bounded by eN for each > 0.
4.4
Model 4
In this experiment we have taken the same queues as for Models 2 and 3, and we have considered the cases q1 = q2 = q = 0.1, 0.2, . . . , 1. The main objective was to obtain the range of variation of the lower supports corresponding to N = 200, 300, 400 for Case 1 as a function of q, which is displayed in Figure 8.
30
200
upper support
lower support
0 −2 −4 −6 −8
150 100 50
−10 −12 0
500
1000
1500
0
2000
0
500
1000
N 0.5
2000
1500
2000
200
0.4
150
p
2
0.3
p
1
1500
N
100
0.2 50
0.1 0
0
500
1000
1500
0
2000
0
500
1000
N
N
Figure 20: Results of the adjusting process corresponding to Model 3. q 0.1 0.2 0.3 0.4 05 0.5 0.6 0.7 0.8 0.9 1
N =200 200 min. -2.0 -2.0 -2.1 -2.2 -2.3 23 -2.4 -2.4 -2.5 -2.5 -2.7
max. -1.8 -1.9 -2.0 -2.1 -2.2 22 -2.2 -2.3 -2.4 -2.5 -2.6
N =300 300 min. -2.5 -2.7 -2.8 -2.9 -3.0 30 -3.1 -3.2 -3.3 -3.4 -3.5
N =400 400
max. -2.2 -2.3 -2.4 -2.5 -2.6 26 -2.7 -2.8 -2.8 -3.0 -3.0
min. -2.8 -3.1 -3.2 -3.3 -3.4 34 -3.6 -3.7 -3.8 -3.9 -4.0
max. -2.5 -2.5 -2.7 -2.8 -2.9 29 -3.0 -3.1 -3.2 -3.3 -3.4
Table 8: Lower supports for N = 200, 300, 400 after the adjusting process corresponding to Model 4.
4.5
Model 5
For this model q1 (x, K1 , p1 ) =
1 (K1
1 (K2 xh−1 (K2 x) − 1) (h−1 (K2 x))(p2 −1)
xh−1 (K
1 x)
− 1) (h−1 (K1 x))(p1 −1)
and q2 (x, K2 , p2 ) =
.
1 . According to this model, 1.001e when N grows the inverse of P [I − IN (ωN ) ≤ c log N ] is polynomially bounded. Figure 21 shows the results of the adjusting process for this case. A biparametric model is obtained by taking K1 = K2 =
31
400
upper support
lower support
0 −2 −4 −6 −8 0
500
1000
1500
300 200 100 0
2000
0
500
N
1000
1500
2000
1500
2000
N
60
500 400
40
p
p
2
1
300
2
200
20
100 0
0
500
1000
1500
0
2000
0
500
N
1000
N
Figure 21: Results of the adjusting process corresponding to Model 5.
4.6
Model 6 1
Here the left queue is q1 (x, K1 , p1 ) =
whereas the right (K1 xh−1 (K1 x) − 1) (h−1 (K1 x))(p1 −1) (q+1) (h−1 (K2 x)) queue has the expression q2 (x, K2 , p2 , q) = q , and we have (K2 xh−1 (K2 x) − 1)ep2 (h−1 (K2 x)) 1 and q = 2. Figure 22 displays the corresponding results. taken K1 = K2 = 1.001e
4.7
Model 7
For this model we have used the same queues as for Model 6, but we have taken K1 = K2 = 1 and q = 0.3. Figure 23 shows the range of variation of the lower and upper support 1.001e and the parameters p1 , p2 for Cases 1 and 2. Table 9 contains a pair of selected combinations of p1 , p2 for different N, their corresponding supports and the relative discrepancy of the roots of the standardized moments obtained from the model when they are compared with the sample standardized moments. For N = 200 and the limit case N → +∞, the discrepancy is computed by taking as reference values the ones given by the interpolation formula for the standardized moments presented in 2.2. Figures 24-31 show the density functions corresponding to the first parametric choice given in Table 9 (continuous line), and the histogram obtained from the sample data (points) for N = 300, 400, . . . , 1000. The histograms for N = 300, 400, 500, 600 contain 200 points, whereas the ones corresponding to N = 700, 800, 900, 1000 contain 100 points. In Fig. 24, which corresponds to N = 300, we have also plotted two density functions given by Model 4 32
1000
upper support
lower support
0
−2
−4
−6 0
500
1000
1500
800 600 400 200 0
2000
0
500
N
1000
1500
2000
1500
2000
N
50
30
40 20 2
2
p
p
1
30 20
10
10 0
0
500
1000
1500
0
2000
N
0
500
1000
N
Figure 22: Results of the adjusting process corresponding to Model 6. with q = 0.3 and q = 1, respectively. For the case q = 0.3 we display the density function corresponding to the parametric combination p1 = 7.444, p2 = 234.934, which gives the support α300 = −2.648, Ω300 = 38.449 and the relative discrepancies −0.000767, −0.00224, 0.0035, 0.00266, 0.007077, 0.00852, 0.0116, 0.0139. For the case q = 1, the parametric combination is p1 = 0.473, p2 = 52.194, the associated support is α300 = −2.648, Ω300 = 38.449, and the corresponding relative discrepancies are −0.0094, −0.00293, −0.000865, 0.00043, 0.00365, 0.00531, 0.00782, 0.00977.
5
A plausible probabilistic positive solution
After the analysis of the sample information and the observation of the results of the adjusting process, we can discuss the consequences of this study regarding to Smale’s 7th problem. We can start by making the following general comments: 1. All the considered biparametric models exhibit the same qualitative behavior of the lower and upper supports of V . In particular, after a fast increase of the magnitude of the lower support up to N ' 600, this growth suddenly decreases and the lower support stabilizes. On the other hand, the upper support rapidly grows up to N ' 600, then decreases more slowly and finally stabilizes when N grows. In these conditions, we can say that it is the way the moments MVk change with N which leads to a specific shape of the lower and upper support evolution, and that different approximating biparametric continuous models give essentially the same tendencies, but which are affected by a scaling factor.
33
150
upper support
lower support
0
−2
−4
−6 0
500
1000
1500
100
50
0
2000
0
500
1000
N
1500
2000
1500
2000
N
40 400 30
10 0
2
2
20
p
p
1
300 200 100
0
500
1000
1500
0
2000
0
500
1000
N
N
Figure 23: Results of the adjusting process corresponding to Model 7. 0.5
probability density
0.4 0.3 0.2 0.1 0 −5
−4
−3
−2
−1
0
1 2 3 4 standardized energy
5
6
7
8
9
10
Figure 24: Histogram and a density function given by Model 7 for N = 300. The figure also includes two different density functions given by Model 4. 2. The range of variation of the upper support for Cases 1 and 2 is much higher than the range of variation of the lower support for all the models studies. This reveals a strong sensibility of the magnitude of the upper support to the value of the standardized moments, especially that of higher order. Analogous comments can be made regarding the range of variation of parameters p1 , p2 . This sensibility has also been experimentally confirmed (recall Figures 5, 6). 3. All the models give upper supports considerably higher than the corresponding sample supports. Models 2, 3, 4 correspond to the hypothesis that the asymptotic computational cost of the Fekete problem is not polynomially bounded. Regarding these models we can remark: 4. It is not possible to adjust the sample lower support for N = 200, 300, 400 while keeping the parameter q constant. The values of q that enable us not to underestimate 34
0.5
probability density
0.4 0.3 0.2 0.1 0 −5
−4
−3
−2
−1
0
1 2 3 4 standardized energy
5
6
7
8
9
10
Figure 25: Histogram and a density function given by Model 7 for N = 400. 0.5
probability density
0.4 0.3 0.2 0.1 0 −5
−4
−3
−2
−1
0
1 2 3 4 standardized energy
5
6
7
8
9
10
Figure 26: Histogram and a density function given by Model 7 for N = 500. 0.5
probability density
0.4 0.3 0.2 0.1 0 −5
−4
−3
−2
−1
0
1 2 3 4 standardized energy
5
6
7
8
9
10
Figure 27: Histogram and a density function given by Model 7 for N = 600. 0.5
probability density
0.4 0.3 0.2 0.1 0 −5
−4
−3
−2
−1
0
1 2 3 4 standardized energy
5
6
7
8
9
10
Figure 28: Histogram and a density function given by Model 7 for N = 700.
35
0.5
probability density
0.4 0.3 0.2 0.1 0 −5
−4
−3
−2
−1
0
1 2 3 4 standardized energy
5
6
7
8
9
10
Figure 29: Histogram and a density function given by Model 7 for N = 800.
0.5
probability density
0.4 0.3 0.2 0.1 0 −5
−4
−3
−2
−1
0
1 2 3 4 standardized energy
5
6
7
8
9
10
Figure 30: Histogram and a density function given by Model 7 for N = 900.
0.5
probability density
0.4 0.3 0.2 0.1 0 −5
−4
−3
−2
−1
0
1 2 3 4 standardized energy
5
6
7
8
9
10
Figure 31: Histogram and a density function given by Model 7 for N = 1000.
36
N 200 300 400 500 600 700 800 900 1000
∞
p1
p2
αN
ΩN
3.977 4.426 8.927 6.064 16.068 11.239 26.247 20.746 30.570 28 998 28.998 29.784 26.640 29.784 24 283 24.283 32.534 21.925 33.320 21 532 21.532 11.901 10.149
35.321 41.563 113.798 54.940 337.635 130.741 450.000 247.565 450.000 450 000 450.000 450.000 397.385 450.000 268 968 268.968 450.000 187.816 450.000 173 547 173.547 44.469 36.653
-1.856 -1.941 -2.647 -2.307 -3.368 -3.025 -4.413 -4.117 -4.857 -4.697 4 697 -4.777 -4.523 -4.777 -4.515 4 515 -5.055 -4.500 -5.134 -4 517 -4.517 -4.120 -3.848
19.642 22.217 47.402 24.622 134.785 46.598 125.113 65.320 107.249 113 030 113.030 110.051 102.158 110.051 63 641 63.641 100.926 42.937 98.638 39 377 39.377 12.476 11.164
3 0.002 -0.005 -0.009 0.005 -0.010 -0.009 -0.009 -0.009 -0.008 0 006 0.006 -0.009 0.008 -0.008 -0.010 0 010 -0.009 -0.009 -0.008 -0 009 -0.009 -0.002 0.007
relative difference with the standardized sample moments 4 5 6 7 8 9 0.008 0.000 0.009 -0.001 0.002 -0.005 0.007 -0.004 0.008 -0.002 0.003 -0.004 -0.004 0.000 0.001 0.005 0.008 0.011 -0.005 0.001 -0.003 -0.001 -0.003 -0.002 0.005 0.007 0.013 0.020 0.025 0.033 0.001 0.002 0.005 0.008 0.011 0.016 0.001 0.000 0.004 0.006 0.008 0.011 0.000 -0.001 0.001 0.002 0.003 0.005 -0.004 -0.006 -0.007 -0.006 -0.008 -0.007 -0.002 0 002 0 004 0.004 -0.002 0 002 0 003 0.003 0 000 0.000 0 003 0.003 -0.002 -0.002 -0.001 0.001 0.002 0.006 0.001 0.009 0.005 0.012 0.011 0.016 0.002 0.001 0.005 0.007 0.010 0.013 0 000 0.000 -0.002 0 002 0 001 0.001 0 002 0.002 0 004 0.004 0 006 0.006 0.002 0.000 0.004 0.006 0.010 0.014 -0.001 -0.004 -0.001 -0.001 0.002 0.004 0.002 0.001 0.005 0.007 0.009 0.012 -0 001 -0.001 -0 004 -0.004 -0 001 -0.001 -0 002 -0.002 -0 001 -0.001 0 001 0.001 0.007 -0.009 0.009 -0.008 0.006 -0.007 0.005 -0.006 0.007 -0.009 0.002 -0.010
10 -0.006 -0.004 0.014 -0.003 0.040 0.019 0.013 0.006 -0.007 0 002 0.002 0.008 0.018 0.015 0 008 0.008 0.019 0.008 0.015 0 002 0.002 0.002 -0.003
Table 9: Selection of parameters p1 , p2 (Model 7) for different N . the sample lower supports for N ≥ 500 overestimate the sample lower supports for N = 200, 300, 400. 5. It is possible to obtain very high lower supports by increasing q. Model 3, for instance, corresponds to the case q = 2, and it gives lower supports of about −9 for N = 600. In particular, it seems very unlikely that the asymptotic cost of the Fekete problem grows factorially or faster. 6. If we admit that the parameter q can depend on N , then it is possible to design nonpolynomial models that adjust well the lower supports for N ≥ 200. Table 8 indicates the values of q that must be taken for each N = 200, 300, 400. We have marked in bold the lower supports that agree with the sample ones for these N . If we assume that the supports and the parameters of the models start to stabilize from N ' 600, then parameter q should tend to a value around 1. Models 1, 5, 6, 7 correspond to the hypotheses that the asymptotic cost of the Fekete problem is polynomially bounded. For these models we observe: 7. Model 1 subestimates the magnitude of the sample lower supports for N = 300, 400. Moreover, it gives lower supports with a length of about 4 for N = 500 and 4.5 for N = 600 − 1000, which does not seem to be enough. 8. Models 5, 6, 7 adjust the sample lower supports for N = 200, 300 under the conditions of Case 1, and also adjust the sample lower support for N = 400 under the conditions 37
of Case 2. Moreover, they give lower supports with a magnitude up to about 4.5 for N = 500, and 5 for N = 600 − 1000. 9. The upper queue of models 1, 5 correspond to the hypothesis that the inverse of b N ) − I ≤ d log N ] is polynomially bounded. These models give extremely high P [IN (ω upper supports and p2 values for N ' 500 − 1500. Model 6 corresponds the hypothesis (2+) b N ) − I ≤ d log N ] is bounded by eN for that when N grows, the inverse of P [IN (ω each > 0, and it also gives very high upper supports. b N )−I ≤ d log N ] is asymptotically bounded 10. According to Model 7, the inverse of P [IN (ω N (0.3+) for each > 0. This model gives upper supports between ' 20 and ' 100 by e for N ' 500 − 1000. These values seem more realistic than the ones given by Models 1, 5, 6. Indeed, for the case N = 87 we have observed a lower support of magnitude 0.773 and an upper support of size 4.336. The corresponding ratio is of about 18.5. The magnitude of the lower support should grow at least up to ' 5; if the above ratio were maintained, then we could expect an upper support of magnitude about 90 for N ' 600. On the other hand, in the case N = 87 there is only a small amount of minima, one of them with a probability of around 42%, and some others with probabilities of around 10%, see [4]. Nevertheless, even in this case there exist some bad minima far from the average energy value and with extremely low (of order 10−5 ) probabilities. Taking into account these results, and that the enormous complexity of the problem is only starting to develop at N = 87, it is reasonable to assume that there exists a relatively small amount of bad isolated minima with extremely low appearance probabilities for moderately high N . It is difficult for a continuous model to reproduce this phenomenon accurately. However, as N increases and the problem tends to be more and more continuous, these isolated bad minima should tend to disappear, which all models reflect with a strong decrease of ΩN .
11. Models 5, 6, 7 have different upper queues and supports, but they have very similar ranges of variation of the lower support. Comment 11 indicates a general fact. The performed proofs confirm that the lower support depends essentially on only the lower queue, and the upper support depends essentially on only the upper queue, which allows us to “interchange” queues and supports of different 1 models. For instance, the model obtained by taking c1 (x; p1 ) = xp1 −1 and c2 (x; p2 ) = p2 x2 e( x ) gives lower supports very similar to the ones given by Model 1, and upper supports very similar to the ones given by Model 2. In particular, this implies that the behavior of the lower support is practically independent of the hypotheses made on the behavior of the upper queue. Taking into account that the key question for Smale’s 7th problem is the lower support, we can conclude that the observations made in Subsection 2.3 about the worst minima as well as the comments 2, 3, 9, 10 in this section have minor importance. On the other hand, we have tested some models constructed as variations of Models 5, 6, 7 by trying 1 different values for K1 . In general, the largest lower supports were obtained when K1 → . e Regarding Model 7, we include the following additional observations: 38
12. As can be seen in Table 9, there exist combinations of the parameters p1 , p2 that give lower supports with a magnitude in the range 4.5 − 5 for N = 600 − 1000, with an error smaller than 1% in the roots of all the sample standardized moments up to order 10. If we just admit that the 9th root of the 9th sample standardized moment can increase by 1.5%, and that the 10th root of the 10th moment can increase by 2% we can obtain lower supports close to −5 for all N = 600 − 1000. 13. The moments of the density function plotted in Figure 24 are in good agreement with the corresponding sample standardized moments, but the differences with the histogram are appreciable. On the other hand, the density functions given by Model 4 practically coincide (except for the magnitude of the support) with the one given by Model 7. This indicates that the discrepancy between the density function given by Model 7 and the corresponding histogram is not a particular problem of this model, but of any continuous model. Indeed, Figure 16 and the histogram in Fig. 24 show that when N = 300, the problem is still strongly discrete close to the boundary of the lower support. 14. It is necessary to admit relative discrepancies up to 4% in the higher order moments in order that Model 7 gives a lower support in agreement with the corresponding sample value. Although these discrepancies may exist between the sample standardized moments and the real moments of V , this “malfunction” could be caused by the fact that the problem is still too discrete close to the boundary of the lower support (see Figure 16). Anyway, the agreement between the density function and the histogram displayed in Figure 25 is better than in Figure 24. 15. For N ≥ 500, it could be said that the agreement between the density functions given by Model 7 and the corresponding histograms is excellent (see Figures 26-31). 16. According to Model 7, when N → +∞ the magnitude of the lower and upper supports, the parameters p1 , p2 and their range of variation contract with respect to their maximum values, which are achieved approximately at N ' 600. This is not a particular feature of Model 7, but all the models indicate this contraction, which corresponds to the asymptotic decrease of all the standardized moments, which can be observed in Figure 8 (right). Let us summarize the main features of the random variable V that the developed study suggests: the problem starts with a single minimum (up to rotations) when N = 2. For N ' 100 there is a small amount of local minima and the probability distribution of V has the typical aspect of a discrete distribution. Moreover, there exists a handful of bad isolated minima with extremely low appearance probabilities. When N grows, the amount of minima grows enormously and their energy values are very close, so the aspect of the probability distribution of V tends to be that of a continuous function. When N ' 600, the discrete character of the problem almost disappears (as for the energy values) and the standardized supports reach their maximum values. After that, the probability of obtaining bad isolated minima vanishes and the upper support decreases. When N goes to +∞, the 39
standardized moments and the supports stabilize and the probability distribution V tends to the continuous probability distribution V. We have shown that there exist continuous biparametric models that reproduce this behavior, that adjust all the available sample information well and that are consistent with the hypothesis of a polynomially bounded asymptotic computational cost for the Fekete problem. There also exist continuous models that adjust the data well under the assumption of an asymptotic computational cost bounded by the exponential of a power of N . These models require the inclusion of at least three parameters that depend on N . In any case, the existence of this last class models does not contradict the hypothesis of polynomial cost, because if the cost is polynomially bounded, it is of course also exponentially bounded. Moreover, both kinds of models have some difficulties adjusting the histograms corresponding to small N . This is essentially due to the fact that for these N the problem is still too discrete to be reproduced by a continuous model, and does not exclude any of the models. In any case, Model 7 gives good estimations of the lower supports for N = 200, 300, 400 when it is used to adjust the sample standardized moments. Taking into account all the data and reasonings presented here, no arguments have been found for which Model 7 must be rejected, and it gives the best adjustment of the sample information among all the considered models. Consequently, we consider that the plausibility of a probabilistic positive solution to Smale’s 7th problem has been established. In particular, if Model 7 is taken as a valid approximating model, then α ' −4, so the formula 1 4 1 N 2 − N log N − 0.02685N + 0.2538 IN (ωN ) ' − log 4 e 4
gives an estimation of the asymptotic minimum logarithmic energy on the 2-sphere and, moreover, taking into account Table 9 and the results shown in [4], the asymptotic average computational complexity of finding a c-minimum with arbitrarily high probability of success is polynomially bounded by about N 15 . Regarding the constant c, we can observe for instance that taking c = 0.0513, the around 3 · 107 values of I obtained for N = 500 satisfy the condition I − Imin ≤ c log 500, where Imin is the best found energy. For N big enough, arbitrarily small values for c > 0 can be considered. For more details about the results presented here, see the Ph.D. of J.M. Gesto, [7].
Acknowledgements The authors express their sincere gratitude to Professor Antonio Gens for his unconditional support. We are also pleased to thank Professors Agust´ın Medina and Juan S´anchez for their kind comments and the research group Lacan from UPC, in whose cluster “Clonetroop” some of the intensive calculations performed for this work have been carried out. This work has been partially supported by the CICYT under project MTM2007-62551, by the i-MATH project and by CESGA.
40
References [1] M. Atiyah and P. Sutcliffe, Polyedra in Physics, Chemistry and Geometry, Milan J. Math., 71 (2003), 33-58. [2] C. Beltr´an and L.M. Pardo, On Smale’s 17th problem: a probabilistic positive solution, Found. Comput. Math., (2007), DOI: 10.1007/s10208-005-0211-0. [3] E. Bendito, A. Carmona, A.M. Encinas and J.M. Gesto, Estimation of Fekete points, J. Comput. Phys., 225 (2007), 2354-2376. [4] E. Bendito, A. Carmona, A.M. Encinas and J.M. Gesto, Computational cost of the Fekete problem I: The Forces Method on the 2-Sphere, preprint, accessible in http://www-ma3.upc.edu/users/bencar/papers.html. [5] M.J. Bowick, http://phy.syr.edu/condensedmatter/thomson/thomsonapplet.htm. [6] T. Erber and G.M. Hockney, Comment on “Method of Constrained Global Optimization”, Phys. Rev. Lett, 74 (8) (1995), 1482. [7] J.M. Gesto, Estimation of Fekete points, Ph.D. Departament de Matem`atica Aplicada III, Universitat Polit`ecnica de Catalunya, (2008), accessible in http://www-ma3.upc.edu/users/bencar/papers.html. [8] L. Giomi and M. Bowick, Crystalline order on Riemannian manifolds with variable Gaussian curvature and boundary, Phys. Rev. B 76, (2007), 054106. [9] D.P. Hardin and E.B. Saff, Discretizing manifolds via minimum energy points, Notices Amer. Math. Soc., 51 (2004), 1186-1194. [10] J.B. Hiriart-Urruty, A new series of conjectures and open questions in optimization and matrix analysis, to appear in ESAIM Control Optim. Calc. Var., accessible in http://www.mip.ups-tlse.fr/publis/files/07.21.pdf. [11] A.B.J. Kuijlaars and E.B. Saff, Asymptotics for minimal discrete energy on the sphere, Trans. Amer. Math. Soc., 350 (1998), 523-538. [12] P.M. Pardalos, An open global optimization problem on the unit sphere Journal of Global Optimization, 6 (1995), 213. [13] A. P´erez-Garrido, J.W. Dodgson and M.A. Moore, Influence of dislocations in Thomson’s problem, Phys. Rev. B, 56, (1997), 3640. [14] E.A. Rakhmanov, E.B. Saff and Y. Zhou, Minimal discrete energy on the sphere, Math. Res. Lett., 1 (1994), 647-662. [15] E.B. Saff and A.B.J. Kuijlaars, Distributing many points on a sphere, Math. Intelligencer, 19 (1997), 5-11.
41
[16] M. Shub and S. Smale, Complexity of Bezout’s theorem I: geometric aspects, Journal of the Amer. Math. Soc., 6 (1993), 459-501. [17] M. Shub and S. Smale, Complexity of Bezout’s theorem II: volumes and probabilities, In F. Eysette and A. Galligo (Eds.), Computational Algebraic Geometry, Volume 109 of Progress in Mathematics (1993), 267-285. [18] M. Shub and S. Smale, Complexity of Bezout’s theorem III: condition number and packing, Journal of Complexity, 9 (1993), 4-14. [19] M. Shub and S. Smale, Complexity of Bezout’s theorem V: polynomial time, Theoretical Computer Science, 133 (1994), 141-164. [20] M. Shub and S. Smale, Complexity of Bezout’s theorem IV: probability of success; extensions SIAM J. of Numer. Anal., 33 (1996), 128-148. [21] S. Smale, Mathematical problems for the next century, Math. Intelligencer, 20 (1998), 7-15. [22] S. Smale, Complexity Theory and Numerical Analysis, Acta Numerica, 6, (1997), 523551. [23] Y. Zhou, Arrangements of points on the sphere, Ph.D. Department of Mathematics, University of South Florida, (1995).
42