Monte Carlo Methods - Semantic Scholar

Chapter 3

Monte Carlo Methods Jerome Spanier

3.1 Introduction Monte Carlo methods comprise a large and still growing collection of methods of repetitive simulation designed to obtain approximate solutions of various problems by playing games of chance. Often these methods are motivated by randomness inherent in the problem being studied (as, e.g., when simulating the random walks of “particles” undergoing diffusive transport), but this is not an essential feature of Monte Carlo methods. As long ago as the eighteenth century, the distinguished French naturalist Compte de Buffon [1] described an experiment that is by now well known: a thin needle of length l is dropped repeatedly on a plane surface that has been ruled with parallel lines at a fixed distance d apart. Then, as Laplace suggested many years later [2], an empirical estimate of the probability P of an intersection obtained by dropping a needle at random a large number, N , of times and observing the number, n, of intersections provides a practical means for estimating . The relationship is 2l d 2l PO d

P D

or

where PO D n=N and we assume that l < d . We introduce another example that can be used to illustrate several important features of Monte Carlo simulation. This “model” transport problem, one of the simplest random walk problems one might imagine, can be solved completely without resorting to sampling at all and yet exhibits characteristics of problems that are typical of more complex particle transport. The study of such model problems is crucial in obtaining a deeper understanding of the basic principles that underlie Monte Carlo methods. J. Spanier () Beckman Laser Institute, University of California, Irvine, California, USA e-mail: [email protected]

Y. Azmy and E. Sartori, Nuclear Computational Science: A Century in Review, c Springer Science+Business Media B.V. 2010 DOI 10.1007/978-90-481-3411-3 3,

117

118

J. Spanier

We imagine particles (random walkers) that are assumed to impinge on the left face of a vertical slab of unit thickness. Each particle moves only to the right in steps selected at random from a uniform distribution on [0,1] until it escapes from the slab. Let X be the number of steps required to escape. The problem is to estimate EŒX , the average number of steps to escape, where EŒX is the expectation of the random variable X . For this problem, one expresses the expectation as an infinite series, the nth term of which is the product of n and the probability pn that the particle escapes after exactly n steps. This infinite series representation of EŒX is precisely analogous to the Neumann series representation of the solution of the transport equation that describes this problem, as well as so many problems that are solved using Monte Carlo methods. Making use of the fact that for this simple problem, pn D 1 = n.n 2/Š for n D 2; 3; : : : the infinite series can be summed exactly, which yields EŒX D e Š 2:71828 and the variance of X is 2 ŒX D 3e e 2 Š 0:76579: This very simple test problem provides a very useful vehicle for analyzing more complex Monte Carlo random walk problems for which exact values of the moments of key random variables will be unknown in general. Of course, much more accurate and efficient (deterministic) methods may be used to estimate both and e. However, even these very simple simulation problems serve to illustrate a number of key ingredients of the Monte Carlo method: (1) The need to generate random samples drawn from a variety of probability distributions; (2) the need to express the outcomes of a Monte Carlo experiment as estimates of a theoretical expected value of some random variable; (3) the need to perform an error analysis based on statistical fluctuations of a random variable from sample to sample; and (4) where possible, the desirability of reducing the sample to sample fluctuations by thinking deeply about the inherent cause of these fluctuations and taking appropriate measures to reduce them. For example, in the needle-tossing experiment, why not toss a cruciform-shaped needle, or even one with many needles of the same length equally distributed around the circumference of a circle of radius l? This would seem to provide a more efficient experiment since each toss produces many possible intersections, yet each “spoke” retains the same distributional characteristics as each single needle toss in the original experiment. But clearly these more sophisticated “needles” produce correlated sampling results. How does this affect the statistical analysis of the outcomes? Some of these questions are addressed in the interesting references [3–5]. For the one-dimensional random walk problem, one might imagine that a more systematic sampling (instead of random sampling) of the unit interval to obtain step sizes for each step to the right would lead to reduced statistical fluctuations in the Monte Carlo estimate. Suppose, then, that one were to subdivide the unit interval

3

Monte Carlo Methods

119

into a large, but fixed, number, S , of equal subdivisions and choose as individual step sizes the midpoints, say, of these subintervals ordered deterministically. For example, one could move through these in order, going from the least to the greatest, to generate steps for the random walkers. Pretty clearly, this choice is not a very good one since, if S is very large compared to the number of samples generated, there would be a bias in the direction of short steps. Perhaps one should average a small step with a large one or run through the midpoints randomly. Again, what about the correlation introduced? And what is the impact of any such scheme on the sample to sample variability? In addition to containing many of the same critical features of most Monte Carlo problems, these simple model problems illustrate some of the advantages of the Monte Carlo method: (a) its appeal to intuition; (b) its simplicity of implementation; and (c) its accessibility to nonexperts. However, as more and more sophistication is considered in an attempt to speed up the computation or to reduce the sampling variability per unit computing cost, the model for the experiment can depart more and more from intuitive plausibility, and the need for mathematical rigor is accentuated. A firm theoretical foundation becomes not only desirable, but also essential. There are many problems for which analytic or good deterministic methods are simply unavailable that benefit from being formulated stochastically. Particle transport problems provide a fertile field of examples of such problems, and this field will be our main emphasis here. Since the earliest applications of Monte Carlo methods to neutron transport problems, however, the number and range of applications have grown well beyond the bounds of a single book chapter. The field of operations research is rife with such examples, which we do not discuss here (see, e.g., [7]). We have deliberately ignored the discussion of developments in computer architecture (e.g., use of parallel or vectorized computation) and many of the rapidly growing list of important application areas, such as financial modeling, design of radiation therapy plans, Markov chain Monte Carlo, and others. Because of the literal explosion in the number of applications of Monte Carlo methods and the avalanche of publications dealing with both the theory and applications, there is no possibility of dealing with the subject comprehensively here. Finally, we presume that the reader is familiar with at least the rudiments of Monte Carlo. Several books and articles can provide an introduction to the subject matter (e.g., [8–13]). For those more familiar with at least the rudiments, a review of [14] would provide an appropriate introduction since our ultimate goal here is to update that article.

3.2 Organizing Principles Initially, a historical context was suggested by the organizers for each of the Gelbard lectures. Each lecture was intended to survey one of the topics traditionally important to the Mathematics and Computation Division of the American Nuclear Society in such a way as to update the 1968 publication of [14]. While this context seemed

120

J. Spanier

to serve well for the lectures at the Gatlinburg conference – at least the chronology featured prominently in the oral accounts – a division according to subject content seemed more appropriate for the written version. Our hope is that this switch in perspective might make the chapter more useful as a reference. Accordingly, we have abandoned the idea of dividing the content into two historical periods, one prior to 1968 and the other afterward. Instead, the material here will be organized into what we perceive to be the key elements of Monte Carlo methods development: generating sequences, error analysis, error reduction, and theoretical foundations. Here is a brief overview of what the reader might expect to find.

3.2.1 Generating Sequences Section 3.4, dealing with generating sequences, shows that the relatively simple algorithms used to create a sequence of unpredictable, “random” numbers for early use in simulation have evolved into several quite different streams of research. These involve both modern-day successors to the earliest pseudorandom number generators and algorithms that are completely deterministic, focusing only on uniformity in a manner that forsakes the idea of stochastic independence altogether.

3.2.2 Error Analysis For the analysis of errors in Monte Carlo output, we will discuss in Section 3.5 the evolution, in the case of pseudorandomly generated samples, from reporting the sample mean and standard deviation to the now fairly common use of higher moments and additional statistical tests (see [15]). When completely deterministic sequences are used in place of pseudorandom ones, and no probabilistic model is invoked, however, we will see that a markedly different error analysis must be applied.

3.2.3 Error Reduction Perhaps the greatest concentration of effort to improve Monte Carlo methods has been devoted to the topic of error reduction in the last 50 years or so. Whether the simulation makes use of pseudorandom number sequences or their deterministic cousins – quasi-random sequences – highly sophisticated techniques have been developed that are capable of producing not only increased rates of convergence, but also much lower error levels for a fixed sample size. Some of these developments are described in Section 3.6.

3

Monte Carlo Methods

121

3.2.4 Foundations/Theoretical Developments In discussing this last of our “big four” topics in Section 3.7, we shall attempt to list the major advances in understanding the foundations of the subject. Our (admittedly biased) perspective is that these have had a great deal to do with the practical advances made since the earliest uses of simulation methods to solve complex problems. Following our discussion of the development of each of these four major themes, we try to formulate at the close of each section, a succinct summary of the present state of the art for each such theme. We end this chapter with a short list, presented in Section 3.8, of major challenges that might serve to stimulate further thinking.

3.3 Historical Perspectives Before we pursue our discussion of each theme, we provide an overview of the early history of the subject. Much has already been written about the development of a nuclear weapons program whose history began with the famous experiments conducted during World War II at the University of Chicago. These culminated on December 2, 1942 with the first controlled nuclear chain reaction on a squash court situated beneath Chicago’s football stadium. Working later in Los Alamos, New York, Oak Ridge, and other locations, Enrico Fermi and others (for instance, Stanislaw Ulam, John von Neumann, Robert Richtmyer, and Nicholas Metropolis) played an important part in solving the problems connected with the development of the atomic bomb. This work depended crucially on rudimentary numerical simulations of multiplying neutron populations. It is widely held that the first paper on the Monte Carlo method was [16]. The marriage of relatively unsophisticated simulation methods with the development of automatic digital computers provided the key ingredients for success in this rapidly expanding wartime undertaking. Shortly after the end of World War II another major effort was initiated, spearheaded by Admiral Hyman Rickover, to design and implement nuclear propulsion systems for the Navy. This highly successful program began in the 1940s, with the first test reactor started up in the United States in 1953. The first nuclear-powered submarine – the USS Nautilus – put to sea in 1955. This development marked the transition of submarines from slow underwater vessels to warships capable of sustaining 20–25 knots while submerged for weeks at a time. The success of the Nautilus effort led to the development of additional submarines, each powered by a single pressurized water reactor, and an aircraft carrier, the USS Enterprise, powered by eight reactor units, in 1960. A cruiser, the USS Long Beach, was placed into service in 1961 and by 1962 the United States boasted a nuclear fleet of 26 operational nuclear submarines with 30 more under construction. Today, the US Navy operates more than 80 nuclear-powered ships including 11 aircraft carriers and a number of cruisers.

122

J. Spanier

The postwar activities to produce nuclear propulsion units for a nuclear navy were concentrated primarily at the Westinghouse-managed Bettis Atomic Power Laboratory in suburban Pittsburgh, PA and the General Electric-managed Knolls Atomic Power Laboratory in Schenectady, New York during the 1950s and 1960s. In that same time frame there was a very rapid expansion in the capability of digital computers at these and at the nation’s National Laboratories at Los Alamos, Oak Ridge, Brookhaven, Argonne, and Livermore. The development of the giant (using nearly 18,000 vacuum tubes) ENIAC machine by John W. Mauchly and J. Presper Eckert at the University of Pennsylvania during the period 1943–1946 and the EDVAC computer, for which conceptual design was completed in 1946 but which was not fully operational until 1952, led the way. It is perhaps no coincidence that von Neumann and Metropolis were heavily involved in both the development of modern computing machines and the Monte Carlo method as a practical numerical method. As a result of this dual evolution of science and technology, the same period was marked by a dramatic increase in the levels of sophistication of computations in support of the nuclear energy program, including the design and development of reactors for peacetime uses. The commercial development of nuclear power-generating plants provided added incentives for acceleration of this effort. As a result, Monte Carlo methods began to find greater use as a design tool and as a partial replacement for the much more expensive (and risky!) criticality experiments that were needed to validate the various nuclear designs. An unfortunate consequence of the fact that much of the work done was classified during this early period is that publication occurred mainly in classified government reports rather than in the open literature. This undoubtedly prevented many important ideas from becoming known to a much wider audience sooner. Indeed, many of these reports, such as the seminal work of Herman Kahn [17, 18], were never republished.

3.4 Generating Sequences It seems appropriate to begin our review of this topic with an often cited quotation due to R.R. Coveyou [19] – “Random number generation is too important to be left to chance.” Indeed, this statement, which was certainly valid in 1969 when a good deal of effort was being devoted to try to understand how to generate high quality pseudorandom numbers, is even more accurate today. This is so because there are very sophisticated new methods for generating pseudorandom sequences as well as methods for generating completely deterministic uniform sequences that serve in place of pseudorandom ones. We will deal with such deterministic sequences1 in Section 3.4.2. 1 Of course, even pseudorandom sequences are deterministic, a fact that has stirred some debate about whether a probabilistic analysis made any sense at all for pseudorandomly implemented Monte Carlo. This, in turn, was one of the motives for developing a mathematically rigorous analysis divorced from probability theory and based only on a notion of uniformity arising out of number-theoretic considerations.

3

Monte Carlo Methods

123

3.4.1 Pseudorandom Sequences Pseudorandom numbers are commonly understood to be computer substitutes for “truly random” numbers. Their function is to simulate realizations of independent, identically distributed (iid) random variables on the unit interval. Other distributions that might be needed in a stochastic simulation are then obtained by transformation methods, of which there are many (see [20] and the software package C-Rand [21, 22]). The most common early source of pseudorandom numbers was the multiplicative congruential generator i C1 a i .mod m/ originally suggested by Lehmer [23], or one incorporating an additive component i C1 Œa i C b.mod m/: The pseudorandom numbers themselves are then defined through division by the modulus: ri D i =m 2 Œ0; 1: When employing numbers generated by such linear congruential generators, appropriately chosen “seeds” 0 , moduli m, and multipliers a provide the means to generate reasonably high quality pseudorandom sequences, with sufficiently long periods for many problems (see [24]). Following Lehmer’s original suggestion to use such sequences for simulation, substantial effort and analysis was invested in assessing their imperfections, such as the persistence of serial correlation [25, 26]. Knuth [24] devised a number of statistical tests to bolster confidence in using pseudorandom sequences for generating samples from the many distributions required during the course of a simulation experiment. For much of the early history of Monte Carlo computations, the emphasis was on obtaining results, not on looking carefully at the theoretical underpinnings of the computations. When using linear congruential generators as a source of nearly independent and nearly uniform realizations of independent and identically distributed uniform distributions on the unit interval, it seemed sufficient to assure sufficiently long periods of pseudorandom sequences with serial correlation properties that seemed “safe.” The ideal desiderata for pseudorandom sequences were maximal periods and minimal “structure.” Here, the word structure is being used nearly synonymously with predictability. And indeed, recursive multiplications of large integers produces, as remainders, numbers that work surprisingly well in this regard for many Monte Carlo problem applications. In 1967, however, George Marsaglia published his seminal paper [27] with the surprising conclusion that all multiplicative congruential generators possess a defect that makes them unsuitable for use in many applications. Furthermore, this defect cannot be eliminated by adjusting the starting values, multipliers, or moduli of the congruence. The fatal flaw revealed by Marsaglia’s paper is that if successive n-tuples .u1 ; u2 ; : : :; un /, .u2 , u3 , : : :, unC1 ), : : : produced by the generator are

124

J. Spanier

treated as points in a unit cube of dimension n, then all of these points will lie on a relatively small number of parallel hyperplanes. If one plots in two or three dimensions a set of successive pairs or triples of numbers obtained from a linear congruential generator, this hyperplane structure is readily apparent. In applications requiring as much uniformity as possible in two (or more) dimensional space, not just on each axis separately, this would be a crucial factor. For example, it is easy to imagine specific piecewise linear functions of a single variable whose integrals – estimated crudely by selecting a pair of points randomly in the unit square and counting the fraction that fall below the graph of the function – would not be estimated very accurately even after many samples were generated. After nearly 20 years of essentially unquestioning use of such generators, the stage had been set for an explosion of effort and analysis to create improved pseudorandom sequences as well as methods that do not rely on randomness at all. After all, as John von Neumann stated “Anyone who considers arithmetical methods of producing random numbers is, of course, in a state of sin” (quoted in [24]). There are many excellent sources of additional material about pseudorandom numbers. Beginning with Knuth’s book [24], which may be the best reference for a rigorously mathematical discussion, another more recent reference, which belongs to any basic library on the subject is Harald Niederreiter’s book [10] based on lectures he gave at a CBMS-NSF Regional Conference held at the University of Alaska, Fairbanks in 1990. In addition to these standard reference works, there are now several Internet web sites that provide not only lists of references but also links to various other sites, algorithms, and a variety of other relevant information. A good choice among these is the one maintained at the University of Salzburg at http://random.mat.sbg.ac.at. Two important developments characterize the past 30 years of progress with respect to generating sequences. The first of these is the exhaustive, rigorous analysis of uniform pseudorandom number algorithms, including linear and nonlinear congruential generators and the emergence of a host of other ideas for generating sequences that behave nearly randomly. The second development that occurred more or less in parallel with the first was much more radical. This involved the abandonment of randomness as a requirement for decision making in favor of reliance on uniformity in an appropriately defined high-dimensional space. In this latter area of research, number-theoretic methods are used for generating and analyzing optimally regular sequences. The use of such “quasi-random,” as opposed to pseudorandom numbers, in turn, can produce more rapid convergence of sample means to theoretical means for many practical problems (see Section 3.5.2). Following the discovery by Marsaglia of the parallel hyperplanes phenomenon characteristic of linear congruential generators it became clear that many simulations could be adversely affected by this behavior. As a result, much effort was devoted to careful theoretical analyses of the behavior and fundamental properties of uniform pseudorandom number generators and algorithms. Excellent material describing these developments can be found in [28–34] and in [10] and [35, 36]. A detailed exposition of this topic alone is beyond the scope of this chapter. The reference [28] lists 147 relevant papers, most of them published during the

3

Monte Carlo Methods

125

10-year interval, 1985–1995. However, we try to summarize the salient facts here to provide a flavor of the material to be found in the literature. Taking into account the important, indeed, pivotal role played by pseudorandom numbers in any Monte Carlo simulation, the significance of these investigations can hardly be overemphasized. Computer algorithms for generating uniform pseudorandom numbers all yield periodic sequences. Naturally, it is desirable that these sequences have long periods, good equidistribution properties, minimal serial correlation, and as little intrinsic structure as possible. Obviously, it is also important that the algorithm be amenable to a fast computer implementation. The purpose of a pseudorandom number generator is to simulate independent, identically distributed (iid) random variables that are uniform in the unit interval. Stringing s such numbers together in sequence then simulates iid uniform random variables in Œ0; 1s for every integer s. Clearly, this cannot be the case for every s since there are, in any event, only finitely many numbers available from any pseudorandom generator. It remains, then, to find sensible criteria for assessing the quality of a given pseudorandom generator and to construct specific generators that satisfy these criteria. Presumably, this will provide a measure of the evenness of the distribution of successive s-tuples in all dimensions s up to some sufficiently large integer or some carefully selected subset of dimensions, if these can be identified. A common quantification of quality is the discrepancy between the empirical distribution (i.e., one based on sample, rather than theoretical, averages) of a point set, such as the set of all s-tuples of pseudorandom numbers) and the uniform distribution over Œ0; 1s . This same notion of discrepancy will play a fundamental role in analyzing so-called quasi-random sequences – sequences chosen solely based on their uniformity properties rather than on any probabilistically – inspired ones. Definition 1. For any N points u0 ; u1 ; : : : ; uN 1 in the s-dimensional unit cube I s D Œ0; 1s , s 1, their discrepancy is defined by DN .u0 ; u1 ; : : : ; uN 1 / D sup jEN .J / V .J /j J

where the supremum is extended over all subintervals J of I s with one vertex at the origin, EN .J / is N 1 times the number of points among the u0 ; u1 ; : : : ; uN 1 that lie in J and V .J / is the s-dimensional volume of J. Evidently, the first study of the concept of discrepancy is in a paper of Bergstrom [37]. The term “discrepancy” was most likely coined by van der Corput and the first intensive study of it appeared in a paper of van der Corput and Pisot [38]. The discrepancy provides the statistical test quantity for the s-dimensional Kolmogoroff test, which is a goodness-of-fit test for the empirical distribution of initial segments of a pseudorandom sequence. For fixed N and theoretically random .u0 ; u1 ; : : : ; uN 1 / 2 Œ0; 1, the distribution of DN .u0 ; u1 ; : : : ; uN 1 / is known, which gives rise to various formulas for ProbfDN .u0 ; u1 ; : : : ; uN 1 / tg;

0 t 1:

126

J. Spanier

For s D 1 one obtains a test of one-dimensional uniformity (equidistribution in [0,1]), while for s > 1 one obtains a test of higher dimensional uniformity (which encompasses the independence, or lack thereof, of successive pseudorandom numbers). As we saw earlier, the classical method for generating pseudorandom numbers is the linear congruential method i C1 Œa i C b.mod m/: where m is always chosen to be a large integer and 0 , a, and b are suitably chosen integers. Algorithms based on this simple idea have been widely studied [10, 24, 34, 40–43], especially in the past 40 years. In spite of well-founded criticism of linear congruential generators stemming from the hyperplane structure they exhibit, they remain the most commonly used in practice, especially in standard computer software libraries. There are several reasons for this state of affairs: Linear congruential algorithms are fast. Alternative generators are essentially al-

ways considerably slower. Many large Monte Carlo programs that undergo periodic revision rely on repro-

ducing output from test problems run with previous versions of the program. Introducing change in the basic pseudorandom sequence employed would destroy the reproducibility of results obtained with the earlier versions of the program. Those responsible for maintaining Monte Carlo programs may be unaware of the potential danger in using pseudorandom number generators that were devised many years earlier, generators that may well have outlived their usefulness. Unfortunately, many of the “default” generators currently available in popular computer software are old and could be dangerous to use in modern applications that require much larger sample sizes than might have been needed earlier. An increasing demand for parallel or vector pseudorandom streams of numbers has aggravated the problem as well since these often make use of subsequences of sequences that may not have sufficiently long periods to justify this subdivision. For example, consider the simple generator i C1 Œa i C b.mod m/ with ri D i =m and the popular choices m D 231 – 1, a D 16; 807. The period length is 231 – 2, which is judged to be too small for serious applications [44, 45]. Furthermore, this generator has a lattice structure with rather large distances between the small number of hyperplanes that contain the higher dimensional s-tuples, which could easily prove to be a problem in simulation results [46, 47].

3

Monte Carlo Methods

127

One method for controlling the lattice structure is to combine linear recursive generators [48]. For example, beginning with two such linear recurrences x1;n D .a1;1 x1;n1 C C a1;k x1;nk /.mod m1 / x2;n D .a2;1 x2;n1 C C a2;k x2;nk /.mod m2 / where k, the mj ’s, and the ai;j ’s are fixed integers, one can define the pseudorandom numbers by rn D .x1;n =m1 x2;n =m2 /.mod1/: This is an example of a combined linear multiple recursive generator. At step n this generator produces the 2k-dimensional vector .x1;n ; : : : ; x1;nkC1 ; x2;n ; : : : ; x2;nkC1 /, whose first k components lie in f0; 1; : : : ; m1 1g and whose last k components lie in f0; 1; : : : ; m2 1g. It can be shown in [24, 44] that this generator gives good coverage of the unit hypercube Œ0; 1s for all dimensions s k. For dimensions higher than that there is the usual lattice structure, but parameters can be chosen that make the distance between the parallel hyperplanes of this lattice quite small. L’Ecuyer [47] recommends parameter choices that produce a generator with two main cycles of length 2192 each and whose lattice structure in dimensions up to 48 has been found to be excellent. This would seem to make it a good choice for many applications. The lattice structure that Marsaglia [27] discovered plagues every linear congruential generator. Inversive congruential generators (introduced by Eichenauer and Lehn [48] in 1986) were designed to overcome this difficulty. By analogy with linear congruential methods, inversive congruential sequences are defined by the nonlinear congruence i C1 Œa Ni C b.mod p/ with i Ni 1.mod p/ and

ri D i =p 2 Œ0; 1:

In these equations, p is a prime modulus, a is a multiplier, b is an additive term, and 0 is a starting value. From these definitions it follows that the i take values in the set f0; 1; : : : ; p 1g. We denote this generator by ICG .p; a; b; 0 /. A key feature of the ICG with prime modulus is the absence of any lattice structure, in sharp contrast to linear congruential generators. Figure 3.1 is a plot of pairs of consecutive pseudorandom numbers .rn ; rnC1 / generated by ICG .231 1; 1288490188; 1; 0/ concentrated in a region of the unit square near the point (0.5, 0.5). The extra inversion step dramatically reduces the effect of the parallel hyperplanes phenomenon. For example, an inversive congruential algorithm modulo p D 231 – 1 passes a rather stringent s-dimensional lattice test for all dimensions s 230 , whereas, for the linear congruential algorithm with this same modulus, it is difficult to guarantee a nearly optimal lattice structure for s 10 [10]. Inversive congruential algorithms also display better behavior for a rather severe test of serial correlation. They also enjoy robustness with respect to the choice of parameters. These algorithms are promising candidates for parallelization because, unlike

128

J. Spanier

t000_102.out 0.5004 0.5002 0.5 0.4998 0.4996 0.4996 0.4998

0.5

0.5002 0.5004

Fig. 3.1 Points generated by an inversive congruential generator

linear congruential generators, they do not have long-range correlation problems. The only downside to their use is that they are substantially costlier to produce (by approximately a factor of 8) than their linear congruential counterparts because of the relative costliness of computing multiplicative inverses in modular arithmetic. In addition to these linear and inversive congruential generators for uniform pseudorandom numbers, several other classes have been studied – some quite extensively – in an attempt to circumvent the problems perceived with the classical ones. Of course, no single generator can prove to be ideal for all applications. Any single generator can satisfy only a finite number of randomness tests. This, along with the finiteness of the set of computer numbers, means that a test can always be devised that cannot be passed by a specific generator. It has been said that pseudorandom number generators are like antibiotics in that respect: no one is appropriate for all tasks. What is needed is an arsenal of possible choices with distinct properties. If two very different generators yield the same outcome in a simulation, then additional confidence is gained in the result. In addition to the generator families mentioned above, there are laggedFibonacci, generalized feedback shift register, matrix, Tausworthe and other classes of generators, and combinations of these. There are also various nonlinear generators. Literally thousands of publications dealing with these topics have appeared in print in the last 30 years. In [49], L’Ecuyer suggests the following generator families as having the essential requirements of good theoretical support, extensive testing, and ease of use: the Mersenne twister [50], the combined multiple recursive generators of L’Ecuyer [47], the combined linear congruential generators of L’Ecuyer and Andres [51], and the combined Tausworthe generators of L’Ecuyer [52]. More information can be found on the web pages: http://www.iro.umontreal.ca/ lecuyer http://random.mat.sbg.ac.at http://cg.scs.carleton.ca/ luc/rng.html http://www.robertnz.net/

3

Monte Carlo Methods

129

3.4.2 Quasirandom Sequences In the past 35 years or so there has been a surge in the number of publications (see, e.g., [10, 53–58]) that recommend the use of quasi-random sequences (i.e., sequences more regular than pseudorandom ones) in place of pseudorandom sequences for difficult problems. There is no universally accepted, rigorous definition of the term “quasi-random sequence,” but it has come to mean sequences with low discrepancy (see definition of discrepancy: Definition 1 of Section 3.4.1). Indeed, the terms quasi-random and low-discrepancy sequences are frequently used synonymously. These quasi-Monte Carlo methods, as they are sometimes called, have become the methods of choice recently for many problems involving financial modeling [59–62], radiosity, and global illumination problems [54, 55, 63, 64] and other applications. Quasi-random methods offer the potential for improved asymptotic (i.e., for sufficiently large sample size) convergence rates when compared with pseudorandom methods and have performed even better than can easily be explained by existing theory in many applications. Additionally, deterministic rather than statistical error analysis can be applied to their use, even though sharp error bounds are not easily obtained. Consequently, a sizeable research effort is presently devoted to obtaining a deeper understanding of the potential of quasi-Monte Carlo methods. In spite of this potential, most traditional nuclear applications continue to depend upon Monte Carlo programs, such as MCNP [15], that rely on pseudorandom sequences. An important reason for this is that a completely different (and more complex) error analysis must be applied for simulations based on quasi-random sequences. Also, the generation of quasi-random sequences is costlier, in general, than that for pseudorandom sequences. In return for the extra computation, quasi-random sequences offer the prospect of accelerated convergence rates when compared with pseudorandom Monte Carlo. There is sufficient complexity involved in deciding which method might be best to use for a given problem, however, that no hard and fast rule can be applied. In any case, the development of simulation methods based on the use of quasi-random sequences represents, in our view, one of the most important lines of Monte Carlo research since 1968. As stated above, the term low-discrepancy sequences is normally used in place of quasi-random sequences to characterize the highly regular sequences used in quasi-Monte Carlo implementations. The definition of discrepancy (Section 3.4.1) provides a quantitative measure of the regularity of a sequence of s-dimensional points. For the case s D 1 we are concerned with either a finite set u0 ; u1 ; : : : ; uN 1 or with the first N members of an infinite sequence u0 ; u1 ; : : : of points drawn from the unit interval [0,1]. In this case the discrepancy reduces to DN .u0 ; u1 ; : : : ; uN 1 / D sup jEN .J / V .J /j; J

where the supremum is extended over all subintervals J of [0,1] with one vertex at the origin, EN .J / is N 1 times the number of points among the u0 ; u1 ; : : : ; uN 1 that lie in J and V .J / is simply the length of the interval J .

130

J. Spanier

A useful example of a low-discrepancy infinite sequence in [0,1) is the Van der Corput sequence [65], which is defined by

2 .n/ D

N X

aj .n/2j 1 ;

(3.1)

aj .n/2j :

(3.2)

j D0

where nD

N X j D0

These formulas produce 2 .1/ D 1=2, 2 .2/ D 1=4, 2 .3/ D 3=4, 2 .4/ D 1=8,

2 .5/ D 5=8; : : : and the numbers f 2 .n/g1 n D 1 systematically run through the multiples of 2k without duplicating any that arose earlier. Such numbers are much more uniformly distributed in the unit interval than are pseudorandom numbers. In similar fashion, one can define the radical inverse function for any number base b by

b .n/ D

N X

aj .n/b j 1 I

(3.3)

j D0

it enjoys properties very similar to the b D 2 case when b is a prime larger than 2. k without duThat is, f b .n/g1 n D 1 systematically runs through the multiples of b plication for any prime b. The Halton sequence [66] is an infinite, s dimensional, low-discrepancy sequence defined by f b1 .n/; b2 .n/; : : : ; bs .n/g, where b1 ; b2 ; : : : ; bs are relatively prime in pairs (e.g., the first s primes). It is useful for generating very uniform sdimensional vectors, as when random walks in an s-dimensional phase space are required. In Fig. 3.2, a visual comparison is made of 2,000 pseudorandom pairs (left) with 2,000 Halton pairs (right). Another family of low-discrepancy sequences, called lattices, has been especially useful for integrating periodic functions. Their ancestor is the number-theoretic method of good lattice points, developed by Korobov [67] and Hlawka [68] for the approximate evaluation of integrals over I s D Œ0; 1s under the assumption that the integrand is 1-periodic in each variable. Lattice methods, or lattice rules, generalize and extend this early work making use of algebraic, rather than numbertheoretic, principles and techniques. Excellent references for lattice rules are the books [10, 69, 70]. Many other low-discrepancy sequences have been used in various quasi-Monte Carlo simulations. The reader is referred to [10] for a more thorough treatment of this general topic.

3

Monte Carlo Methods

131

Fig. 3.2 Visual comparison of pseudorandom (left) and quasi-random (right) sequences

3.4.3 Hybrid Sequences Hybrid sequences are meant to combine the best features of both pseudorandom (convergence rate is independent of the problem dimension) and quasi-random (asymptotic rate of convergence is greater than N 1=2 but weakly dependent on dimension). Ideas for generating hybrid sequences rely, in general, on combining both random and quasi-random elements in a single sequence. For example, randomly scrambling the elements of a low-discrepancy sequence or restricting the use of the low-discrepancy component to a lower dimensional portion of the problem and filling out the remaining dimensions (“padding”) with pseudorandom sequence elements can be effective strategies. Thus, Spanier [58] introduces both a “scrambled” and a “mixed” sequence based on these ideas, Owen [71] describes a method for scrambling certain low-discrepancy sequences called nets [10], Faure [72] describes a method for scrambling the Halton sequence to achieve lowered discrepancy, Wang and Hickernell [73] randomize Halton sequences, and Moskowitz [74], Coulibaly and Lecot [75], and Morokoff and Caflisch [56] present various methods for renumbering the components of a low-discrepancy sequence – in effect, introducing randomness somewhat differently into the sequence. Okten [76, 77] has introduced a generalization of Spanier’s mixed sequence and Moskowitz’s renumbering method and also indicated how error estimation can be performed when using such sequences. Because of its generality, we describe Okten’s ideas briefly here.

132

J. Spanier

In [77], Okten provides the following: Definition 2. Let D fi1 ; : : : ; id g.i1 < < id / be a subset of the index set f1; : : : ; sg . For a given d-dimensional sequence fqn g1 n D 1 , a mixed .s; d / sequence i 1 is an s-dimensional sequence fmn gn D 1 .s d / such that mnk D qnk , k D 1; : : : ; d ik and all other components of mn (i.e., mn for i 2 f1; : : : ; sg ) come from independent realizations of a random variable uniformly distributed on Œ0; 1sd . This definition is useful inasmuch as it specializes to a number of interesting sequences introduced by other authors earlier. For example, Spanier’s mixed sequence corresponds to the choices D f1; : : : ; d g with s D 1; i.e., it is a mixed .1; d / sequence that can be used for either high-dimensional integration or random walk problems. Also, the continuation method introduced in [78] (see also [71]) amounts to using a mixed .s; d / sequence with D fs d C 1; s d C 2; : : : ; sg. Furthermore, the Spanier mixed sequence obviously specializes to an ordinary pseudorandom sequence when d D 0 and is empty, while if D f1; : : : ; d g and s D d , the resulting mixed .d; d / sequence is clearly completely deterministic with no random components at all. Okten [79] also introduces a new family of hybrid sequences obtained by random sampling from a universe consisting of low-discrepancy sequences of the appropriate dimension for the problem. This idea permits the use of conventional statistical analyses to be performed on the resulting estimates and it therefore attempts to overcome one of the major drawbacks of using low-discrepancy sequences: the unavailability of an effective and convenient error analysis. Hybrid sequences are designed to produce good results for general problems with dimensions s that are too large for pure low-discrepancy sequences to be effective. The dimension that defines this threshold depends upon the details of the problem. For example, a number of problems arising in financial modeling involve integrations over 360 dimensions and have been successfully accomplished with purely low-discrepancy sequences, whereas it is not difficult to compose integrands of only 20 variables for which the use of pseudorandom sequences provides better results than when a 20-dimensional low-discrepancy sequence is used. This disparity exists because, in the 360-dimensional case, the integrand function does not depend strongly on all of its 360 variables: it is much more affected by fluctuations in only a handful of the variables while behaving very smoothly with respect to the others. In other words, the partial derivatives of the integrand are all quite large in the case of the 20-dimensional integrand function while in the 360-dimensional problem cited, only a few of the partial derivatives are large and the remaining ones are much smaller. Hybrid sequences should, therefore, be useful for s-dimensional integration of arbitrary functions or for random walk problems whose Neumann series converge rather slowly. But one should not overlook the possibility to which we alluded in the previous paragraph that for certain integrands or random walk problems, special features of that problem might suggest the use of special sequences designed to take advantage of additional information about the problem. For example, if the s-dimensional integrand is, in fact, independent of several of the s variables, the

3

Monte Carlo Methods

133

properties of the hybrid sequence with respect to these variables becomes much less important. More generally, if an s-dimensional integrand exhibits diminished dependence on some subset of the variables, it makes sense to design a hybrid sequence that takes advantage of that information. Similar special considerations would apply in the case of certain random walk problems also. Based on this sort of reasoning, quite recently some authors have focused attention on restricting the class of integrands treated by each method in an attempt to explain why some sequences perform surprisingly well for certain problems. Interest in pursuing this point of view might have been piqued by provocative results reported when purely quasi-random sequences were used to estimate some very high-dimensional integrals arising in financial applications [80, 81]. This has led to a rash of publications in which the sensitivity of an integrand function with respect to its independent variables and/or parameters is studied [82–87].

3.4.4 State of the Art The idea of replacing pseudorandom sequences in Monte Carlo programs by sequences that are more regular, although correlated, has had a profound effect on the field in the past 40 years. Although use of pseudorandom sequences still dominates the more traditional applications involving neutron and charged particle transport, quasi-random sequences are used increasingly in the newer applications areas. Thus, low-discrepancy or hybrid sequences are used almost routinely for financial modeling, global illumination, and other problems for which Monte Carlo methods have recently been shown to be useful. This trend has also dramatically influenced the amount and kind of research needed to analyze errors and to develop error reduction strategies, which is described in Section 3.5.

3.5 Error Analysis 3.5.1 The Pseudorandom Case Throughout the first 20 or more years (1942–1962) of the exciting period during which digital computers and Monte Carlo methods developed rapidly, only minimal information was available from most of the computer programs employing simulation. At that time, the primary output consisted of one or more sample means mN D

N 1 X i N i D1

134

J. Spanier

of an estimating random variable, , together with estimates of the sample standard deviation, s 8 2 N < N X 41 sD i2 :N 1 N i D1

N 1 X i N i D1

!2 391= 2 = 5 : ;

The estimated variance, s 2 =N , of the sample mean, mN , then establishes the very slow p O.N 1=2 / convergence of the Monte Carlo error (based on the standard deviation s 2 =N ) to zero for pseudorandom Monte Carlo implementations. The central limit theorem states that the sample mean mN is approximately normally distributed for large sample sizes N and that this normal distribution has as its true mean the expected value of the random variable and as its variance 2 =N , where 2 is the population variance. It is the assumption of asymptotic normality that permits the derivation of various confidence intervals that, in turn, provide the foundation for assigning precision to each Monte Carlo result. Quite often it is the estimate of the relative error R D Œs=N 1=2 =mN

(3.4)

that is supplied with each estimated mean. It is common to use the size of R as a mechanism for interpreting the quality of the Monte Carlo estimates. Small values of R indicate high precision, whereas values near to 1 suggest sample means that are suspect. The use of higher-order statistical quantities and other statistical tests has been recommended in conjunction with some Monte Carlo programs. For example, the program MCNP [15] estimates, in addition to tally means and variances, the variance of the variance (VOV), and the users’ manual contains guidelines for interpreting these measures of precision, along with a number of other test quantities provided routinely with MCNP output. Partly driven by the frustration caused by the slowness of the O.N 1=2 / convergence rate of pseudorandom Monte Carlo, the development of Monte Carlo methods that abandoned the probability theory model and the central limit theorem for error analysis was undertaken. Of course, this necessitated the construction of a deterministic model and error analysis. Radically, different error analyses must be applied when simulations are based on quasi-random or hybrid-generating sequences.

3.5.2 The Quasi-random Case When quasi-random sequences are used as generators, deterministic error bounds are available. The key ingredients of the quasi-Monte Carlo theory can be illustrated using simple one-dimensional integration.

3

Monte Carlo Methods

135

Definition 3. For a given sequence Q D fx1 ; x2 ; : : :g Œ0; 1/, and any t, 0 t 1, first define a counting function A.Œ0; t/I N; Q/ D number of xi ;

1 i N; with xi 2 Œ0; t/:

(3.5)

Using this notion, the discrepancy of the sequence Q becomes DN .Q/ D sup0 t 1 jLN .Œ0; t//j

(3.6)

where

A.Œ0; t/I N; Q/ t: (3.7) N The discrepancy DN .Q/ plays a key role in bounding the error that results when estimating the theoretical mean of a random variable by its sample average. For finite-dimensional integrals, quasi-Monte Carlo methods replace probability with asymptotic frequency: LN .Œ0; t// D

Z N 1 X lim f .xi / D f .x/dx N !1 N i D1

(3.8)

Is

for a reasonable class of f . Equation 3.8 means, then, that the sequence x1 , x2 , . . . produces convergent sums for the estimation of integrals of functions in the given class. Such sequences are said to be uniformly distributed in I s , a condition that clearly has nothing to do with randomness. For integral equations, the analogous condition is Z N 1 X lim .!i / D d (3.9) N !1 N i D1

and (which here ordinarily represents an estimator of a weighted integral R g.x/‰.x/dx of the solution ‰.x/ of the integral equation) must satisfy mild smoothness restrictions (and Eq. 3.9 then defines the -uniformity of !1 ; !2 ; : : : ; !N in ). The idea of -uniformity was introduced by Chelson [88] who showed that replacing pseudorandom sequences by appropriately chosen uniformly distributed sequences produces -uniformity in . This is the critical result needed to ensure that quasi-random sequences can be used to provide asymptotically valid (as N ! 1) estimating sums for solutions of integral equations. Chelson’s construction, modified slightly in [89], is simply to sample the usual onedimensional conditional probability density functions derived from the source and kernel of the transport equation by using low-discrepancy sequences, rather than pseudorandom ones. However, if one were simply to use a one-dimensional lowdiscrepancy sequence, such as the van der Corput sequence for all of the decisions needed to generate the random walks !1 ; !2 ; : : : , a little thought shows that the random walks would not necessarily satisfy the Markov property. In other words, in switching from pseudorandom sequences that are approximately uniformly and

136

J. Spanier

independently distributed in the unit interval to low-discrepancy sequences that are very uniformly distributed but obviously serially “correlated,” the required condition Eq. 3.9 may be lost. The way around this predicament is to use sequences that are uniform (in this new, deterministic way) in a unit cube I s of sufficiently high dimension s to suffice for generating all collisions of every random walk. The fact that there is no a priori upper bound for such a dimension s in the case of integral equations means that the sequence used must be uniform over the infinitedimensional unit cube. A sequence such as the Halton sequence would suffice, for example, and this is the sequence that Chelson employed in [88]. These ideas can be illustrated using the simple, one-dimensional random walk problem introduced in Section 3.1. We first generated ten random walks using a conventional pseudorandom number generator to make the required decisions and compute the sample mean, m.1/ 10 . The results of that simulation are listed in the table below: Particle 1 2 3 4 5 6 7 8 9 10 X 5 2 2 2 5 5 3 3 6 3

m.1/ 10 3:6 jm.1/ 10 EŒX j D j3:6 ej 0:9 0:87509 p D 0:28: 3:162 10 p Here, D 2 is the population standard deviation. These results show that we had rather bad luck with our sample of ten particles. If we want a 95% probability of a relative error no worse than 1/100, using simple probabilistic arguments we would need to take n 3:84.100/2 .0:77=2:72/2 D 3;100 random walk samples. Next, we used the Van der Corput sequence instead of pseudorandom numbers to select step sizes for the random walks. That produces, for n D 10 particles: Particle X

1 3

2 3

3 3

4 2

5 3

6 3

7 2

8 3

9 3

10 2

m.2/ 10 2:7 .2/

jm10 EŒX j D j2:667 ej 0:0183: It appears as though we have improved our estimate of e. However, continuing the process (i.e., incorporating the results of additional particle histories generated in this way) does not improve the estimate further. In fact, this particular quasirandom sequence produces estimates that converge (rapidly!) to 2 2/3 rather than to e. Obviously, great care is needed in implementing quasi-Monte Carlo methods,

3

Monte Carlo Methods

137

even for such simple problems. The difficulty here is that the correlation which is intrinsic in the Van der Corput sequence has defeated the Markov property in the execution of particle random walks and has, therefore, not provided a faithful simulation of the underlying physical process. If we are careful to restore a kind of statistical independence in using quasi-random numbers, which is accomplished here by using components of a higher dimensional quasi-random vector sequence (in fact, the Halton sequence can again be used) to generate the individual steps of each random walk, we can indeed improve the use of pseudorandom sequences. When this was executed for model problem 1, we found: Particle X

1 2 3 4 5 6 7 8 9 10 3 3 3 3 2 7 2 3 3 3

m.3/ 10 3:2 jm.3/ 10 EŒX j D j3:2 ej 0:4818: This provides a better estimate than the one obtained using pseudorandom numbers, and the advantage in employing the quasi-random sequence in place of the pseudorandom one can be shown to increase as the number of samples grows. The basis for analyzing error when using quasi-Monte Carlo methods is the Koksma–Hlawka inequality [90, 91]. It provides a method for bounding the difference between an integral and an average of integrand values. In one variable, this result takes the following form: Theorem 1. Let Q D fx1 ; x2 ; : : :g Œ0; 1/ and let f be a function of bounded variation on [0,1]. Then ˇ ˇ Z1 N ˇ1 X ˇ ˇ ˇ f .xi / f .t/dt ˇ jıN .f /j D ˇ ˇN ˇ i D1

0

DN .Q/V .f /

(3.10)

where DN .Q/ D discrepancy of Q, V .f / D total variation of f. The Koksma–Hlawka inequality has been extended to functions of many variables and provides a rigorous (deterministic) upper bound for the difference between a Monte Carlo-type sum and an integral of a function f of bounded variation. It can be applied to any set of points x1 ; : : : ; xN that lie in the domain of the function f . This idea of bounding such differences by a product of two factors, one of which describes the “smoothness” of the integrand, and the other describes the uniformity of the set of points used in the evaluation, has been generalized by making use of the theory of reproducing kernel Hilbert spaces. In this more general setting, the space of integrands is regarded as a reproducing kernel Hilbert space and the resulting

138

J. Spanier

inequality makes use of the norm in this space to characterize the smoothness of the integrand, while the factor that replaces the discrepancy DN measures the uniformity of the point set in a way that generalizes the classical definition of discrepancy. The interested reader should examine [93] and references cited therein. A Koksma–Hlawka-type inequality has been established for the transport equation [88, 89, 93], which is shown in these references to be an infinitedimensional extension of the finite-dimensional quadrature problem. That is, jıN . /j CO DN

(3.11)

R P O where ıN . / N1 N i D 1 .!i / d. The constant C in Eq. 3.11 measures the variation of ; reductions in CO can be accomplished by importance sampling and similar conventional variance reduction mechanisms, as described in [88, 93]. But improvements in the rate of convergence as N ! 1 based on the Koksma–Hlawka inequality can be obtained only as a result of the rate of decrease of the factor DN . The key question then becomes: How rapidly can DN converge to 0 for arbitrary sequences or point sets? The answer is widely believed to be DN D OŒ.log N /S =N ; S D “effective” dimension of the problem (though slightly more rapid convergence cannot yet be ruled out theoretically). Integral equations are really infinite-dimensional problems in the sense that in general, no a priori upper bound exists for the number of steps in a random walk. However, a finite effective dimension of such a transport problem might be given by the product of the average number of steps and dim (); here is the physical phase space for the underlying transport model. In this case, the effective dimension is essentially the average number of decisions needed to simulate a random walk in the transport process being modeled. Sequences in s dimensions, whose discrepancies are OŒ.log N /S =N are the ones typically used in quasi-Monte Carlo implementations, and these are often referred to as low-discrepancy sequences. A very general family of low-discrepancy sequences are the (t,s)-sequences [10, 94, 95]. These sequences generalize to arbitrary base previous constructions for base 2 [96] and any prime base [97, 98], and are generally believed to possess the lowest discrepancies of any known sequences. Accordingly, these sequences are in wide use for quasi-Monte Carlo implementations. Let S denote either the actual dimension of a multidimensional integral to be estimated or the effective dimension of a discrete or continuous random walk problem to be simulated, as defined, for example, above. For fixed S , quasi-Monte Carlo methods will be superior to pseudorandom methods as N ! 1 because of the inequality (3.11) and because Œ.log N /S =N becomes much smaller than N 1=2 in this limit. However, for practical values of N and moderate values of S , such a comparison may well favor the pseudorandom convergence rate. In fact, when S is only 3, N must be larger than 107 for quasi-random sampling to be expected to improve upon pseudorandom sampling based on the error upper bounds expressed in (3.10) or (3.11). There is also some evidence in support of the conjecture that N must be exponential in S before the advantages of quasi-Monte Carlo methods

3

Monte Carlo Methods

139

over conventional Monte Carlo using pseudorandom sequences can be realized [56]. This means that, for all practical purposes, S cannot be too large. It is when S is too large that hybrid sequences have often been used successfully. Another problem with the use of the Koksma–Hlawka inequality to bound quasiMonte Carlo errors is that the individual terms V .f /, DN appearing on the right side of (3.10) (or the terms CO , DN on the right side of (3.11)) are difficult to estimate. This creates another argument in favor of hybrid sequences, especially those designed with components of randomness that enable a statistical analysis of the error, bypassing the Koksma–Hlawka inequality. Even with these caveats against the routine use of purely quasi-random sequences for estimating high-dimensional integrals or solutions of transport problems, they have been amazingly successful in certain situations for which the Koksma–Hlawka bounds suggest they would not. Examples of this sort arising, for example, in stochastic financial modeling, have inspired a good deal of research aimed at obtaining a deeper understanding of the pros and cons of the use of quasi-random sequences. Figure 3.3 compares the theoretical asymptotic convergence rates associated with pseudorandom Monte Carlo and quasi-random Monte Carlo implemented in two dimensions .S D 2/. The line with slope 1 is shown since it provides a theoretical optimum (at least asymptotically) for quasi-Monte Carlo implementations whose error analysis is based on the Koksma–Hlawka inequality. While this graph is instructive for values of the sample size N that are sufficiently large, for practical values of N there is no assurance that quasi-random errors will be smaller than

100

Pseudorandom

Error

10−5 Quasirandom (S=2) 10−10

10−15 100

Line with slope = −1

105 1010 N = number of random samples

1015

Fig. 3.3 Comparison of pseudorandom, quasi-random (S D 1; 2) convergence rates

140

J. Spanier

pseudorandom ones, as discussed above. Notice that the quasi-random .S D 2/ and pseudorandom lines cross for modest values of N . We have alluded earlier to the fact that good lattice point methods, or their extensions to lattice rules, can provide extremely good results (i.e., estimates of finite-dimensional integrals) when the integrand is a periodic function of each of its variables. In fact, the unusual effectiveness of the simple trapezoidal rule when applied to periodic functions, which has been well known for some time, is an illustration of this phenomenon. The explanation lies in the fact that the analysis of the error made in applying lattice rules to periodic functions is quite different from both of the error analyses discussed so far: statistically based for pseudorandom Monte Carlo or for some hybrid methods, and based on the Koksma–Hlawka inequality for quasi-Monte Carlo methods. In fact, the usual error analysis for lattice methods applied to periodic integrands makes use of number-theoretically based estimates of certain exponential sums. The interested reader might consult Chapter 5 of [10]. Any elaboration here would take us much too far afield.

3.5.3 The Hybrid Case We have just seen that a major drawback to the use of purely quasi-random methods – especially for random walk problems – is the lack of an effective and low-cost error analysis when it is used. By contrast, use of the sample standard deviation or variance in conventional Monte Carlo applications as a measure of uncertainty in the output is simple and inexpensive. Quite recently, Halton [99] has advocated analyzing low-discrepancy sequences as though they were independent and identically distributed uniform sequences, but this device entails generating low-discrepancy sequences in dimensions much higher than actually needed for the simulation itself, and there are some disadvantages in doing this. A similar error analysis has been suggested by Okten [100] when employing hybrid sequences. Okten’s idea is to define a universe consisting of a number of different low-discrepancy sequences of the appropriate dimension for the problem under study (e.g., of dimension s when estimating s-dimensional integrals or infinite dimensional when solving integral equations). One then draws such a sequence at random from this universe and uses it to perform the simulation. Repetition of this process a finite number of times then permits conventional statistical analysis to be applied rigorously to the resulting estimates. Okten has presented evidence [77, 79, 100] that this can be an effective strategy.

3.5.4 Current State of the Art Confidence interval theory is routinely applied to conventional pseudorandom Monte Carlo implementations. For quasi-Monte Carlo implementations, there are no known efficient ways of estimating the error bounds that result from application of

3

Monte Carlo Methods

141

the Koksma–Hlawka inequality. Certain hybrid sequences – for example, randomized low-discrepancy sequences – enable a conventional statistical error analysis to be used, and this seems to be one of the major reasons for employing hybrid sequences for difficult Monte Carlo simulations. Use of such an error analysis circumvents the thorny problem of estimating variations and discrepancies in order to obtain rigorous deterministic error bounds for quasi-random methods that have no random component at all. Many open questions remain concerning how best to utilize low-discrepancy and/or hybrid-generating sequences, and the question of effective error estimation for these methods is far from settled.

3.6 Error Reduction 3.6.1 Introduction The period following World War II was marked by a shift in emphasis from weapons development to peacetime applications of simulation methods. The design of nuclear reactors for power generation, both for use in naval propulsion systems and for commercial application, focused attention anew on Monte Carlo methods. This interest, in turn, accelerated the development of such methods, especially inasmuch as the limitations of deterministic solutions of the transport equation, or various approximations to that equation, became better understood in the 1950s and 1960s. For example, while formulation in terms of the transport equation was frequently replaced by the simpler diffusion equation (which was then often solved by finite difference methods), the inaccuracies inherent in this approach could not be assessed without having “benchmark” transport calculations available. The early development of Monte Carlo methods concentrated on the solution of fairly specific families of reactor physics problems and the development of techniques designed to improve the efficiency of their solution. For example, the need to deal with shielding problems – for which most of the useful information is carried in analog histories that occur only very rarely – gave rise to the study of importance sampling [7, 8, 11, 101–106]. In the review [13], considerable attention was paid to the use of importance sampling to achieve variance reduction, especially in problems involving deep penetration. Provided that sufficiently many random samples could be processed in early pseudorandom Monte Carlo codes, statistical fluctuations could be expected to be low enough to meet critical design criteria (at least most of the time!), albeit with sizeable computing expenditures. Eventually, however, the demands for increased accuracy and speed – especially in the naval reactors program and in the rapidly expanding peacetime applications – inspired the quest for additional clever error reduction strategies. In spite of the gains made in understanding how to use Monte Carlo methods to solve an increasing number of important physical problems, the method seems to have been used only as a last resort during the early period following World War II.

142

J. Spanier

For reactor calculations generally, diffusion theory was dominant. Computations based on finite-difference approximations to the diffusion equation were routinely used for the design of nuclear reactors. There were, no doubt, several valid reasons for this. Monte Carlo calculations required large amounts of computer time and produced only statistical estimates of a few quantities at a time. While diffusion theory produced only an approximation to the solution of the transport equation of uncertain quality (and a rigorous examination of the approximation error was impractical), it provided at least qualitative knowledge of the particle behavior everywhere in the phase space – a distinct advantage. Another factor might have been the growing realization that Monte Carlo simulations making use of importance sampling could produce anomalous results. That is, certain rare occurrences of high-weight particles could produce an “effective bias” that was not well understood at the time. For pseudorandom Monte Carlo implementations, error reduction is usually equated with variance reduction because the N – sample error, declines at the rate ¢N 1=2 where is the standard deviation of the estimating random variable. The analogous issue for quasi-Monte Carlo methods is to seek reductions in the constant CO appearing in the inequality (3.11). In both cases, it is the fluctuations in the sample-to-sample values of the estimator that determines how large the Monte Carlo error is for any sample size. It is natural to inquire whether variance reduction methods developed for pseudorandom Monte Carlo can be directly applied to quasiMonte Carlo implementations, as one might perhaps expect. There has not been a systematic effort to convert variance reduction schemes for pseudorandom Monte Carlo to error reduction schemes for quasi-random Monte Carlo, but we will report on what seems to be known in this regard with respect to each strategy discussed in this section. As the number and kind of error reduction strategies has grown, it has become increasingly difficult to organize them all into a short list of “types.” In the first part of Section 3.6.2, however, we will mainly rely on the classification schemes used by earlier authors [8, 11, 107] for describing variance reduction strategies such as control variates, importance sampling, stratified sampling, and the use of expected values. We will complete this section by discussing a few other error reduction strategies that seemed sufficiently important to treat here. Since we cannot hope for an exhaustive treatment of the subject of error reduction, we will content ourselves with tracing what we think are the most important developments since the publication of [13], especially as they relate to impact on transport applications.

3.6.2 Control Variates The classical method of control variates has been well known in the statistical community for many years. In that context, the method has found application in the design of certain types of sample surveys. For instance, suppose one is interested in estimating the expected value y of a random variable and one can observe the

3

Monte Carlo Methods

143

outcomes of a control variable for which the expected value x is known. Then y N where ; can be estimated as N C x ; N N are the sample means of , . If and are highly positively correlated, this method will be much more effective than if the simple estimate N were used for x. When formulated as a technique for estimating integrals by Monte Carlo, the idea is to represent the integrand f as a sum of a function, , that mimics the behavior of f but whose integral is known (or easier to estimate than f ) and a remainder function, f – . Then, Monte Carlo is used only to estimate the integral of f – , and a substantial reduction in variance or variation can result. A similar idea finds use in transport applications, based on the linearity of the transport operator (3.12) L.F˛ C Fˇ / D Q˛ C Qˇ where L.F˛ / D Q˛ ; L.Fˇ / D Qˇ and L is the transport operator. One way to make use of linearity, if the problem being studied is described by the equation L.F˛ / D Q˛ , is to identify a source Qˇ such that the sum problem (3.12) has a simple, analytically known solution. For example, in one-energy transport problems with an isotropic source Q˛ one can simply define the source Qˇ to be isotropic and of sufficient strength in each region so that the ratio R of source strength to absorption cross section is constant across the entire geometry. This implies that the flux is also isotropic and constant; in fact, the scalar flux is also equal to R. The method then finds use in converting the “’-problem” to the ““-problem”; solving the “-problem by Monte Carlo might provide significant advantages over simulating the ’-problem directly. In general terms, this technique was described in [11] as the superposition principle. It was used by these authors to estimate both thermal flux averages and resonance escape probabilities. In both of these applications, the method was responsible for large gains in efficiency and accuracy. For the resonance escape application, this was especially true when the escape probability pres is close to 1, which is the situation that presents the greatest computational challenge when conventional simulation of the “’-problem” is employed. The method makes use of an analytic calculation of the narrow resonance approximation as a control [108]. Although it is usually identified by a different name, the method of antithetic variates could equally well be described as a control variate method with negative rather than positive correlation. Instead of seeking a random variable that is strongly positively correlated with the original one and which has a known expectation, one seeks a random variable with the same expectation as the original one that is strongly negatively correlated to it. Then forming the average of the two random variables produces a new random variable with the same mean but reduced variance. The idea was first presented in [5] and can be a powerful method for lowering the resulting error. Furthermore, this very simple idea can be extended in several ways, as was already made apparent in [5]. For example, if one is able to construct n mutually antithetic random variables with identical means, their average, or some appropriately chosen weighted average, has the potential to provide unbiased estimates of the same mean with greatly reduced variance. The antithetic

144

J. Spanier

variates method has recently enjoyed a resurgence of interest in problems arising in the finance community, problems that entail estimating integrals of functions that are well approximated by linear functions, for which the method is exceptionally well suited. In [109] (see also [110]), Halton applied the superposition principle iteratively to solve matrix problems by means of a technique that he called “sequential Monte Carlo.” The goal was to improve the Monte Carlo estimates steadily so that geometric convergence of the error to zero could be accomplished. In our view, this idea – which has recently been extended to treat transport problems [111–114] – marks one of the more important developments in Monte Carlo over the last 40 years because the idea of using Monte Carlo sampling to produce global solutions of problems – e.g., the transport flux or collision density everywhere – is so striking. We will devote a bit more attention to it here for that reason. The main idea underlying this method is easy to describe. How can one build a feedback loop, or “learning” mechanism, into the Monte Carlo algorithm by means of which a larger and larger part of the problem can serve as a control variate, leaving a smaller and smaller portion to be estimated stochastically? Since matrix problems can be solved by Monte Carlo by generating random walks on a discrete index set consisting of N objects, where the matrix is of order N N , it is possible to encompass both matrix and (continuous) transport problems within the same formulation by studying random walks on a general (either discrete or continuous) state space, and we will adopt this point of view shortly. Of course, in implementing such a sequential (or adaptive) algorithm, if the feedback mechanism is imposed after each random walk, the additional overhead might easily overwhelm the improvement caused by the increased information content. Accordingly, the approach taken has been to process random walks in stages, each consisting of many samples, and to revise the sampling method at the end of each adaptive stage. The goal then becomes to achieve Ek < Ek1 < k E0 ;

0