Random Numbers for. Computer Simulation. Diplomarbeit zur Erlangung des Magistergrades an der Naturwissenschaftlichen Fakult at der Universit at Salzburg.
Random Numbers for Computer Simulation Diplomarbeit zur Erlangung des Magistergrades an der Naturwissenschaftlichen Fakultat der Universitat Salzburg eingereicht von Hannes Leeb Salzburg, im Janner 1995
The core of this thesis originated from a series of discussions with my supervisor Peter Hellekalek and with Karl Entacher, Ferdinand Osterreicher, and Maximilian Thaler. I owe them.
1
Dr. Who: Hence this new device. Romana: What is it? Dr. Who: Ah, well, it's called a randomizer and it's tted to the guidance system. It operates under a very complex scientic principle called potluck. { Dr. Who, The Armageddon Factor
2
Contents 1 Original sin
5
2 Running for randomness { heat 1
9
3 Running for randomness { heat 2
25
4 Getting a `good one'
38
2.1 The notion of a sequence of random numbers : : : : : : : : : 9 2.2 Scoring sequences of random numbers with statistical tests : 11 2.3 Two ways out : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 3.1 Finite sequences of random numbers in 0 1 : : : : : : : : : : 25 3.2 Three propositions : : : : : : : : : : : : : : : : : : : : : : : : 28 3.3 A critical remark : : : : : : : : : : : : : : : : : : : : : : : : : 36 4.1 4.2 4.3 4.4
Pre-testing : : : : : : : : : : : : : : : Employing known defects : : : : : : : Cross-testing : : : : : : : : : : : : : : Employing deterministic error bounds
5 Some generators
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
39 44 58 61
70
5.1 Preliminaries : : : : : : : : : : : : : : : : : : : : : : : : : : : 70 5.2 LCG : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 72 3
5.3 ICG : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82 5.4 EICG : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 89 5.5 Other generators : : : : : : : : : : : : : : : : : : : : : : : : : 95
A Expectation and convexity A.1 A.2 A.3 A.4
The idea : : : : : : : : : : : : The formal statement : : : : Some utilities : : : : : : : : : Proof of the formal statement
B Lattices B.1 B.2 B.3 B.4
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
Basic lattice properties : : : : : : : : : : : : Minkowski-reduced lattice bases : : : : : : : Covering lattices with parallel hyperplanes : Linear transformations of lattices : : : : : :
4
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
96
96 98 100 103
109
109 117 119 124
Chapter 1
Original sin Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin. { John von Neumann, 1951
(1.0.1) We will get involved into right this: computer algorithms for producing random numbers. Consider the C function declarations below. double sin( double x ) int rand()
The sin() function is well known, and implementations of rand() are what we are looking for. This might be not much of a problem { both sin() and rand() should approximate well-dened mathematical objects, the Sinus function and a sequence of independent random variables uniformly distributed on f0 : : : RAND MAX ; 1g1. The function sin() is easily compared to the mathematical object it should approximate an implementation which returns 3.14 for sin(0.0) will certainly not be considered a well suited one. But what about an implementation of rand() which returns 12345, 1406932606, 654583775, 1449466924 on four successive calls? (1.0.2) Ripley describes the common attitude towards this problem in 93, p.2] as follows: \The rst thing needed for a stochastic simulation is 1
RAND MAX is a predened integer depending on which system you are using.
5
a source of randomness. This is often taken for granted but is of fundamental importance." In 93, p.14], he continues: \Many users of simulation are content to remain ignorant of how such numbers were produced, merely calling standard functions to produce them. Such attitudes are dangerous, for random numbers are the foundations of our simulation edice, and problems at higher levels are frequently traced back to faulty foundations." (1.0.3) Ever since my rst acquaintance with the subject of random number generation, I failed to cope with the apparent paradox of employing deterministic algorithms to produce random numbers. By the time, uneasyness turned from suspicion to the assertion that the whole concept per se did not make much sense. Finally, I was able to prove: In general, it is not possible to rate any nite sequence of numbers more `random' than any other. With this, an implementation of rand() which returns 0 all over again cannot be considered worse than your favorite `random' number generator. But somehow intuition2 as well as a whole world full of applications seem to oppose this attitude3 . (1.0.4) It is common belief today that computers are capable of simulating almost everything { even quantities that are indeterminate and random. The crucial point in simulating randomness with a computer is that the latter is what the former is denitly not: deterministic. But, somehow, this concept obviously passed the test of application in practice. The various elds, ranging from physics to operations research and civil engineering, in which stochastic simulation is employed whitness the attraction of this method almost every general-purpose programming language and even o-the-shelf spreadsheet programs like Microsoft Excel include a `random' number generator. Here, we cannot treat the numerous techniques of stochastic simulation (among the many books on this subject, we refer the reader to 8], 48], 57], or 93]) just let us observe that a method which proves to work well in so many applications can hardly be completely unfounded. You are likely to prefer your favorite algorithm for rand() to a 0-repeating rand(). There are exceptions like Zaremba 116] who considers the concept of simulating randomness on a computer as \spurious". But, somehow, this idea failed to become popular. 2
3
6
(1.0.5) The problem we face here is this: users expect mathematicians to propose deterministic algorithms { programs { `well suited' for generating `random' numbers. They have reason to do so, since mathematicians started using such algorithms in the rst place. To be precise: the whole story started in the 1940s, and von Neumann, Metropolis, Ulam, and Lehmer may be named among the pioneers in the eld. John von Neumann apparently conjectured the potential4 of computers for stochastic simulation in 1945 when he wrote 84]: \It the computer] will certainly open up a new approach to mathematical statistics the approach by computed experiments : : : " During the '40s, computer based stochastic simulation remained restricted to secret projects of the U.S. Department of Defense. The publication of `The Monte Carlo method' by N. Metropolis and S.M. Ulam 81] in 1949 denotes the beginning of the `ocial' history of this method (according to 87, p.11]). Two years later, D.H. Lehmer 73] proposed the linear congruential generator which { with a slight generalization by Thomson 110] and, independently, Rotenberg 97] { was to become today's most widely used method for random number generation. (1.0.6) Before we actually propose some algorithms for generating `random' numbers, let us state the problem a little more precise. The mathematical object we want to model is a sequence (Xn )n0 of random variables which are stochastically independent and equidistributed on 0 1 (or, in some examples, on f0 1g). We are looking for algorithms to produce a sequence of numbers (xn )n0 , such that the xn can be taken as realizations of the Xn . Since every stochastic simulation on a computer is expected to end in nite time, it can only consume a nite amount of `random' numbers we can therefore restrict our attention to the nite sequences or vectors X = (Xn)Nn=0;1 and x = (xn)Nn=0;1. Most stochastic models are based on random variables with distributions dierent from the uniform distribution on 0 1 however, the latter is used as a `point of common reference' from which the desired distribution is obtained by various transformation methods (which are, among others, discussed by Devroye in 23] or Ripley in 93, Chapter 3, 4]). It is interesting to note that the advent of computer based stochastic simulation happened to coincide with the advent of the rst really bad `random' number generator: the middle-square method has many undesirable properties which are discussed in detail by Knuth in 59, pp.3{8]. 4
7
(1.0.7) In Chapter 2, we rst reect on how to assign some degree of randomness to a nite sequence of random numbers, restricting our considerations to numbers and random variables in f0 1g: we compare the numbers x = (xn)Nn=0;1 to the mathematical object they should model, the random ;1. In this way, we nd that, in general, any sequence variables X = (Xn )Nn=0 x is as `good' as any other sequence y any random number generator is as `good' as any other, no matter which sequence it produces. Therefore, it is impossible to nd a generator which is `good' in general. Fortunately, the term `in general' is essential for the above statement to hold. If some information about the simulation problem or a class of simulation problems is available, we will be able to assign some degree of quality to generators, and dierent generators will dier in quality. Our intuitive preference for some generators to others5 simply presupposes certain assumptions about our simulation problem. We develop a mathematical device which forces us to make these assumptions explicit and which enables us to make reasonable assessments of a random number generator's quality. In Chapter 3, we generalize our observations on random numbers in f0 1g to the more interesting case of random numbers in 0 1. Applications of our approach for assessing a generator's quality are given in Chapter 4. The applications mostly use existing mechanisms like statistical tests, but provide a deeper understanding of their relevance. Finally, we present and study some random number generators in Chapter 5. The two appendices contain some arguments and derivations needed by or being related to the main text. Both are entirely self-contained and can be understood without reading any of the other chapters. However, within the main text, we point out when an appendix might or should be read.
Think of simulating games of roulette with a sequence of integers in f0 : : : 36g: you will almost certainly rate a sequence which repeats `Zero' all over again as `worse' than any sequence without such obvious regularity. 5
8
Chapter 2
Running for randomness { heat 1 I would like, ah, if I may,
: : : to take you on a strange journey.
{ Narrator, The Rocky Horror Picture Show
2.1 The notion of a sequence of random numbers (2.1.1) Before we start searching for random number generators that pro;1 , we give a precise duce a `good' random number sequence x = (xn )Nn=0 denition of this notion. Everybody seems to have a clear idea of what random numbers are, but when asked to give an explicit denition, things get complicated. Knuth 59, p.2] observes: \People who think about this topic almost invariably get into philosophical discussions about what the word `random' means. In a sense, there is no such thing as a random number for example, is 2 a random number?" (2.1.2) The mathematical object we are about to model, the vector ;1 is uniquely dened by the axioms of N random variables X = (Xn )Nn=0 of Kolmogorov 61]. But how to dene a sequence x of random numbers? Knuth describes this problem in 59, p.142] as this: \The mathematical theory of probability and statistics carefully sidesteps the question it refrains from making absolute statements, and instead expresses everything in terms 9
of how much probability is to be attached to statements involving random sequences of events. The axioms of probability theory are set up so that abstract probabilities can be computed readily, but nothing is said about what probability really signies, or how this concept can be applied meaningfully to the actual world." Similar opinions are expressed by Ripley in 93, p.14] and Schnorr in 103, p.8]. There are of course attempts to give the notion of a sequence of random numbers a precise meaning. We refer the interested reader to Kac ?] or Lagarias 65] for surveys and to Knuth 59, Section 3.1, 3.5 ] or Schnorr 103] for more thorough treatises. However, as Anderson notes in 3], be \warned that you will nd many mathematical esoterica that, at present, have very little to do with the actual generation of random numbers on today's computers." In our opinion, the most interesting approach is given by Compagner in 12] in particular, this approach complies with what we are going to nd out in this and the following chapter. However, none of the various concepts succeeded in becoming a widely accepted standard so far. The lack of a standard denition of a sequence of random numbers has led to some rather peculiar arguments for the quality of random number generators: in 41, p.177], Frederickson et. al. claim about their generators (which they call pseudo-random trees) that there is \no good reason to believe that any other family of pseudo-random trees oers any advantages." (2.1.3) For our purposes, we will get along with a very simple denition of a random number sequence. Its simplicity is admissible because the sequences we are concerned with are computer-generated and therefore not random anyway. Moreover, it is admissible since we will show that every x 2 0 1N is as good in approximating the random vector X as any other1 y 2 0 1N . Our denition is derived from Hammersley and Handscomb 48, p.25]: \The essential feature common to all Monte Carlo computations is that at some point we have to substitute for a random variable a corresponding set of actual values, having the statistical properties of the random variable. The values that we substitute are called random numbers : : :]." We will treat the problem of the `quality' of a random number sequence separately, so we can dene it without demanding any statistical properties. 1 This seems to be quite obvious for N = 1 as indicated by Knuth above but for large N , say, N = 106 , intuition plays a trick on us.
10
Denition 2.1 A sequence x = (xn)Nn=0;1 2 0 1N is called a nite se-
quence of random numbers, if { while performing a stochastic simulation { x is substituted for a sequence X = (Xn)Nn=0;1 of independent and on 0 1 equidistributed random variables. For convenience, we will drop the adjective `nite' and refer to x as a sequence of random numbers, knowing it is nite anyway. Instead of numbers in 0 1, numbers in f0 1g or vectors in 0 1s are required sometimes. We dene a nite sequence of random numbers in f0 1g or of random vectors in analogy to above (i.e., say, as N vectors in 0 1s which { while performing a stochastic simulation { will be substituted for N independent random quantities which are equidistributed on 0 1s). In the sense of this denition, a sequence of numbers is called `random' simply because it is being used as substitute for a sequence of random variables. One may expect that this denition allows a sequence of random numbers to vary signicantly in quality: intuitively, one would assign much ;1 than to most of the other less randomness to the sequence (xn = 0)Nn=0 conceivable sequences of length N . However, besides intuition, we have no means of assessing this quality so far. This is a major problem to be solved, because, as Niederreiter 87, p.2] notes: \The success of a Monte Carlo calculation often stands or falls with the `quality' of the random samples that are used, where, by `quality', we mean how well the random samples reect true randomness." From what we have seen before, it is clear that a precise notion of quality is not easy to nd. The above quotes from Hammersley and Handscomb as well as Niederreiter suggest that certain statistical properties will play a major role in measuring a random sequence's quality.
2.2 Scoring sequences of random numbers with statistical tests ;1 of random (2.2.1) When assessing the quality of a sequence x = (xn )Nn=0 numbers, we try to nd out how closely it matches the mathematical object ;1 of random variables. In this it should model, the sequence X = (Xn )Nn=0 vein, we have to compare two things which defy direct comparison: random variables which are indeterminate by denition and pre-determined numbers. All we can do is check whether or not x gives a good representation
11
of certain statistical properties of X or { putting it the other way round { whether the stochastic object X gives a good description of the behavior of the numbers x. Stated formally, given x, we have to choose one of the two alternatives below.
H0 : The numbers xn can be considered as realizations of the random variables Xn . H1 : The above statement does not hold. This is the traditional setup for a statistical test. Throughout the literature, random numbers are assumed to have statistical properties of random variables and to pass statistical tests. Lehmer, for example, denes a random sequence in 73] as \a vague notion embodying the idea of a sequence : : :] whose digits pass a certain number of tests, traditional with statisticans and depending somewhat on the uses to which the sequence is to be put." Or with the words of Knuth 59, p.38]: \The theory of statistics provides us with some quantitative measures for randomness. : : :] If a sequence behaves randomly with respect to tests T1 T2 : : : Tn , we cannot be sure in general that it will not be a miserable failure when it is subjected to a further test Tn+1 yet each test gives us more and more condence in the randomness of the sequence." For more examples of this approach, see Hammersley and Handscomb 48, p.25] or Ripley 93, p.15]. We will see later in (2.3.7) and Section 4.1 if the condence suggested by Knuth has a rmer foundation than bare intuition. (2.2.2) Given two sequences x y of random numbers and a set of tests F , we can identify the set Fx of tests in F passed by x and the analogous set Fy for y. There are several possible ways to compare x and y we consider Criterion 2.1 x is at least as good as y if x passes at least as many tests as y: #Fy #Fx : In addition, we try2
Criterion 2.2 x is at least as good as y if x passes at least the tests passed by y: Fy Fx: 2
We would like to thank Otmar Lendl for suggesting this one.
12
(2.2.3) To apply these criteria to specic x and y, we need a tighter grip on the sets F , Fx , and Fy and therefore on the decision mechansim used by statistical tests in general. Let us study the basic function of statistical tests on a simple problem: given the results of N successive coin tosses, we are asked to decide if the coin is fair. Setting xn = 1 for `heads' on the n-th toss and xn = 0 otherwise (n = 0 : : : N ; 1), we can formalize the problem as follows. Given the sample x = (x0 : : : xN ;1), we have to decide between the following two hypotheses.
H0 : The xn are realizations of random variables Xn which are independent and equidistributed on f0 1g. H1 : The above statement does not hold. Unfortunately, H0 is so vague that it renders every 0-1-sequence of length N possible, each with probability 1=2N . Whatever values the xn may have, each design is compatible with H0. Whenever one decides in favor of H1 , misjudgement is therefore always possible. If the possibility of a wrong decision cannot be removed, one tries to decide so that at least the probability of a wrong decision is small. Thus: if H0 holds, we want the event H1 of choosing H1 to be quite improbable. Denoting `quite improbable' by some probability with 0 < 1, say, = 0:05, we want
P ( H1jH0) = :
We can picture a statistical test as a decision rule T which, for every x 2 f0 1gN , either yields T (x) = 0 if one decides in favor of H0 or T (x) = 1 if H1 is favored. A decision for H1 while H0 holds is quite improbable if
P (T (X) = 1) = : Such a T is called statistical test with level of signicance sometimes, we will refer to it as {test, for short. Deciding whether a coin is fair, one usually focuses on the number S (x) of `heads' which should be close to its expected value N=2. If the number of `heads' deviates heavily from this, say, by more than some constant c, there might be reason to favor H1 and regard the coin as unfair. It remains to choose c such that N P S (X) ; 2 > c = : 13
With this, the statistical test T with level of signicance is complete: we set T (x) = 0 if jS (x) ; N=2j c, T (x) = 1 otherwise, and get P (T (X) = 1) = . (2.2.4) Observe that this framework of identifying an {test with an event T with P (T = 1) = is quite universal. As L'Ecuyer notes in 69, p.94], \any function of a nite set of i.i.d. uniform random variables can be used as a statistic to dene a test of hypothesis, if its distribution is known." (The same argument is put forward by Marsaglia in 80, p.6]). This holds in particular for the events T we considered: every event T with probability denes an {test. On the other hand, every statistical test can be expressed as an event T with probability . We can conclude that the set of all possible {tests equals the set of events T with P (T = 1) = . Picturing statistical tests as events is not very useful for actually performing a test. The reader should keep in mind that we are after sets of statistical tests, specically F , Fx , and Fy which are needed to apply Criterion 2.1 and 2.2. (2.2.5) Let us now consider a similar problem which is exactly what we are looking for: given the results x0 : : : xN ;1 of N successive coin tosses simulated by a computer program, we are asked to decide if the xn are suciently random, i.e. if they behave like realizations of random variables Xn which are statistically independent and equidistributed on f0 1g. Again, we have to choose one out of two hypotheses 3 .
H0 : The xn are realizations of random variables Xn which are independent and equidistributed on f0 1g. H1 : The above statement does not hold. And again, we search for a statistical test T with level of signicance . In the example above, we had T (x) = 0 if jS (x) ; N=2j c and T (x) = 1 otherwise, which was some sort of `natural' choice nobody being interested in the fairness of a coin would reasonably check if the nonoverlapping pairs (x0 x1) (x2 x3) : : : are equidistributed on f0 1g f0 1g or if the number of `runs' in the sequence4 is close to its expected value (N + 1)=2. Note this problem is formally equivalent to the one above. ;1 is the number of maximum length The number of runs in the binary sequence (xn )Nn=0 blocks xn : : : xn+k in the sequence which consist entirely of 1s or entirely of 0s. 3
4
14
In the present context however, tests based on these aspects are perfectly reasonable5 ! We have no `natural' choice for a statistical test T for a user who takes the xn as input for his simulation, any statistical aspect could be of fundamental importance. Having no information about the statistical aspects the user considers relevant, we have to treat all statistical tests as equally important. Concerning the two criteria, this means the set F should include all statistical tests with level of signicance equal to6 . If we consider all {tests as equally important, we may take interest in the total number #F of these tests. A statistical test with level of signicance is an event T on f0 1gN with P (T = 1) = . Any event T on f0 1gN is, in turn, uniquely dened by the set A of those elements x for which T (x) = 1 given A, the event can be written as T = 1A . For an event 1A , we have P (1A (X) = 1) = #A=2N (Note that demanding P (1A(X) = 1) = implicitly requires that is rational and 2N is an integer). Hence there are as many {tests as there are subsets of f0 1gN with 2N elements. The set F of all {tests is n o F = 1A : A is a subset of f0 1gN with #A = 2N and their number is N ! #F = 22N : Now that we know F , we can focus on the set Fx of tests from F passed by x. Since the sample x passes an {test T = 1A if and only if x 2= A, we get Fx = f1A : 1A 2 F x 2= Ag and elementary combinatorics yields N ! #Fx = 2 2;N 1 : Doesn't something strike you mind? : : : #Fx is independent of x! For any two samples x and y 2 f0 1gN , we have #Fx = #Fy : 5 Among others, aspects like these are checked by Knuth's tests on random number generators in 59, Section 3.3]. 6 We could as well have allowed F to contain all tests with level of signicance or with level of signicance between two bounds 0 and 00 the generalization is so obvious that it is left to the reader.
15
In words: Each binary sequence of length N passes exactly the same number of statistical tests. What about the criteria we wanted to use for rating sequences of random numbers? Criterion 2.1 is obviously useless since it rates any x as `good' as any other y. Criterion 2.2 is useless as well: assume we rate x as actually better than y on the basis of Criterion 2.2. Then
Fy Fx in the sense that Fy is a proper subset of Fx . This would imply #Fy < #Fx which { as we have seen above { cannot be. One might argue that this phenomenon stems from our sampling from the simple set f0 1g, but in vain. Sampling from f0 1 2 : : : M ; 1g, we get the same results as above with just 2N replaced by M N in the formulas. Moreover, the same problem occurs if we sample from the continuum 0 1 instead of a nite set. The reader might believe this by taking the discrete case as an intuitive basis or see Chapter 3 for a formal proof. (2.2.6) This is where we end up: There is, in general, no reason to regard any x as more random than any other. There is, in general, no reason to attribute randomness to any such x. The reader will now sense the reason in our rather unusual Denition 2.1 of a nite sequence of random numbers. This problem is well known, of course although seldom stated explicitly, it surfaces all through the literature. Niederreiter notes in 87, p.164]: \Early in the history of the Monte Carlo method, it already became clear that `truly random' numbers are ctious from a practical point of view. Therefore users have resorted to pseudorandom numbers (abbreviated PRN) that can be readily generated in the computer by deterministic algorithms 16
with relatively few input parameters. : : :] It should be clear that a deterministic sequence of numbers cannot perform well under all imaginable tests for randomness." These observations, although sobering, are particularly important for the computer{generated sequences we are searching for: concerning the `quality' of a given x, it is unimportant how the numbers were obtained. No matter how elaborate the physical source of randomness, no matter how sophisticated the random number generator, the produced sequence is as random as any other. Once the sample is known, all the randomness seems to evaporate in a pu of logic7 . Knuth 59, p.2] observes: \In a sense, there is no such thing as a random number" he continues 59, p.145]: \In a similar vein, one may argue that there is no way to judge whether a nite sequence is random or not any particular sequence is just as likely as any other one." Apart from the fact it is already being done rather successfully, we have reason to worry if substituting nite sequences of random numbers for random variables in stochastic simulations makes any sense at all: \the quality of a generator can never be proven by any statistical test.", as L'Ecuyer notes in 69, p.94], and we have seen this holds in a very general sense. Knuth 59, p.161] parries with intuition: \Still, nearly everyone would agree that the sequence 011101001 is `more random' than 101010101, and even the latter sequence is `more random' than 000000000." In fact, Knuth's parry is based on more than just a `feeling': if you take the three sequences above to simulate a fair coin, the third sequence is quite likely to produce the worst result. (2.2.7) All the trouble started when we regarded all {tests as equally important the reason for this was that we had no information about which statistical aspect might be relevant for the user in his simulation. The set F of all {tests is simply too large to render a given sample superior in quality to any of the others. If intuition and real-world experience teach us to prefer certain sequences
x for certain simulation problems, we apparently base our preference on some restricted test set F or on some nonuniform weighting of relevance on 7 It is discomfortingly easy to see that even the strong law of large numbers seizes to hold once they are determined, i.e. once the random variables are replaced by actual values.
17
this set. Considering some tests as very important and others as virtually irrelevant (for a given simulation problem) seems to be the key for nding `good' random sequences. It is just as Niederreiter 87, p.164] recommends: \Therefore the user of PRN must be aware of the specic desirable statistical properties of the random samples in the current computational project and must choose PRN that are known to pass the corresponding statistical tests." In real-world applications, preferences based on such restrictions or weightings of relevance actually perform quite well: for most simulations you could conceive, your favorite random number sequence actually will behave ;1, although there is no reason more random than the sequence (xn = 0)Nn=0 for this in general. Let us set out to nd a criterion of quality which forces the hidden restrictions and weightings of relevance imposed by intuition to surface.
2.3 Two ways out (2.3.1) Considering the set F of all statistical tests with level of signicance , we found this set to be too large to attribute any special status of `randomness' to a nite sequence x of random numbers. The above quote from Niederreiter already suggests that, for choosing a `good' x, one should focus on \the specic desirable statistical properties of the random samples in the current computational project". This means we are free to ignore some tests (if they are based on statistical properties which are of no interest), while others should be more emphasized. But how to nd the important tests, how to sort out the irrelevant ones? An idea of Hammersley and Handscomb in 48, p.25] might help: \one of the tests that might have been applied is whether or not the random numbers yield an unbiased or a reliable answer to the Monte Carlo problem under study, and it is really only this test that interests us when we are ultimately concerned only with a nal numerical solution to a particular problem. Taken in this second vein, the other tests are irrelevant". There is truth in this: what else could a user be interested in than in a good approximative answer to his simulation problem? (2.3.2) A given `Monte Carlo problem under study' is usually comprised of two successive approximations. The rst one is pictured by Ripley 93, p.1] as follows: \In its technical sense simulation involves using a model to 18
produce results, rather than experiment with the real system under study (which may not yet exist). For example, simulation is used to explore the extraction of oil from an oil reserve. If the model has a stochastic element, we have stochastic simulation". So the basic subject of the user's interest is some `real system' TR , which either cannot be described by a deterministic model, or whose deterministic model is far too complex to be tackled by today's mathematical methods directly. He therefore develops a stochastic model, i.e. a random variable8 TS coinciding with TR `in the mean', and focuses his attention on the expectation E (TS ). This rst approximation, the modelling of a real-world object by a mathematical, stochastic one is { for our purpose { of no interest the quality of this modelling is solely in the user's responsibility. Given TS , Ripley 93, p.3] continues: \To make use of a model one has two choices: 1. To bring mathematical analysis to bear to try to understand the model's behavior. : : :] 2. To experiment with the model." If the user can derive E (TS ) by analytical methods, he should do so and forget about any computer simulation9 . Otherwise, he has to construct an estimator F for E (TS ), which will be simulated on a computer. In general, an estimator F for E (TS ) is a random variable with
E (F ) = E (TS ) and
V (F ) V (TS ):
To be simulated using a sequence of random numbers, F has to take the ;1 , where the Xn are independent random variform F (X) with X = (Xn )Nn=0 ables, each equidistributed on 0 1. X will be substituted by some random numbers in the actual simulation. Observe that the movement from TS to Any model which has a stochastic element depends on a more or less complex random quantity expressing the stochastic element. Since the model is governed by the random quantity, the model is a random quantity itself. 9 Computer simulations of a stochastic model are often believed capable of revealing some extra information which could not be derived by analytical methods but how can a computer{program make statements which are not contained in the mathematical model it is derived from? 8
19
F is no approximation since the random variables' expectations coincide.
The second approximation is what we are concerned with: the user chooses some sequence of numbers x, computes F (x), and uses this value as an approximation of E (F ) = E (TS ) (Note that in our terminology, the computation of F (x) already is the whole stochastic simulation10 ). (2.3.3) When should we regard the numbers x as `good' with respect to the simulation F ? Well, obviously if #x (F ) := jF (x) ; E (F )j is small. By introducing #x (F ) to measure the quality of x in simulating F , we have preserved the idea that \it is really only this test that interests us when we are ultimately concerned only with a nal numerical solution to a practical problem." Moreover, #x still does permit us to perform statistical tests. Let T be an {test, i.e. P (T = 1) = E (T ) = . If < 1=2 { which is not much of a restriction since nobody would reasonably apply a test with is likely to yield the wrong result { a sequence x passes T if and only if #x (T ) < 1=2. We can conclude that #x gives a measure more general and more exible than statistical tests alone. Given F , #x (F ) measures the deviation of the computed result from the desired result11 . So far, we have de facto eliminated the notion of randomness the problem is reduced to approximating E (F ) which in turn is an integral. However, if the user knows x to be good with respect to a given problem F , he may use these numbers to simulate F acting as if they were random their behavior is { with respect to F { close to the expected behavior of random variables, and the elimination of randomness does not really matter. As Knuth 59, p.3] states (about computer{generated sequences): \The answer is that the sequence isn't random, but it appears to be. : : :] Being `apparently random' is perhaps all that can be said about any random sequence anyway." (2.3.4) The measure of quality #x developed so far has one severe aw: we can assess the quality of x with respect to only one problem F , For example, if TS is a function of three random variables (the stochastic element in the model), i.e. TS = f (X1 X2 X3 ), a quite popularPchoice for F is to x a large number ;1 M , say, M = 104 , N 3M , and set F (X) := 1=M M n=0 f (X3n X3n+1 X3n+2 ). 11 A notion of quality similar in spirit but not formally is developed by Fuss in 42] using Petri nets to generate random decisions. 10
20
so all we can nd is a good special-purpose sequence x. This is not what people are looking for they demand sequences of numbers which are good for simulating a variety of problems, a set F of problems F . Given a set of problems F and a sequence x of random numbers, we consider
Criterion 2.3 x is good with respect to F if sup #x := sup f#x (F ) : F 2 Fg is small.
Based on this criterion, the existence of good sequences with respect to a certain set F of practical relevance is shown by the theory of good lattice points (see Hlawka 54] P and Korobov 62]). The set F contains all functions of the form F = 1=N Ni=1 f , where f is a function with in some sense rapidly decreasing Fourier coecients. But, as Larcher and Traunfellner note in 67], \especially for dimensions s 3, it turned out to be a challenge to give fast algorithms for nding good lattice points". A similar approach based on Haar instead of Fourier series was initiated by Sobol' (see 105, 106]) and developed to its full extent by Niederreiter (see 87, Chapter 4]). The shift from Fourier to Haar series has the advantage that methods to construct good samples for problems in higher dimensions have been found. The good samples x for this set F are called LP {sets by Sobol' and (t-m-s){ nets by Niederreiter12 . Shifting again, this time from Haar to Walsh series, Larcher and Traunfellner show in 66, 67] that (t-m-s){nets are in some sense the best possible choice for approximating problems from the corresponding F . In 101], Schmid gives an introduction to (t-m-s){nets together with implementations for serial and parallel computer architectures. For our present purpose, however, Criterion 2.3 is not exible enough: the supremum tends to focus on `local' properties of #x . If, for example, x 6= y and F is comprised of some statistical tests with level of signicance < 1=2, then we have sup #x < sup #y if and only if
8T 2 F : T (x) = 0 9T 2 F : T (y) = 1: 12
and
Niederreiter's notion of a (t-m-s){net is a generalization of Sobol's LP {set.
21
The supremum assigns each sequence to one of two distinct sets: those sequences which pass all tests in F (the `good' set) and those which fail at least one of them (the `bad' set). This leaves us with no possibility to dierentiate the `good' set. A possible remedy might be to extend F by additional problems or tests. But doing so, we stumble over the next inexibility of Criterion 2.3: adding or removing just one problem from F can completely change our evaluation of sup #x ! (2.3.5) To get a more exible criterion, recall that we have conjectured the necessity of F being weighted in (2.2.7). For a nite set of problems F , let w be a nonnegative weighting function on F . Excluding innite weights, P we may assume F 2F w(F ) = 1. With this, we regard x as `good' with respect to F and w if X Ew (#x ) := #x (F )w(F ) F 2F
is small { if #x is small in the (weighted) mean. We generalize this to problem sets F of arbitrary size by employing a measure13 on F in
Criterion 2.4 Let (F R ) be a probability space such that #x is a random variable. The nite sequence x of random numbers is good with respect to the problem set F and the probability measure on F if E (#x ) := is small.
Z
F
#x d
Although Criterion 2.4 seems to be completely dierent from Criterion 2.3, we will show in Section 4.2 that under certain assumptions they are in fact equivalent. In the spirit of this criterion, the search for good random number sequences becomes a two step process: rst the user supplies some probabilistic description of the problem(s) he is interested in, and then we propose If you are not familiar with the notion of measure, just think of it as generalization of the weighting function w on nite F . The weighted sum over a nite set is generalized to an integral over an innite set. 13
22
a good number sequence according to the given information. Observe how much responsibility is loaded on the user this way. If his description is inadequate, so may be the proposed number sequence. But observe also that this is not just a mathematician's trick to avoid being blamed for the proposal of bad number sequences since no nite sequence of numbers is random by itself, the user who wants to simulate a random variable using well-determined numbers has to take the burden of specifying which properties of randomness are relevant for him. (2.3.6) There are some useful interpretations of E (#x ): as described above, can assign dierent `grades of interest' to subsets of the problem set F . Subsets the user considers vital are assigned high measure, while low measure is assigned to subsets which he thinks are interesting enough to be contained in F but not very important. If a sequence x is good with respect to F and in the above sense, the average weighted error by which x approximates the problems in F is small. A second interpretation is from the mathematician's point of view. Given the set F and the measure , the mathematician faces the following problem. He has to propose a sequence x to the user. According to the probability measure , the user will randomly select a problem F from F , estimate E (F ) by F (x), and punish the mathematician (by lowering his reputation, cutting his salary, applying physical violence, or whatever else he considers appropriate) proportional to the resulting estimation error. E (#x ) is the punishment the mathematician can expect choosing x as to minimize the expected punishment seems a very reasonable approach. (2.3.7) The reader may object that measures on such complex sets as F are hard to nd and that, even if an appropriate can be found, the computation of E(#x ) is { technically { not always possible. There is reason in this, of course. However, in all examples given in Chapter 4, we will get along with much less than the full-edged measure , and even the precise extent of the problem set F will almost never be required. To demonstrate how this is possible, recall Niederreiter's recommendation from 87, p.164] to \be aware of the specic desirable statistical properties of the random samples in the current computational project and : : :] choose PRN that are known to pass the corresponding statistical tests." Suppose G is a set of estimators for a gambler's gain in successive rounds of roulette following one of several strategies14. #x (G) is the approximation 14
There is quite a variety of ways to simulate this just think of the various conceivable
23
error produced by x when simulating a given G 2 G . Next, suppose H is a set of statistical tests with level of signicance < 1=2, each of which checks for the statistical properties desirable for simulating the gambler's gain15 , and let $x (T ) := jT (x) ; j be the error16 produced by x when subjected to a test T 2 H. Now let F := G H and let (F R ) be a probability space such that #x and $x are random variables. In general, precise knowledge of (F R ) is required to compute E(#x ) and E($x ). Now suppose all we get to know is that the random quantities #x and $x are positively correlated. Informally, this means if x passes a test randomly selected from H (yielding $x < 1=2), #x is likely to be small, too. Given only this information, we will show without much eort in Section 4.1 that
E (#x j$x < 1=2) < E(#x ) < E(#x j$x 1=2) for any probability space (F R ) for which the positive correlation of #x and $x holds. In this way, Knuth's argument 59, p.38] { which was based on no more than intuition in the light of Criterion 2.1 and 2.2 { does make sense: \If a sequence behaves randomly with respect to tests T1 T2 : : : Tn, we cannot be sure in general that it will not be a miserable failure when it is subjected to a further test Tn+1 yet each test gives us more and more condence in the randomness of the sequence." When the hidden assumption of positive correlation is made explicit, the `condence' can be proven.
algorithms to transform the random numbers in x into lottery results. 15 Again, there are many ways to check for these properties think of varying the level of signicance of the test, the sample size considered, the test statistic, or the region of acceptance. 16 This is x (T ) < 1=2 if x passes the test T and x (T ) 1=2 otherwise.
24
Chapter 3
Running for randomness { heat 2 Expect the unexpected! { Douglas Adams, The Hitch Hiker's Guide to the Galaxy
We have seen that no nite 0-1-sequence is per se more random than any other. In this chapter, we will state this in a more general context, for any nite sequence of numbers in 0 1. The reader who is convinced that analogous results hold for this more complex set and who does not want to get involved into mathematical details may skip this chapter. Here, we will use no ideas not already presented before, but we hope the mathematical generalization will provide a deeper insight in the phenomenon's nature.
3.1 Finite sequences of random numbers in
0 1
(3.1.1) Our previous considerations were restricted to the particular set f0 1gN . We have tried to nd out if a sequence x of N numbers xn in f0 1g can be used to model a sequence X of N random variables Xn which are independent and equidistributed on f0 1g. The sequence X is a random variable with values in f0 1gN and { due to the independence of the Xn { it is equidistributed on f0 1gN . Thus X is a random variable on the probability space (f0 1gN P (f0 1gN ) p), where P (f0 1gN ) is the family of all subsets of f0 1gN and p is the normalized counting measure on f0 1gN . 25
Searching for a `good' sequence of random numbers, we tried to model the random variable X by means of a specic sample x 2 f0 1gN (see (2.2.5)). Now we will do the same for the more complex probability space (0 1N B 01 N N ). B 01 N is the Borel -algebra on 0 1N with respect to the standard topology and N is the N -dimensional Lebesque measure restricted to 0 1N . This probability space has the following three helpful mathematical properties which we will use in the proofs. (3.1.2) The real-valued random variables X1 : : : XN are independent and equidistributed on 0 1 if and only if the vector X = (X1 : : : XN ) is a random variable on 0 1N with probability measure N . So the probability space (0 1N B 01 N N ) is exactly what we are concerned with when searching for a `good' sequence of random numbers in 0 1. (3.1.3) For x y 2 0 1N , we dene x + y as the component-wise sum ;1 and y = (yn )N ;1, let of x and y, reduced modulo 1: for x = (xn )Nn=0 n=0
x + y := (xn + yn mod 1)Nn=0;1:
Then (0 1N +) is an abelian group.
(3.1.4) The measure N is invariant under translation: for any A 2
B 01 and any c 2 0 1N , we have N (A + c) = N (A): N
Another way to state this is to observe that, for any c 2 0 1N , the translation Tc : x 7;! x + c is a measure-preserving function. The background of this is the fact that there is a topology on 0 1N such that (0 1N + ) is a compact abelian group. is very similar to the standard topology in fact, the Borel -algebra with respect to is equal to B 01 N . For any compact abelian group, the existence of an unique translation-invariant probability measure, the so called Haar measure can be proven. In our case this measure is just N . For more information on compact abelian groups and Haar measures, see Cohn 11, Section 9.2]. We can, of course, state our problem of nding a `good' x for a general compact abelian group instead of the specic group (0 1N + ). Proposition 3.1 and 3.3 as well as Part 2 of Proposition 3.2, which we prove in this 26
chapter, can easily be proven in this general case. The proofs of Part 1 and 3 of Proposition 3.2 however, which are in fact the most interesting ones, rely heavily on the specic structure of (0 1N + ). (3.1.5) Just as we considered the set of all statistical tests on f0 1gN with level of signicance , we will now consider the analogous set of all {tests on 0 1N as well as more complex sets of problems. A problem set might be, say, for any given random variable Y , the set F of all estimators for E (Y ) or the set F of all such estimators whose variances are less than some > 0, and so on. In general, any F L1 (0 1N R) can be taken as a problem set, this is any set of real-valued functions F whose expectations E (F ) exist and are nite1 . We will show that no `good' random numbers can be found as long as the test set F is invariant under translation. So if F is a test function from F (which serves some purpose like being a statistical test or estimating some E (Y )), then each translated test function F Tc (which is a statistical test or an estimator for E (Y ) as well) is in F , too. Let us state this formally.
Denition 3.1 A set F L1(0 1N R) is called unbiased if 8F 2 F 8c 2 0 1N : F Tc 2 F : It is easy to see that the set of all -tests or the set of all estimators for a given random variable's expectation is unbiased because the translation Tc is measure-preserving. (3.1.6) To evaluate the `quality' of a given sample x 2 0 1N , we proceed as in Section 2.3: for any simulation problem F 2 L1 (0 1N R), we measure the `quality' of x in simulating F by #x (F ) := jE (F ) ; F (x)j where E (F ) is the integral of F with respect to N . For a given test set F and a xed error bound > 0, we consider the set Fx of those functions whose expectations x approximates by an error less than : Fx := fF 2 F : #x (F ) < g : 1 The symbol L1 (0 1N R) denotes Rthe set of all real-valued, measurable functions on 0 1N for which the Lebesque R integral 01 N jF jdN is well-dened and nite. Note that E (F ) is just dened as
01 N
FdN .
27
Note that Fx implicitly depends on . For the rest of this chapter, we assume that is some xed, positive error bound. The quality of x with respect to F depends on the size of Fx . We will show that, for unbiased test sets F , the size of Fx does not depend on x. Before we can do so, we have to deal with a phenomenon which did not occur in the nite probability space on f0 1gN considered before: the existence of nonempty sets of measure zero. The function 1, which assigns 0 to each x, is a statistical test with level of signicance 0 { it accepts every sample. This statistical test is of course meaningless. The function 1fxg is a statistical test with level of signicance 0 as well, but it rejects the sample x since 1fxg(x) = 1. However, this test looks quite pathological, too. We refer to all functions which have the same property as trivial functions.
Denition 3.2 A function F 2 L1(0 1N R) is called trivial if n o N x 2 0 1N : #x < 2 f0 1g:
3.2 Three propositions We have presented four criteria to measure the quality of a given sample x in Chapter 2. Here, we will show that whenever F L1(0 1N R) is unbiased, the rst three criteria do not allow the proposal of a `good' sequence of random numbers x 2 0 1N .
Proposition 3.1 For any unbiased set F L1(0 1N R) and any x y 2 0 1N , there exists a bijection $ from Fx onto Fy . Remark: If F is unbiased, Proposition 3.1 yields 8x y 2 0 1N : #Fx = #Fy: In this case, Criterion 2.1 is useless. Proof: Let F 2 L(0 1N R) be unbiased and let x y 2 0 1N be xed. We will construct a bijective and measure-preserving function T : 0 1N ! 0 1N , use T to dene the mapping $ : Fx ;! Fy
F 7;! F T 28
and show that $ is bijective. If ;y is the inverse of y in (0 1N +), we dene T as the translation by x ; y: T := Tx;y : T is bijective, because (0 1N +) is an abelian group, and it is measurepreserving because N is invariant under translation. To see that $ is well-dened, let F 2 Fx : #x (F ) < : Because F T (y) = F (x) and because T is measure-preserving, we have #y (F T ) < which means $(F ) 2 Fy . To show the injectivity of $, let F F 0 2 Fx and
F T = F 0 T: Since T is bijective, it immediately follows that F = F 0 . To show the surjectivity of $, let G 2 Fy . Using the same argument as in showing that $ is well-dened, we get
G Ty;x 2 Fx and, due to the commutativity of +, $(G Ty;x ) = G
With this, the mapping $ is a bijection from Fx onto Fy . 2 Of course, the existence of a bijection between two sets does not imply they are of equal size. There is, for example, a bijection from 0 1=2 onto 0 1. But since 0 1=2 is a proper subset of 0 1, 0 1=2 0 1, the former is usually considered smaller than the latter2. We show that such an inclusion cannot occur between the sets Fx and Fy when F is unbiased and non-trivial. 2 Observe that we use the symbol A B to denote that A is a proper subset of B , this is A B and B n A = 6 . 29
Proposition 3.2 Let F L1(0 1N R) be unbiased and let x 2 0 1N be xed.
1. For almost all y 2 0 1N , the following holds: if
Fx Fy then F is a set of trivial functions.
2. For all y 2 0 1N , the following holds: if
Fx Fy then
Fy Fx+2(y;x) Fx+3(y;x) : : ::
3. If x and y 2 0 1N are rational in each coordinate, then
Fx 6 Fy :
Remark: If we take two samples x and y at random, and if we happen to nd Fx Fy , then Part 1 states that almost certainly y is superior to x only with respect to trivial functions. Part 2 states that whenever such an inclusion occurs, we can construct a sequence (x + n(y ; x))1 n=0
of samples where each element is better than its predecessor in this case, we cannot consider y as `good' because we can actually compute an innite number of samples which are `better'. For computer simulations, Part 3 is the most interesting one. It states that for those sequences x, y of random numbers which can be represented in a computer's nite-precision oatingpoint arithmetic, an inclusion of Fx in Fy cannot occur at all. As a corollary, this renders Criterion 2.2 useless whenever F is unbiased. The Parts 2 and 3 are easy to prove, but Part 1 is not. To prove it, we use the bijection $ from Fx onto Fy as constructed in the proof of Proposition 3.1: each F 2 Fx is translated to F Tx;y , which is an element of Fy . We show that this translation of functions in F corresponds to the translation of some measurable subsets of 0 1N . The notion of ergodicity applied to these subsets is the key to Part 1. Recall the denition of ergodicity3 .
For more than just the denition of this very interesting notion, we refer the reader to Parry 89], Petersen 91], or Walters 111]. 3
30
Denition 3.3 A measure-preserving mapping T : 0 1N ;! 0 1N is called ergodic if, for any A 2 B 01 , the following implication holds: T ;1(A) = A =) N (A) 2 f0 1g: N
Besides the notion of ergodicity, we need four lemmata to prove Proposition 3.2. The rst two are used to state that the translation Tc is ergodic for almost all c. The third shows that the translation of functions in F corresponds to the translation of certain measurable subsets of 0 1N . The fourth translates the concept of trivial functions to these measurable subsets.
Lemma 3.1 Let c = (cn)Nn=0;1 2 0 1N . Then Tc is ergodic if and only if the numbers c0 : : : cN ;1, and 1 are rationally independent.
We do not prove this result here. The interested reader might consult Parry 89, Chapter 1]. Recall that the rational independence of cP0 : : : cN ;1 1 means that there are no integers h0 : : : hN ;1 k satisfying N ;1 hn cn = k. n=0
Lemma 3.2 The translation Tc is ergodic for almost all c 2 0 1N . Proof: We show that A := fc = (cn)Nn=0;1 2 0 1N : c0 : : : cN ;1 1 are rationally dependent g
is a set of measure zero. The condition of rational dependence of the cn and 1 can be read as `c lies on a hyperplane dened by the vector h and the scalar k, where h is a nonzero N -dimensional integer vector and k is an integer'. A is a subset of the union of all these hyperplanes. Since any hyperplane has measure zero and since the set of the above hyperplanes is countable, their union and its subset A have measure zero, too. 2
Lemma 3.3 Let F L1(0 1N R) be unbiased. Then there is a set G F
with
8x 2 0 1N 8F 2 G : 9CxF 2 B 01
N
such that 1.
Fx =
F 2G
fF Tc : c 2 CxF g : 31
2. The sets in the above union are mutually disjoint. 3. 8x y 2 0 1N 8F 2 G : CyF = CxF + (x ; y):
Proof: Let F be unbiased. We dene an equivalence relation on F
as
F G () 9c 2 0 1N : F = G Tc:
It is easy to see that is in fact an equivalence relation: it is reexive since there is a neutral element, the zero-vector in the group (0 1N +) it is symmetric since, for each c, there is an inverse element in the group, too nally, it is transitive also because (0 1N +) is a group. In the following argument, we dene G as a set of representatives4 of F with respect to , and construct the sets CxF accordingly. Since G is a set of representatives and since F is unbiased, we have o n F= F Tc : c 2 0 1N F 2G
and this union is taken over disjoint sets. Now we represent the set Fx in a similar form: Fx = fF 2n F : #x (F ) < g o = F Tc : c 2 0 1N #x (F Tc) < =
F 2G
n
F 2G
n
oo
F Tc : c 2 c 2 0 1N : #x (F Tc) < :
As above, the union is taken over disjoint sets. We dene the sets CxF as
n o CxF := c 2 0 1N : #x (F Tc) <
(1) which are measurable because F is measurable itself. With this, the Parts 1 and 2 are proved. For Part 3, recall the mapping $ we used in the proof of Proposition 3.1: $ is a bijective function from Fx to Fy , which maps F to F Tx;y . Let F 2 G . On one hand, we have Fy = F Tc : c 2 CyF 4
F 2G Which does exist if we assume the Axiom of Choice.
32
and on the other, since $ is bijective,
Fy = $( Fx) = =
F 2G
$ (fF Tc : c 2 CxF g)
F 2G
F Tc0 : c0 2 CxF + (x ; y) :
Because the sets CxF and CyF are uniquely determined by (1), we have CyF = CxF + (x ; y):
2
Lemma 3.4 Let F L1(0 1N R) be unbiased and let the sets G CzF (z 2 0 1N , F 2 G ) be dened as in Lemma 3.3. If
9x 2 0 1N 8F 2 G : N (CxF ) 2 f0 1g
then F is a set of trivial functions.
(2)
Proof: Let F L1(0 1N R) be unbiased, let x 2 0 1N , and suppose (2) holds for the given x. It is sucient to show that G is a set of trivial functions because G is a set of representatives of F with respect to the translations Tc and because, if F 2 G is trivial, each F Tc is trivial,
too. Let F 2 G . Due to (1) we have
CxF = = = = =:
n o c 2 0 1N : #x(F Tc) < n o c 2 0 1N : #x+c(F ) < n0 o c ; x 2 0 1N : #c0 (F ) < n0 o c 2 0 1N : #c0 (F ) < ; x A ; x:
Since N (CxF ) 2 f0 1g and since N is invariant under translation, we get that N (A) 2 f0 1g and that F is trivial. 2 Now we are ready to prove Proposition 3.2. We show Part 1 to 3 in reverse order because the argument runs smoother this way. 33
Proof of Proposition 3.2: Let F L1(0 1N R) be unbiased, x 2 0 1N , and let G and the sets CzF (z 2 0 1N , F 2 G ) as in Lemma 3.3. For a xed y 2 0 1N , let $ : Fx ! Fy be dened as in the proof of Proposition 3.1: F is mapped to $(F ) = F Tx;y . Finally, let y 2 0 1N such that
Fx Fy : (3) For Part 3, let both x and y be rational in each coordinate. Then the same holds for c := x ; y. Let 0 = (0 : : : 0) be the zero-vector in 0 1N . Since c is rational, there is an integer n > 0 such that nc = 0 in the abelian group (0 1N +). For F 2 G , (1) and (3) imply5 that CxF CyF . Because of Lemma 3.3, Part 3, for any F 2 G , we have CxF CyF = CxF + c CxF + 2c .. .
CxF + nc = CxF : With Lemma 3.3, this yields Fx = Fy , which is a contradiction to (3). Hence (3) cannot hold whenever x and y are rational in each coordinate. This proves Part 3. To show Part 2, note that $(Fx ) = Fy , so (3) can be written as
Fx $(Fx) = Fx+(y;x) : Inductively, this yields
Fx+n(y;x) Fx+(n+1)(y;x)
for any integer n 0 and proves Part 2. To prove Part 1, let y 2 0 1N such that Tx;y is ergodic. Observe that this holds for almost all y6. With Lemma 3.4, it is sucient to show that, 5 If we can choose c 2 CxF n CyF , then (1) gives x (F Tc ) < and y (F Tc ) . But this means F Tc 2 Fx n Fy , which is a contradiction to (3). 6 Due to Lemma 3.2, Ty is ergodic for almost all y 2 0 1N . Since the translation Tx;2y is measure-preserving, the mapping Tx;y = Ty Tx;2y is ergodic for almost all y, too. 34
for any F 2 G , the measure N (CxF ) is either 0 or 1. Because of (1) and (3), the following relations between CxF and CyF are possible7 : CxF = CyF or CxF CyF : In the rst case, CxF = CyF , Lemma 3.3, Part 3 yields CxF = CyF = CxF + (x ; y) or, equivalently,
CxF = Tx;;1y (CxF ) : Since Tx;y is ergodic, N (CxF ) is either 0 or 1. In the second case, CxF CyF , the already proven Part 2 of this proposition yields that, for c := x ; y, we have 8n 0 : CxF + nc CxF + (n + 1)c: Because N is invariant under translation, the measure of each CxF + nc is equal to N (CxF ). Now we dene 1 C := (CxF + nc) : n=0
Then, because the measure N is nite and therefore continuous from below8 , we have N (C) = N (CxF ): Since CxF + nc CxF + (n + 1)c, we have C = 1 n=1 (CxF + nc). Hence C+c = C or, equivalently,
Tc;1 (C) = C: Since Tc is ergodic, the measure N (C) = N (CxF ) is either 0 or 1.
2
The failure of Criterion 2.3 when F is unbiased is stated in As we have seen in the proof of Part 3, the case CxF n CyF = 6 is impossible. 8 A measure is continuous from below if, for arbitrary measurable sets An and A with 1n=1 An = A, the relation limn!1 ( ni=1 An ) = (A) holds. 7
35
Proposition 3.3 Let F L1(0 1N R) be unbiased. Then 8x y 2 0 1N : sup #x = sup #y : F
F
Proof: Let F 2 L1(0 1N R) be unbiased and let x y 2 0 1N .
Recall the bijective mapping
$ : Fx ;! Fy
F 7;! F Tx;y
we used in the proof of Proposition 3.1. By the same argument as used to show that $ is well-dened, we get #x (F ) = #y (F Tx;y ) which, because of the bijectivity of $, completes the proof.
2
3.3 A critical remark (3.3.1) Note that the condition of unbiasedness we demanded for F is quite restrictive. We could make use of Criterion 2.1, 2.2, and 2.3 by considering sets F which are not unbiased (for an example of this, see our remark concerning Criterion 2.3 in (2.3.4)). Anyway, all we wanted to say was that, in general, no nite sample can be considered more random than any other (see (1.0.3)). With the example of unbiased sets F , and in particular since L1 (0 1N R) is unbiased, this is shown. Moreover, searching for good general-purpose random number sequences to run stochastic simulations on computers, one almost invariably stumbles over unbiased sets F . Think of a user who wants to simulate Y , a yoursimulation-problem-here] on a computer. In particular, if Y is a complex quantity, there are numerous possible estimator for E (Y ), and each of them can be implemented on a computer in numerous dierent ways. The user does not want a separate random number sequence for each individual estimator F , but one which performs well for all F 9 . Although additional 9 Using the same sequence with dierent estimators is quite useful for debugging the code and comparing the estimators.
36
knowledge about the relevant estimators may be available, the information which actually reaches the mathematician is often no more than that `one searches for a sequence well suited for simulating Y '. In this case, the corresponding set F is
F = fF 2 L1(0 1N R) : E (F ) = E (Y )g which is, of course, unbiased. As already hinted in (2.3.6) and as we will see in the next chapter, the use of a probability space over F provides us with a quite handy tool for making justied proposals of which sequence to use.
37
Chapter 4
Getting a `good one' With a bit of a mind ip { Ri Ra, The Rocky Horror Picture Show
(4.0.2) We have seen that it is impossible to attribute a special status ;1 of N numbers in general. of randomness to any xed sequence x = (xn )Nn=0 On the other hand, we are looking for good random numbers for computer simulation. Such numbers can be found only if some information about the simulation problem is available. We have presented Criterion 2.3 and 2.4 to take such information into account. Criterion 2.3 denes good sequences with respect to a restricted set1 of simulations F . For practical applications, the problem with restricted sets F is twofold. First, those F , for which the quality of a sequence x with respect to Criterion 2.3 can actually be proven, are comprised of rather primitive simulation problems (see (2.3.4)). Most of the interesting problems (think of, say, discrete event simulations) are of a more complex nature. In principle, it might be possible to nd good sequences for more complex F unfortunately there are many technical diculties on the way, which are very hard to surmount even for rather primitive F . Second, even if we can prove the quality of a sequence x with respect to Criterion 2.3 and F , the question of whether a given simulation problem actually belongs to F or not is mostly a matter of guessing2 . 1 To those who have read Chapter 3: with problem sets F which are biased, Criterion 2.1 and 2.2 will also work. 2 Think of, say, the problem set F described in (2.3.4), for which the (t-m-s){nets are 38
In this chapter, we will apply Criterion 2.4 in some common situations. To apply this criterion, we need some information about the simulation problem just as in the application of Criterion 2.3. The advantage of Criterion 2.4 is that it requires much less information and that the assumptions on the simulation problem are less restrictive and mostly intuitively convincing. But nothing comes for free, of course. The assessments of quality based on Criterion 2.4 are much weaker than those based on Criterion 2.3. Whereas the latter proposes sequences which are good `with certainty', the proposals based on the former are good only `in the mean'. We present four examples of applying Criterion 2.4. The aim is not to nd completely new arguments for a sequence's quality, but to use existing arguments found in the literature, and to point out under which assumptions they are reasonable. You may view these examples as plug-ins, saying: If the user can supply us with this-or-that information about his simulation problem, then we advise him to use this-or-that sequence. The point is that the advice can actually be derived from the given information. Thus the above if-then-argument should better read: If the user is willing to make this-or-that assumption about his simulation problem, then these assumptions imply the preference of this-or-that sequence.
4.1 Pre-testing (4.1.1) A sequence x is often believed to be good if it behaves well in certain statistical tests. We have already encountered this attitude in (2.2.1). If a sequence passes, say, ten statistical tests, you cannot be sure it will be a good choice for your simulation, but usually the ten tests it passed will increase your condence in the sequence. Suppose a user wants to simulate the ow of aerosol particles in the human lung. More precisely, he is interested in how particles are deposited in P proven to be good. F is made up of functions F = 1=N Nn=1 f , where f is a function with in some sense rapidly decreasing Walsh coecients. The rate of decrease of a given function's Walsh coecients is not easy to estimate let alone to compute!
39
bronchial airway bifurcations. The model is roughly described as follows3 . The positions of a number of particles are randomly selected at the inlet of the bifurcation. The number of particles is chosen large enough so that the resulting deposition patterns can be considered signicant. Each particle enters the bifurcation at the selected position in a stream of inhaled air. Its trajectory within the bifurcation is governed by the rules of intertial impaction, interception, gravitational setting, and Brownian motion. The particle either leaves the bifurcation at one of its outlets or is deposited somewhere within. This model is used with varying bifurcation geometries, varying ow proles at the inlet boundary, and varying particle sizes. Note that this is a stochastic model. Apart from randomly selecting particle positions at the inlet, the particle trajectories have a stochastic element. The rst three rules governing each trajectory are obeyed by solving the corresponding dierential equations numerically. To obey the fourth rule in a computer simulation, random numbers are used. The Brownian motion of a particle is modeled by incrementally changing its position by some three-dimensional random vector % after a xed time interval. Independent realizations 0 1 : : : of the random vector % are obtained from a sequence x of random numbers and an algorithm to compute the n given x. Now think of a statistical test which checks if the 0 1 : : : behave as independent realizations of % should, i.e. a test whether the realizations are correctly distributed. Learning x behaves well in this test might well increase the user's condence in the sequence. We will see that this increase of condence is not only convincing on an intuitive basis, but can in fact be derived using Criterion 2.4 from a moderate assumption about the test's relevance for the simulation. (4.1.2) Let G be a set of simulations, let H be a set of statistical tests with level of signicance (0 < < 1=2), let F := G H, and let (F R ) be a probability space. To x ideas, think of G being the set of all simulations of a model of particle-ow in bronchial bifurcations with varying bifurcation geometries, ow proles, and particle sizes. Or think of G containing many dierent models for the particle-ow, all of which are mathematical images of the same real-world phenomenon. The set H might be thought of containing all statistical tests based on the random jumps 0 1 : : : computed from a 3 For a more complete description of the model and various simulation results, see Balashazy and Hofmann 4].
40
sequence of random numbers or just one statistical test with dierent sample sizes, dierent levels of signicance, and so on. For a pair (F T ) 2 F and a sequence x, #x (F ) := jF (x) ; E (F )j measures the quality of x in simulating F and $x (T ) := jT (x) ; E (T )j measures the behavior of x in the test T 4. Additionally, we assume that the -algebra R on F is such that these quantities are real-valued random variables (We have already encountered this setup in (2.3.7)). It is of course not easy to describe the mathematical object (F R ) explicitly, especially for a user who just wants to simulate the ow of particles in the bronchiae. Fortunately, we do not need a complete, explicit description but just one property of . Suppose the user assumes that the random quantities #x and $x are positively correlated. The positive correlation can be thought of expressing that the tests in H check for certain properties which are relevant for the simulations in G . We have the following result. Whenever the positive correlation of #x and $x holds, we have E(#xj$x < 1=2) < E(#x ) < E (#x j$x 1=2): Thus, if x passes a test randomly selected from H, the user has to be more optimistic about the error x is expected to produce in simulating a problem randomly selected from G . Proof: Recall that the positive correlation of #x and $x means that their covariance is positive: E (#x $x ) ; E(#x )E ($x ) > 0: (1) Now let &x be the event of x passing a test from H. ( 1 if $x (T ) < 1=2, &x (T ) := 0 otherwise (2) = 1 ; ; $x (T ) : 1 ; 2
4 Recall our observation from (2.3.3): if T is a test with level of signicance < 1=2, then x passes the test if and only if x (T ) < 1=2.
41
To verify (2), recall that $x (T ) = < 1=2 if and only if T (x) = 0. Since #x is positively correlated to $x , it is negatively correlated to &x . To derive this formally, simply use (2) to express $x by means of &x , and substitute this in (1) this yields
E(#x &x ) ; E(#x )E (&x ) < 0: This is in fact equivalent to the positive correlation of #x and $x since &x was obtained from $x by a linear transformation. Next, observe that for the above negative correlation to hold, the terms E(#x) and E (&x) both must be positive5 . With this, we can transform the above equation to E(#x &x ) < 1: (3) E(#x )E(&x ) Now we have
E(#x &x) = E (#x &x j&x = 1)P (&x = 1) + E(#x &x j&x = 0)P (&x = 0) = E (#x j&x = 1)P (&x = 1):
(4)
Employing the fact that E(&x ) = P (&x = 1), we can substitute (4) in (3) to get E(#x j&x = 1) < 1 E(#x ) which is equivalent to
E (#x j&x = 1) < E (#x ): The proof of the second inequality is analogous, now with the auxiliary event &x := 1 if x fails the test and 0 otherwise. 2 (4.1.3) A similar argument can be applied to choose between two sequences x and y when the user cannot decide which one he should prefer. If he considers the quantities #z and $z as negatively correlated for any sequence z, we advise him to use that sequence which passes the test (if one All involved random variables are nonnegative and, therefore, their expectations are also nonnegative. Hence E (!x ) and E (x ) must both be positive, since otherwise the inequality of negative correlation cannot hold. 5
42
of them fails). This is because his indierence about which sequence to use means
E(#x ) = E (#y ):
Assuming that y fails the test and x passes it, we get
;
E (#x j$x < 1=2) < E #y j$y 1=2 : Moreover, even if the test is evaluated only for one of the samples, a reasonable decision is possible. If, say, we only know that #x < 1=2, we get E (#x j$x < 1=2) < E(#y ): (4.1.4) There is of course no need to restrict the pre{testing to sets H of statistical tests only any set of functions whose evaluation amounts to answering a yes-no question, functions which assume only two possible values, can be used. If, say, H is a set of functions T which can only take the values T (x) = 0 or T (x) = 1 (where T (x) = 0 is interpreted as x passing the `test' T ) and if positive correlation between the corresponding random variables #x and $x is assumend, then pre{testing is applicable. (4.1.5) So this is the justication for applying statistical tests to random number sequences. If the user considers a test as relevant for his simulation, his condence in a sequence necessarily increases if it passes the test. This argument is especially useful since we can apply a battery of statistical tests to a sequence before the user actually starts looking for a good one. The selection of these tests is of course more or less arbitrary, but this does not matter as long as the user can nd a test which checks for a property he considers relevant. The rst simple tests for random numbers were designed \in the early days of making tables of random digits" 80, p.6] in the work of Kendall and Babington-Smith 58]. These and a few more tests suggested by MacLaren and Marsaglia in 76] were compiled in Knuth's 59, Section 3.3]. An application of (slightly modied) tests from Knuth's battery to specic sequences was performed by L'Ecuyer in 68]. 43
Marsaglia 80] designed a battery of what he calls \stringent tests, because they seem more dicult to pass than the mild tests that have become standard." He gives a description of his tests and the results of applying them to various random number sequences. The most extensively applied test battery is due to Fishman and Moore 38]. In addition to presenting individual tests, they also present an \omnibus test" which is the conjunction of the individual tests. In 39], they apply their tests to a whole class of random number generators: out of over 534 million candidate generators, they select the 414 `best' with respect to some reasonable criterion6 , subject them to their tests, and present the results and the ve best generators. The same procedure is applied by Fishman in 37] to the whole of another class of random number generators and to parts of yet a third. (4.1.6) However, the utility of test batteries is restricted by the requirement that the tests must check for properties which the user considers relevant for his simulation. If he cannot nd a test which checks for relevant properties, probably the best advice for using the argument presented in this section is given by Marsaglia 80, p.6]: \If the RNG random number generator] is to be used for a particular problem, one should try to create a test based on a similar problem for which the underlying distributions are known : : : ". With the considerations of (4.1.4), we add that any test will do, statistical or not, if only it is relevant.
4.2 Employing known defects (4.2.1) By a defect of a sequence x, we understand a simulation problem F which x approximates `very badly'. In our terminology, this is to say that #x (F ) for some error bound determining the defect's severeness. Their selection is based on a normalized version c= s of the spectral test 1= s . We introduce the spectral test informally in (4.2.4) and formally in Section B.3. c depends on the generator and the tested dimension such that results of dierent generators are comparable. 6
44
We have already seen that any two samples x and y are `good' in approximating about the same number of simulation problems7 . It is clear that the same holds for the number of simulation problems in which x and y are `bad'8. Thus: Any sample is as defective as any other. We know x = (0 0 0 0 0 0 0 0 0 0 0 0) and y = (1 1 0 0 1 1 1 1 1 0 0 0) have the same number of defects, so { in general { x is `as good as' y. However, if we are going to simulate the average number of `heads' in 12 tosses of a fair coin (taking 1 as `head' and 0 as `tail'), we will tend to prefer y. Although this preference for y is quite obvious, it exemplies one thing: the crucial point about a sequence's defects is not their presence but their relevance for the simulation problem at hand. The more we know about the defects of x, the better we can decide for which simulations x should not be used. If no defect of x is explicitly known, we may { simulating a given problem F { stumble right over one of its worst deciencies. In this sense, a sound knowledge of the defects of x holds in favor of the sequence. (4.2.2) Concerning known defects, the study of deterministic sequences of random numbers, produced by so-called random number generators9, can ;1, each number xn is be very valuable. In such a sequence x = (xn )Nn=0 obtained by an explicit computational rule which { for virtually all random number generators in use today { is some sort of recursion, say, x0 := 0 xn := f (xn;1 ) (1 i < N ): We saw this with respect to statistical tests and nite binary sequences in Chapter 2 and with respect to arbitrary unbiased test sets and nite sequences in 0 1 in Chapter 3. 8 To those who have read Chapter 3: for a problem set F , those functions in which a sample x is defective with respect to form just the set F n Fx . Since the sets Fx and Fy are `of the same size' for almost all x and y whenever F is unbiased and non-trivial, (Proposition 3.1 and 3.2), the same holds for their complements. 9 If the nite sequence x is generated by a deterministic algorithm, some authors refer to x as a sequence of pseudo-random numbers and to the algorithm as a pseudo-random number generator. In accordance with Denition 2.1, and because we have seen that there is no probabilistic way to separate the nite sequences of `truly'-random numbers from those of pseudo-random numbers, we omit the adjective `pseudo' in this text. Any nite sequence x of numbers is a sequence of random numbers if it is used as a substitute for random variables in a simulation. If x is generated by a deterministic algorithm, we refer to this algorithm as a random number generator. 7
45
The output x of a random number generator has one important advantage over a nondeterministic y. x is completely determined by f and x0 any defect of x can { in principle { be derived from f and x0. On the other hand, the defects of y depend on all the y0 : : : yN ;1 if there is no mathematical structure in the numbers yn , its defects cannot be derived but just observed10. As Knuth notes in 59, p.75]: \Although it is always possible to test a random number generator using the statistical] methods in the previous section, it is far better to have `a priori tests,' i.e., theoretical results that tell us in advance how well those tests will come out. Such theoretical results give us much more understanding about the generation methods than empirical, `trial-and-error' results do." Although it is not always easy to infer structural deciencies of a random number generator's output from its algorithm, it can be done in some cases. For some random number generators, we are able to prove they are defective for a whole class of simulation problems, so we know in advance that they will fail the corresponding tests. Conversely, for some other random number generators, we are able to prove that they are not defective for a whole class of problems they are known in advance to pass the corresponding tests. In the rest of this section, we describe these generators and their defects along with how they can be used in conjunction with Criterion 2.4. For the sake of simplicity, this section is rather informal, focusing on the defects' consequences rather than on their cause. The precise and formal treatise is postponed to Chapter 5. (4.2.3) Today, the most widely used generator is in fact the one with There is yet another important advantage of using a random number generator to produce x. Since its output is repeatable, so is the computed simulation result F (x). The ability to obtain reproducible results is appreciated in scientic work in general and in computing in particular, as noted by Ripley in 95, p.153]: \For example, Ripley and Kirkland 96] :: :] show some summaries of the simulation of a Markov random eld which show an abrupt change at one point in the supposedly converging iterative process. Because a repeatable sequence was used, the process could be run up to just before that point and stopped, so the critical phase could be examined in detail." A similar statement is given by Halton in 47, p.2]. Without the ability to repeat the sequence x, one is unable to decide whether a given simulation result F (x) is valid or simply due to some programming error! 10
46
the apparently most serious defect: the linear congruential generator11 and its lattice structure. A linear congruential generator (LCG, for short) with integer parameters M a b, and y0 computes its values by the recursion
y0 x0 := M xn := axn;1 + Mb (mod 1)
(0 < n < N )
where the period N of the generator depends on M , a, and b. We refer the reader to Denition 5.1 for an exact denition and to (5.2.2) for conditions to achieve maximal period length. As can be seen from the denition of the xn , the LCG's recursion is dened by a straight line which is wrapped around the unit square by the modulo-operation for an LCG with M = 64, a = 5, and b = 21, the recursion f has the following form12: 1
0
1
Figure 1 If we plot the points (x0 x1) (x1 x2) : : : (xN ;2 xN ;1) (xN ;1 x0) produced by this LCG, they lie on this wrapped line: See (1.0.5) for who proposed this generator. If we used another LCG with larger values of M or a, the basic appearance of f would be the same but it would have signicantly more (and maybe steeper) linear branches which are hard to plot. 11 12
47
1
0
1
Figure 2 Linear patterns like this were known for quite a time13 without being recognized as an inherent aw of this kind of generator. Even eighteen years after the rst proposal of the LCG, Chambers 10] argued: \With linear congruential] generators having suciently long periods such patterns are no longer evident."14 The rst proof that linear patterns are inherent to the LCG was given by Marsaglia 78] in 1968: \if n-tuples (u1 u2 : : : un), (u2 u3 : : : un+1 ) : : : of uniform variates produced by the linear congruential] generator are viewed as points in the unit cube of n dimensions, then all the points will be found to lie in a relatively small number of parallel hyperplanes." This structure of overlapping tuples15 occurs with any LCG in any dimension. Moreover, the points do not only fall into quite a few parallel hyperplanes, but also exhibit a regular behavior within these. As Beyer et al. 5] and Smith 104] showed in 1971, the s-dimensional points produced by an LCG form a (shifted) lattice16 in 0 1s (Intermediate cases, where the points are merely contained in a (shifted) lattice, or form or are contained See, for example, Greenberger 43, 44]. There is of course reason in Chambers' observation. If the period is large and if we plot only a small fraction of all the points, then indeed the linear patterns are no longer evident see Figure 4b. 15 See also Marsaglias papers 79, ?]. 16 See Figure 2 to take a look at the lattice for s = 2. 13 14
48
in the union of several (shifted) lattices exist see 1, Section 3.3]. Handling these intermediate cases would add some technical diculties but not more clarity. Therefore, we only consider the standard case of points forming a (shifted) lattice in 0 1s). Without going into detail, it is clear that this is denitely not the expected behavior of the random variables which the LCG should approximate. Thus: let T be any test which checks a sequence z of random numbers for the presence of lattice or hyperplane structures in some dimension s > 1. We know than any x produced by an LCG will fail T . If the user considers T as relevant for his simulation problem, Criterion 2.4 can be applied just as in Section 4.1, yielding the result that x should not be used. (4.2.4) The lattice structure of the LCG enables us not only to avoid it in certain circumstances but also to select a `good' LCG, i.e. parameters M , a, and b such that the corresponding LCG is `better' than others. If you look at the lattice produced by the following two LCG17, 1
1
0
Figure 3a
1
0
Figure 3b
1
you are likely to prefer the second generator. Both of them produce a regular pattern, but the rst generator seems to be much worse. Marsaglia's work in 1968 78] has triggered a series of papers on how to describe the LCG's lattice in higher dimension. Several gures of merit Figure 3a was produced by an LCG with M = 1024, a = 1021, and b = 21 Figure 3b was produced by an LCG with the same parameters except for a, which was set to 997. The points in Figure 3a are packed so closely on just four line-segments that they cannot be perceived as individual points any more. 17
49
have been developed to facilitate nding an LCG whose points fall on more parallel hyperplanes, whose parallel hyperplanes are packed more closely, or whose lattice's unit cell is more cube-like. Considering the generators shown in Figure 3, think of a (not necessarily statistical) test T which `rejects' a sample z if the two dimensional points of z are covered by 10 or less parallel lines and `accepts' z otherwise. If we consider T as relevant for whatever problem we are going to simulate, we have reason to prefer the generator in Figure 3b to the one in Figure 3a. To consider a test T as `relevant', one has to understand the defects T can detect and the properties of randomness T tests for. The better these properties are understood, the better one can judge whether or not they are relevant for a given simulation problem. The following gures of merit are quite easy to understand.
The maximal distance 1= s of parallel hyperplanes which cover the LCG's lattice in Rs this is called the spectral test.
One interpretation of this is quite obvious: the larger 1= s , the further are the LCG's hyperplanes apart, and so the larger is the parallelepiped in 0 1s which is never hit by one of the points. If the simulation F (x) depends on the distribution of the s-dimensional points produced by the sequence x and if the LCG x has a large value of 1= s , then F (x) will ignore large areas of 0 1s. If those areas happen to be important for F , the simulation F (x) will ignore important aspects of the simulation problem F . A quite but not entirely unlike interpretation is given by Knuth in 59, p.90]: \If we take truly random numbers between 0 and 1, and round or truncate them to nite accuracy so that each is an integer multiple of 1= for some given number , then the t-dimensional points : : :] we obtain will have an extremely regular character when viewed through a microscope. : : :] We shall call 2 the two-dimensional accuracy of the random number generator, since the pairs of successive numbers have a ne structure that is essentially good to one part in 2 ." See (5.2.5) and Section B.3 for a formal specication of 1= s and for how to compute this quantity. It is interesting to note that Coveyou and MacPherson introduced the spectral test in 1967 14] without actually employing the s-dimensional lattice structure. \Instead of working with the grid structure of successive points, they considered random number generators as sources 50
of t-dimensional `waves'. The inverse of the distance of parallel adjacent hyperplanes covering the lattice] : : :] in their original treatment were the wave `frequencies', or points in the `spectrum' dened by the random number generator, with low-frequency waves being the most damaging to randomness hence the name spectral test" (from 59, p.110])18. The approach of Coveyou and MacPherson suggests that the spectral test might be used as a general gure of merit for rating sequences of random numbers. But when the sequence x has no lattice structure, Niederreiter observes in 87, p.168] that the \diculty here is to nd a convincing quantitative formulation of this idea." An alternative approach which conserves the basic concept of Coveyou and MacPherson and which is applicable for any sequence x is presented by Hellekalek in 50]: the diaphony (due to Zinterhof 117] see also the corresponding footnote on page 62) can be viewed as a weighted spectral test. While the spectral test computes the lowest wave frequency, the diaphony computes a weighted sum of all wave frequencies where lower frequencies are more emphasized. The relation qs of the shortest to the longest edge of the unit cell of the lattice in 0 1s, called the Beyer-quotient. Due to Beyer 5], the Beyer-quotient gives a good description of the unit cell of the lattice. If we look at Figure 3, we tend to prefer the generator for which q2 is closer to 1. One interpretation of qs is given by L'Ecuyer in 69, p.90]: \A unit cell of the lattice is determined by the vectors of a Minkowski-reduced lattice base (MRLB) : : :]. It is traditionally accepted that `better' generators are obtained when the unit cells of the lattice are more `cubic-like' (i.e. when the vectors of the MRLB have about the same size). The ratio qt of the sizes of the shortest and the longest vectors of a MRLB is called the Beyer-quotient." Another interpretation can be derived from Knuth's concept of sdimensional accuracy. Suppose we take \truly random" points in 0 1s and represent them with coordinates relative to the MRLB. If we round or truncate each coordinate to an integer value (performing some wraparound at the unit cube's boundaries), the resulting rounded points This `indirect' introduction of 1= s is probably one of the reasons why this quantity was misunderstood for quite a long time. See, for example, ?, p.278] for an unquotable statement on the spectral test. 18
51
will lie just on the lattice spanned by the MRLB. A Beyer-quotient qs close to 1 means that we loose about the same amount of accuracy in each coordinate, while a small value of qs means that one coordinate looses signicantly more accuracy than some other. We refer to (5.2.5) and Section B.2 for a more complete treatise of qs and Minkowski-reduced lattice bases and for how to compute these numerical quantities. Some more gures of merit describing the s-dimensional lattice of an LCG have been proposed, but we do not present them here since we could not nd reasonable interpretations by which the user can judge their relevance for his simulation problem. We refer the interested reader to 28], 78], ?], 86], or 92]. (4.2.5) Whenever lattice tests or tests for the presence of hyperplane structures are considered relevant, we can not only advise the user not to use an LCG but also which kind of generator to use instead: an inversive generator. In fact, there are two kinds of inversive generators, the inversive congruential generator (ICG) and the explicit inversive congruential generator (EICG). The ICG, proposed in 1986 by Eichenauer and Lehn 31], computes its numbers by means of a recursion just as the LCG, but its recursion is based on a nonlinear function. For a prime number p and suitable integer parameters a, b, and y0 , the ICG computes the sequence x = (xn )pn;=01 as
x0 := yp0 xn := f (p a b xn;1)
(0 < n < p)
where f (p a b x) is a `highly' nonlinear function in x. We give a more complete description of the ICG in Section 5.3. The EICG, proposed in 1993 by Eichenauer-Hermann 34], uses no recursion but, given the desired index n, facilitates the explicit computation of xn . Just as the ICG, it uses a prime number p and integer parameters a, b, and n0 to compute x = (xn )pn;=01 as
xn := g (p a b n0 n)
(0 n < p):
A more complete description of this generator can be found in Section 5.4. 52
The output of inversive generators has a property which renders them the alternative to the LCG whenever hyperplane defects are considered relevant19 . The s-dimensional points produced by an inversive generator avoid the hyperplanes in Rs . So with respect to hyperplane structures, inversive generators behave exactly as random variables are expected to. Since points avoiding the hyperplanes cannot form a lattice, the ICG and the EICG are proven to pass exactly those tests which the LCG fails! (4.2.6) Considering the undesirable lattice structure of the LCG and the lack of this structure in inversive generators, one might argue as Marsaglia does in ?, p.250]: \The conclusion is that linear] congruential random number generators are not suitable for precision Monte Carlo use." This statement is not completely valid. Of course, inversive generators perform better than the LCG { if lattice or hyperplane structures are concerned. Without this `if', there is no reason to reject the LCG. Knuth warns in 59, p.90] not to overestimate the LCG's lattice structure: \At rst glance we might think that such a systematic behavior is so nonrandom as to make linear] congruential generators quite worthless but more careful reection, remembering that the period] m is quite large in practice, provides a better insight." If a simulation problem consumes only a small fraction of a generator's output using too few points for the LCG's lattice to show up, it is rather hard to judge if an LCG or an inversive generator should be used20 : For a precise statement of this result, see (5.3.5) for the ICG and (5.4.5) for the EICG. Figure 4a is a plot of the points f(xn xn+1 ) : 0 i < 1000g obtained from the EICG with M = 231 ; 1, a = 1, b = 0, and y0 = 0 Figure 4b is a plot of the same set of points obtained from the LCG with M = 231 ; 1, a = 742938285, b = 0, and y0 = 1, which Fishman and Moore analyzed in 39]. 19 20
53
1
1
0
Figure 4a
0
1
Figure 4b
1
Even in their rst proposal of the ICG in 31, p.322], Eichenauer and Lehn stress that \one cannot recommend the application of nonlinear congruential generators : : :] instead of linear congruential generators : : :] in general. But they should be applied if one has the feeling that there is something wrong with the simulation results and one suspects that this is caused by the lattice structure of the linear congruential generator." Careful examination of the simulation problem is necessary to judge whether a lattice structure can be considered relevant or not. There are simulation problems whose examination does in fact indicate that lattice structures might be highly relevant. Ripley 92] considers the minimal distance of k independent points in 0 12. If such points are simulated by an LCG, they lie on a regular lattice. The shortest possible distance of two lattice points is just the length jjmjj of the shortest side of the lattice's unit cell. Although the minimal distance is expected to decrease as k increases, an LCG-based simulation will never yield minimal distances below jjmjj. Simulations of this minimal distance, where an ICG scores signicantly better than an LCG, were performed by Eichenauer and Lehn in 31]. However, these simulations have to be taken with a grain of salt. The involved generators have a very short period of less than 300000, while most real-world applications require period lengths of about at least 231. Although it is, in principle, possible to repeat the experiment in 31] for generators with larger period lengths, sampling more points for the LCG's lattice to show up, the computational complexity of the test-statistic { which is O(sample size2) { hindered us from doing so. In short, we could not produce results as devastating as those in 31] for LCGs with large period lengths. 54
Simulations where inversive generators are found to be always at least as good as and sometimes signicantly better than even the best LCG proposed in 39] (whose period is 231 ; 2) have been performed by Entacher 35, 36], Leeb 70, 71, 72], and Wegenkittl 112]. (4.2.7) A particularly serious defect common to many random number generators is the presence of a special kind of long-range correlations. If ;1 to form the points (xn xn+s ) for a xed we use a sequence x = (xn )Nn=0 shift s, these points should be randomly scattered in the unit square in particular, the numbers xn and xn+s should be empirically uncorrelated21 . For a variety of generators, the converse is true. For specic integers s called critical distances, the points (xn xn+s ) concentrate on a few parallel lines in the unit square. ;1 The presence of this kind of long-range correlation in a sequence r = (rn)Nn=0 can be devastating in a simulation which uses more than s random numbers. As De Matteis and Pagnutti note in 21, p.67]: \In any case, the event simulated by means of rn will be correlated with that simulated by rn+s for every n and wherever one starts in the sequence, i.e. each event will keep a memory of what happened s events before." Just as the LCG's lattice structure, this gives reason not to use generators which exhibit critical distances { if this defect is considered relevant. Again, this `if' is important. De Matteis and Pagnutti remark on the longrange correlations in 21, p.6]: \How harmful this may be depends on the particular application and also on the quantity of numbers required." The user is responsible to analyze his simulation problem and to evaluate the relevance of the long-range correlation defect. The rst indication of this defect was noticed in the LCG in the early '60s by Coveyou 13], Greenberger 43], and Peach 90]. It is interesting how many years passed until it was analyzed to its full extent. Various long-range correlations were of course observed since then (like, say, by De Matteis and Faleschini 17] in 1963, Dieter 24] in 1968, Ahrens, Dieter and Grube 2] in 1970, Dieter and Ahrens 27] in 1971, Neuman and Merrick 83] in 1976, Neuman and Martin 82] in 1976, Holmild and Rynefors 56] in 1978, and 21 x is used to model the sequence X = (Xn )Nn=0;1 of independent random variables. Since
Xn and Xn+s are stochastically independent, they are uncorrelated and so should be xn and xn+s .
55
Hill 53] in 1979), but even in 1987 Bowman and Robinson 7] noticed the defect merely by \Detailed examination of examples : : : ". Finally, in 1988, De Matteis and Pagnutti proved the presence of critical distances in a class of LCGs for specic values of s in 18] (see also 19]), a result which was generalized for arbitrary s in 1989 by Eichenauer-Hermann and Grothe in 33]. One may suspect this defect to be a direct consequence of the simple linear recursion of the LCG and that using more complicated or nonlinear recursions might be a remedy. Unfortunately, long-range correlations were observed also in Tausworthe generators22 by Neuman and Martin in 82], and the existence of critical distances s in Wichmann-Hill generators23 was shown by De Matteis and Pagnutti in 21]. Moreover, in 20], the above authors gave a quite restrictive but nevertheless sucient condition under which any generator with a recursion of order one, be it linear or nonlinear, has this defect. We show the existence of critical distances in the LCG in (5.2.6) and for any congruential generator forming a lattice in higher dimensions in Section B.4. (4.2.8) For sequences which exhibit critical distances, these can be used to select one whose long-range correlation between the n-th and the (n + s)-th number takes shape later, for larger values of s. To be specic, let x = (xn)Nn=0;1 and y = (yn )Nn=0;1 be sequences which exhibit critical distances. For some integer l, say, l = 1000, we dene24
s?(x) := minfs : the (xn xn+s ) are covered by at most l parallel linesg for x and s? (y) as the analogous number for y (note that both sequences are nite so the minima are well-dened). If we nd s? (x) > s? (y), we know that the specic long-range correlation in x forms later than in y. If we consider this concentration on at most l parallel lines as relevant, we can apply Criterion 2.4 to prefer x since it will pass a corresponding test which y will fail. (4.2.9) If the user considers the mere presence of critical distances Proposed by Tausworthe in 109] see also 87, Chapter 9]. The Wichmann-Hill generator, due to Wichmann and Hill 114, 115], computes its numbers by combining the output of (usually) three LCG. This technique was supposed to perturb the LCGs' simple lattice structure. 24 An algorithm for computing the quantity s? for LCGs with b = 0 is given by De Matteis, Eichenauer-Hermann, and Grothe in 16] and for Wichmann-Hill generators (composed of three LCGs whose additive constant b is zero) by De Matteis and Pagnutti in 21]. 22 23
56
as relevant for his simulation problem25 , he should use an EICG. For this generator, the absence of critical distances s (except for the trivial one, s = 0) is proven by the following result. The points (xn xn+s ) produced by an EICG avoid the lines in the unit square. This property, which is a consequence of Niederreiter's 85, Theorem 1], is stated more properly in (5.4.6). A similar but slightly weaker result for the ICG is given in (5.3.6). (4.2.10) All the random number generators presented so far yield, when all their numbers are produced, a one-dimensional grid equal or similar to f1=N : : : N ; 1=N g, sometimes including 0 (see (5.2.1), (5.3.2), and (5.4.2) for the exact form of this grid for the LCG, ICG, and EICG, respectively). As with the lattice produced by the LCG, it is clear that producing such excessive uniformity is not the behavior one would expect from random variables. If a user's simulation problem will consume, say, k random numbers, and if he considers excessive uniformity as a relevant defect, he has reason not to use a generator whose period N is close to k (if the generator exhibits excessive uniformity). It is not easy to assess which fraction of a generator's period N can be used without p risking excessive uniformity. A common advice is to use no more than N of a total of N random numbers, but as MacLaren objects in 77, p.47]: \This rule appears to be passed primarily by word of mouth, because nobody seems to know either where it originated or a reliable reference to its reason." MacLaren observes excessive uniformity in a number of generators in simulating a certain quantity when using more than N 2=3 numbers. However, he is cautious not to recommend this fraction as a denite safety limit: \This does not mean that all analyses will start to give incorrect results p at that point, but that at least some will do so." Besides the fractions N and N 2=3, using no more than N=100 is suggested by De'ak in 22, p.156], N=1000 by De Matteis and Pagnutti in 18, p.607], and an argument for using no more p than N=200 is given by Ripley in 93, p.26]. 25 As he may, for example, in certain forms of stochastic simulation on parallel computers see 18] and 19].
57
Although we do not feel able to favor one of these fractions, we can conclude that a generator with large period length might be preferred to another whose period is signicantly shorter if the user expects his simulation to require a suciently large amount of random numbers. To get a feeling of how many numbers certain simulations consume today and thus how large the period of the employed generator should be, we quote Halton 47, p.3]: \In a typical conventional particle-transport calculation, using nonbranching random walks, we may compute some 103 { 108 random walks, averaging perhaps 102 { 106 steps each, with every step requiring around 10 random numbers this adds up to a need for something of the order of 106 { 1015 random numbers."
4.3 Cross-testing (4.3.1) Donald Knuth ends his treatise on random numbers in 59, p.173] with the following advice: \The most prudent policy for a person to follow is to run each Monte Carlo program at least twice using quite dierent sources of random numbers, before taking the answers of the program seriously". This recommendation seems to be quite sensible and convincing. Suppose we simulate some problem F twice, using two sequences x and y of random numbers. If we nd F (x) and F (y) to be completely dierent, we are in the unpleasant situation of knowing that one result is signicantly better than the other : : : without knowing which. Conversely, if we nd F (x) to be close to F (y), this seems to give rise to some optimism that both simulation results are rather good. This optimism is well exemplied by Dudewicz and Ralley in 29, p.3]: \it is often desired to re-run at least part of a simulation study using a quite dierent generator (after which, if the two sets of results agree, one has somewhat more condence that generator regularities were not of such a serious nature as to vitiate the simulation study)." As long as the term `quite dierent generator' is unspecied, having `somewhat more condence' is completely unfounded. All that can be derived from F (x) being close to F (y) is that
jF (x) ; F (y)j = j(F (x) ; E (F )) ; (F (y) ; E (F ))j jj (F (x) ; E (F ))j; j(F (y) ; E (F ))jj = #x (F ) ; #y (F ) : 58
(1)
If c := jF (x) ; F (y)j is small, this only implies that that the pair (#x (F ) #p y (F )), viewed as point in the plane, lies withing a diagonal stripe of width c 2: #y
6
;; ; ; ;;; ; ; ; ; ;; ;; ; ; ; ; ; ; c; ; ; 0 c
#x
Without further assumptions, the point (#x (F ) #y (F )) can be { albeit within this stripe { arbitrarily far from the origin, so both approximation errors #x (F ) and #y (F ) can be arbitrarily large. (4.3.2) As with the pre{testing presented in Section 4.1, the above optimism is based on a hidden assumption. The samples x and y are considered `quite dierent' in the sense that they are unlikely to be both simultaneously `bad' in a simulation. Given this assumption, we will see that the above optimism is not only well-founded but in fact necessary. Let F be a set of simulation problems. For F 2 F , let #x (F ) := jF (x) ; E (F )j and let #y (F ) be dened analogously. Furthermore, for a xed > 0, let & (F ) :=
(
1 if j#x (F ) ; #y (F )j , 0 otherwise.
The event & is equal to 1 if and p only if the point (#x(F ) #y(F )) lies within the diagonal stripe of width 2. 59
Let (F R ) be a probability space such that #x , #y , and & are random variables. Then we have the following result. Whenever the random variables #x and & are negatively correlated, then
E(#x j& = 1) < E(#x ) < E (#x j& = 0): (2) Note that the negative correlation of #x and & expresses the assumption
that x and y are unlikely to be simultaneously both `bad' in a simulation problem. Note, too, that this `unlikelyness' depends not only on x and y alone, but also on the set of problems F and their `weighting' as well. The proof of (2) is simply a transcription of the proof of our statement on pre-testing given in (4.1.2) just with the positive correlation assumed there replaced by a negative one there is no need to repeat it here. Now what is the consequence of this if we observe that our simulations F (x) and F (y) produce approximately the same result? If
jF (x) ; F (y)j then (1) implies that
& (F ) = 1: Hence, assuming the negative correlation of #x and & and observing jF (x) ; F (y)j , we are forced to be optimistic about the approximation error produced by x. In this sense, Knuth's recommendation is indeed the \most prudent policy for a person to follow". (4.3.3) One might ask if the assumption of negative correlation can be replaced by a weaker one. In general, this is not possible. If we set := jF (x) ; F (y)j and run the proof of (2) backwards, we get that (2) implies the negative correlation of #x and & : let
E (#x j& = 1) < E (#x ): If we assume that P (& = 1) is dened and positive, this implies
E (#x j& = 1)P (& = 1) < 1: E (#x )P (& = 1) 60
Using the same argument as in (4.1.2) yields
E(#x & ) < 1 E(#x )E(& ) which means that #x and & are negatively correlated. In this sense, the negative correlation of E (#x ) and & and the optimism expressed in (2) are equivalent.
4.4 Employing deterministic error bounds (4.4.1) One might question the applicability of deterministic error bounds when dealing with a probabilistic notion of goodness. Deterministic error bounds seem to render the application of deterministic notions of quality such as Criterion 2.3 more appropriate, and we have already remarked in (4.0.2) that Criterion 2.3 is in some sense much stronger than Criterion 2.4. So why using the strong bounds with the weak criterion? In this section we present two examples of applying Criterion 2.4 in conjunction with a deterministic error bound from which we draw two interesting conclusions. The rst example shows that whenever Criterion 2.3 is applicable, Criterion 2.4 is applicable as well, and both criteria suggest the same sequences as being `good'. In this sense, the weak, probabilistic Criterion 2.4 is consistent with the strong, deterministic Criterion 2.3. The second example gives a scenario where a deterministic error bound is available but no `good' sequence can be found with Criterion 2.3. In this case, Criterion 2.4 will turn out to be applicable. In both examples we will use the following result which is introduced, stated formally, and proved later on, in Appendix A. Let T be a mapping from F into Rs and let t be a point in Rs. There exists a probability space (F R ) with E(T) = t (taking expectations component-wise) if and only if t lies in the convex hull of the set T(F ). With this, the problem of proving the existence of a probability space (F R ) with E(T) = t is equivalent to solving a relatively simple geometric problem. We know that using a result before proving it is not the 61
way to run a smooth argument regrettably the proof of the above is so completely o the track of thought we have followed so far that we cannot help but do just this. (4.4.2) In both examples, we will use a deterministic error bound known as Koksma's inequality26 . This inequality concerns functions Ff of the form
Ff (x) = N1
NX ;1 n=0
f (xn )
where f is a real-valued function on the closed unit interval27 with bounded variation28 . Recall that approximative solutions of the problem Ff are approxima;1 is a sequence tions of the integral of f . If f is integrable and X = (Xn )Nn=0 of N stochastically independent random variables each of which is equidistributed on 0 1, then the R expectation of Ff (X) equals the expectation of f (X0), which in turn is f (X0 )d: by means of Fubini's theorem and the independence of the Xn , we have
E (Ff (X)) =
Z
Ff (X)dN
01]N ;1 Z 1 NX
= N n=0
01]N
f (Xn )dN
26 Other deterministic error bounds are available, and we might have used them instead. A particularly interesting alternative is an error bound based on the diaphony which was introduced by Zinterhof in 117] see also Stegbuchner 107]. The advantage of this notion is that, in contrary to the discrepancy on which Koksma's inequality is based, the diaphony of N points in Rs can actually be computed with reasonable eort for s 1. Moreover, as pointed out on page 51, the diaphony has an interesting relation to the spectral test. The reason for choosing Koksma's inequality is twofold. First of all, Koksma's inequality is widely known and { mostly in the work of Niederreiter { ratings for various types of random number generators with respect to this inequality are available. The second, personal reason is that the concept of Criterion 2.4 was perceived in the contemplation of Koksma's inequality. 27 A generalization for the more interesting case of f being a function of more than one variable is available. Since we intend to use Koksma's inequality primarily for illustrative purpose, the special case of f depending on just one variable will suce. Moreover, the generalization has all the properties we require from the one-dimensional case, so all we are going to show for functions f of one variable applies also to functions f of many variables. 28 This is a technical requirement which we will describe in the following.
62
NX ;1 Z 1 =
N
Z
=
n=0 01]
f (Xn)d
f (X0 )d:
For these problems Ff , Koksma's inequality (due to Koksma 60] see also 87, Theorem 2.9]) provides an upper bound for the approximation error #x (Ff ) := jFf (x) ; E (Ff )j. If f has bounded variation V (f ) on 0 1], then #x (Ff ) V (f )DN? (x): (1) We do not prove Koksma's inequality here. For a proof of it and its generalizations for functions of more than one variable, the Koksma-Hlawka inequality, the reader is refered to Niederreiter 87, Theorem 2.9 and 2.11]. Before we describe the quantities on the right side of Koksma's inequality in detail, observe its formal appearance. The error #x (Ff ) depends on both x and f simultaneously. With Koksma's inequality, the error is bounded by the product of two quantities DN? (x) and V (f ), where the rst depends only on x and the second only on f . The contributions of x and f to the error bound are { in this sense { independent! The quantity DN? (x), the star-discrepancy, is dened as the rating of x with respect to Criterion 2.3 and the set n o I := F1 0t] : 0 t 1 : Stated formally, this means DN? (x) := sup #x =
I
sup F1 0t] (x) ; E (F1 0t] (X)) :
0t1
One might ask how a sequence's rating with respect to the particular set I can be used to bound its rating with respect to arbitrary functions of bounded variation. A hint is provided by a closer examination of the functions F1 0t] . For one, we have
F1 0t] (x) = N1
NX ;1 n=0
1 0t](xn )
= N1 # fxn : xn t 0 n < N g 63
and for the other
E (F1 0t] (X)) =
Z
1 0t](X0)d = (0 t]) = t:
So F1 0t] (x) is the empirical distribution function of the numbers xn and E (F1 0t] (X)) is the equidistribution's distribution function, both evaluated at t. From this point of view, the star-discrepancy is the distance of these two distribution functions with respect to the supremum-norm. Introduced as such, the star-discrepancy is a special case of what is called KolmogorovSmirnov statistic in the literature (see Bury 9, Section 6.11 { 6.14] or Knuth 59, Section 3.3]). For functions of the form F1 0t] , the following inequality holds due to the denition of DN? (x): #x (F1 0t] ) DN? (x): This is generalized for more interesting functions than simple indicators by the concept of variation. The quantity V (f ), the variation of f on 0 1], is dened as
V (f ) := sup
(X n
n=1
)
jf (an ) ; f (an;1)j : n 1 0 a0 < a1 < : : : < an 1 :
Note that if f is non-decreasing or non-increasing, then V (f ) = jf (0) ; f (1)j. Hence, for F1 0t] 2 I , we have V (1 0t]) = t and supfV (1 0t]) : F1 0t] 2 Ig = 1. (4.4.3) With this we are ready to use Koksma's inequality with the deterministic Criterion 2.3. For a xed real number c 0, let Fc be the set of all problems Ff for which the variation of f is at most c, i.e.
Fc := fFf : V (f ) cg : For this set, the following equality29 holds: sup #x = cDN? (x):
(2)
Fc 29 The idea for this is from Niederreiter's 87, Theorem 2.12]. A consequence of his much stronger result is that, instead of all functions Ff with V (f ) c, only those might be used for which f is continuously dierentiable innitely often.
64
In the sense of Criterion 2.3, the sequence x is considered `good' with respect to the set Fc if cDN? (x) is small. Just as with Koksma's inequality the contribution of x to the error bound was found to be independent of the contribution of the function f , we nd here that the contribution of x to supFc #x is independent of Fc . This is particularly useful since it causes us to regard x as `good' if DN? (x) is small, regardless of whatever value c might have. Proof of (2): Avoiding trivialities30 , let c > 0 be xed. Note that #x (Fcf ) and V (cf ) both are { by denition { linear in c: #x (Fcf ) = c#x (Ff ) and
V (cf ) = cV (f ):
Instead of considering the whole set Fc , we will nd that a quite small subset Ic is representative with respect to the supremum of #x . Let
n o Ic := Fc1 0 ] : 0 t 1 : Note that I1 = I . Since Fc is a superset of Ic , we have of course sup #x sup #x : t
Fc
Ic
On the other hand, there is31 sup #x sup fDN? (x)V (f ) : Ff 2 Fc g Fc
= = = =
cDN? (x) c sup #x I sup c#x I sup #x : Ic
These two inequalities imply (2). 2 30 For c = 0, Fc contains only those Ff for which V (f ) = 0, i.e. for which f is constant. For these Ff , the error x (Ff ) is always zero and (2) does always hold, independent of x. 31 In this series of (in-)equalities, we make use of Koksma's inequality, the denitions of Fc and DN? (x), the linearity of x (Fcf ) in c, and the denition of Ic . 65
Now suppose we search for a sequence x which should be `good' with respect to Fc by applying Criterion 2.4 instead of Criterion 2.3. Will we prefer sequences with small star-discrepancy? Will the application of the weak criterion suggest the same sequences as the strong one? There is no answer to this question in general. We might consider a statistical test T as highly relevant, which a sequence x passes, but which a sequence y with smaller star-discrepancy utterly fails. As described in Section 4.1, applying Criterion 2.4 in this case indicates that x should be preferred although its star-discrepancy is larger. But doing so, we would have used information { the relevance of the test T and the behavior of x and y in this test { which Criterion 2.3 per denition does not take into account. Let us try to apply Criterion 2.4 using no more information than we have needed to apply Criterion 2.3 in the rst place. Let (Fc R ) be a probability space such that #x is a real-valued random quantity. Suppose all we know about this probability space is the value32 of c in particular, suppose R and are unknown to us. What can be derived about the expected approximation error E (#x )? We can of course use the deterministic error bound (2): since #x supFc #x with certainty, we have E(#x ) cDN? (x): (3) But how accurate is this upper bound? The measure is not known to us, and the expectation of #x with respect to this unknown measure can be quite far away from cDN? (x). The point is this: If only c is known but is not, then the upper bound in (3) is best possible. By `best possible', we understand that there are so many probability spaces consistent with the provided information (the value of c), i.e. there are so many -algebrae R on Fc and so many measures on these R that all, that can be said about E (#x ) is that it is at most cDN? (x). If nothing else is known, E (#x ) can be arbitrarily close to cDN? (x). Lacking additional information, we regard x as `good' with respect to Criterion 2.4 if cDN? (x) is small. In particular, we regard a sequence with small star-discrepancy as `good', regardless of the actual value of c. Proof that (3) is best possible: Avoiding trivialities, let c > 0 be xed and known to us. Let R be a consistent -algebra on Fc , i.e. one 32
This information is required by Criterion 2.3, too.
66
for which #x is measurable. Finally, let M be the set of all probability measures on (Fc R) (note that M is the set of all probability measures consistent with the available information and with R). For the set
E := fE(#x ) : 2 Mg we show
sup E = cDN? (x) which means that (3) is best possible. Proposition A.1 states that E is equal to the convex hull of the possible values #x : E = conv #x (Fc): The one-dimensional convex set conv #x (Fc ) is an interval33 and, due to (2), its upper endpoint is cDN? (x). 2 (4.4.4) Since Criterion 2.3 worked well for the set Fc and any nite
c 0, one might expect it to be applicable to the set F1 := fFf : V (f ) < 1g
too. F1 is the union of all the sets Fc taken over all values c 0 and therefore it is a superset of each specic Fc this and (2) yield
8c 0 : cDN? (x) sup #x: F1
Hence supF1 #x = 1, independent of the star-discrepancy of x. For this set F1, no `good' sequence x can be found with Criterion 2.3 since supF1 #x equals 1 for any sequence x. This is not the expected result. For any nite c 0 and the corresponding set Fc , Criterion 2.3 caused us to prefer a sequence with small star-discrepancy. Intuition suggests that for the union F1 of these sets,
sequences with small star-discrepancy should be preferable, too. Of course, if V (f ) is nite but arbitrarily large, (2) implies that #x (Ff ) can also be arbitrarily large. The intuitive preference for sequences with small star-discrepancy seems to be based on the assumption that although V (f ) can be arbitrarily large, it is not expected to be. Let us express this assumption in the terminology of Criterion 2.4. 33
See page 106 for a proof of this.
67
Let (F1 R ) be a probability space such that #x and the mapping V : Ff 7! V (f ) are real-valued random variables. Suppose we know that E(V ) c, but R and are not known to us. Due to Koksma's inequality and E(V ) c, we have
E(#x ) cDN? (x):
(4)
In analogy to the previous example, the following holds. In this state of information, the upper bound in (4) is best possible. Hence, if no more information is available, then (4) is all that can be derived about E (#x ). We regard a sequence x as `good' with respect to Criterion 2.4 and the given information if cDN? (x) is small since the `quality' of x does not depend on the actual value of c, we regard a sequence as `good' with respect to the available information if DN? (x) is small, regardless of c. Proof that (4) is best possible: Let R be any -algebra on F1 for which #x and V are measurable. As a auxiliary tool, we dene the mapping ; : F1 ;! R2 Ff 7;! (#x(Ff ) V (f )) which is measurable since both #x and V are measurable. What we have to show is the existence of probability measures on (F1 R) which are consistent with the available information (i.e. E (V ) c) and for which E (#x ) gets arbitrarily close to cDN? (x). Due to Proposition A.1, this is equivalent to showing that there are points (r s) in the convex hull of ;(F1 ) with s c for which r gets arbitrarily close to cDN? (x). The set Fc is contained in F1 and so are the corresponding convex hulls. In (4.4.3) we have shown that sup fr : r 2 convf#x (Fc)gg = cDN? (x): Since
f(r s) 2 conv ;(F1) : s cg = conv ;(Fc) 68
we have sup fr : (r s) 2 conv ;(F1 ) s cg = sup fr : (r s) 2 conv ;(Fc )g = sup fr : r 2 conv #x (Fc )g = cDN? (x):
69
2
Chapter 5
Some generators He deals the cards to nd the answer the sacred geometry of chance the hidden law of a probable outcome. The numbers lead a dance. { Sting, Shape of My Heart
5.1 Preliminaries (5.1.1) Apart from designing statistical tests, the best assistance mathematics can provide to a user searching for a `good' random sequence is the analysis of random number generators. For these deterministic algorithms for generating random numbers, the presence or absence of certain defects can be proven. Understanding the nature of these defects and assessing their relevance for his simulation problem, the user can judge whether or not a given generator can be considered adequate (`good'). In the following, we give a precise denition of the generators presented in Section 4.2, and state and prove the existence or absence of the already mentioned defects. The reader who is not interested in the mathematical details involved and who is content to believe what we have said about the generators so far may skip this chapter. (5.1.2) The generators we are about to consider and in fact virtually all generators in use today are congruential generators. They pro70
;1 of integers which are spread evenly in some set duce a sequence (un )Nn=0 f0 1 : : : M ; 1g and transform these integers to a sequence (xn)Nn=0;1 of random numbers in 0 1 by setting xn := un =M . The reason for this is twofold. Performing all calculations in integer arithmetic except for the nal scaling assures that no propagating round-o errors are introduced in the actual generation of the random numbers on a computer. In this way, a random number generator will produce the same sequence regardless of the oating point representation used by the computer it is run on1 . The other reason for rst computing an integer sequence is the mathematical structure of the set f0 : : : M ; 1g, which we will employ in the following and which is briey sketched below a more complete discussion of these algebraic and number theoretic topics is given by Lidl and Niederreiter in 75].
(5.1.3) For a positive integer M , let ZM := f0 1 : : : M ; 1g be the system of all residues modulo M . Furthermore, let Z?M be the set of all numbers from ZM which are coprime to M . To obtain some arithmetic structure on ZM , we use the modulo operation a mod M , which yields the residue of the integer division of a by M . The addition of two elements a and b of ZM is dened as (a + b) mod M and their multiplication as (a b) mod M: We will omit the trailing mod and write a + b, a b, or simply ab when it is clear that we are operating in ZM . With these conventions, we get the following: (ZM +) and (Z?M ) both are abelian groups. Moreover, if M = p is prime, then (Zp + ) is a nite eld2 . This holds provided the integer computations are correctly implemented, which is not always the case (see 1, Section 3.1.2] or 100]). Fortunately, it seems to have become unpopular nowadays to believe that a correct implementation of a random number generator is as good as an incorrect one since its output is supposed to be random anyway: : : 2 For a 2 Z?p , its inverse a;1 is the uniquely determined number such that a a;1 = 1 in Zp . The inverse can be obtained by the Euclidean algorithm: since a and p are coprime, there are integers k and l such that ka + lp = 1 thus a;1 = k mod p. Another way to get a;1 is Fermat's theorem, which states that ap;1 = 1 in Zp therefore a;1 = ap;2 mod p. 1
71
Except for eld-isomorphisms, there is a unique nite eld with p elements called Fp Zp can therefore be identied with the general eld Fp . The integral domain of all polynomials in x over Fp is denoted by Fp x]. Working with polynomials from Fp x] is very similar to working with polynomials over R. The fundamental theorem of algebra3 holds in Fp x], and two polynomials from Fp x] are equal if and only if their coecients coincide. Since (Zp + ) is a eld, there is { for s 1 { an s-dimensional vector space over Zp which we denote by Zsp or Fps .
5.2 LCG Denition 5.1 Let a b u0 2 ZM . The linear congruential generator, LCG, for short, with parameters M , a, b, and u0 denes a sequence (un )n0 in ZM by un := a un;1 + b (n > 0) and a sequence (xn )n0 of random numbers in 0 1 by xn := uMn
(n 0):
(5.2.1) The sequence (un )n0 of an LCG is dened by a recursion of order one on the nite set ZM . Hence it will eventually repeat itself and so will the corresponding sequence of random numbers. Since this is usually considered a major defect, we adopt the convention of using only as many of ;1 and (xn )N ;1 with the xn as do not enter a cycle formally, we use (un )Nn=0 n=0 N dened as N := minfj > 0 : 9i < j : ui = uj g: Moreover, we will consider only LCGs whose sequences are purely periodic (i.e. x0 = xN ) in this case N is called the period of the corresponding generator. ;1 by lcg(M a b u0) and the sequence We denote the sequence (un )Nn=0 N ; 1 (xn )n=0 by LCG(M a b u0). In the following, we will consider the s-tuples (xn : : : xn+s;1 ) formed ;1 and the analogous sequence of s-tuples formed from the sequence (xn )Nn=0 3 If h 2 Fpx] is a non-zero polynomial of degree k, then h has at most k roots in Fp. 72
;1. To avoid technical diculties, we always implicitly assume from (un )Nn=0 that the indices are reduced modulo N . This is, say,
(xN ;1 xN : : : xN +s;2 ) = (xN ;1 x0 : : : xs;2 ): In this way, we do not have to bother with a wrap-around in the s-tuples when n approaches N . (5.2.2) Since we are operating on the M -element set ZM , the period of an LCG is at most M . Necessary and sucient conditions on the parameters a, b, and u0 to achieve the maximal possible period length are given in Knuth 59, Section 3.2., Theorem A, B, and C] or Ripley 93, Theorem 2.1, 2.2, and 2.3]. For our present purpose, we restrict ourselves to the following three standard types4 of maximal possible period LCGs which are either mathematically interesting or computationally ecient. 1. M a power of 2, a = 5 mod 8, b odd. In this case, we get a period of N = M . 2. M = p prime, a a primitive root5 modulo p, b = 0, u0 6= 0. In this case, we get a period of N = p ; 1. 3. M 4 a power of 2, a = 5 mod 8, b = 0, u0 odd. In this case, we get a period of N = M=4. Note that for each type, the set fun : 0 n < N g of the integers produced by an LCG is suspiciously regularly distributed: it is equal to ZM for Type6 1 and equal to ZM n f0g for Type7 2 for Type 3, it is equal to all numbers of the form 2k + 1 in ZM , where k 0 is either an even or an odd integer 4 These standard types of LCGs are obtained from Ripley 92] or from Niederreiter 87, p.169]. 5 a 2 Zp is a primitive root modulo p if the set of powers fa1 a2 : : : ap;1 g is equal to ? Zp ,6 or, in other words, if a generates Z?p . The recursion operates in the M -element set ZM where it achieves the maximal possible period of M since it produces exactly M dierent values, each value in ZM is produced exactly once. 7 In this case, the recursion operates in the (M ; 1)-element set ZM n f0g and has a period of M ; 1.
73
depending on whether the number l, for which u0 = 2l + 1, is even or odd8 . The regular pattern of the un in ZM is transposed to a regular pattern of the xn in 0 1. (5.2.3) To prove the existence of a lattice structure in the LCG's output, let s 2 be a xed integer and let
P := fxn = (xn xn+1 : : : xn+s;1 ) : 0 n < N g be the set of s-dimensional points produced by the LCG. To describe the lattice-like structure of P , we need the following formal denition of an sdimensional lattice, which is given according to Gruber and Lekkerkerker 46, Section 1.3, Denition 1].
Denition 5.2 Let a0 : : : as;1 2 Rs be linearly independent. The set (sX ) ;1 ) := kiai : ki 2 Z i=0
is called the s-dimensional lattice with basis fa0 : : : as;1 g.
A lattice shifted by a vector x is denoted by x + ). The lattice structure of the set P produced by an LCG is described by the following proposition which, as its proof, is given according to Ripley 92] (see also 87, Theorem 7.6]).
Proposition 5.1 For an LCG of Type 1 and 3, there exists a lattice ) such
that
P = 0 1s\(x0 + )):
For an LCG of Type 2, there exists a lattice ) such that
P = ]0 1s\): A simple inductive argument shows that, if u0 = 2(2 k) + 1, the LCG's recursion operates in the (M=4)-element set f4 i + 1 : 0 i < M=4g where it achieves a period of M=4. Conversely, if u0 = 2(2 k + 1) + 1 = 4 k + 3, then the recursion operates in the (M=4)-element set f4 i + 3 : 0 i < M=4g where it again achieves a period of M=4. 8
74
Proof of Proposition 5.1: We prove Proposition 5.1 only for LCGs of Type 1 the proof for the remaining types uses the same basic idea and diers only in some details. The idea is based on the following trick to represent the j -th component of the integer vector un = (un : : : un+s;1 ). For any lcg(M a b u0), we have un+j = uj + aj (un ; u0)
(0 j < s)
in ZM as is easily proved by induction9 . ;1 Let (un )M n=0 = lcg(M a b u0) be of Type 1. Since the period of this generator is M , we may assume10 u0 = 0. For a xed index n, the above representation of un+j yields
un+j = uj + aj un
(0 j < s)
in ZM . Since addition and multiplication in ZM are dened via reduction modulo M , there are uniquely determined integers kj for which we have
un+j = uj + aj un + kj M
(0 j < s)
in R. For the corresponding xn , this means
aj u + k xn+j = xj + M n j
(0 j < s):
(1)
To extend this equality to the vectors xn , we dene the s linearly independent vectors b := 1 (1 a a2 : : : as;1) 0
M
bj := ej the j -vector of the standard ordered basis of Rs (1 j < s): 9
Let the index n be xed. The equality is trivial for j = 0. Supposing it holds for some
j 0, we get
un+j+1 = aun+j + b = a(uj + aj (un ; u0 )) + b = auj+1 + aj+1 (un ; u0 )
in ZM . ;1 M ;1 10 The sequences lcg(Ma b 0) = (un )M n=0 and lcg(M a b u0 ) = (vn )n=0 of Type 1 dier only by a cyclic permutation in particular, both sequences yield the same set P .
75
Note that we use 0-based indices. Hence the standard ordered basis of Rs is fe0 : : : es;1 g, All the components of ei are 0 except for the (i + 1)-st, which is 1. With this, we can write (1) in vector form:
xn = x0 + un b0 +
sX ;1 j =1
kj bj :
Hence every point xn lies on the shifted lattice x0 + ), where ) is spanned by the lattice basis fb0 : : : bs;1 g11. In other words,
P 0 1s\(x0 + )): Conversely, if we take an arbitrary point p from 0 1s\(x0 + )), it has the form sX ;1 p = x0 + l0b0 + lj bj (lj 2 Z): j =1
Since p lies in 0 1s, its rst coordinate is between 0 and 1: 0 x0 +l0 M1 < 1. Recalling that we assumed x0 = 0 and multiplying this inequality by M , we get 0 l0 < M i.e. l0 2 ZM . Since the integers un run through the whole set ZM , there is l0 = un for some n. Thus
p = x0 + un b0 +
sX ;1 j =1
lj bj :
This representation of p is quite similar to the above representation of xn it remains to show that li = kj for 1 j < s. As we have noted before, the integers kj are uniquely determined by the reduction modulo M , i.e. by demanding that un+j 2 ZM or, equivalently, that xn+j 2 0 1. In other words: the integers k1 : : : ks;1 are the only ones for which sX ;1 x0 + un b0 + kj bj 2 0 1s: j =1
11 This lattice basis is the same for LCGs of Type 1 and 2. The lattice basis for a Type 3 LCG is similar, just with M replaced by M=4 in the denition of b0 .
76
Thus li = ki for 1 i < s, p = xn , and 0 1s \ (x0 + )) P : With this, the two sets are equal.
2
(5.2.4) Although we have proven the existence of a (shifted) lattice formed by an LCG's s-dimensional points with Proposition 5.1 and although we obtained a basis for the lattice in the proof, this particular basis does not seem to be well suited for assessing the lattice's quality. Considering the lattices formed by the LCGs in Figure 3 on page 49 and looking at the corresponding basis vectors as obtained from the proof, we have to admit that these lattice bases are not those we would have chosen by intuition. For the generator in Figure 3a, the lattice basis found in the proof is 1 1021 ) b0 = ( 1024 1024 b1 = (0 1) whereas the corresponding lattice basis for the generator in Figure 3b is 1 997 ) b0 = ( 1024 1024 b1 = (0 1): In the contrary to the resulting lattices, these bases do not seem to be much dierent. The basis suggested by the visual perception of a 2- or 3-dimensional lattice is one whose vectors have the shortest possible length. This feature is provided by a Minkowski-reduced lattice basis (MRLB, for short) which is dened by the following { non-constructive { procedure: given an s-dimensional lattice ), a MRLB fm0 : : : ms;1g is obtained by rst choosing a basis vector m0 as short as possible and, if m0 : : : mk;1 (0 < k < s) have been chosen, choosing a basis vector mk as short as possible. Note that as a consequence of this procedure, the vectors of a MRLB are ordered according to their length: jjm0jj jjm1jj : : : jjms;1 jj. Although an MRLB of a lattice does always exist, it is, in general, not unique. More precisely, for s 7, one can nd a lattice ) and two MRLBs fm0 : : : ms;1g and fn0 : : : ns;1g of ) with jjmkjj 6= jjnkjj for some k. For s < 7, however, this is not possible. Since Minkowski-reduced lattice bases are used only as an auxiliary notion in the next paragraph of this section, we will dene and discuss it and the 77
above claims more formally later, in Section B.2 in the appendix. Algorithms to compute a MRLB for a given LCG and dimensions s = 2, 3 s 6, and s 7, respectively, are given by A*erbach in 1, Chapter 3, Algorithms MB.2, MB.3-6, and MB.n]12. (5.2.5) The notion of a Minkowski-reduced lattice basis gives us a precise formulation of the intuitive idea of choosing a basis of successively shortest possible vectors. With this, we can dene the gures of merit for assessing the quality of a lattice, which we informally introduced in (4.2.4). The Beyer-quotient qs was pictured there as `the relation qs of the shortest to the longest edge of the unit cell of the lattice'. Since the vectors of a MRLB span what is intuitively called a unit cell, we can { at least for s 6 { dene according to Beyer 5]: Let ) be the s-dimensional lattice of an LCG as in Proposition 5.1. For a MRLB fm0 : : : ms;1 g of ), the quantity
qs := jjjjmm0jjjj s;1
is called the corresponding Beyer-quotient. Because the lengths of the vectors of a MRLB are uniquely determined only for s 6, the same may apply to the Beyer-quotient see Section B.2 for a discussion of this. It is clear that 0 < qs 1. Values of qs close to 1 are considered `good' since the corresponding unit cell is more cube-like. The second gure of merit presented in (4.2.4), the spectral test, was introduced as `the maximal distance 1= s of parallel hyperplanes which cover ;1 and s N , the LCG's lattice in Rs '. For an LCG(M a b u0) = (xn )Nn=0 13 we have the following : Although A$erbach's last algorithm, MB.n, works for arbitrary dimensions, it should be used only with greatest care and suspicion because it possibly produces wrong results: given any basis of an s-dimensional lattice % as input, the algorithm transforms it into supposedly equivalent bases until it ends up with a MRLB. Two sets of s linearly independent vectors span the same lattice % if and only if they are transformed to each other by a unimodular matrix (i.e. an integral matrix with determinant 1 see 46, Section 1.3, Theorem 7]). Due to an oral communication with Niederreiter 88], non-unimodular matrices can occur in the progress of algorithm MB.n, and so its result can be a basis spanning a lattice %0 dierent from %. 13 For s > N , the N s ; 1 points produced by the LCG are all contained in just one hyperplane in Rs and should therefore not be used anyway. 12
78
Let ) be the s-dimensional lattice of an LCG as in Proposition 5.1. Then ) denes the so called dual lattice )?14 . The value of the spectral test is 1 = 1
s
jjmjj
where m is the shortest nonzero vector in )?. Thus computing the spectral test amounts to nding the shortest vector in )?, which can be found as described in Dieter 25], Knuth 59, Section 3.3, Algorithm S], or Ripley 93, p.37] (a FORTRAN code for dimensions s 8 is given in 93, Section B.3]). We postpone the proof of the above statement to Section B.3 in the appendix. The reason is that the complete discussion of the spectral test is lengthy and not as much concerned with LCGs than with lattices in general. We avoid the detour of handling the spectral test here. (5.2.6) Having proven the existence of a lattice structure in the output of an LCG, it is fairly easy to describe the specic long-range correlations, the critical distances, in its sequence. The following15 is a consequence of understanding what Eichenauer-Hermann and Grothe showed in 33].
Proposition 5.2 For an LCG (xn)Nn=0;1 of Type 1 to 3, the points (xn xn+s ) (0 n < N ) have a lattice structure in R2 . ;1 Let (xn )M n=0 = LCG(M a b 0) be of Type 1 (We prove this proposition only for this type. The proof for the remaining types is analogous). Since the LCG is purely periodic with period M and the proposition is trivial for s = 0, we can assume 0 < s < M . Let xn = (xn : : : xn+s ) be the (s + 1)-dimensional points obtained from the
Proof:
14 % does not only dene %?. Given a lattice basis of %, a lattice basis of %? can be computed see (B.1.3) 15 There is a slight generalization of this, whose statement and proof unfortunately require too many technical details from lattice theory to be stated here. The interested reader may consult Section B.4 or, most likely, the whole Appendix B.
79
LCG and let P = fxn : 0 n < M g be the set of all these points. From Proposition 5.1, we know that
P = 0 1s+1\(x0 + )) where the lattice ) is spanned by the vectors b := 1 (1 a a2 : : : as) 0
M
bj := ej
(0 < j s):
To obtain the tuples (xn xn+s ), we use the 2 (s + 1)-matrix
T :=
1 0 ::: 0 0 0 0 ::: 0 1
!
which denes a function from Rs+1 to R2 , mapping x 2 Rs+1 to T x. With this, we have T P = f(xn xn+s ) : 0 n < M g : The proof is complete if we can show that is a lattice in R2
)0 := T ) and that
;
T P = 0 12\ T x0 + )0 :
(2) (3)
To prove (2), observe the following. ) is the set of all integer linear combinations of the bi (0 i s). T is linear, so )0 = T ) is the set of all integer linear combinations of the T bi . Moreover, for 0 < i < s, we have T bi = (0 0). Hence )0 is the set of all integer linear combinations of T b0 = 1=M (1 as) and T bs = (0 1). Because these two vectors are linearly independent, )0 is a lattice in R2 . To prove (3), we show that the sets on each side of the equation are included in each other. For one, we have
T P = T 0 1s+1\(x0 + )) T 0 1s+1\T (x0 + )) = 0 12\(T x0 + )0): 80
The second inclusion needs some extra consideration. First note that Zs+1 is a subset16 of ). Any point z 2 Rs+1 can be written as the sum of two other points z = bzc + fzg where the coordinates of bzc are all integers and the coordinates of fzg are all between 0 inclusive and 1 exclusive. From the denition of a lattice, we see that for any two points p q 2 ), their sum p + q is as well contained in ). For any lattice point p, we have bx0 + pc 2 Zs+1 and this yields
fx0 + pg = x0 + (p ; bx0 + pc) 2 x0 + ): Now we are ready for the second inclusion of (3). For
T x0 + x0 2 0 12\(T x0 + )0) there is a lattice point p 2 ) with
T (x0 + p) = T x0 + x0: Choosing x such that
x0 + x := fx0 + pg implies x0 + x 2 0 1s+1\(x0 + )). Since T x0 + x0 2 0 12, the rst and the last coordinate of x0 + p are in 0 1. This and the denition of x0 + x = fx0 + pg yield T (x0 + x) = T x0 + x0
which completes the proof. 2 16 The lattice Zs+1 is spanned by the standard ordered basis fe0 e1 : : : es g of Rs+1 , so it is contained in % if the ei are. For i = 1 : : : s, the ei are of course in % because they are part of a lattice basis. For i = 0, we have
e0 = M b0 ; which is thus contained in %.
81
X s
i=1
ai bi
5.3 ICG (5.3.1) The inversive congruential generator (ICG) is a recursive congruential generator using a recursion of order one, just like the LCG. But unlike linear generators, the ICG's recursion is based on a highly nonlinear function. For a (large) prime p and a 2 Zp , let17
a :=
(
a;1 : a 2 Z?p 0 : a=0
in Zp .
Denition 5.3 Let p be a (large) prime and a b u0 2 Zp. The inversive congruential generator, ICG, for short, with parameters p, a, b, and u0 denes a sequence (un )n0 in Zp by un := a un;1 + b
(n > 0)
and a sequence (xn )n0 of random numbers in 0 1 by
xn := upn
(n 0):
(5.3.2) An ICG is of course periodic for the same reason an as LCG. The same considerations as in (5.2.1) apply to the ICG. In particular, we only consider ICG which are purely periodic with some period length N , ;1. And and for these ICG, we agree to use only the rst N numbers (xn )Nn=0 as before, it seems desirable to choose the ICG's parameters as to maximize its period length. We denote the sequence (un )pn;=01 by icg(p a b u0) and the sequence (xn )pn;=01 by ICG(p a b u0). A necessary and sucient condition under which ICG(p a b u0) has the maximal period length N = p is given by Flahive and Niederreiter in 40]. This necessary and sucient condition, and even a just sucient one given in 31] or 87, Theorem 8.4], are such that their statement and their proof require some background in the theory of nite elds that we do not want 17 Note that we are not computing in R but in the nite eld Zp now. As pointed out in (5.1.3), for any a 2 Z?p , there is a unique element a;1 2 Z?p such that aa;1 = 1 in Zp . 82
to introduce here. In the following, whenever we refer to a generator as an ICG, we simply assume it has maximal period p. Moreover, since we only present results for the totality fu0 : : : up;1g = Zp of an ICG's output, we additionally assume u0 = 0. Thus, when talking about an ICG, we always assume it has the form ICG(p a b 0) = (xn )pn;=01 : A description of algorithms to compute the parameters for maximal period ICGs is given by Hellekalek in 51]. For p 231 ; 1 being a Mersenne prime, tables of parameters for maximal period ICGs are presented by Hellekalek, Mayer, and Weingartner in 52]. (5.3.3) To study the structure of the s-dimensional points xn = (xn : : : xn+s;1 ) of an ICG, we switch to the corresponding integer s-tuples
un = (un : : : un+s;1 ) which we view as points not in Rs but in the nite vector space Zsp . We
will nd this to be technically convenient. As in (5.2.1), we assume that the indices of the un and xn are reduced modulo the generator's period p. Switching from xn to un does not really matter since they dier only by a constant scaling factor p any structure of the xn is inherited by the un , and vice versa. The change from Rs to Zsp however, from an innite vector space to a nite one with dierent arithmetical structure, needs some explanation. Zsp is contained in Rs because we have
Zsp = 0 ps\Zs: A hyperplane Hnc in Rs , dened by a nonzero vector n 2 Rs and a constant c 2 R, is the set of those points x 2 Rs for which18 < n x > = c. A hyperplane in, say, Z2p is a discrete set of integer points. If we picture these as points in R2 , we nd that they all lie on one straight line which is folded when it leaves the square 0 p2. More generally, a hyperplane in Zsp 18 For two points x = (x0 : : : xs;1) and y = (y0 : : : ys;1) in Rs , we denote their inner product by : s;1 X := xi yi : i=0
83
corresponds19 to a nite number of parallel hyperplanes in Rs . Conversely, any hyperplane in Rs which contains at least s anely independent points from 0 ps\Zs corresponds20 to a single hyperplane in Zsp . We are about to show that the s-dimensional points of an ICG do literally avoid the hyperplanes in Zsp . Due to the above correspondence of hyperplanes in Zsp an Rs , hyperplanes in Rs are also avoided. In this sense, switching from Rs to Zsp is meaningful. The technical convenience gained by changing to Zsp will become apparent in the forthcoming proofs. (5.3.4) To prove that the un avoid the hyperplanes in Zsp , we use a technique due to Niederreiter 87, Theorem 8.6]. We show that for most of these points, the condition
un is contained in a given hyperplane can be translated to
n is root of a corresponding nonzero polynomial of degree s over
Zp.
Since such a polynomial has at most s roots in Zp, it will follow that at most s of these points are contained in any given hyperplane. The primary device to translate between the above conditions for a full period icg(p a b 0) = (un )pn;=01 is the mapping
P : Zp ;! Zp x 7;! ax + b
For n 2 Zsp n f0g and c 2 Zp , the hyperplane fx 2 Zsp : = c (in Zp )g in Zsp corresponds to those hyperplanes Hnc+kp (k 2 Z) in Rs which intersect 0 ps \Zs. 19
Let Hnc be a hyperplane in Rs containing s anely independent points p0 : : : ps;1 from 0ps \Zsp . The pi are contained in the vector space Zsp , and it is easy to see that they are anely independent in Zsp . Hence there is a uniquely determined hyperplane Hn c in Zsp which contains all the pi. As we have seen above, this hyperplane in Zsp corresponds to some of the hyperplanes Hn c +kp (k 2 Z) in Rs. Since we assumed that the pi are covered by one hyperplane Hnc in Rs , we can conclude that for one integer k0 , we have Hnc = Hn c+k p . 20
0
0
0
0
0
0
84
and its iterates
P j (x) :=
(
x : j = 0 P (P j;1 (x)) : j > 0:
For these, we have21
Lemma 5.1 Let (un)pn;=01 = icg(p a b 0) be a full period ICG. If 1 j < p
and
in Zp , then22
x 6= ;aui
(0 i < j )
P j (x) = uj xx++auauj : j ;1
Proof: We use induction on j . Throughout the proof, we always compute in Zp . Let j = 1 and x 6= au0 (which means x 6= 0 since we chose u0 = 0). With this, the denition of P and the basic recurrence of the ICG yield P 1 (x) = b + ax = bx x+ a = b x +x ab au1 : = u1 xx + + au0 Now suppose the induction hypothesis holds for any integer between 1 and j inclusive with 1 < j + 1 < p. Let x 2 Zp such that
x 6= ;aui
(0 i < j + 1):
Before we can apply the induction hypothesis to P j +1 (x) = P j (P (x)), we have to verify its applicability for P (x). By the choice of x, we have
ax + b 6= ;ui + b 21 22
(0 i < j + 1):
See 87, Equation (8.6)]. Instead of ab;1 , we also write ab .
85
The left side of this equation is equal to P (x). The denition of ui = aui;1 +b yields that the right side is ;ui + b = ;aui;1 . This means (0 i < j )
P (x) 6= ;aui
and the induction hypothesis is applicable for P (x):
P j +1 (x) = P j (P (x)) = uj axax++b b++auauj j ;1 ax + u j +1 = uj ax + u j a + u j +1 = uj a + u xx j a + u x j +1 = auj + x = uj +1 xx++auauj +1 : j
Hence the induction hypothesis holds for j + 1, too.
2
(5.3.5) With Lemma 5.1, it is fairly easy to prove that ICGs avoid the planes, as was originally shown by Eichenauer-Hermann in 32] (the proof given here stems from Niederreiter 87, p.184]).
Proposition 5.3 Let s 2 and let ICG(p a b u0) be a full period ICG. Then any hyperplane in Zsp contains at most s of the points un = (un : : : un+s;1) (0 n < p) for which the rst s ; 1 coordinates are nonzero. The statement is trivial for s p even without the condition of nonzero coordinates, because the ICG can produce only p s distinct points. Otherwise, s ; 1 of those u0 : : : up;1 are excluded from the proposition for which xn = un =p lies on one of the `left-sided' faces of 0 1s. This is not much of a restriction if s is small. If s is large and, in particular, if s is close to p, all except just p ; s of the xn lie on the `left-sided' faces of 0 1s in this case, you probably would not want to use the given ICG anyway and prefer a generator with longer period instead. 86
Proof: Throughout the proof, we always compute in Zp. The proposition trivially holds for s p as noted above therefore let s < p. Without loss of generality, we assume u0 = 0. The ICG(p a b 0) has the full period p, so the sequence (un)pn;=01 visits every element of Zp exactly once. Hence f(un un+1 : : : un+s;1 ) : 0 n < pg
= (P 0 (x) P 1(x) : : : P s;1 (x)) : x 2 Zp : Let us see how the condition `the rst s ; 1 coordinates of un are nonzero' can be formulated for the second set above. A point in the second set has a nonzero rst coordinate if P 0 (x) = x 6= 0 by the choice of u0 = 0, this means x 6= ;au0. If we have x 6= ;au0 , then Lemma 5.1 yields that P 1 (x) 6= 0 if x 6= ;au1 . If we have x 6= ;au0 and x 6= ;au1 , then Lemma 5.1 yields that P 2 (x) 6= 0 if x 6= ;au2 . Proceeding this way, we get that a point (P 0 (x) : : : P s;1 (x)) in the second set has its rst s ; 1 coordinates nonzero if and only if x 6= ;aui for 0 i < s ; 1. We have to show that the set of points P := f(un : : : un+s;1 ) : 0 n < p un : : : un+s;2 6= 0g intersects any given hyperplane in Zsp in at most s points. With the above considerations, we have o n P = (P 0(x) : : : P s;1(x)) : x 2 Zp n f;au0 : : : ;aus;2g : Now let b = (b0 : : : bs;1) 2 Zsp n f0g and c 2 Zp . A point (P 0 (x) : : : P s;1 (x)) 2 P lies on the hyperplane dened by b and c if and only if sX ;1 bi P i (x) ; c = 0: i=0
Lemma 5.1 is applicable to the coordinates of any point in P . With this, the above hyperplane-equation translates to sX ;1 b0x + bi ui xx++auaui ; c = 0: i;1 i=1 Clearing denominators, this is equivalent to h(x) = 0, where h is a polynomial over Zp:
h(x) = (b0x ; c)
sY ;1 i=1
(x + aui;1 ) +
sX ;1 i=1
87
biui(x + aui)
sY ;1
j=1 j 6= i
(x + auj ;1 ):
We will show that h has at most s roots in Zp , which essentially completes the proof: in the above representation of P , every point is uniquely identied by a particular x 2 Zp n f;au0 : : : ;aus;2 g. If h has at most s roots in Zp, it has of course at most s roots in Zp n f;au0 : : : ;aus;2g, so at most s points of P lie on the hyperplane dened by b and c. The degree deg(h) of h is at most s, so we are nished if h is not the zero-polynomial. If h were the zero-polynomial, the coecient b0 of xs would be zero. Then, for 1 k s ; 1, we would have
h(;auk;1 ) = bk uk (;auk;1 + auk ) = 0:
sY ;1
j=1 j 6= k
(;auk;1 + auj ;1 )
From u0 = 0 and 1 k s ; 1 < p, we conclude that uk 6= 0. For i 6= j in Zp, we have ui 6= uj . Therefore ;auk;1 + auk 6= 0 and
;auk;1 + auj;1 6= 0 (j = 1 : : : s ; 1 j 6= k): With this, all terms of h(;auk;1 ) = 0 except bk are nonzero. Thus, for 1 k s ; 1, we have bk = 0. We get that the vector b = (b0 : : : bs;1) is equal to the zero vector 0, which contradicts the fact that we have chosen b 2 Zsp n f0g in the rst place. So h is not the zero-polynomial. 2 (5.3.6) The avoidance of hyperplanes by the s-dimensional points of an ICG does not strictly imply the absence of long-range correlations. We show at least the absence of critical distances for reasonably small shifts s.
Proposition 5.4 Let ICG(p a b u0) be a full period ICG. Then any hyperplane in Z2p contains at most two of the points (un un+s ) (0 n < p) for which un un+1 : : : un+s;1 are all nonzero.
Proof: We will proceed as in the proof of Proposition 5.3. Without
loss of generality, we assume u0 = 0. Since the sequence is purely periodic with period p, we can assume s < p. Moreover, to avoid trivialities, let 1 < s < p. 88
The proposition states that any hyperplane dened by b = (b0 b1) 2 Z2p n f0g and c 2 Zp contains at most two points from P := nf(un un+s) : 0 n < p 0 62 fun un+1 : : : un+so;1gg = (P 0 (x) P s(x)) : x 2 Zp n f;au0 : : : ;aus;1 g : A point (P 0 (x) P s(x)) from P lies on the hyperplane dened by b and c if
b0P 0(x) + b1P s (x) ; c = 0: Lemma 5.1 holds for the coordinates of every point from P , so the above condition translates to b0x + b1us xx++auaus ; c = 0: s;1 Clearing denominators, we get that x is a root of the polynomial
h(x) = (b0x ; c)(x + aus;1 ) + b1us (x + aus ) over Zp . Since deg(h) 2, h has at most 2 roots in Zp and therefore at most 2 roots in Zp n f;au0 : : : ;aus;1 g { if it is not the zero-polynomial. Suppose h were the zero-polynomial. Then the coecient b0 of x2 would be zero. Moreover, we would have
h(;aus;1 ) = b1us (;aus;1 + aus ) = 0: As before, this would yield that b1 is zero, too. We would get that b, which was chosen from Zsp n f0g, is equal to 0. This is a contradiction, so h is not the zero-polynomial. 2
5.4 EICG (5.4.1) The last generator we will study in detail is the explicit inversive congruential generator. It uses the same inversion function as the ICG does, but it is a nonrecursive generator. 89
Denition 5.4 Let p be a (large) prime and let a b n0 2 Zp. The explicit
inversive congruential generator, EICG, for short, with parameters p, a, b, and n0 denes a sequence (un )n0 in Zp by
un := a (n0 + n) + b
(n 0)
and a sequence (xn )n0 of random numbers in 0 1 by
xn := upn
(n 0):
(5.4.2) Choosing parameters p a b, and n0 to obtain the maximal possible period length is particularly easy. If only p is prime and a 6= 0 in Zp , any choice of b and n0 gives an EICG with period p producing fu0 : : : up;1g = Zp. The reason for this is that due to the group structure of Zp, the function f (n) := a (n0 + n) + b (being composed of bijective functions on Zp ) is itself a bijection on Zp . Hence, for 0 n < p, the resulting values f (n) = un are all distinct. For n p, we have n mod p = n0 for some n0 2 Zp , and hence un = un0 . In the following, we only consider maximal period EICGs. For a 2 Z?p and b n0 2 Zp , we denote the sequence (un )pn;=01 by eicg(p a b n0) and the sequence (xn )pn;=01 by EICG(p a b n0). (5.4.3) Since virtually any choice of parameters denes a full period EICG, there seems to exist quite a lot of dierent EICGs for a given prime modulus p. We will see that this is not exactly true. First of all, for a 6= 0, the EICGs EICG(p a b 0) and EICG(p a b n0) dier only by a cyclic permutation. Therefore, considering the totality fx0 : : : xp;1g produced by EICG(p a b n0), we can always assume n0 = 0. Next, we show that one of the EICG's parameters is redundant23 .
Lemma 5.2 Let a 2 Z?p be xed. For any b 2 Zp, we have eicg(p a b 0) = eicg(p a 0 ab): Conversely, for any n0 2 Zp , we have
eicg(p a 0 n0) = eicg(p a n0a 0): The idea for this stems from Karl Entacher's observation that the 2-dimensional scatter-plots of two EICGs with the same p, a, and n0 but dierent values of b look exactly alike. The same idea is implicitly contained in Niederreiter's 85, p.5]. 23
90
For any choice of b 6= 0 and n0 6= 0, the resulting eicg(p a b n0) diers from eicg(p a 0 0) by just a cyclic permutation, and any cyclic permutation of eicg(p a 0 0) is equal to eicg(p a b 0) for some b. The reason we introduced the EICG including the redundant parameter in the rst place is technical and will soon become apparent. Proof: All calculations in the proof are carried out in Zp. Let a 2 Z?p, b 2 Zp , let (un)pn;=01 = eicg(p a b 0), and (vn)pn;=01 = eicg(p a 0 ab). With this, we have u0 = a (0) + b = b = a(ab) + 0 = v0 : Note that un can be computed recursively as un = un;1 + a and the same holds for vn , too. So both the un and the vn depend on their predecessors by a recursion of order 1. We get u1 = v1 and, inductively, (un )pn;=01 = (vn )pn;=01 . This proves the rst part of the lemma. Proving the second part is equally elementary. 2 Finally, there is an obvious relation between the sequences eicg(p a 0 0) and eicg(p 1 0 0).
Lemma 5.3 Let a 2 Z?p. The sequence eicg(p a 0 0) is obtained by selecting every a-th element from eicg(p 1 0 0).
Remark: As we did for the LCG and ICG, we implicitly assume that the
index n of the numbers xn and un produced by an EICG are reduced modulo p. Hence selecting every a-th element from eicg(p 1 0 0) = (un )pn;=01 means selecting u0 ua modp u2a modp : : : u(p;1)a modp . Proof: Let (un)pn;=01 = eicg(p a 0 0) and (vn)pn;=01 = eicg(p 1 0 0). We have un = an = 1 (an) = van : in Zp . 2 ? With these observations, for any a 2 Zp and b n0 2 Zp , the sequence eicg(p a b n0) is obtained from eicg(p 1 0 0) as follows: 91
select every a-th element from eicg(p 1 0 0), and to the selected elements, apply a cyclic permutation whose `shift' is determined by b and n0 .
(5.4.4) Lemma 5.2 and 5.3 show that all maximal period EICGs for a given prime modulus p are closely related more precisely, they show that every EICG(p a b n0) is obtained from EICG(p 1 0 0) by selecting and then rotating a subsequence of equal stride. In this sense, for a given prime p, the corresponding EICGs are linearly related24 to each other. This observation gives us a new way to look at the s-tuples un = (un un+1 : : : un+s;1 ) obtained from an eicg(p a 0 0) = (un )pn;=01 . The i-th coordinate un+i of un is just the n-th element of the sequence eicg(p a ia 0) = (un+i )pn;=01 . The result we are about to state concerns a more general form of s-tuples. We consider the s EICGs (u(ni))pn;=01 = eicg(p ai bi 0)
(0 i < s)
with ai 2 Z?p to construct the s-tuples
(s;1) un := (u(0) n : : : un )
(0 n < p):
The following result for these general s-tuples is due to Niederreiter 85].
Proposition 5.5 If a0b0 : : : as;1bs;1 are mutually dierent in Zp, then any hyperplane in Zsp contains at most s of the un whose coordinates are
nonzero. If the hyperplane contains 0, then in contains at most s ; 1 of these points.
This is the strongest form of nonlinearity we have encountered so far: take the sequence eicg(p 1 0 0) = (un )pn;=01 and form s new sequences (u(ni) )pn;=01 , each by selecting and then rotating an equal-strided subsequence of (un )pn;=01 . If only the s new sequences are all formed in a dierent way25 , the resulting s-tuples un avoid the hyperplanes in Zsp (except for at most s points which 24 25
Refer back to Denition 5.4 for the cause of this linear relationship. This is what the condition on a0 b0 : : : as;1 bs;1 in Proposition 5.5 translates to.
92
are excluded from the proposition). The same considerations as in (5.3.5) apply to these excluded points. Proof: We will proceed as we did in the proof of Proposition 5.3. Throughout this proof, all calculations are performed in Zp . Let the a0b0 : : : as;1bs;1 be mutually dierent. A hyperplane Hbc in Zsp is uniquely identied by a vector b = (b0 : : : bs;1) 2 Zsp n f0g and a scalar c 2 Zp . Due to the denition of the u(ni) , the coordinates of un are nonzero if and only if
n 62 f;a0 b0 : : : ;as;1bs;1 g : For such n, we have un 2 Hbc if and only if 0 = c; = c;
sX ;1 i=0 sX ;1
bi u(ni)
bi : i=0 ai n + bi
Clearing denominators, we get that n is a root of the polynomial
h(x) = c
sY ;1
sX ;1
i=0
i=0
(ai x + bi ) ;
bi
sY ;1
j=0 j 6= i
(aj x + bj ):
If c 6= 0, then h is a nonzero polynomial26 of degree s over Zp . Since such h has at most s roots in Zp , the hyperplane Hbc contains at most s of the un with n 62 f;a0 b0 : : : ;as;1 bs;1 g. If c = 0, i.e. if 0 2 Hbc , we get
h(x) = ;
sX ;1 i=0
bi
sY ;1
j=0 j 6= i
(aj x + bj )
To avoid trivial sequences, we assumed all the ai are nonzero in (5.4.2). Therefore the coecient of xs in the above equation is nonzero if c is. 26
93
whose degree is at most s ; 1. It remains to show that h is not the zeropolynomial. b is not the zero-vector, so one of its coordinates is nonzero. For bk 6= 0, we have
h(;ak bk ) = ;bk = bk
sY ;1
j=0 j 6= k
sY ;1
j=0 j 6= k
(;aj ak bk + bj )
aj (ak bk ; aj bj ):
bk is nonzero by the choice of k and the aj are all nonzero by assumption nally, the terms (ak bk ; aj bj ) were all assumed to be nonzero, so h is not the zero-polynomial. 2 (5.4.5) With Proposition 5.5, it is fairly obvious that the points (xn : : : xn+s;1 ) produced by the EICG(p a 0 0) = (xn )pn;=01 have no lattice structure. Applying the proposition to the special case (0 i < s) (0 i < s)
ai = a bi = i a
yields that the points xn = (xn : : : xn+s;1 ) avoid the hyperplanes in Rs (with s exceptions). The restriction to EICG(p a 0 0) does not really matter. The sequence EICG(p a 0 0) diers from EICG(p a b n0) just by a cyclic permutation and so do the corresponding s-tuples. (5.4.6) In a similar vein, Proposition 5.5 is used to prove the absence of critical distances in the points (xn xn+s ) formed from EICG(p a 0 0) = (xn )pn;=01 . For any 27 0 < s < p, the special case
a0 = a b0 = 0
a1 = a b1 = s a
yields that the points (xn xn+s ) avoid the lines in R2 (with just two points being excluded from the proposition). 27 As before, the case s = 0 is trivial and the case s p can be reduced to s0 < p with 0 s = s mod p.
94
With this, the EICG does not exhibit the special kind of of long-range correlation inherent to the LCG. Although a similar result was shown for the ICG in Proposition 5.4, the present result for the EICG is stronger. For an ICG, all except s of the points (xn xn+s ) avoid the lines in R2 for an EICG, we have the same for all but just two of these points.
5.5 Other generators (5.5.1) So far, we have presented three generators in detail { the LCG being one of the oldest and the most widely used, and the ICG and EICG being new generators. These three types were chosen because the LCG's defects are thoroughly explored and because the inversive generators can be proven to lack just these defects28 . A number of other random number generators in use today had to be excluded from this text29. Many of them have both theoretically and practically desirable as well as undesirable properties which, though sometimes dierent in the details, are similar to those we encountered so far. We refer the reader to the surveys 3], 30], 65], 69], 86], 94], or 95]. A more detailed discussion of specic random number generators is given by A*erbach in 1, Chapter 3], Knuth in 59, Section 3.2.1 and 3.2.2], Niederreiter in 87, Chapter 4, 7 { 10], Ripley in 93, Section 2.2, 2.3, and 2.5], and Weingartner in 113].
28 Regrettably, we know of no exploration of the defects inherent to inversive generators (which do of course exist see Chapter 2 and 3). We wonder what they are and for which sorts of simulation they might be considered relevant (in the sense of Section 4.2). 29 The main reason for this is the fact that we have limited our study to those generators for which the most qualitative results were available.
95
Appendix A
Expectation and convexity A.1 The idea (A.1.1) Most introductory textbooks on probability note that the expectation of certain random variables (as, say, events) can be pictured as the barycenter or center of gravity obtained by distributing mass among the possible values proportional to their probabilities. The center of gravity of a nite number of points is usually dened as follows1 .
Denition A.1 Let x0 : : : xn;1 be points in Rs . If a total mass of > 0 is distributed among them such that xi is assigned the mass i 0, then nX ;1 1 x := ixi i=0
is the corresponding center of gravity of the points x0 : : : xn;1 .
It is clear that the center of gravity x does not change when we replace by = = P 1 and i by i = hence we can assume without loss of generality ;1 i = 1. Doing so, the distribution of mass becomes a probthat = ni=0 ability distribution and the center of gravity x becomes the corresponding 1 For our purpose, it suces to use this special case of the more general center of gravity as dened by Leichtwei& in 74, Denition 3.3].
96
expectation. There is yet another name for this x: in the theory of convex sets, it is called a convex combination of the points x0 : : : xn;1. For a nite number of points, we have found three concepts from quite dierent areas which all denote the same thing: the physical concept of `center of gravity', the probabilistic concept of `expectation', and the geometric concept of `convex combination'. In this chapter, we investigate the extent of this mutual equivalence. (A.1.2) The set of all possible convex combinations which can be formed from a set A of points is called the convex hull of A.
Denition A.2 For A Rs , the set (nX ) ;1 nX ;1 conv A := ixi : n 1 xi 2 A i 0 i = 1 i=0
i=0
is called the convex hull of A.
Suppose A contains only a nite number of points. Then there are two additional interpretations of conv A besides the geometric one we used to dene it. conv A is the set of all possible centers of gravity obtained from all conceivable distributions of mass among the points in A, and conv A is the set of all expectations a random variable with values in A can assume. With the identication of `center of gravity' and `expectation' in mind and recalling the fact that even more general probability measures can be viewed as `distribution of mass', one might suspect this result holds also for more general sets: Let A Rs and x 2 Rs . There is a probability distribution on A such that x is the corresponding expectation if and only if x 2 conv A. Before we set out to prove it, we develop a more formal representation of this statement in the next section.
97
A.2 The formal statement (A.2.1) Let (+ A) be a measurable space and let
X : + ;! Rs ! 7;! (X0(!) : : : Xs;1(!)) be measurable. For a probability measure on (+ A), let E (X) := (E(X0) : : : E(Xs;1 )) be the expectation of X with respect to . Finally, let M := fprobability measures on (+ A) such that E(X) 2 Rs g E := fE(X) : 2 Mg C := conv X(+): With these conventions, we can express our claim from (A.1.2) as
Proposition A.1
E = C:
Remark: Since the set C does not depend on the -algebra A on +, we conclude that the set E of possible nite expectations of X is independent of A. (A.2.2) The idea for Proposition A.1 is derived from De Finetti's `Theory of Probability' 15]: in Section 3.4 of his book, he notes that E = C , i.e. that E is the closure of C . This is due to the fact that De Finetti uses a concept of probability measure which must be nite- but not necessarily -additive. The most widely accepted notion of probability measure due to Kolmogorov 61] which we use in this text must be -additive. Due to the requirement of -additivity, we get that E equals C instead of C . To see that the dierence between nite-additive measures and additive measures is responsible for getting E = C and E = C , respectively, consider the following example. Let X be a real-valued random variable and let ]0 1 be the set of its possible values2 . For nite-additive measures, 2 This is X : ' 7!]0 1 is measurable with respect to some -algebra on ' and the Borel -algebra on ]0 1, and X(') =]0 1. 98
we have E = 0 1] due to 15, Section 3.4] and, for -additive measures, we claim that E = ]0 1. Every number y 2]0 1 is the expectation of a point measure concentrated on y. Since each such point measure is -additive, the relation E ]0 1 holds for -additive probability measures. Now suppose there is a -additive probability measure with E (X) = 0 (This is the only case which needs consideration. Since X is positive, the expectation of X cannot be negative, and expectations 1 are handled analogously). For any positive integer n, we have 0 = E (XjX < 1=n)P (X < 1=n) + E(XjX 1=n)P (X 1=n) which implies3 that P (X 1=n) = (fX 1=ng) = 0: We get a sequence of measurable sets fX 1=ng with fX 1=ng = fX > 0g: n1
Each of the events fX 1=ng has measure 0, but the limit fX > 0g has measure 1. Thus is not continuous from below. On the other hand, we have assumed that is -additive, which implies continuity from below. Since this is a contradiction, we conclude that E(X) = 0 is impossible. The point of all this is the following: a nite-additive measure is -additive if and only if it is continuous from below. So the dierence between the nite-additive and -additive notions of `measure' concerning E is that { in this example { we get E = 0 1] in the nite-additive and E =]0 1 in the -additive case. (A.2.3) We expected a short search through the literature would certainly reveal a proof of Proposition A.1 or some similar statement. We were astonished to nd none of these. The best we can come up with are two rather unsatisfying references. The rst is Leichtwei, 74, Satz 3.6], where a special case of Proposition A.1 is shown. Second, Stoelinga apparently4 3 X is nonnegative and so are both of the conditional expectations E (Xj : : :). Since the probabilities P(: : :) are nonnegative too, both expressions in the sum above must be zero. On the other hand, we have 1=n E (XjX 1=n), which implies P(X 1=n) = 0. 4 According to 6, Section 2.6]. We would like to thank Johann Linhart for providing us with these references.
99
proved a special case from a geometric point of view in his Ph.D. thesis 108]. However, we were unable to locate his work in time moreover, locating his work would not have been sucient either since it is in Dutch.
A.3 Some utilities (A.3.1) Before we prove Proposition A.1, we recall some notions and lemmata from the elds convex geometry and measure theory. For two points a P = (a0 : : : as;1 ) and b = (b0 : : : bs;1) in Rs , we denote ;1 ai bi by . their inner product si=0 By H = Hn, we denote the hyperplane dened by the vector n 2 Rs nf0g and the constant 2 R:
Hn := fy 2 Rs : = g :
By H + and H ; , we denote the closed half-spaces dened by the hyperplane H this is Hn+ := fy 2 Rs : g and Hn; is dened analogously. Let x 2 Rs and A Rs . A hyperplane H is said to separate x and A if
A H; and
x 2 H +
or vice versa. We say that Hn strongly separates x and A if there is an > 0 such that both Hn; and Hn+ separate x and A. Finally, we say that x and A are (strongly) separable if there is a hyperplane which (strongly) separates x and A. Note that if x and A are separable, then
9n 2 Rs n f0g 8a 2 A : :
If x and A are strongly separable, then the same statement as above with just the `' replaced by ` 0. For any x 2 Hn0 and any y 2 Hnc , there is jjx ; yjj = jj(x ; x) ; (y ; x)jj and y ; x is contained in Hnc. So
d(Hn0 Hnc) = inf fjjyjj : y 2 Hnc g :
Any y 2 Hnc can be represented as the sum of two vectors y = m + y0, where m := jjncjj2 n is contained in Hnc and y0 2 Hn0. Note that, given m, y0 is uniquely determined by y. We get
jjyjj2 = = = jjmjj2 + jjy0jj2 + 2 = jjmjj2 + jjy0jj2 + 2 jjncjj2 :
Since y0 2 Hn0, the last term is equal to zero. Hence
jjyjj2 jjmjj2:
This observation and the triangle inequality yield
jjmjj jjyjj jjmjj + jjy0jj:
Assuming C Z, this lemma is similar to 39, Equation (13)]. This is an immediate consequence of Kronecker's Approximation Theorem 63, 64] see also Hlawka 55, p.3]. 11 12
121
Note that y0 2 Hn0 is not only uniquely determined by y, but any such y0 uniquely determines a point y = m + y0 in Hnc. The inmum of jjyjj taken over all points in Hnc is therefore equal to jjmjj. Hence d(Hn0 Hnc) = jjmjj = jjnc jj which means that d(HnC ) is equal to the inmum of all values of the form c 2 jjnjj for positive c 2 C . The value of the spectral test for a lattice ) can thus be written as 1 = sup 1 inf fjcj : c 2 C n f0gg : H is a cover of ) : nC
s jjnjj (B.3.4) In this formula for 1= s , we consider all covers of a lattice. We now show that it is sucient to restrict to covers of a certain kind which are uniquely dened by the lattice itself.
Proposition B.6 Let HnC be a cover of ). Then d(HnC ) > 0 if and only if for some > 0.
n 2 )? is primitive
Proof: Let HnC be a cover of ). Observe that, for any > 0, the denitions of a cover and of its spacing yield that HnC = H n C : For the `if'-part, let c0 := d(HnC ) > 0. Since C is a group and c0 is its smallest positive element (Lemma B.2 and B.3), we have C = c0Z. This and the above observation yield
HnC = Hnc0Z = H1=c0 nZ: 122
Let := 1=c0 (which is, of course, positive). Since Z = f : p 2 )g, the vector n is in the dual lattice )?. Let n 2 )? \ n]. To prove that n is primitive, we have to show that is an integer. Since n 2 )?, is an integer for any p 2 ). Because c0 is the smallest positive element in the discrete group C = c0Z, there is a p0 2 ) such that = c0. The choice of and p0 yield = 1 0
c0 =
0
so is an integer. For the `only if'-part, let n 2 )? be primitive. Then HnC = H n C is a cover of ) and C is a subgroup of Z. Since f0g 6= C (otherwise, the whole lattice ) would be contained in just one hyperplane H n0, which is impossible), it contains a smallest nonzero element. With Proposition B.3, we get
d(HnC ) = d(H n C ) =
1 jjnjj inf fjcj : c 2 C n f0gg
> 0:
2
There is, in fact, more information in the above proof than stated in Proposition B.6. In the `if'- part, we assumed that d(HnC ) > 0 and concluded that HnC = H nZ for n 2 )? being primitive. Note that the spacing of this cover is d(H nZ) = 1=jjnjj. In accordance with Knuth 59, Section 3.3.4] and Ripley 93, Theorem 2.13], we can conclude: The value of the spectral test for a lattice ) is 1 = 1
s
jjmjj
where m is the shortest nonzero vector in )?. 123
B.4 Linear transformations of lattices (B.4.1) Throughout this section, let )=
(sX ;1 i=0
kibi : ki 2 Z
)
be the s-dimensional lattice spanned by the bi and, for s t, let
T : Rs ;! Rt be linear and surjective. This mapping is uniquely determined by a t smatrix which we also call T . The image of ) under T is
T) =
(sX ;1 i=0
)
kiT bi : ki 2 Z :
In this section, we give a sucient condition on T for T ) to be a lattice. (B.4.2) The set T ) does already look quite similar to a lattice: it is made up of all integer linear combinations of the vectors T bi . The only problem is that these s vectors are not necessarily linearly independent in Rt and some of them may even be equal to 0. For another reason why T ) does not need p to be a lattice, recall our observation from (B.3.3): the set p C := fa + b 2 : a b 2 Zg is dense in R. Setting ) := Z Z and T = (1 2) yields that T ) = C is dense in R1 and therefore cannot be a lattice. (B.4.3) In some cases however, one of them being of special interest when studying congruential random number generators, T ) is a lattice.
Proposition B.7 If every row-vector of T is in )? (up to a constant scaling
factor), then T ) is a lattice.
Proof: Let every row-vector ni (0 i < t) of T be in )? (up to
a constant scaling factor). Without loss of generality, we assume that the scaling factor is 1, so ni 2 )?. We show that T ) is a discrete subgroup of Rt which is not contained in a (t ; 1)-dimensional subspace of Rt. With this and Proposition B.2, T ) is a lattice. 124
From the above representation of T ), it is clear that with any two points
x and y, it contains x y as well. Hence T ) is a group. ) is a lattice and therefore not contained in an (s ; 1)-dimensional subspace of Rs . Since T is surjective, T Rs is equal to Rt . Using the linearity of T , we get that T ) is not contained in a (t ; 1)-dimensional subspace of Rt. For a basis fb0 : : : bs;1 g of ), we have 0 1 CA (0 i < s): .. T bi = B @ . All the nj are in )?, so all the T bi are in Zt . The set T ) is made up of all integer combinations of the T bi , so T ) Zt . Finally, since Zt is discrete,
2
so is T ).
(B.4.4) We have to stress that the given condition for T ) being a lattice, although sucient, is far from being necessary. For example, consider the lattice ) := Z Z an the bijective linear transformation dened by p ! T := 11 ;p22 :
p
Since 2 is irrational, the row-vectors of T cannot be scaled to lie in )? nf0g. However, we will see that T ) is a lattice. Using the same argument as in the proof above yields that T ) is a subgroup of R2 which is not contained in a one-dimensional subspace of R2 . It remains to show that T ) is discrete. Observe that for any k 2 Z, we have p p T f(k l) : l 2 Zg = f(k + l 2 kp; l p2) : l 2 Zg = (k k) + fl( 2 ; 2) : l 2 Zg =: k(1 1) + L: It is clear that L is a discrete set. The euclidean distance of each point to its nearest neighbours in L is equal to 2. Moreover, note that L is contained in a hyperplane through the origin. We have T) = T f(k l) : l 2 Zg k2Z 125
=
k2Z
k(1 1) + L:
Each individual set k(1 1)+ L is discrete. For k 6= k0 , the set k0 (1 1)+ L is just k(1 1) + L shifted by the nonzero vector (k0 ; k)(1 1). Since these two sets are disjoint and their distance is positive, their union is discrete, too. By induction, it follows that the union of all the sets k(1 1) + L for k 2 Z, i.e. T ), is discrete. (B.4.5) The reason we have presented Proposition B.7 at all is its use for studying critical distances in congruential generators, i.e. generators ;1 whose numbers have the form xn = un =M for un M 2 Z (virtually (xn )Nn=0 all random number generators well suited for computer implementation are of this type see (5.1.2)). We will see that the existence of a lattice structure in the s-dimensional points xn = (xn : : : xn+s;1 ) leads to a lattice structure in the 2-dimensional points x0n = (xn xn+s;1 ). As noted in (5.2.6), the x0n can be written as x0n = T xn , where T is the 2 s-matrix ! 1 0 : : : 0 0 T := 0 0 : : : 0 1 : For this T , we have the following result13 .
Proposition B.8 Let ) be a lattice in Rs which contains Zs furthermore, let )0 := T ). Then
)0 is a lattice
(1)
T ((x + )) \ 0 1s) = (T x + )0) \ 0 12:
(2)
and, for any x 2 Rs ,
The meaning of the condition Zs ) is explained in
Lemma B.4 Let ) be a lattice containing Zs. Then there exists a positive integer M such that 13
) M1 Zs :
Which is not as obvious as it seems see (B.4.4).
126
Proof: First, we show that ) is made up of rational points, i.e. ) Qs ,
and then we choose M accordingly. Assume there is a lattice point p 2 ) n Qs . Since at least one coordinate of p is irrational, the points14
fkpg
(k 2 Z)
are all distinct. Since of each fkpg are between 0 and 1, p the s coordinates we have jjfkpgjj s. This means p that ) contains an innite number of points fkpg whose norm is at most s. But ) is discrete, so this cannot be. ) is made up of rational points. Next, observe that any p 2 ) can be uniquely represented as the sum of two lattice points p = bpc + fpg where fpg 2 0 1s. There is just a nite number of lattice points in 0 1s, all of which are rational. We can choose an integer M > 0 such that all the fpg are in M1 Zs. Since the integer points bpc are in M1 Zs too, the proof is complete. 2 Proof of Proposition B.8: It suces to show (1). The proof of (2) is completely equivalent to the proof of the corresponding part of Proposition 5.2. Let ) be an s-dimensional lattice and let T be dened as above. With Lemma B.4, there is an integer M > 0 such that ) 1=M Zs . Multiplying each row-vector ni (i = 0 1) of T by M , we get M n0 M n1 2 )? . Due to Proposition B.7, the points (xn xn+s;1 ) form a lattice in R2 . 2
14
See (5.2.6) for the meaning of the notation fxg and bxc for s-dimensional points x.
127
Bibliography 1] L. Aerbach. Die Gutebewertung von Pseudo-Zufallszahlen-Generatoren aufgrund theoretischer Analysen und algorithmischer Berechnungen. Grazer Mathematische Berichte, 309, 1990. 2] J. Ahrens, U. Dieter, and A. Grube. Pseudo-random numbers: a new proposal for the choice of multiplicators. Computing, 6:121{138, 1970. 3] S.L. Anderson. Random number generators on vector supercomputers and other advanced architectures. SIAM Rev., 32:221{251, 1990. 4] I. Balashazy and W. Hofmann. Particle deposition in airway bifurcations for inspiratory ow. Part II: calculations of particle trajectories and deposition patterns. Hungarian Academy of Sciences, Central Research Institute for
5] 6] 7] 8] 9] 10] 11] 12]
Physics, Budapest, 1992. W.A. Beyer, R.B. Roof, and D. Williamson. The lattice structure of multiplicative congruential pseudo-random vectors. Math. Comp., 25:345{363, 1971. T. Bonnesen and W. Fenchel. Theorie der konvexen Korper. Chelsea Publishing Company, New York, 1971. Reprint rst published by Springer, Berlin, 1934. In German. K.O. Bowman and M.T. Robinson. Studies of random number generators for parallel processing. In M.T. Heath, editor, Proc. Second Conference on Hypercube Multiprocessors, pages 445{453, Philadelphia, 1987. SIAM. P. Bratley, B.L. Fox, and L.E. Schrage. A Guide to Simulation. Springer, New York, 2nd edition, 1987. K.V. Bury. Statistical Models in Applied Science. Robert E. Krieger Publishing Company, Inc., Malabar, Florida, reprint edition, 1986. R.P. Chambers. Random number generation. IEEE Spectrum, 4:48{56, 1967. D.L. Cohn. Measure Theory. Birkhauser, Boston, 1980. A. Compagner. Denitions of randomness. Am. J. Phys., 59:700{705, 1991.
128
13] R.R. Coveyou. Serial correlation in the generation of pseudo-random numbers. J. Assoc. Comput. Mach., 7:72{74, 1960. 14] R.R. Coveyou and R.D. MacPherson. Fourier analysis of uniform random number generators. J. Assoc. Comp. Mach., 14:100{119, 1967. 15] B. De Finetti. Theory of Probability, volume 1. John Wiley, New York, 1979. 16] A. De Matteis, J. Eichenauer-Hermann, and H. Grothe. Computation of critical distances within multiplicative congruential pseudorandom number sequences. J. Comp. Appl. Math., 39:49{55, 1992. 17] A. De Matteis and B. Faleschini. Some arithmetical properties in connection with pseudo-random numbers. Boll. Unione Mat. Ital., 18:171{184, 1963. 18] A. De Matteis and S. Pagnutti. Parallelization of random number generators and long-range correlations. Numer. Math., 53:595{608, 1988. 19] A. De Matteis and S. Pagnutti. A class of parallel random number generators. Parallel Comput., 13:193{198, 1990. 20] A. De Matteis and S. Pagnutti. Long-range correlations in linear and nonlinear random number generators. Parallel Comput., 14:207{210, 1990. 21] A. De Matteis and S. Pagnutti. Long-range correlation analysis of the Wichmann-Hill random number generator. Statistics and Computing, 3:67{ 70, 1993. 22] I. Deak. Uniform random number generators for parallel computers. Parallel Comput., 15:155{164, 1990. 23] L.P. Devroye. Non-Uniform Random Variate Generation. Springer, New York, 1986. 24] U. Dieter. Autokorrelation multiplikativ erzeugter Pseudo-Zufallszahlen. Operations Research Verfahren, 6:69{85, 1968. 25] U. Dieter. How to calculate shortest vectors in a lattice. Math. Comp., 29:827{833, 1975. 26] U. Dieter. Erzeugung von gleichverteilten Zufallszahlen. In Jahrbuch Uberblicke Mathematik 1993, pages 25{44, Braunschweig, 1993. Vieweg. 27] U. Dieter and J.H. Ahrens. An exact determination of serial correlations of pseudo-random numbers. Numer. Math., 17:101{123, 1971. 28] U. Dieter and J.H. Ahrens. Uniform random numbers. Inst. f. Math. Stat., Technische Hochschule Graz, Graz, 1974. 29] E.J. Dudewicz and T.G. Ralley. The Handbook of Random Number Generation and Testing With TESTRAND Computer Code, volume 4 of American Series in Mathematical and Management Sciences. American Sciences Press, Inc., Columbus, Ohio, 1981.
129
30] W.F. Eddy. Random number generators for parallel processors. J. Comp. Appl. Math., 31:63{71, 1990. 31] J. Eichenauer and J. Lehn. A non-linear congruential pseudo random number generator. Statist. Papers, 27:315{326, 1986. 32] J. Eichenauer-Hermann. Inversive congruential pseudorandom numbers avoid the planes. Math. Comp., 56:297{301, 1991. 33] J. Eichenauer-Hermann and H. Grothe. A remark on long-range correlations in multiplicative congruential pseudo random number generators. Numer. Math., 56:609{611, 1989. 34] J. Eichenauer-Herrmann. Statistical independence of a new class of inversive congruential pseudorandom numbers. Math. Comp., 60:375{384, 1993. 35] K. Entacher. Selected random number generators in the run test. Preprint, Mathematics Institute, University of Salzburg. 36] K. Entacher. Selected random number generators in the run test II: subsequence behavior. Article in preparation, Mathematics Institute, University of Salzburg. 37] G.S. Fishman. Multiplicative congruential random number generators with modulus 2 : an exhaustive analysis for = 32 and a partial analysis for = 48. Math. Comp., 54:331{344, 1990. 38] G.S. Fishman and L.R. Moore. A statistical evaluation of multiplicative congruential random number generators with modulus 231 ; 1. J. Amer. Statist. Assoc., 77:129{136, 1982. 39] G.S. Fishman and L.R. Moore. An exhaustive analysis of multiplicative congruential random number generators with modulus 231 ; 1. SIAM J. Sci. Statist. Comput., 7:24{45, 1986. 40] M. Flahive and H. Niederreiter. On inversive congruential generators for pseudorandom numbers. In G.L. Mullen and Shiue P.J.-S., editors, Finite Fields, Coding Theory, and Advances in Communications and Computing, pages 75{80, New York, 1992. Dekker. 41] P. Frederickson, R. Hiromoto, and T.L. Jordan. Pseudo-random trees in Monte Carlo. Parallel Comput., 1:175{180, 1984. 42] H. Fuss. Simulating 'fair' random numbers. In G.C. Vansteenkiste, E.J.H. Kerckhos, L. Dekker, and J.C. Zuklervaart, editors, Proceedings of the 2nd European Simulation Congress, Sept. 9{12, 1986, pages 252{257, Belgium, 1986. Simulation Councils, Inc. 43] M. Greenberger. An a priori determination of serial correlation in computer generated random numbers. Math. Comp., 15:383{389, 1961.
130
44] M. Greenberger. Method in randomness. Comm. ACM, 8:177{179, 1965. 45] P.M. Gruber. Geometry of numbers. In P.M. Gruber and J.M. Wills, editors, Handbook of Convex Geometry, volume B, pages 739{764, Amsterdam, 1993. Elsevier Science Publishers B.V. 46] P.M. Gruber and C.G. Lekkerkerker. Geometry of Numbers. Elsevier Science Publishers B.V., Amsterdam, 2nd edition, 1987. 47] J.H. Halton. Pseudo-random trees: multiple independent sequence generators for parallel and branching computations. J. Comp. Physics, 84:1{56, 1989. 48] J.M. Hammersley and D.C. Handscomb. Monte Carlo Methods. Chapman and Hall, London, 1983. 49] F. Hartel. Zufallszahlen fur Simulationsmodelle. PhD thesis, Hochschule St. Gallen fur Wirtschafts-, Rechts- und Sozialwissenschaften, St. Gallen, 1994. 50] P. Hellekalek. General discrepancy estimates v: diaphony and the spectral test. Preprint, Mathematics Institute, University of Salzburg. 51] P. Hellekalek. Study of algorithms for primitive polynomials. Report D5H-1, CEI-PACT Project, WP5.1.2.1.2, Research Institute for Software Technology, University of Salzburg, Austria, 1994. 52] P. Hellekalek, M. Mayer, and A. Weingartner. Implementation of algorithms for IMP-polynomials. Report D5H-2, CEI-PACT Project, WP5.1.2.1.2, Research Institute for Software Technology, University of Salzburg, Austria, 1994. 53] G.W. Hill. Cyclic properties of pseudo-random sequences of Mersenne prime residues. Comput. J., 22:80{85, 1979. 54] E. Hlawka. Zur angenaherten Berechnung mehrfacher Integrale. Mh. Math., 66:140{151, 1962. 55] E. Hlawka, editor. Theorie der Gleichverteilung. Bibliographisches Institut, Mannheim, 1970. 56] L. Holmild and K. Rynefors. Uniformity of congruential pseudo-random number generators. Dependence on length of number sequence and resolution. J. Comp. Physics, 26:297{306, 1978. 57] M.H. Kalos and P.A. Whitlock. Monte Carlo Methods, volume 1: Basics. John Wiley, New York, 1986. 58] D.G. Kendall and B. Babington-Smith. Randomness and random sampling numbers. J. Royal Statist. Soc., 101:146{166, 1938. 59] D.E. Knuth. The Art of Computer Programming, volume 2: Seminumerical Algorithms. Addison-Wesley, Reading, MA, 2nd edition, 1981.
131
60] J.F. Koksma. Een algemeene stelling uit de theorie der gelijkmatige verdeeling modulo 1. Mathematica B (Zutphen), 11:7{11, 1942/1943. 61] A.N. Kolmogorov. Grundbegrie der Wahrscheinlichkeitsrechnung. In Ergebnisse der Mathematik und ihrer Grenzgebiete, volume 2, Berlin, 1933. 62] H.M. Korobov. Anwendung zahlentheoretischer Methoden auf Probleme der numerischen Mathematik. Moskau, 1963. 63] L. Kronecker. Die Periodensysteme von Functionen reeller Variablen. In Mathematische Werke, volume 3, pages 31{46, New York, 1968. Chelsea Publishing Company. Reprint. 64] L. Kronecker. Naherungsweise ganzzahlige Auosung linearer Gleichungen. In Mathematische Werke, volume 3, pages 47{109, New York, 1968. Chelsea Publishing Company. Reprint. 65] J.C. Lagarias. Pseudorandom numbers. Statistical Science, 8 :31{39, 1993. 66] G. Larcher. A class of low-discrepancy point-sets and its application to numerical integration by number-theoretical methods. Grazer Mathematische Berichte. To appear. 67] G. Larcher and C. Traunfellner. On the numerical integration of Walsh-series by number-theoretical methods. Math. Comp., 63:277{291, 1994. 68] P. L'Ecuyer. Ecient and portable combined random number generators. Comm. ACM, 31:742{774, 1988. 69] P. L'Ecuyer. Random numbers for simulation. Comm. ACM, 33:85{97, 1990. 70] H. Leeb. On the digit test. In P. Hellekalek, G. Larcher, and P. Zinterhof, editors, Tagungsband zum 1. Salzburger Minisymposium uber Pseudozufallszahlen und Quasi-Monte Carlo Methoden am 18. Nov. 1994, Salzburg. Mathematics Institute, University of Salzburg. To appear. 71] H. Leeb. Selected random number generators in the digit test. Preprint, Mathematics Institute, University of Salzburg. 72] H. Leeb. Selected random number generators in the digit test II: subsequence behavior. Article in preparation, Mathematics Institute, University of Salzburg. 73] D.H. Lehmer. Mathematical methods in large-scale computing units. In Proc. 2nd Sympos. on Large-Scale Digital Calculating Machinery, Cambridge, MA, 1949, pages 141{146, Cambridge, MA, 1951. Havard University Press. 74] K. Leichtwei. Konvexe Mengen. Springer, Berlin, 1980. 75] R. Lidl and H. Niederreiter. Finite Fields. Addison-Wesley, Reading, MA,
1983.
132
76] M.D. MacLaren and G. Marsaglia. Uniform random number generators. J. Assoc. Comput. Mach., 12:83{89, 1965. 77] N.M. MacLaren. A limit on the usable length of a pseudorandom sequence. J. Statist. Comput. Simul., 42:47{54, 1992. 78] G. Marsaglia. Random numbers fall mainly in the planes. Proc. Nat. Acad. Sci. USA, 61:25{28, 1968. 79] G. Marsaglia. Regularities in congruential random number generators. Numer. Math., 16:8{10, 1970. 80] G. Marsaglia. A current view of random number generators. In L. Billard, editor, Computer Science and Statistics: The Interface, pages 3{10, Amsterdam, 1985. Elsevier Science Publishers B.V. 81] N. Metropolis and S.M. Ulam. The Monte Carlo method. J. Amer. Statist. Assoc., 44:335{341, 1949. 82] F. Neuman and C.F. Martin. The autocorrelation structure of Tausworthe pseudo-random number generators. IEEE Trans. Comput., C-25:460{464, 1976. 83] F. Neuman and R. Merrick. Autocorrelation peaks in congruential pseudorandom number generators. IEEE Trans. Comput., C-25:457{460, 1976. 84] J. von Neumann. Memorandum on the program of the high speed computer. Institute of Advanced Study, Princeton, NJ, 1945. 85] H. Niederreiter. On a new class of pseudorandom numbers for simulation methods. J. Comput. Appl. Math. To appear. 86] H. Niederreiter. Recent trends in random number generation and random vector generation. Ann. Oper. Res., 31:323{345, 1991. 87] H. Niederreiter. Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia, 1992. 88] H. Niederreiter. oral communication, Nov. 18, 1994. 89] W. Parry. Topics in Ergodic Theory. Cambridge University Press, Cambridge, MA, 1981. 90] P. Peach. Bias in pseudo-random numbers. J. Amer. Statist. Assoc., 56:610{ 618, 1961. 91] K. Petersen. Ergodic Theory. Cambridge University Press, Cambridge, MA, 1983. 92] B.D. Ripley. The lattice structure of pseudo-random number generators. Proc. Roy. Soc. London Ser. A, 389:197{204, 1983. 93] B.D. Ripley. Stochastic Simulation. John Wiley, New York, 1987.
133
94] B.D. Ripley. Uses and abuses of statistical simulation. Mathematical Programming, 42:53{68, 1988. 95] B.D. Ripley. Thoughts on pseudorandom number generators. J. Comput. Appl. Math., 31:153{163, 1990. 96] B.D. Ripley and M.D. Kirkland. Iterative simulation methods. J. Comput. Appl. Math., 31:165{172, 1990. 97] A. Rotenberg. A new pseudo-random number generator. J. Assoc. Comput. Mach., 7:75{77, 1960. 98] S.S. Ryshkov. On the reduction theory of positive quadratic forms. Dokl. Akad. Nauk SSSR, 198:1028{1031, 1971. = Soviet Math. Dokl. 12, 946{950 (1971). 99] S.S. Ryshkov. On Hermite, Minkowski and Venkov reduction of positive quadratic forms in n variables. Dokl. Akad. Nauk SSSR, 207:1054{1056, 1972. = Soviet Math. Dokl. 13, 1676{1679 (1973). 100] G. Sawitzki. Another random number generator which should be avoided. Statist. Soft. Newsl., 11:81{82, 1985. 101] W.C. Schmid. Zur numerischen Integration von Walsh-Reihen. Master's thesis, University of Salzburg, 1993. 102] R. Schneider. Convex Bodies: The Brunn{Minkowski Theory. Press Syndicate of the University of Cambridge, Cambridge, MA, 1993. 103] C.P. Schnorr. Zufalligkeit und Wahrscheinlichkeit, volume 218 of Lecture Notes in Math. Springer, Berlin, 1971. 104] C.S. Smith. Multiplicative pseudo-random number generators with prime modulus. J. Assoc. Comput. Mach., 18:587{593, 1971. 105] I.M. Sobol'. The distribution of points in a cube and the approximate evaluation of integrals. Zh. Vychisl. Mat. i Mat. Fiz., 7:784{802, 1967. In Russian. 106] I.M. Sobol'. Multidimensional Quatrature Formulas and Haar Functions. Izdat. "Nauka", Moscow, 1969. In Russian. 107] H. Stegbuchner. Zur quantitativen Theorie der Gleichverteilung mod 1. Arbeitsberichte, Mathematisches Institut der Universitat Salzburg, Salzburg, Austria, 1980. 108] T.G. Stoelinga. Convexe Puntverzamelingen. PhD thesis, Groningen, 1932. See Zentralblatt fur Mathematik und ihre Grenzgebiete, volume 5, pages 256{ 257, Springer, Berlin, 1933. 109] R.C. Tausworthe. Random numbers generated by linear recurrence modulo two. Math. Comp., 19:201{209, 1965.
134
110] W.E. Thomson. A modied congruence method for generating pseudorandom numbers. Comp. J., 1:83{86, 1958. 111] P. Walters. Ergodic Theory { Introductory Lectures, volume 458 of Lecture notes in Math. Springer, Berlin, 1975. 112] S. Wegenkittl. Empirical testing of pseudorandom number generators. Master's thesis, University of Salzburg, 1995. In preparation. 113] A. Weingartner. Nonlinear congruential pseudorandom number generators. Master's thesis, University of Salzburg, 1994. 114] B.A. Wichmann and I.D. Hill. An ecient and portable pseudo-random number generator. Appl. Statist., 31:188{190, 1982. 115] B.A. Wichmann and I.D. Hill. Building a random-number generator. Byte, 12:127{128, 1987. 116] S.K. Zaremba. The mathematical basis of Monte Carlo and quasi-Monte Carlo methods. SIAM Rev., 10:303{314, 1968. 117] P. Zinterhof. U ber einige Abschatzungen bei der Approximation von Funktio nen mit Gleichverteilungsmethoden. Sitzungsber. Osterr. Akad. Wiss. Math.Natur. Kl. II, 185:121{132, 1976.
135
Curriculum vitae Name: Hannes Leeb Date of birth: September 4, 1968 Place of birth: Klagenfurt, Austria Parents: Peter and Barbara Leeb Graduations:
1979: Volksschule in Ebene Reichenau 1987: Matura, Gymnasium BRG in Salzburg
136