Efficient generation of random vectors by using the

0 downloads 0 Views 200KB Size Report
Interval analysis uses interval arithmetic to represent real numbers rather than ... Then, m is rounded up in the last accurate decimal place to provide an optimal rounded ..... response variables yi have mean and variance. This model .... programs that maintain their reasonable speed regardless of the setup cost are obtained.
Efficient generation of random vectors by using the ratio-of-uniforms method with ellipsoidal envelopes C. J. P´erez1 , J. Mart´ın1 , C. Rojano2 , and F. J. Gir´on2 1

Departamento de Matem´aticas, Universidad de Extremadura, Avda. de la Universidad s/n, C´aceres, Spain, 10071 2 Departamento de Estad´ıstica e Investigaci´on Operativa, Universidad de M´alaga, Campus de Teatinos s/n, M´alaga, Spain, 29071 Abstract: Stochastic simulation is widely used to validate procedures and provide guidance for both theoretical and practical problems. Random variate generation is the basis of stochastic simulation. Applying the ratio-of-uniforms method to generate random vectors requires the ability to generate points uniformly in a suitable region of the space. Starting from the observation that, for many multivariate distributions, the multidimensional objective region can be covered by a hyper-ellipsoid more tightly than by a hyper-rectangle, a new algorithm to generate from multivariate distributions is proposed. Due to the computational saving it can produce, this method becomes an appealing statistical tool to generate random vectors from families of standard and nonstandard multivariate distributions. It is particularly interesting to generate from densities known up to a multiplicative constant, for example, from those arising in Bayesian computation. The proposed method is applied and its efficiency is shown for some classes of distributions. Key Words: Acceptance-rejection, Ratio-of-uniforms, Stochastic simulation.

1

Introduction

The role of simulation in all sciences has increased in importance during the past several years. Today, truly complex models often can only be computationally handled by simulation-based techniques. In particular, simulation has become an essential tool in Bayesian computation. At the core of a simulation method is random variate generation. For this reason, many generation techniques have been developed. Some references on random variate generators are Devroye (1986), Ripley (1987) and Gentle (1998). Markov Chain Monte Carlo (MCMC) methods are also procedures to generate samples from multivariate distributions (see, e.g., Gilks et al. (1998)). These methods have meant a real revolution in Bayesian statistics. However, a critical issue is the convergence detection. Exact simulation (in the sense that samples are generated from the true distribution and not from an approximation of it) is preferred when it is possible and efficient. Nevertheless, MCMC methods are sometimes the only available alternative. Simulation programs usually provide a collection of random variate generators for many standard distributions. However, the situation changes considerably when the user is interested in nonstandard distributions. The current research trend is to develop algorithms that are valid for large families of distributions. The programs based on these algorithms are called universal (also automatic or black-box) generators. In the last decade, some universal algorithms have been developed (see H¨ ormann et al. (2004) for a monograph on this topic).

1

A general exact variate generation technique is the ratio-of-uniforms method, proposed firstly by Kinderman and Monahan (1977). The idea behind this method is very simple: each component of the variate can be obtained by calculating the ratio of coordinates of a point uniformly generated in a suitable region of the space. This method is very appealing, since it produces simple algorithms. It is particularly interesting in Bayesian inference because most Bayesian computation is focused on the calculation of posterior expectations. By using this method, it is not necessary to calculate the normalization constant. The main problem in the application of the ratio-of-uniforms method is the uniform generation in the objective region. When an arbitrary bounded region is obtained, acceptance-rejection methods must be used. In this case, the region is enclosed within a bounding hyper-rectangle. In multidimensional problems, low acceptance rates are often obtained, so this method becomes computationally inefficient. This is the reason why the ratio-of-uniforms method is not used as often as desirable in multivariate settings. This paper proposes a new efficient algorithm for sampling from multivariate distributions utilizing the ratio-of-uniforms method. The outline of the paper is as follows. In Section 2, some background about the ratio-of-uniforms method is presented. The generation method is proposed in Section 3. Section 4 shows some computational issues of the proposed approach. In Section 5, some illustrative examples showing significant increases in efficiency in terms of acceptance rates are presented. Finally, the main conclusions are drawn in Section 6.

2

Ratio-of-uniforms method

As mentioned before, the original version was proposed by Kinderman and Monahan (1977). Later, several improvements were proposed. Barbu (1982) presented two new methods based on transformations of uniform random numbers. Vaduva et al. (1983) and Vaduva (1984) generalized them to the multivariate case. Stefanescu and Vaduva (1987) presented a class of general algorithms finding as particular cases the results given in Kinderman and Monahan (1977) and Vaduva (1984). Wakefield et al. (1991) introduced a generalized ratio-of-uniforms method and showed that the relocation of the required density via the mode can greatly improve the computational efficiency of the method. Finally, see Barabesi (1993) for an extensive review and Jones and Lunn (1996) for an interesting contribution with a pedagogical aim. Without loss of generality, the following version from Wakefield et al. (1991) will be used in this paper. Theorem 1 Suppose h is a positive integrable function over V, a subset of Rk . Suppose further that the vector X = (X1 , X2 , . . . , Xk+1 ) is uniformly distributed over { [ ( )]1/(rk+1) } x2 xk+1 Ch,r = (x1 , x2 , . . . , xk+1 ) : 0 < x1 ≤ h ,..., r , xr1 x1 where r ≥ 0. Then V = (V1 , V2 , . . . , Vk ), where Vi = Xi+1 /X1r , has density f = h/



h.

The original version is recovered when k = r = 1. In this paper, the objective region Ch,r will be referred to as Ch , because r will be a fixed quantity (usually r = 1 or r = 1/2) and the generated samples do not depend on it. For many distributions, the region Ch is bounded and convex. Properties of Ch can be found in H¨ ormann et al. (2004). The basic requirement to apply this method is the ability to generate points uniformly in Ch . Except for particular regions (spheres, rectangles, simplices,. . .), uniform generation is not straightforward. When the region is bounded, a general technique consists in using the minimal bounding hyper-rectangle containing Ch (see Wakefield et al. (1991) for conditions and formulae to calculate the hyper-rectangle). Then, the

2

traditional acceptance-rejection technique can be used to generate points uniformly in Ch . The remaining problem is that the acceptance rate decreases exponentially as the dimension rises. The following example is very illustrative in this sense. Let Sk be the unit sphere in dimension k. The minimal bounding hyperrectangle containing Sk is [−1, 1]k , so the ratio of the volume of the unit sphere to the volume of the rectangle is given by: π k/2 (k ) , Γ 2 + 1 2k and, therefore, the volume of Sk is a vanishingly small proportion of the volume of [−1, 1]k as the dimension k rises. This fact suggests that tighter fitting enclosing regions should be used in order to increase the acceptance rates. Possible choices are union of some non-overlapping subregions (Kinderman and Monahan (1977)), parallelograms (Cheng and Feast (1979)) and polygons (Dagpunar (1988)), among others. Leydold (2000) presented an algorithm that uses polygonal envelopes and squeezes. The usefulness of this technique is its capacity to construct envelopes and squeezes automatically for large classes of univariate distributions including all log-concave distributions, resulting in very fast algorithms. This method is, in some sense, equivalent to do the transformed density rejection technique. The next section proposes an algorithm that uses an hyper-ellipsoid as the enclosing region.

3

Random vector generation

Sometimes, multidimensional regions Ch coming from the application of the ratio-of-uniforms method can be included within a hyper-ellipsoid with a better acceptance rate than within a hyper-rectangle. This fact motivates a strategy that allows for improvements in terms of acceptance rates. Assume that the (k + 1)−dimensional region Ch is bounded (sufficient conditions can be found in Wakefield et al. (1991)) . The acceptance-rejection technique will be used to generate points uniformly in Ch . Firstly, points will be generated uniformly in an hyper-ellipsoid1 and then, the points being in the objective region Ch will be stored. In order to generate uniformly in ellipsoids, Pearson type II distributions are used. These distributions are easy to generate from, as it is shown in Johnson (1987). The p-dimensional Pearson type II distribution is defined in Kotz (1975). It is denoted by P T IIp (µ, Σ, s), and its density function is: g(x; µ, Σ, s) =

( )s Γ(p/2 + s + 1) |Σ|−1/2 1 − (x − µ)′ Σ−1 (x − µ) , p/2 Γ(s + 1)π

having as support the region defined by (x − µ)′ Σ−1 (x − µ) ≤ 1 and shape parameter s > −1. The parameters µ and Σ are location and scale parameters, respectively. The expectation and covariance matrix (see, e.g., Johnson (1987)) are given by: E[X] Cov[X]

=

µ,

(1)

=

Σ Λ = . 2s + p + 2

(2)

When choosing s = 0, a uniform distribution over the ellipsoid Ep (µ, Σ) = {x ∈ Rp : (x − µ)′ Σ−1 (x − µ) ≤ 1} is obtained. 1 From

now on, the hyper-ellipsoid and the hyper-rectangle will be referred to as ellipsoid and rectangle, respec-

tively.

3

b such ˆ and Σ A (k + 1)−dimensional Pearson type II distribution, with s = 0 and suitable estimations µ ˆ covers the region Ch , is considered. The ellipsoid must also have minimum that the ellipsoid Ek+1 (ˆ µ, Σ) volume in order to achieve a high acceptance rate. The argument for building the ellipsoid is as follows. For arbitrary m > 1, µ and Λ set: Ek+1 (µ, mΛ)

=

{x ∈ Rk+1 : (x − µ)′ (mΛ)−1 (x − µ) ≤ 1} =

=

{x ∈ Rk+1 : (x − µ)′ Λ−1 (x − µ) ≤ m}.

Now, for each x ∈ Ch , it is required that x ∈ Ek+1 (µ, mΛ). So m is chosen such that: m ≥ max (x − µ)′ Λ−1 (x − µ). x∈Ch

(3)

The smallest ellipsoid that satisfies (3) is obtained when: m = max (x − µ)′ Λ−1 (x − µ). x∈Ch

(4)

Note from equation (2) (Σ = mΛ, s = 0, p = k + 1) that m ≥ 3 + k. In fact, the closer m (satisfying inequality (3)) is to 3 + k, the greater the acceptance probability is. The theoretical acceptance probability is the ratio of the volume of the objective region to the volume of the ellipsoid, i.e.: pe =

Γ[(k + 3)/2] (rk +



h(v)dv

V √ . 1)(πm)(k+1)/2 |Λ|

(5)

b with some (slight) loss in efficiency. Then, m ˆ and Λ, In order to solve (4), µ and Λ are replaced by µ b in place of µ and Λ. Note that this does not ˆ and Λ is obtained so that (3) is guaranteed to hold for µ b it is proposed to generate a pilot ˆ and Λ, compromise the exactness of the algorithm. In order to set µ random sample uniformly in the objective region. This can be done by using the traditional acceptancerejection method, i.e., enclosing the objective region within a bounding (k + 1)−dimensional rectangle. Note that this step may be computationally costly. However, only a sample of small size is necessary. This sample will not only be used in the estimation of the parameters but also in the generation of samples from the density of interest. The proposed estimators for a practical implementation are: µ ˆ

=

x ¯,

ˆ Λ

=

n 1∑ (xi − x ¯)(xi − x ¯ )′ , n i=1

b where Σ b = mΛ. ˆ leading to the ellipsoid of interest Ek+1 (ˆ µ, Σ), The acceptance probability given in (5) can be estimated by dividing the number of points generated b in the objective region Ch by the total number of points generated in the ellipsoid Ek+1 (ˆ µ, Σ). The following algorithm is recommended to be applied when the acceptance rate in the traditional acceptance-rejection technique is low and a pilot sample of small size can be generated in Ch . This is usual for many multivariate distributions and specially in low and moderate dimensions (some illustrative examples are shown in Section 5). The inputs of the algorithm are a density known up to a normalization ∫ constant h and the value of r, whereas the output is a random sample from f = h/ h. Algorithm 1 Follow the next steps: 1. Generate a pilot uniform random sample in Ch by traditional acceptance-rejection. b ˆ and Λ. 2. Calculate the estimations µ

4

b = mΛ. ˆ 3. Compute m and set Σ 4. Repeat steps 4-6 until the necessary sampling size has been obtained. b 0). ˆ Σ, 5. Generate x ∼ P T IIk+1 (µ, 6. (a) If x ∈ Ch then x is a random vector uniformly generated in Ch . (b) Otherwise return to step 4. 7. Take (x2 /xr1 , x3 /xr1 , . . . , xk+1 /xr1 ). The next section shows some computational issues related to the optimization problem and other topics of the proposed approach.

4

Computational issues

Each variate generator has a certain fixed problem-specific computational setup cost and a certain marginal cost for the generation of new draws. In this approach, the setup cost is related to the computation of the ellipsoid. It is composed by the sum of two partial costs: the uniform generation of the presample (pilot sample) and the calculation of the solution for the constrained global optimization problem. The extra setup cost with respect to the traditional ratio-of-uniforms comes from solving the optimization problem, since the presample is obtained by running the traditional acceptance-rejection technique on the rectangle and can be used to obtain samples from the density of interest. Therefore, the generation of the presample can not be considered as an extra computational effort in relation to the traditional method. With respect to the optimization problem, it is remarkable that it has many characteristics that make it easy to solve. The optimization problem is given by: ˆ −1 (x − µ), max (x − µ) ˆ ′Λ ˆ x s.t. xrk+1 − h(x2 /xr1 , x3 /xr1 , . . . , xk+1 /xr1 ) ≤ 0, 1

(6)

0 < x1 ≤ b1 , ai ≤ xi ≤ bi , i = 2, . . . , k + 1,

(7)

where the bounds in (7) are the ones previously obtained for the minimal bounding rectangle. Note that the information provided by the bounds in (7) is redundant, since a point satisfying the constraint given in (6) also satisfies the bounds given in (7). Nevertheless, this information is useful to solve the problem by applying several optimization techniques, including the one recommended here. Note also that the maximization can be similarly considered on ∂Ch . Then, the constraint (6) must be substituted by: xrk+1 − h(x2 /xr1 , x3 /xr1 , . . . , xk+1 /xr1 ) = 0. 1

(8)

This optimization problem consists of a continuous (infinitely differentiable) objective function constrained to a compact region determined by (6). This fact assures the existence of, at least, a maximum point. Even more, the maximum(a) point(s) is (are) achieved at the boundary given by (8). Sometimes, the resulting problem can be solved analytically in an exact way. This could be practical for some lowdimensional problems, but this is not the usual case. Several constrained global optimization techniques can be applied to solve this problem. However, if the appropriate optimization technique is not chosen, it might happen that m is underestimated, leading to an ellipsoid which might not completely contain the objective region. This problem is overcome by imposing some regularity conditions to the density known up to a proportionality constant h, so that an efficient implementation of the interval-arithmetic global optimization method can be derived.

5

Until fairly recently, it was thought that no numerical algorithm could guarantee to have found the global solution of a general nonlinear optimization problem. This is probably true for algorithms using standard arithmetic, but not for algorithms based on interval arithmetic. Several authors independently had the idea of bounding the errors by computing with intervals. However, interval analysis can be considered to have begun with the work of Moore (1966), who transformed this simple idea into a viable tool for error analysis. Interval analysis uses interval arithmetic to represent real numbers rather than floating point arithmetic. An interval is defined by a lower and an upper bound and semantically means the range of all real numbers enclosed by these bounds. By using interval arithmetic, the optimization problems can be solved with a guarantee that the computed bounds on the location and value of a solution are numerically correct. If there are multiple solutions, all will be found and correctly bounded. It is also guaranteed that the solution(s) is (are) global and not just local (see, e.g., Ratschek and Rokne (1988) and Hansen and Walster (2004)). Interval methods for global optimization exist that do not require differentiability (see Ratschek and Rokne (1988) or Moore et al. (1992)). However, these methods are much slower than those that take advantage of it. In order to speed up the computation considerably, it must be assumed that the objective function is twice continuously differentiable and the constraint function is continuously differentiable (see Hansen and Walster (2004) for theoretical results). In the optimization problem addressed here, the objective function is always twice continuously differentiable, since it is a quadratic function, whereas the density known up to a proportionality constant h must be assumed to be continuously differentiable. This is the condition that must be required of h so that the constraint function is continuously differentiable and the optimization problem can be efficiently solved. Note that this condition is not very restrictive because it is satisfied by many multivariate densities (Cauchy, Normal, t, . . .). Then, if it is held, a possible algorithm consists of dividing the region successively into subregions and estimating the lower and upper bounds of the objective function on each subregion. By discarding subregions where the global solution can not exist and applying the interval Newton’s method to solve the Lagrange equations, the global solution with a rigorous error bound is always found. The interval form of Newton’s method is used to achieve rapid convergence. Besides the previous implementation, there exists some specialized global optimization software that uses interval analysis to obtain proven global solutions. An interesting example of this is the Branch And Reduce Optimization Navigator (BARON), that can handle nonlinear optimization problems up to thousands of constraints and variables (see Tawarmalani and Sahinidis (2004) and Sahinidis (2005)). This software implements a global optimization algorithm of the branch-and-bound type, combined with interval analysis, in a framework containing convergence acceleration techniques. BARON derives its name from its combining interval analysis and duality in its “reduce” arsenal with enhanced “branch and bound” concepts, as it winds its way through the hills and valleys of complex optimization problems in search of global solutions. Note also that it is supported by a deterministic algorithm and not by a stochastic one. The optimization problem can be solved through an easy-to-use GAMS/AMPL-like interface. For more information about BARON, see Sahinidis (2005) and references therein. In the case of the optimization problem addressed here, the nature of the objective function and the condition imposed on h facilitate finding the global solution with an affordable cost. Even more, our empirical experience in the optimization problems arising from the application of the ratio-of-uniforms method is that the global solution is generally found with many other solvers that do not use interval arithmetic. However, due to the characteristics of the problem, it is recommended to use interval-based optimization techniques to assure that a bound for the global maximum value can be achieved with

6

certainty. In practice, the user can apply interval-based optimization techniques to find m with a certain decimal accuracy. Then, m is rounded up in the last accurate decimal place to provide an optimal rounded value denoted by m∗ . It is recommended to use two-decimal place accuracy. Note that m is always greater than 3 + k, so the change produced by rounding up in the second decimal place does not mean a significant part of the value of m, and what is more important, it has a minimal effect on the acceptance rate. Getting more than two accurate decimal places in the optimization problem leads to a computational effort that gives almost no significant improvement in terms of acceptance rates. This setup cost can be tolerated when a large number of draws from the same distribution are needed. For example, in drawing from a conditional distribution in a Gibbs sampler, only one or at most a small number of draws will be needed, so this method is not recommended to be applied. However, it can be applied to generate observations from a candidate density in the Metropolis-Hastings independence sampler. The computational saving produced when generating new draws is the main advantage of the proposed approach. The marginal cost consists of generating a point uniformly in the ellipsoid and examining if it is included in the objective region. The process is analogous for the rectangle in the traditional approach. Most of the computation time comes from examining the inclusion of the point in the objective region. Then, the marginal cost can be considerably reduced by using the proposed approach. The proposed method can be applied combined with other techniques to improve the procedure in terms of acceptance rates, as in the case of relocating the density via the mode. Also the execution time can be diminished by using specific properties of distribution families, as in the case of densities leading to convex objective regions. Then, the convex hull composed by the points of the pilot sample is fully included in the objective region. This fact can be used to simplify and speed up the checking process to determine whether a point is inside the objective region or not. Note that for every log-concave density and r = 1, the objective region is convex (see H¨ ormann et al. (2004)). Up to now an analogous (simple) condition for the convexity of the objective region with r ̸= 1 is not known. This approach can lead to computationally efficient programs, valid for diverse families of multivariate distributions. Empirical proofs on a wide range of standard multivariate distributions (e.g. Cauchy, Normal, t,...) and nonstandard ones (e.g. those arising in the Bayesian context) show that an important computational saving is obtained. The following section illustrates the application of the method for some classes of distributions.

5

Illustrative examples

The proposed method is applied to some examples that have been obtained from the literature on stochastic simulation. The objective of this section is to illustrate how the method is applied and to show the improvements obtained with respect to the traditional approach. The first example was studied by Chen et al. (2000) to sample from a correlation. The involved one-dimensional distribution leads to an ellipsoid-like region that is graphically presented for illustration purpose. The second example was studied by Wakefield et al. (1991) to extend and improve the basic ratio-of-uniforms method. The involved distributions are bidimensional and they serve to show how the proposed approach can be applied in combination with other techniques such as relocating the density via the mode. Finally, the third example was considered by Evans and Swartz (1995) to approximate 10-dimensional integrals that arise in the Bayesian analysis of a linear statistical model. This example, that involves a multivariate posterior distribution, is used to show how the proposed approach can be applied in Bayesian contexts with dimensions higher than 1 or 2.

7

Example 1. Sampling a correlation ρ. Assume that D = {yi = (y1i , y2i )′ , i = 1, 2, . . . , n} is a random sample from a bivariate normal distribution N2 (0, ∆), where

( ∆=

1

ρ

ρ

1

) .

Assuming a uniform prior U (−1, 1) for ρ, the posterior density for ρ is given by: { } 1 2 −n/2 p(ρ|D) ∝ (1 − ρ ) exp − (S11 − 2ρ S12 + S22 ) , 2(1 − ρ2 ) ∑ where −1 < ρ < 1 and Srs = n i=1 yri ysi for r, s = 1, 2.

(9)

Generating from (9) is not trivial. Chen et al. (2000) studied this problem and considered a MetropolisHastings algorithm with a “de-constraint” transformation to sample ρ. The values S11 = 106.538, S12 = 78.043, and S22 = 108.291 are obtained from n = 100 observations. In order to generate from p(ρ|D), the knowledge of the normalization constant is not required. Without loss of generality, the second member in (9) is multiplied by 1032 (in order to avoid a very low order of magnitude) and, therefore, the function h proportional to the density is: { } 78.043 ρ − 107.415 h(ρ) = 1032 (1 − ρ2 )−50 exp . (1 − ρ2 ) Note that h(ρ) is continuously differentiable for −1 < ρ < 1. Therefore, the constraint function in the optimization problem is continuously differentiable. The region Ch (with r = 1) is included in the rectangle I = [0, 6.973] × [−4.375 · 10−10 , 4.969]. The acceptance rate obtained by the traditional acceptance-rejection method is pa = 0.0711. This is a very low value. However, a pilot sample of small size can be generated. A pilot sample of size 500 is generated and the following estimations are obtained: µ ˆ b Λ

=

=

( ) 3.804 2.678 ( 2.853 2.037

, 2.037 1.468

) .

It is obtained that the optimal value with two accurate decimal places is m = 5.37. Then, by using b is: m∗ = 5.38, the estimation for Σ ( ) 15.350 10.958 b Σ= . 10.958 7.896 b fully contains the objective region. The shadowed region The ellipsoid constructed by using µ ˆ and Σ ˆ are also drawn. in Figure 1 represents Ch . The boundaries of I and E2 (µ, ˆ Σ) Now, the new acceptance rate is: pe =

Vol(Ch ) = 0.7396, ˆ Vol(E2 (ˆ µ, Σ))

which highly improves the one obtained by the traditional acceptance-rejection method, i.e., pa = 0.0711. Now, the acceptance probabilities by using relocation to the mode, as in Wakefield et al. (1991), are presented. The acceptance probability for the proposed approach goes from 0.7396 (without relocation) to 0.7664 (with relocation). Since the own objective region has ellipsoidal shape, relocation is not very helpful. The opposite happens for the traditional approach, the acceptance probability goes from 0.0711

8

6 5 4 3 0

1

2

X2

0

1

2

3

4

5

6

7

8

X1

ˆ Figure 1: Region Ch (shadowed) and boundaries of I and E2 (ˆ µ, Σ). (without relocation) to 0.7317 (with relocation). Comparing both probabilities with relocation, it is seen that the improvement is not significant. The next example shows how the proposed approach obtains significant improvements by using relocation. Example 2. Bivariate distributions. Suppose h(v1 , v2 ) is of bivariate normal × bivariate student form. Such forms arise as prior × likelihood specification in certain Bayesian models. In particular, suppose the bivariate normal density is proportional to the function

{ h1 (v1 , v2 ) = exp −

} ( 1 2 2) (y − v ) − 2ρ (y − v )(y − v ) + (y − v ) , 1 1 1 1 1 2 2 2 2 2(1 − ρ21 )

and the bivariate student distribution has two degrees of freedom and is proportional to the function ( )−2 1 2 2 h2 (v1 , v2 ) = 1 + (v − 2ρ v v + v ) , 2 1 2 1 2 2(1 − ρ22 ) where y1 , y2 , ρ1 and ρ2 are constants and h = h1 h2 . Note that h is continuously differentiable. Therefore, the constraint function in the optimization problem is also continuously differentiable. Let pa denote the acceptance rate obtained by Wakefield et al. (1991) from a simulation of size 125000 using r = 1/2 in Theorem 1. Let pe denote the acceptance rate obtained when generating exactly 30000 values uniformly in the objective region by using the proposed method. For each case, a pilot sample of size 1000 has been generated and the optimal rounded up value m∗ has been obtained. Table 1 shows the acceptance rates for some cases. Except for two cases, the proposed method yields better acceptance rates. The reason is that a significant part of the ellipsoid volume is placed into the region {(x1 , x2 , x3 ) : x1 ≤ 0}, which is not

9

y1

y2

ρ1 = 0

ρ2 = 0

ρ1 = 0.5

ρ2 = 0.5

ρ1 = 0.9

ρ2 = 0.9

pa

pe

pa

pe

pa

pe

0

0

0.522

0.676

0.452

0.724

0.227

0.658

1 1

1 −1

5

−5

0.509 0.509 0.126

0.631 0.683 0.223

0.432 0.465 0.116

0.651 0.629 0.095

0.216 0.254 0.062

0.706 0.519 0.035

Table 1: Acceptance rates. feasible. A possible solution is the relocation of the mode. All the remaining 10 cases show a significant increase in terms of acceptance rates. Let pca and pce be the acceptance rates with relocation obtained by Wakefield et al. (1991) and the proposed method, respectively. The results are shown in Table 2.

y1

y2

ρ1 = 0

ρ2 = 0

ρ1 = 0.5

ρ2 = 0.5

ρ1 = 0.9

ρ2 = 0.9

pca

pce

pca

pce

pca

pce

0 1

0 1

0.522 0.520

0.676 0.636

0.452 0.441

0.724 0.708

0.227 0.220

0.658 0.664

1 5

−1 −5

0.520 0.533

0.705 0.763

0.478 0.429

0.653 0.767

0.277 0.196

0.709 0.732

Table 2: Acceptance rates with relocation. Table 2 shows that the proposed method with relocation to the mode gives higher acceptance rates than the approach in Wakefield et al. (1991) that also uses relocation. The increase in acceptance rate is particularly important for cases with ρ1 = ρ2 = 0.9. The acceptance rates for these cases range between 0.196 and 0.277 by using the traditional method, whereas they range between 0.658 and 0.732 by using the proposed method. Example 3. Bayesian linear model. Suppose that data (X, y) are observed, where X ∈ R45×9 has xij = 1 for 5(j − 1) + 1 ≤ i ≤ 5j and xij = 0 otherwise, and y ∈ R45 . The statistical model is then specified by y = Xβ + σz, where β ∈ R9 , σ ∈ (0, +∞) and the error z ∈ R45 is a sample of size 45 from a distribution in the family S = {Student∗ (ν) : ν ∈ (2, +∞)}, where Student∗ (ν) denotes the Student distribution with ν degrees of freedom standardized to have variance 1, i.e. the density is given by ( ) ( )−(ν+1)/2 Γ ν+1 1 z2 2 √ gν (z) = ( ν ) ( 1 ) 1 + ν−2 Γ 2 Γ 2 ν−2 for −∞ < z < +∞. Therefore there are nine location parameters βi , 1 ≤ i ≤ 9, a scale parameter σ and a shape parameter ν. In this example, Evans and Swartz (1995) assumed that ν was known and made the assignment ν = 3. The restriction in the degrees of freedom for the Student distribution ensures that all response variables yi have mean and variance. This model corresponds to five independent observations

10

from each of the nine distributions specified by (βi , σ) for i = 1, 2, . . . , 9. The j-th observation from the i-th distribution is denoted by yij . Choosing the improper prior π(β, σ) = π(β, σ) ∝ 1/σ on (β, σ), the posterior distribution for the 10-dimensional parameter (β, σ) is proportional to ( ) 5 9 ∏ ∏ 1 yij − β1 xi1 − · · · − β9 xi9 g3 . σ σ i=1 j=1 This is a reasonably complicated function to generate from. It makes sense to make a transformation from σ to log σ so that the variable in the integration is unconstrained in R. Denoting θ = (θ1 , . . . , θ9 , θ10 ) = (β1 , . . . , β9 , log σ), the posterior density is proportional to f (θ|y) ∝ h(θ) = exp(−45 θ10 )

9 ∏ 5 ∏ i=1 j=1

( g3

yij − θi exp(θ10 )

) ,

and the objective is to approximate integrals of the form ∫ ∫ 10 m(θ)h(θ)dθ R(m) = R ∫ = m(θ)f (θ|y)dθ, h(θ)dθ R10 R10

(10)

for functions m : R10 −→ R whose posterior expectations are wanted to be approximated. In order to obtain a numerical example, Evans and Swartz (1995) fixed the values of the parameters and generated the errors. The same values obtained by them will be used to approximate the integrals given in the last term of (10) for m(θ) = θi , i = 1, 2, . . . , 10 and m(θ) = θi2 , i = 1, 2, . . . , 10. Note that R(m) = Ef (m(θ)). Then, the sample mean of m(θ) obtained by using a generated sample from the posterior distribution (in this case by using the proposed method with h) is an unbiased estimator of R(m). In this case, h is also continuously differentiable. In order to generate a sample from the posterior distribution, the ratio-of-uniforms method is applied with r = 1. The acceptance rate obtained by applying the traditional acceptance-rejection method is pa = 0.00010048. Note that this is a very low acceptance rate. A pilot sample of size 5000 is generated b are obtained. Then, the acceptance rate increases to pe = 0.02483. The and the estimations µ ˆ and Σ computational saving is outstanding since, in mean, it is necessary to generate 10000 points to obtain 1 valid point by using the traditional method whereas, by using the proposed approach, it is enough to generate 40 points to obtain 1 valid point. The next step is to approximate the integrals. The estimation obtained by using Laplace’s method ˆ L , and R ˆ N denotes the asymptotic normality approximation. These estimations were is denoted by R ˆ e denotes the approximation obtained applying Monte Carlo calculated in Evans and Swartz (1995). R integration to the sample of size 1000 generated by using the proposed method. Table 3 presents the estimations for R(θi ) and R(θi2 ), i = 1, 2, . . . , 10, which are presented with three decimal places. In the ˆ N (θi ) = R ˆ L (θi ). In the second case, the fully exponential Laplace approximation obtained by first case, R ˆF . Evans and Swartz (1995) is denoted by R In Table 3, it can be observed that Monte Carlo integration used with the i.i.d. sample generated with the proposed method can be used to obtain good estimations. The proposed method can also be used to estimate integrals via importance sampling and to generate candidate observations in the MetropolisHastings independence sampler.

11

i

R(θi )

ˆ L (θi ) R

ˆ e (θi ) R

R(θi2 )

ˆ L (θ2 ) R i

ˆ N (θ2 ) R i

ˆ F (θ2 ) R i

ˆ e (θ2 ) R i

1 2

2.043 0.095

2.018 0.116

2.048 0.095

4.263 0.081

4.073 0.014

4.141 0.061

4.313 0.062

4.284 0.081

3 4

0.696 0.018

0.883 0.029

0.691 0.022

0.721 0.069

0.780 0.001

0.914 0.046

0.747 0.039

0.715 0.068

5 6 7

−0.426 −0.147 −0.287

−0.408 −0.172 −0.271

−0.441 −0.148 −0.268

0.270 0.107 0.212

0.166 0.030 0.073

0.231 0.082 0.177

0.273 0.094 0.214

0.280 0.114 0.209

8 9

−0.650 0.319

−0.629 0.370

−0.643 0.322

0.488 0.206

0.396 0.137

0.437 0.208

0.505 0.200

0.478 0.204

10

−0.073

−0.232

−0.076

0.033

0.054

0.079

0.033

0.034

Table 3: True values and estimations for R(θi ) and R(θi2 ).

6

Conclusion

The ratio-of-uniforms method is useful for the design of generation algorithms suitable for large families of univariate distributions. When generalizing to higher dimensions, the principles of the ratio-of-uniforms method remain the same, but the acceptance rate decreases quickly when the dimension rises. The fact that everything becomes much more difficult in the multidimensional situation is a commonplace for all fields of computational mathematics. The method proposed in this paper is based on the multivariate version of the ratio-of-uniforms method. The proposed approach is recommended to be applied for continuously differentiable densities, where the acceptance rate in the traditional acceptance-rejection technique is low, but a pilot random sample of small size is possible to be uniformly generated in the objective region. The minimal bounding hyper-rectangle is replaced by a properly chosen hyper-ellipsoid to cover the objective region. This may be useful to improve the acceptance rates for many standard and nonstandard multivariate distributions. Higher acceptance rates are not guaranteed for all multivariate densities, but only when the corresponding objective region is more tightly enclosed in a hyper-ellipsoid than in a hyper-rectangle. However, this approach leads to good results for many multivariate distributions. For example, for unimodal multivariate distributions for which the mode (or an approximation of it) is known, better acceptance rates can be obtained after a relocation of the mode. In this case, the geometric interpretation of the ratio-of-uniforms method shows that the objective region can be adjusted by a hyper-ellipsoid more tightly than by a hyper-rectangle. The proposed approach is based on simple theory and, when applied to suitable densities, computer programs that maintain their reasonable speed regardless of the setup cost are obtained. Empirical proofs on a wide range of standard multivariate distributions (e.g. Cauchy, Normal, t,...) and nonstandard ones (e.g. those arising in the Bayesian context) show that an important computational saving is obtained. Some illustrative examples from the literature, presented in Section 5, show the benefits of the proposed method. Finally, a remarkable property is that the ratio-of-uniforms method, and so the modification proposed here, allow to generate from densities known up to a multiplicative constant. This fact makes this approach specially interesting for Bayesian computation.

12

Acknowledgements The authors are thankful to Luke Tierney and two anonymous referees for valuable comments and insightful suggestions. This research was partially supported by Ministerio de Educaci´ on y Ciencia, Spain (Project TSI2004-06801-C04-03).

References Barabesi, L., 1993. Random variate generation by using the ratio-of-uniforms method. Ricerca-Monografie / n.1. Universit` a degli Studi di Siena-Dipartimento di Metodi Quantitativi. Collana di Pubblicazioni. Barbu, G., 1982. On computer generation of a random variables by transformations of uniform variables. Bulletin mathematique de la Societe des Sciences mathematiques de Republic socialiste de Roumanie 26 (74) (2), 129–139. Chen, M., Shao, Q., Ibrahim, J. G., 2000. Monte Carlo Methods in Bayesian Computation. Series in Statistics. Springer. Cheng, R. C. H., Feast, G. M., 1979. Some simple gamma variate generators. Applied Statistics 28, 290–295. Dagpunar, J., 1988. Principles of Random Variate Generation. Clarendon Oxford Science Publications. Devroye, L., 1986. Non-Uniform Random Variate Generation. Springer-Verlag. Evans, M., Swartz, T., 1995. Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems. Statistical Science 10 (3), 254–272. Gentle, J. E., 1998. Random Number Generation and Monte Carlo Methods. Statistics and Computing. Springer-Verlag. Gilks, W. R., Richardson, S., Spiegelhalter, D. J., 1998. Markov Chain Monte Carlo in Practice. Chapman and Hall. Hansen, E., Walster, G. W., 2004. Global optimization using interval analysis. Pure and applied mathematics. Marcel Dekker. H¨ ormann, W., Leydold, J., Derflinger, G., 2004. Automatic Nonuniform Random Variate Generation. Statistics and Computing. Springer. Johnson, M. E., 1987. Multivariate Statistical Simulation. John Wiley and Sons. Jones, M. C., Lunn, A. D., 1996. Transformations and random variate generation: generalised ratio-ofuniforms methods. Journal of Statistical Computation and Simulation 55, 49–55. Kinderman, A. J., Monahan, J. F., 1977. Computer generation of random variables using the ratio of uniform deviates. ACM Transactions on Mathematical Software 3 (3), 257–260. Kotz, S., 1975. Multivariate distributions at a cross road. Statistical Distributions in Scientific Work 1, 247–270.

13

Leydold, J., 2000. Automatic sampling with the ratio-of-uniforms method. ACM Transaction on Mathematical Software 26 (1), 78–98. Moore, R. E., 1966. Interval analysis. Prentice-Hall. Moore, R. E., Hansen, E. R., Leclerc, A., 1992. Recent advances in global optimization. Floudas and Pardalos, Ch. Rigorous methods for global optimization, pp. 321–342. Ratschek, H., Rokne, J., 1988. New computer methods for global optimization. Ellis Horwood series in Mathematics and its applications. Ellis Horwood Limited. Ripley, B. D., 1987. Stochastic Simulation. Wiley Series in Probability and Mathematical Statistics. John Wiley and Sons. Sahinidis, N. V., 2005. BARON: Branch And Reduce Optimization Navigator. User’s Manual. Version 4.0. Available at http://archimedes.scs.uiuc.edu/baron/manuse.pdf. Stefanescu, S., Vaduva, I., 1987. On computer generation of random vectors by transformations of uniformly distributed vectors. Computing 39, 141–153. Tawarmalani, M., Sahinidis, N. V., 2004. Global optimization of mixed-integer nonlinear programs: A theoretical and computational study. Mathematical Programming 99 (3), 563–591. Vaduva, I., 1984. Computer generation of random vectors based on transformation of uniformly distributed vectors. In: Proceedings of the Seventh Conference on Probability Theory. Vol. I. Brasov., pp. 589–598. Vaduva, I., Stoica, M., Odagescu, I., 1983. Simulation of Economic Processes (Rom.). Technical Publishing House, Bucharest. Wakefield, J. C., Gelfand, A. E., Smith, A. F. M., 1991. Efficient generation of random variates via the ratio-of-uniforms method. Statistics and Computing 1, 129–133.

14

Suggest Documents