Models and simulation techniques from stochastic geometry

11 downloads 9321 Views 2MB Size Report
The Stochastic Geometry and Statistical Applications section of Advances ..... nevertheless its analysis can lead to very substantial calculations and theory.
Home Page

Title Page

Contents

JJ

II

J

I

Page 1 of 163

Models and simulation techniques from stochastic geometry Wilfrid S. Kendall Department of Statistics, University of Warwick

Go Back

Full Screen

Close

Quit

A course of 5 × 75 minute lectures at the Madison Probability Intern Program, June-July 2003

Introduction Home Page

Title Page

Stochastic geometry is the study of random patterns, whether of points, line segments, or objects. Related reading: Stoyan et al. (1995);

Contents

Barndorff-Nielsen et al. (1999); JJ

II

Møller and Waagepetersen (2003) and the recent expository essays edited by Møller (2003);

J

I

The Stochastic Geometry and Statistical Applications section of Advances in Applied Probability.

Page 2 of 163

Go Back

Full Screen

Close

Quit

Home Page

This screen layout is not very suitable for printing. However, if a printout is desired then here is a button to do the job! Print

Title Page

Contents

JJ

II

J

I

Page 3 of 163

Go Back

Full Screen

Close

Quit

Home Page

Title Page

Contents

JJ

II

J

I

Page 4 of 163

Go Back

Full Screen

Close

Quit

Home Page

Title Page

Contents

JJ

II

J

I

Page 5 of 163

Go Back

Full Screen

Close

Quit

Contents 1

A rapid tour of point processes

7

2

Simulation for point processes

45

3

Perfect simulation

75

4

Deformable templates and vascular trees

103

5

Multiresolution Ising models

123

A Notes on the proofs in Chapter 5

145

Home Page

Title Page

Contents

JJ

II

J

I

Page 6 of 163

Go Back

Full Screen

Close

Quit

Home Page

Title Page

Contents

JJ

II

J

I

Chapter 1

A rapid tour of point processes

Page 7 of 163

Contents Go Back

Full Screen

Close

Quit

1.1 1.2 1.3 1.4 1.5

The Poisson point process . . . . Boolean models . . . . . . . . . . General definitions and concepts Markov point processes . . . . . Further issues . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

8 23 27 33 40

1.1. The Poisson point process Home Page

Title Page

Contents

JJ

II

J

I

The simplest example of a random point pattern is the Poisson point process, which models “completely random” distribution of points in the plane R2 , or 3-space R3 , or other spaces. It is a basic building block in stochastic geometry and elsewhere, with many symmetries which typically reduce calculations to computations of area or volume.

Page 8 of 163

Go Back

Full Screen

Close

Quit

Figure 1.1: Which of these point clouds exhibits interaction?

Home Page

Title Page

Contents

JJ

II

J

I

Page 9 of 163

Go Back

Definition 1.1 Let X be a random point process in a window W ⊆ Rn defined as a collection of random variables X(A), where A runs through the Borel subsets of W ⊆ Rn . The point process X is Poisson of intensity λ > 0 if the following are true: 1. X(A) is a Poisson random variable of mean λ Leb(A); 2. X(A1 ), . . . , X(An ) are independent for disjoint A1 , . . . , An . Examples: stars in the night sky;

patterns used to define collections of random discs;

Full Screen

. . . or space-time structures; Close

Brownian excursions indexed by local time(!). Quit

Home Page

Title Page

Contents

JJ

II

J

I

Page 10 of 163

Go Back

• In dimension 1 we can construct X using a Markov chain (“number of points in [0, t]”) and simulate it using Exponential random variables for inter-point spacings. Higher dimensions are trickier! See Theorem 1.5 below. • We can replace λ Leb (and W ⊆ Rn ) by a nonnegative σ-finite measure µ on a measurable space X , in which case X is an inhomogeneous Poisson point process with intensity measure µ. We should also require µ to be diffuse (P [µ({x}) = 0] = 1 for every point {x}) in order to avoid the point process faux pas of placing than one point onto a single location.

Exercise 1.2 Show that any measurable map will carry one (perhaps inhomogeneous) Poisson point process into another, so long as – the image of the intensity measure is a diffuse measure; – and we note that the image Poisson point process is degenerate if the image measure fails to be σ-finite.

Full Screen

Close

Quit

Exercise 1.3 Show that in particular we can superpose two independent Poisson point processes to obtain a third, since the sum of independent Poisson random variables is itself Poisson.

Home Page

Title Page

• It is a theorem that the second, independent scattering, requirement of Definition 1.1 is redundant! In fact even the first requirement is over-elaborate:

Theorem 1.4 Suppose a random point process X satisfies P [X(A) = 0]

Contents

JJ

II

J

I

Page 11 of 163

Go Back

=

exp (−µ(A))

for all measurable A .

Then X is an (inhomogeneous) Poisson point process of intensity measure µ.

– This can be proved directly. I find it more satisfying to view the result as a consequence of Choquet capacity theory, since this approach extends to cover a theory of random closed sets (Kendall 1974). Compare inclusion-exclusion arguments and “correlation functions” used in the lattice case in statistical mechanics (Minlos 1999).

Full Screen

– We don’t have to check every measurable set:1 it is enough to consider sets A running through the finite unions of rectangles!

Close

– But we cannot be too constrained in our choice of various A (the family of convex A is not enough).

Quit

1 Certainly

we need check only the BOUNDED measurable sets.

Home Page

• An alternative approach2 lends itself to simulation methods for Poisson point processes in bounded windows: condition on the total number of points.

Title Page

Contents

Theorem 1.5 Suppose the window W ⊆ Rn has finite area. The following constructs X as a Poisson point process of finite intensity λ: – X(W ) is Poisson of mean λ Leb(W );

JJ

II

J

I

– Conditional on X(W ) = n, the random variables X(A) are constructed as point-counts of a pattern of n independent points U1 , . . . , Un uniformly distributed over W . So X(A)

=

# {i : Ui ∈ A} .

Page 12 of 163

Go Back

Full Screen

Exercise 1.6 Generalize Theorem 1.5 to (inhomogeneous) Poisson point processes and to the case of W of infinite area/volume.

Close

Quit

2 The construction described in Theorem 1.5 is the same as the one producing the link via conditioning between Poisson and multinomial contingency table models.

Home Page

Title Page

Contents

JJ

II

J

I

Page 13 of 163

Go Back

Full Screen

Close

Quit

Exercise 1.7 Independent thinning. Suppose each point ξ of X, produced as in Theorem 1.5, is deleted independently of all other points with probability 1 − p(ξ) (so p(ξ) is the retention probability). Show that the resulting thinned random point pattern is an inhomogeneous Poisson point process with intensity measure p(u) Leb( du).

Exercise 1.8 Formulate the generalization of the above exercise which uses position-dependent thinning on inhomogeneous Poisson point processes.

Home Page

Title Page

Contents

JJ

II

J

I

Page 14 of 163

Go Back

Full Screen

Close

Quit

• The process produced by conditioning on X(W ) = n has a claim to being even more primitive than the Poisson point process.

Definition 1.9 A Binomial point process with n points defined on a window W is produced by scattering n points independently and uniformly over W .

However you should notice – point counts in disjoint subsets are now (perhaps weakly) negatively correlated: the more in one region, the fewer can fall in another; – the total number n of points is fixed, so we cannot extend the notion of a (homogeneous) Binomial point process to windows of infinite measure; – if you like that sort of thing, the Binomial point process is the simplest representative of a class of point processes (with finite total number of points) analogous to canonical ensembles in statistical physics. The Poisson point process relates similarly to grand canonical ensembles (grand, because points are indistinguishable and we randomize over the total number of particles). Of course these are trivial cases because there is no interaction between points — but more of that later!

Home Page

Title Page

Contents

JJ

II

J

I

Here’s a worry.3 It is easy to prove that the construction of Theorem 1.5 produces point count random variables which satisfy Definition 1.1 and therefore delivers a Poisson point process. But what about the other way round? Can we travel from Poisson point counts to a random point pattern? Or is there some subtle concealed random behaviour in Theorem 1.5 going beyond the labelling of points, which cannot be extracted from a point count specification? The answer is that the construction and the point count definition are equivalent.

Page 15 of 163

Go Back

Full Screen

Close

Quit

3 Perhaps you think this is more properly a neurosis to be dealt with by personal counselling?! Consider that a neurotic concern here is what allows us to introduce measure-theoretic foundations which are most valuable in further theory.

Home Page

Title Page

Theorem 1.10 Any Poisson point process given by Definition 1.1 induces a unique probability measure on the space of locally finite point patterns, when this is furnished with the σ-algebra generated by point counts.

Contents

Here is a sketch of the proof. JJ

II

J

I

Page 16 of 163

Go Back

Full Screen

Close

Quit

Take W = [0, 1)2 . For each n = 1, 2, . . . , make a dyadic dissection of [0, 1)2 into 4n different sub-rectangles [k2−n , (k + 1)2−n ) × [`2−n , (` + 1)2−n ). For a given level n, visualize the pattern Pn obtained by painting sub-rectangles black if they are occupied by points, white otherwise. The point counts for each of these 4n sub-rectangles are independent Poisson of mean 4−n λ: by subadditivity of probability:  P [one or more point counts > 1] ≤ 4n (1 − (1 + 4−n λ) exp −4−n λ  = 4n (4−n λ)2 θ exp −4−n λθ for some θ ∈ (0, 1) (use the mean value theorem for x 7→ 1 − (1 + x) exp(−x)), which is summable in n. So by Borel-Cantelli we deduce that for all large enough n the black sub-rectangles in pattern Pn contain just one Poisson point each. Ordering these points randomly yields the construction in Theorem 1.5.

Home Page

Simulation techniques for Poisson point processes are rather direct and straightforward: here are two possibilities. 1. We can use Theorem 1.5 to reduce the problem to the algorithm

Title Page

Contents

JJ

II

J

I

Page 17 of 163

Go Back

Full Screen

Close

Quit

• draw N = n from a Poisson random variable distribution; • draw n independent values from a Uniform distribution over the required window.

def poisson process (region ← W, intensity ← alpha) : pattern ← [ ] for i ∈ range (poisson (alpha)) : pattern.append (uniform (region ← W )) return pattern

Home Page

Title Page

Contents

JJ

II

J

I

Page 18 of 163

Go Back

Full Screen

2. Quine and Watson (1984) describe a generalization of the 1-dimensional “Markov chain” method mentioned above. Suppose we wish to simulate a Poisson point process of intensity λ in ball(o, R). We describe the twodimensional case; the generalization to higher dimensions is straightforward. 4 Using polar coordinates and power transform of radius, produce a measure-preserving map (0, R2 /2] × (0, 2π]



(s, θ)

7→

ball(o, R) √ ( 2s, θ)

(1.1)

√ where ( 2s, θ) is expressed in polar coordinates, and [0, R2 /2] × [0, 2π] and the ball(o, R) are endowed with the measure λ Leb. Being measure-preserving, the map preserves point processes. But the following Exercise 1.11 shows a Poisson(λ) point process on the product space [0, R2 /2] × [0, 2π] can be thought of as a 1dimensional Poisson(2λπ) point process of “half-squared radii” on [0, R2 /2] (which may be constructed using a sequence of independent Exponential(2λπ) random variables), each point of which is associated with an “angle” which is uniformly distributed on [0, 2π].

Close

Quit

4 The Quine-Watson method is particularly suited to the construction of Poisson point processes which are to be used to build geometric tessellations . . .

Home Page

Title Page

Contents

JJ

II

J

I

Page 19 of 163

Exercise 1.11 Suppose X is a Poisson point process of intensity λ on the rectangle [a, b] × [0, 1]. Show that it may be realized as a one-dimensional Poisson point process on [a, b] of intensity λ, enriched by attaching to each one-dimensional point an independent Uniform([0, 1]) random variable to provide the second coordinate.

Simply suppose the construction is given by Theorem 1.5, show the first coordinate of each point provides the required one-dimensional Poisson point process, note the second coordinate is uniformly distributed as required!

Go Back

Full Screen

Close

Quit

Exercise 1.12 Why is it irrelevant to this proof that the map image omits the centre o of ball(o, R)?

Home Page

Title Page

Contents

JJ

II

J

I

Page 20 of 163

Exercise 1.13 Check the details of modifying the Quine-Watson construction for higher dimensions, to show how to build a d-dimensional Poisson point process using independent sequences of Exponential random variables and random variables uniformly distributed on S d−1 .

Exercise 1.14 Consider the following method of parametrizing lines: measure s the perpendicular distance from the origin and θ the angle made by the perpendicular to a reference line. How should you modify the Quine-Watson construction to produce a Poisson line process which is invariant under isometries of the plane? What about invariant Poisson hyperplane processes?

Go Back

Full Screen

Close

Quit

What do large cells look like? In the early 1940s D.G. Kendall conjectured a large cell should approximate a disk. We now know this is true, and more besides! (Miles 1995; Kovalenko 1997; Kovalenko 1999; Hug et al. 2003)

Home Page

Title Page

Given a Poisson point process, we may construct a locally finite point pattern using the methods of Theorem 1.10, and enumerate the points of the pattern using some arbitrary rule. So we may speak of summing h(ξ) over the pattern points ξ; by Fubini the result will not depend on the arbitrary ordering so long as h is nonnegative.

Contents

JJ

II

J

I

Page 21 of 163

Go Back

Full Screen

Theorem 1.15 If X is a Poisson point process of intensity λ and if h(x, φ) is a nonnegative measurable function of a point x and a locally finite point pattern φ then X  Z E h(ξ, X \ {ξ}) = E [h(u, X)] λ Leb( du) , (1.2) and indeed by direct generalization  X  E h(ξ1 , . . . , ξn , X \ {ξ1 , ξ2 , . . . , ξn }) Z ··· Rd ···Rd

Quit

=

disjoint ξ1 ,...,ξn ∈X

Z Close

Rd

ξ∈X

(1.3) n

E [h(u1 , . . . , un , X)] λ Leb( du1 ), . . . , Leb( dun ) .

Home Page

Title Page

Exercise 1.16 Use Theorem 1.5 to prove (1.2) for h(u, X) which is positive only when u lies in a bounded window W and which depends on X only on the intersection of the pattern X with the window W . Hence deduce the full Theorem 1.15.

Contents

This delivers a variation on the independent scattering property of the Poisson point process. JJ

II

J

I

Page 22 of 163

Corollary 1.17 If g is a nonnegative measurable function of a locally finite point pattern φ then E [g(X \ {u})|u ∈ X]

=

E [g(X)]

Go Back

Full Screen

Close

Quit

hP i Use Theorem 1.15 to calculate E ξ∈X I [ξ ∈ ball(u, ε)]g(X \ {ξ}) . Interpret R the expectation as ball(u,ε) E [g(X \ {u})|u ∈ X] λ Leb( du)

We say the Palm distribution for a Poisson point process is the Poisson point process itself with the conditioning point added in!

1.2. Boolean models Home Page

Title Page

Contents

JJ

II

J

I

In stochastic geometry the counterpart to the Poisson point process is the Boolean model; the set-union of (possibly random) geometric figures or grains located one at each point or germ of an underlying Poisson point process. Much of the amenability of the Poisson point process is inherited by the Boolean model, though nevertheless its analysis can lead to very substantial calculations and theory.

Page 23 of 163

Go Back

Boolean models have the advantage of being simple enough that one can carry explicit computation. They have a long application history, for example being used by Robbins (1944,1945) to study bombing problems (how heavily to bomb a beach in order to ensure a high probability of landmine-clearance . . . ).

Full Screen

Close

Bombing problems and coverage: refer to Hall (1988, §3.5), Molchanov (1996a); also note application of Exercise 1.14. Quit

Home Page

Title Page

Contents

JJ

II

J

I

Page 24 of 163

Go Back

Full Screen

Close

Quit

Definition 1.18 A Boolean model Ξ is defined in terms of a germ process which is a Poisson point process X of intensity λ, and a grain distribution which defines a random set5 . It is a random set which is formed by the union [ (U (ξ) + ξ) , ξ∈X

where the U (ξ) are independent realizations of the random set, one for each germ or point ξ of the germ process. (Notice: the grains should also be independent of the germs!) The classic computation for a Boolean model Ξ is to determine the probability P [o ∈ Ξ] that the resulting random set covers a specified point (here, the origin o). This reduces to a matter of geometry: if o is covered then it must be covered by at least one grain, so just think about where the corresponding germ might lie. ˆ of U in the If U is non-random then the germ will have to lie in the reflection U ˆ . If U is random then generalize origin, so restrict attention to germs lying in U h i ˆ that this: thin each germ ξ of the germ process using the probability P ξ ∈ U 5 The random set is often not random at all (a disk of fixed radius, say) or is random in a simple parametrized manner (a disk of random radius, or a random convex polygon defined as the intersection of a sequence of random half-spaces); so we will follow Robbins (1944,1945) in treating this informally, rather than developing the later and profound theory of random sets (Matheron 1975)!

Home Page

the corresponding grain coversho. Theiresult is an inhomogeneous Poisson point ˆ Leb( du): calculate the probability of this process of intensity measure P u ∈ U process containing no points at all to produce the following:

Title Page

Contents

JJ

II

J

I

Page 25 of 163

Go Back

Full Screen

Theorem 1.19 Suppose Ξ is a Boolean model with germ intensity λ and typical random grain U . Then the area / volume fraction is P [o ∈ Ξ]

=

1 − exp (−λ E [Leb(U )]) .

(1.4)

A similar argument produces the generalization

Theorem 1.20 Suppose Ξ is a Boolean model with germ intensity λ and typical random grain U . Let K be a compact region. Then the probability of Ξ hitting K is  h i ˆ ⊕ K) , P [Ξ ∩ K 6= ∅] = 1 − exp −λ E Leb(U (1.5) ˆ ⊕ K is a Minkowski sum: where U

Close

K1 ⊕ K2 Quit

=

{x1 + x2 : x1 ∈ K1 , x2 ∈ K2 } .

(1.6)

Home Page

Title Page

Contents

Remember, from the discussion prior to Theorem 1.15, that we may view the germs as a sequence of points (using some arbitrary method of ordering); so simulation of Ξ need only depend on generating a suitable sequence of independent random sets U1 , U2 , . . . . Harder analytical questions include: • how to estimate the probability of complete coverage?

JJ

II

• in a high-intensity Boolean model, what is the approximate statistical shape of a vacant region?

J

I

• as intensity increases, how to describe the transition to infinite connected regions of coverage?

Page 26 of 163

Go Back

Full Screen

Close

Quit

Useful further reading for these questions: Hall (1988), Meester and Roy (1996), Molchanov (1996b).

1.3. General definitions and concepts Home Page

1.3.1. General Point Processes Title Page

Other approaches to point processes include: • random measures (nonnegative integer-valued, charge singletons with 0, 1);

Contents

• measures on exponential measure spaces; JJ

II

J

I

Page 27 of 163

Go Back

Full Screen

Close

Quit

reflected in notation (a point process X can have points belonging to it (ξ ∈ X), can be used for point counts (X(E) when E is a subset of the ground space, and can be used to provide a reference measure so as to define other point processes via densities). Carter and Prenter (1972) establish essential equivalences. We can define general point processes as probability measures on the space of locally finite point patterns, following Theorem 1.10. This allows us immediately to define stationary point processes (the distribution respects translation symmetries, at least locally) and isotropic point processes (the same, but for rotations). The distribution of a general point process is characterized by its void probabilities as in Theorem 1.4. To each point process X (whether stationary, isotropic, or not) there is an intensity measure given by Λ(E) = E [X(E)]. (For stationary point processes this has to be a multiple of Lebesgue measure!)

We can also define the Palm distribution for X: Home Page

Definition 1.21 E [g(X)|u ∈ X] is required to solve Title Page

 Contents

E

 X

f (ξ)g(X \ {ξ})

=

ξ∈X∩ball(x,ε)

Z JJ

II

J

I

Page 28 of 163

f (u)E [g(X)|u ∈ X] Λ( du) . (1.7) ball(x,ε)

A somewhat dual notion is the conditional intensity: informally, if x is a location and φ is a locally finite point pattern not including the location x then the conditional intensity λ(x; φ) is the infinitesimal probability of a point pattern X containing x if X \ {x} = φ:

Go Back

λ(x; φ)

=

P [x ∈ X|X \ {x} = φ] . P [X \ {x} = φ]

(1.8)

Full Screen

Close

Quit

We shall not define this in rigorous generality, as its definition will be immediate for the Markov point processes to which we will turn below.

Home Page

These considerations provide a general context for figuring out constructions which move away from the “no interactions” property of Poisson point processes.

Title Page

1. Poisson cluster processes (can be viewed as Boolean models in which the grains are simply finite random clusters of points). Neyman-Scott processes arise from clusters of independent points.

Contents

2. Randomize intensity measure: “doubly stochastic” or Cox process.

JJ

II

J

I

Page 29 of 163

Go Back

Exercise 1.22 If the number of points in a Neyman-Scott cluster is Poisson, then the Neyman-Scott point process is also a Cox process.

Line processes present an interesting issue here! Davidson conjectured that a second-order stationary line process with no parallel lines is Cox. Kallenberg (1977) counterexample: Stoyan et al. (1995, Figure 8.3) provides an illustration.

Full Screen

Close

Quit

3D variants with applications in Parkhouse and Kelly (1998). 3. Or apply rejection sampling to a Poisson point process, rejection probability depending on statistics computed from the point pattern. Compare Gibbs measure formalism of statistical mechanics.

Home Page

Title Page

Contents

JJ

II

J

I

Page 30 of 163

What sort of statistics might we use? ˆ • estimation of intensity λ by λ(X) = X(W )/ Leb(W ) (using sufficient statistic X(W )); • measuring how close the point pattern is to a specific location using the spherical contact distribution function Hs (r)

Full Screen

Close

Quit

P [dist(o, X) ≤ r] ,

(1.9)

estimated via a Boolean model Ξr based on balls U of fixed radius r by  [   (U + ξ) ∩ W / Leb(W ) ; Leb(Ξr ∩ W )/ Leb(W ) = Leb ξ∈X

(1.10) • measuring how close pattern points come to each other using the nearest neighbour distance function D(r)

Go Back

=

=

P [dist(o, X) ≤ r|o ∈ X] .

(1.11)

Interpret conditioning on null event “o ∈ X” using Palm distributions; • The related reduced second moment measure 1 K(r) = E [#{ξ ∈ X : dist(o, ξ) ≤ r} − 1|o ∈ X] , (1.12) λ estimating λK(r) by the number sr (X) of (ordered) neighbour pairs (ξ1 , ξ2 ) in X for which dist(ξ1 , ξ2 ) ≤ r (use Slivynak-Mecke Theorem 1.15).

We ignore the important issue of edge effects! Home Page

Title Page

Contents

JJ

II

J

I

Page 31 of 163

Go Back

Full Screen

Close

Quit

Exercise 1.23 Use Theorem 1.15 to prove that for a homogeneous Poisson point process D(r) = Hs (r) and K(r) = Leb(ball(o, r)). Here is a prototype of use of these statistics in rejection algorithms:

Exercise 1.24 Consider the following rejection simulation algorithm, notˆ = X(W )/ Leb(W ) to connect with previous discussion: ing that λ def reject () : X ← poisson process (region ← W, intensity ← lamda) while uniform (0, 1) > theta ∗ ∗(number (X)) : X ← poisson process (region ← W, intensity ← lamda) return X Show that the resulting point process is Poisson of intensity λ + θ.

Use Theorem 1.5 to reduce this to a question about rejection sampling of Poisson random variables.

1.3.2. Marked Point Processes Home Page

Title Page

Contents

JJ

II

J

I

We can think of Boolean models as derived from marked Poisson point processes; to each germ point assign a random mark encoding the geometric data (such as disk radius) of the grain. In effect the Poisson point process now lives on Rd × M where M is the mark space. • Poisson cluster processes as mentioned above: the grains are finite random clusters; • segment and disk processes; • line and (hard) rod processes;

Page 32 of 163

Go Back

Full Screen

• fibre and surface processes. Compare Exercise 1.11; the intensity measure will factorize as µ × ν so long as the marks are independent, where µ is the intensity of the basic Poisson point process and the measure ν is the distribution of the mark. But beware! it can be meaningful to consider improper mark distributions. Examples include: Decomposition of Brownian motion into excursions;

Close

Marked planar Poisson point process for which the marks are disks of random radius R, density proportional to 1/r if r ≤ threshold. Quit

1.4. Markov point processes Home Page

Title Page

Consider interacting point processes generating by rejection sampling (Exercise 1.24 is simple example!). For the sake of simplicity, we suppose here that all our processes are two-dimensional. The general procedure is to use a statistic t(X) in the algorithm

Contents

JJ

II

J

I

Page 33 of 163

def reject () : X ← poisson process (region ← W, intensity ← lamda) while uniform (0, 1) > exp (theta ∗ t (X)) : X ← poisson process (region ← W, intensity ← lamda) return X We can now use standard arguments from rejection simulation. The resulting point process distribution has a density constantθ × exp(θt(X))

Go Back

(1.13)

with respect to the Poisson point process X of intensity λ, for normalizing constant Full Screen

constantθ Close

Quit

=

E [exp(θt(X))|X is Poisson(λ)] .

Home Page

If θt(X) > 0 is possible then the above algorithm runs into trouble (how do you reject a point pattern with probability greater than 1?!) However the probability density at Equation (1.13) can still make sense. Importance sampling could of course implement this. We can do better if the following holds:

Title Page

Contents

Definition 1.25 A point pattern density f (X) satisfies Ruelle stability if f (X)

JJ



A exp(B#(X))

(1.14)

II

for constants A > 0, B; where #(X) is the number of points in X. J

I

Page 34 of 163

Go Back

Exercise 1.26 6 Show that if the density exp(θt(X)) is Ruelle stable with constants as in 1.25 then the rejection algorithm above will produce a point pattern with the correct density if applied to a Poisson point process of density λeB instead of λ, and rejection probability is changed to exp(θt(X) − B#(X))/A instead of exp(θt(X)).

Full Screen

Close

Quit

Use t(X) = sr (X), for γ = exp(θ) ∈ (0, 1), the number of ordered pairs separated by at least r, to obtain the celebrated Strauss point process (Strauss 1975). 6 Thanks

for the comment, Tom!

Home Page

Title Page

Contents

JJ

II

J

I

The statistic t(X) = Leb(Ξr ∩ W ), for γ = exp(−θ) results in a process known as the area-interaction point process (Baddeley and van Lieshout 1995), also known in statistical physics as the Widom-Rowlinson interpenetrating spheres model (Widom and Rowlinson 1970) in the case θ > 1. The (Papangelou) conditional intensity is easily defined for this kind of weighted Poisson point process, which (borrowing statistical mechanics terminology) is commonly known as a Gibbs point process: if the density of the Gibbs process with respect to a unit rate Poisson point process is given by f (X) then the conditional intensity at x may be defined as λ(x; φ)

=

f (φ ∪ {x} . f (φ)

(1.15)

Page 35 of 163

Go Back

Exercise 1.27 Compute the conditional intensity for the Poisson point process of intensity λ (use its density with respect to a unit-intensity Poisson point process; see Exercise 1.24).

Full Screen

Close

Quit

Exercise 1.28 Compute the conditional intensity for the Strauss point process.

Home Page

Exercise 1.29 Compute the conditional intensity for the area-interaction point process.

Title Page

Contents

JJ

II

J

I

Page 36 of 163

Go Back

Full Screen

Close

Quit

All these conditional intensities λ(x; φ) share a striking property: they do not depend on the point pattern φ outside of some neighborhood of x. This is an appealing property for a point process intended to model interactions: it is captured in Ripley and Kelly’s (1977) definition of a Markov point process:

Definition 1.30 A Gibbs point process is a Markov point process with respect to a neighbourhood relation ∼ if its density is everywhere positive7 , and its conditional intensity λ(x; φ) depends only on x and {y ∈ φ : y ∼ x}. The celebrated Hammersley-Clifford theorem tells us that the probability density for a Markov point process can always be factorized as a product of interactions, functions of ∼-cliques of points of the process. Ripley and Kelly (1977) presents a proof for the point process case, while Baddeley and Møller (1989) provides a significant extension. A recent treatment of Markov point processes in book form is given by Van Lieshout (2000). 7 The positivity requirement ensures that the ratio expression for λ(x; φ) in Equation (1.15) is welldefined! It can be relaxed to a hereditary property.

Home Page

Title Page

Contents

JJ

However care must be taken to ensure they are well-defined: the danger of simply writing down a statistic t(X) and defining the density by Equation (1.13) is that the total integral E [exp(θt(X)|X is Poisson(λ)] may diverge so that the normalizing constant is not well-defined. (This is the case for the Strauss point process when γ > 1, as shown by Kelly and Ripley 1976. This particular point process cannot therefore be used to model clustering — a major motivation for considering the more amenable area-interaction point process.)

II

J

I

  Exercise 1.31 Show that E γ sr (X) diverges for γ > 1 if the interaction radius r exceeds the maximum separation of potential points in the window W.

Page 37 of 163

Go Back

Full Screen

Close

Quit

Exercise 1.32 Bound exp(−θ Leb(Ξr ∩ W )) above in terms of the number of points in W (show the interaction satisfies Ruelle-stability), and hence deduce that the area-interaction point process is defined for all real θ.

Home Page

Title Page

Contents

JJ

II

J

I

Page 38 of 163

Can we bias using perimeter instead of area of Ξr in area-interaction, or even “holiness” (Euler characteristic: number-of-components minus number-of-holes)? Like area, these characteristics are local (lead to formal densities whose conditional intensities satisfy the local requirement of Theorem 1.30). The Hadwiger characterization theorem (Hadwiger 1957) more-or-less says that linear combinations of these three measures exhaust all local geometric possibilities! Can the result be normalized? Exercise 1.33 Use the argument of Exercise 1.32 to show that exp(θ perimeter(Ξr ∩ W )) can be normalized for all θ.

Exercise 1.34 Argue similarly to show that exp(θ euler(Ξr ∩ W )) can be normalized for all θ > 0.

Go Back

Full Screen

Close

Quit

There can be no more components than disks/grains of the Boolean model Ξr .

Home Page

Title Page

What about exp(θ euler(Ξr ∩ W )) for θ < 0? Key remark is Theorem 4.3 of Kendall et al. (1999): a union of N closed disks in the plane (of whatever radii, not necessarily the same for each disk) can have at most 2N − 5 holes.

Contents

JJ

II

J

I

Separate, trickier argument: disks can be replaced by polygons satisfying a “uniform wedge” condition.

Page 39 of 163

Go Back

Full Screen

Close

Quit

There are counterexamples . . . .

1.5. Further issues Home Page

Before moving on to develop machinery for simulating realizations of Markov point processes, we make brief mention of a couple of recent pieces of work. Title Page

1.5.1. Poisson point processes and set-indexed martingales Contents

JJ

II

J

I

Page 40 of 163

There are striking applications of set-indexed martingales to distributional results for the areas of adaptively random regions. Recall the notion of stopping time T for a filtration {Ft }. Here is a variant on the standard definition: we require {ω : ∆(ω)  t} ∈ Ft

where

∆(ω) = {ω : t  T (ω)} .

(1.16)

Makes sense for partially-ordered family of closed sets (using set-inclusion); hence martingales, and Optional Sampling Theorem (Kurtz 1980). Zuyev (1999) used this to give an elegant proof of results of Møller and Zuyev (1996):

Go Back

Full Screen

Close

Theorem 1.35 Suppose ∆ is a compact stopping set for X a Poisson point process of intensity α, such that P [X(∆) = n]

>

0

and does not depend on α .

Then Leb(∆) given X(∆) is Gamma(n, α). Quit

Home Page

Title Page

Contents

JJ

II

J

I

Examples: minimal centred ball containing n points (easy: compare Exercise 1.11); regions constructed using Delaunay or Voronoi tessellations. Proof: use likelihood ratio for α (furnishing a martingale), hence h i E ez Leb(∆) | X(∆) = n, α = α =   z Leb(∆) P e I [X(∆) = n]αn α0−n e−(α−α0 ) Leb(∆) | α = α0 . P [X(∆) = n | α = α0 ]

Page 41 of 163

Go Back

Full Screen

(a) Delaunay circumdisk

(b) Voronoi flower

Close

Quit

Cowan et al. (2003) use this technique to consider the extent to which these examples lead to decompositions as in the minimal centred ball.

1.5.2. Stationary random countable dense sets Home Page

Title Page

Contents

JJ

II

J

I

Page 42 of 163

Go Back

We conclude this lecture with a cautionary example! Without straying very from the theory of random point processes one can end up in the bad part of the Measure Theory town. Suppose for whatever reason one wishes to consider random point sets X which are countable but are also dense (set of directions of a line process?). We need to be precise about what we can observe, so that we genuinely treat the different points of X on equal terms. So suppose we can observe any event in the hit-or-miss σ-algebra generated by [X ∩ A] 6= ∅, as A runs through the Borel sets. Suppose that X is stationary. (We thus exclude the situations considered by Aldous and Barlow (1981), where account is taken of how various points are constructed — in effect the dense point process is obtained by projection from a conventional locally finite point process.) There are two ways to define such sets: (a) measurable subset of Ω × R; (b) Borel-set-valued random variable, hit-or-miss measurable.

Full Screen

Close

Quit

Equivalent for case of closed sets (Himmelberg 1975, Thm. 3.5). But (warning!) the set of countable sets is not hit-or-miss measurable, though constructive and weak countability are equivalent (section theorem). What can we say about stationary random countable dense sets?

Home Page

Title Page

Theorem 1.36 (Kendall 2000, Thm. 4.3, specialized to Euclidean case.) If E is any hit-or-miss measurable event then either P [E] = 0 or P [E] = 1. Whether 0 or 1 depends on E but not on X!

Contents

JJ

II

J

I

Page 43 of 163

Exercise 1.37 Compare these two stationary random countable dense sets: (a) For Y a unit-intensity Poisson point process on R × (0, ∞), the countable set obtained by projection onto the first coordinate. (b) The set Q + Z, where Q is the set of rational numbers, and Z is a Uniform([0, 1]) random variable. They are very different, but appear indistinguishable . . . .

Go Back

Full Screen

Close

Quit

This bad behaviour should not be surprising: for example Vitali’s non-measurable set depends on there being no measurable random variable taking values in Q + Z. (Contrast our use of enumeration for locally finite point sets in Theorem 1.15.) Careless thought betrays intuition!

Home Page

Title Page

Contents

JJ

II

J

I

Page 44 of 163

Go Back

Full Screen

Close

Quit

Home Page

Title Page

Contents

JJ

II

J

I

Chapter 2

Simulation for point processes

Page 45 of 163

Contents Go Back

Full Screen

Close

Quit

2.1 2.2 2.3 2.4 2.5 2.6

General theory . . . . . . . . . . . Measure-theoretic approach . . . . MCMC for Markov point processes Uniqueness and convergence . . . . The MCMC zoo . . . . . . . . . . . Speed of convergence . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

48 51 54 60 61 66

Home Page

Computations with Poisson process, Boolean model, cluster processes can be very direct, and simulation is usually direct too. But what about Markov point processes?

Title Page

• Simple-minded rejection procedures (§1.4) are usually infeasible. Contents

JJ

II

• What about accept-reject procedures sequentially in small sub-windows, thus constructing a point-pattern-valued Markov chain whose invariant distribution is the desired point process?

J

I

• Or proceed to limit: add/delete single points?

Page 46 of 163

Go Back

Full Screen

Close

Quit

Markov chain Monte Carlo (MCMC) Gibbs’ sampler, Metropolis-Hastings! Poisson point process as invariant distribution of spatial immigration-death process after Preston (1977), also Ripley (1977, 1979). To deal with more general Markov point processes it is convenient first to work through the general theory of detailed balance equations for Markov chains on general state-spaces. We will show how to use this to derive simulation algorithms for our example point processes above, and discuss the more general MetropolisHastings (MH) formulation after Metropolis et al. (1953) and Hastings (1970), which allows us to view MCMC as a sequence of proposals of random moves which may or may not be rejected.

Simple example of detailed balance Home Page

Underlying the Poisson point process example: the process X of total number of points alters as follows

Title Page

• birth in time interval [0, ∆t) at rate α∆t; • death in time interval [0, ∆t) at rate X∆t.

Contents

Then we can solve JJ

II

J

I

P [X(t) = n − 1 and birth in [t, t + ∆t)] ≈ π(n − 1) × α∆t = π(n) × n∆t ≈ P [X(t) = n and death in [t, t + ∆t)] by

Page 47 of 163

Go Back

Full Screen

Close

Quit

π(n)

=

α π(n − 1) n

= ... =

αn e−α . n!

2.1. General theory Home Page

Title Page

• Ergodic theory for Markov chains: invariant distribution X π(y) = π(x)p(x, y) x

Contents

or (continuous time) X

JJ

II

J

I

π(x)q(x, y) = 0 .

x

• Detailed balance; – reversibility π(x)p(x, y) = π(y)p(y, x) ; – dynamic reversibility π(x)p(x, y) = π(˜ y )p(˜ y, x ˜) ; Page 48 of 163

Go Back

Full Screen

Close

Quit

Exercise 2.1 Show detailed balance (whether reversibility or dynamical reversibility!) implies invariant distribution.

Home Page

P • improper distributions: does x π(x) converge? R Pn • estimate θ(y) dπ(y) using limn→∞ n1 m=1 θ(Ym ) ;

Title Page

Definition 2.2 Geometric ergodicity: there is R(x) > 0 with Contents

kπ − p(n) (x, ·)ktv ≤ R(x)ρn JJ

II

J

I

for all n, all starting points x.

Page 49 of 163

Definition 2.3 Uniform ergodicity: there are ρ ∈ (0, 1), R > 0 not depending on the starting point x such that

Go Back

kπ − p(n) (x, ·)ktv ≤ Rρn

Full Screen

Close

Quit

for all n, uniformly for all starting points x;

2.1.1. History Home Page

Title Page

• Metropolis et al. (1953) • Hastings (1970) • Simulation tests

Contents

JJ

II

• Geman and Geman (1984) • Gelfand and Smith (1990) • Geyer and Møller (1994), Green (1995)

J

I

Page 50 of 163

• ... Definition 2.4 Generic Metropolis algorithm: • Propose: move x → y with probability r(x, y);

Go Back

• Accept: move with probability α(x, y). Required: balance π(x)r(x, y)α(x, y) = π(y)r(y, x)α(y, x).

Full Screen

Definition 2.5 Hastings ratio: set α(x, y) = min{Λ(x, y), 1} where Close

Quit

Λ(x, y)

=

π(y)r(y, x) . π(x)r(x, y)

2.2. Measure-theoretic approach Home Page

Title Page

Contents

JJ

II

J

I

Page 51 of 163

We follow Tierney (1998) closely in presenting a treatment of the Hastings ratio for general discrete-time Markov chains, since in stochastic geometry one frequently deals with chains on quite complicated state spaces. Here is the key detailed balance calculation in general form: compare the “fluxes” π( dx)p(x, dy) and π( dy)p(y, dx). Both are absolutely continuous with respect to their sum: set Λ(y, x) Λ(y, x) + 1

=

π( dx)p(x, dy) π( dx)p(x, dy) + π( dy)p(y, dx)

defining a possibly infinite Λ(y, x) up to nullsets of π( dx)p(x, dy)+π( dy)p(y, dx). Now Λ(y, x) is positive off a null-set N1 of π( dx)p(x, dy) and finite off a null-set N2 of π( dy)p(y, dx). Moreover N1 and N2 can be chosen as transposes of each other under the symmetry (x, y) ↔ (y, x), because they are defined in terms of whether Λ(y, x)/(Λ(y, x) + 1) is 0 or 1, and so we may arrange1 for

Go Back

Λ(x, y) Λ(y, x) + Λ(y, x) + 1 Λ(x, y) + 1

Full Screen

=

1.

Close

Quit

1 working

to null-sets of π( dx)p(x, dy) + π( dy)p(y, dx) throughout!

Applying simple algebra, we deduce Λ(y, x)Λ(x, y) = 1 off N1 ∪ N2 . We know Home Page

Title Page

0 < Λ(y, x), Λ(x, y) < ∞ so

Contents

= II

J

I

Page 52 of 163

Go Back

Λ(y, x) Λ(y, x) + 1

Λ(y, x) Λ(x, y) × Λ(y, x) + 1 Λ(x, y) Λ(x, y) min {Λ(y, x), Λ(y, x)Λ(x, y)} (Λ(y, x) + 1)Λ(x, y) Λ(x, y) = min {Λ(y, x), 1} (2.1) Λ(x, y) + 1

min {1, Λ(x, y)}

JJ

off N1 ∪ N2 ,

=

min {1, Λ(x, y)}

and therefore min {1, Λ(x, y)} π( dx)p(x, dy) = min {1, Λ(y, x)} π( dy)p(y, dx) . The kernel min{1, Λ(x, y)}p(x, dy) may have a deficit (integral over y less than one): compensate by adding an atom   Z 1 − min{1, Λ(x, u)}p(x, du) δx ( dy)

Full Screen

Close

Quit

to obtain a reversible kernel (actually a Metropolis-Hastings kernel!)   Z 1 − min{1, Λ(x, u)}p(x, du) δx ( dy) + min{1, Λ(x, y)}p(x, dy)

Home Page

Title Page

In case π( dx)p(x, dy) and π( dy)p(y, dx) are absolutely continuous with respect to each other, π( dy)p(y, dx) Λ(x, y) = (2.2) π( dx)p(x, dy) is exactly the Hastings ratio. Otherwise it arises via the Metropolis map L1 projection arguments of Billera and Diaconis (2001), measurable case.

Contents

JJ

II

J

I

Page 53 of 163

Go Back

Full Screen

Close

Quit

We note in passing that the Hastings ratio (as opposed to other possible accept/reject rules) yields the largest spectral gap and minimizes the asymptotic variance of additive functionals (Peskun 1973; Tierney 1998).

2.3. MCMC for Markov point processes Home Page

Title Page

Contents

JJ

II

J

I

We now apply the above to simulation of a point process in a bounded window, whose distribution2 has density f (ξ) with respect to a Poisson point process measure π1 ( dξ) of unit intensity (Geyer and Møller 1994; Green 1995). We need to design a point-pattern-valued Markov chain whose limiting distribution will be f (ξ)π1 ( dξ), and we will constrain the Markov chain to evolve solely by adding and deleting points. Let λ(x; ξ) = f (ξ ∪ {x})/f (ξ) be the Papangelou conditional intensity for the target point process, as defined in Equation (1.15). Suppose the point process satisfies some condition such as local stability, namely λ(x; ξ)

Page 54 of 163

Go Back



γ

(2.3)

for some positive constant γ. We can weaken this: we need require only that Z λ(u; ξ) du


0. Write down the simulation algorithm. Check that the resulting kernel is reversible with respect to πβ , the distribution measure of the Poisson point process of rate β.

There is a very easy approach: recognize that there can be no spatial interaction, so it suffices to set up detailed balance equations for the Markov chain which counts the total number of points . . . .

2.3.2. Continuous Time case Home Page

Title Page

Contents

JJ

II

J

I

Page 58 of 163

In the treatment above it is tempting to make αn as large as possible, to maximize the rate of change per time-step. However we can also consider the limiting case of a continuous-time model and reformulate the simulation in event-based terms, thus producing a spatial birth-and-death process. For small δ, set αn = (n + 1)δ for n to maintain Equation (2.4). (For larger n use smaller αn . . . .) Then • suppose the current value of X is X = ξ. While #(ξ) is not too large, choose from: – Death: with probability #(ξ)δ, choose a point y of ξ at random and delete it from ξ. – Birth: generate a new point x with sub-probability density λ(x; ξ)δ, and add it to ξ (in case of probability deficit, do nothing!); Taking the limit as δ → 0, and reformulating as a continuous time Markov chain:

Go Back

Full Screen

Close

Quit

• suppose the current value of X is X = ξ: R • wait an Exponential(#(ξ) + W λ(x; ξ) dx) random time then R – Death: with probability #(ξ)/(#(ξ)+ W λ(x; ξ) dx), choose a point y of ξ at random and delete it from ξ; – Birth: otherwise generate a new point x with probability density proportional to λ(x; ξ), and add it to ξ.

Home Page

Exercise 2.7 Verify that the above algorithm produces an invariant distribution with Papangelou conditional intensity λ(x; ξ).

Title Page

Contents

JJ

II

J

I

Exercise 2.8 Determine a spatial birth-and-death process with invariant distribution which is a Strauss point process.

Example of Strauss process target.

Page 59 of 163

Exercise 2.9 Determine a spatial birth-and-death process with invariant distribution which is an area-interaction point process. Go Back

Full Screen

Close

Quit

Note that the technique underlying these exercises suggests the following: a local stability bound λ(x; ξ) ≤ M implies that the point process can be expressed as some complicated and dependent thinning of a Poisson point process of intensity M . We shall substantiate this in Chapter 3, when we come to discuss perfect simulation for point processes.

2.4. Uniqueness and convergence Home Page

Title Page

Contents

JJ

II

J

I

Page 60 of 163

Go Back

Full Screen

Close

Quit

We have been careful to describe the target distributions as invariant distributions rather than equilibrium distributions, because we have not yet introduced useful methods to determine whether our chains actually converge to their invariant distributions! As we have already hinted in Chapter 1, this can be a problem; indeed one can write down a Markov chain which purports to have the attractive case Strauss point process as invariant distribution, even though this point process is not well-defined! The full story here involves discussions of φ-recurrence and Harris recurrence, which we reluctantly omit for reasons of time. Fortunately there is a relatively simple technique which often works to ensure uniqueness of and convergence to an invariant distribution, once existence is established. Suppose we can ensure that the chain X visits the empty-pattern state infinitely often, and with finite mean recurrence time (the empty-pattern state ∅ is an ergodic atom). Then the coupling approach to renewal theory (Lindvall 1982; Lindvall 2002) can be applied to show that the distribution of Xn converges in total variation to its invariant distribution, which therefore must be unique. This pleasantly low-tech approach foreshadows the discussion of small sets later in this chapter, and also anticipates important techniques in perfect simulation.

2.5. The MCMC zoo Home Page

Title Page

Contents

JJ

II

J

I

Page 61 of 163

Go Back

Full Screen

Close

Quit

The attraction of MCMC is that it allows us to produce simulation algorithms for all sorts of point processes and geometric processes. It is helpful however to have in mind a general categorization: a kind of “zoo” of different MCMC algorithms. • Here are some important special cases of the MH sampler classified by type of proposal distribution. Let the target distribution be π (could be Bayesian posterior . . . ). – Independence sampler: Choose q(x, y) = q(y) (independent of current location x) which you hope is “well-matched” to π; – Random walk sampler: Choose q(x, y) = q(y − x) to be a random walk increment based on current location x. Tails of target distribution π must not be too heavy and (eg) density contours not too sharply curved (Roberts and Tweedie 1996b); – Langevin algorithms (Also called, “Metropolis-adjusted Langevin”): Choose q(x, y) = q(y − σ 2 g(x)/2) where g(x) is formed using x and the “logarithmic gradient” of the density of π( dx) at x (possibly adjusted) (σ 2 is the jump variance). Behaves badly if logarithmic gradient tends to zero at infinity; a condition requiring something of the order “tails must not be too light” (Roberts and Tweedie 1996a).

Home Page

• Slice sampler: task is to draw from density f (x). Here is a one-dimensional example! (But note, this is only interesting because it can be made to work in many dimensions . . . ) Suppose f unimodal. Define g0(y), g1(y) implicitly by the requirement that [g0(y), g1(y)] is the superlevel set {x : f (x) ≥ y}.

Title Page

Contents

JJ

II

J

I

def mcmc slice move (x) : y ← uniform (0, f (x)) return uniform (g0 (y), g1 (y))

Page 62 of 163

Go Back

Figure 2.1: Slice sampler Full Screen

Close

Quit

Roberts and Rosenthal (1999, Theorem 12) show rapid convergence (order of 530 iterations!) under a specific variation of log-concavity. Relates to notion of “auxiliary variables”.

• Simulated tempering: Home Page

Title Page

– Embed target distribution in a family of distributions indexed by some “temperature” parameter. For example, embed density f (x) (defined over bounded region for x) in family of normalized

Contents

R

(f (x))1/τ ; (f (u))1/τ du

JJ

II

– Choose weights β0 , β1 , . . . , βk for selected temperatures τ0 = 1 < τ1 < . . . τk ;

J

I

– Now do Metropolis-Hastings for target density

Page 63 of 163

βo (f (x))1/τ0 = β0 f (x) Z β1 (f (x))1/τ1 / (f (u))1/τ1 du

at level 0 at level 1 ...

Go Back 1/τk

βk (f (x))

Z /

1/τk

(f (u))

du

at level k

Full Screen

where moves either propose changes in x, or propose changes in level i; (NB: choice of βi ?) Close

Quit

– Finally, sample chain only when it is at level 0.

Home Page

Title Page

Contents

JJ

II

J

I

Page 64 of 163

• Markov-deterministic hybrid: Need to sample exp(−E(q))/Z. Introduce “momenta” p and aim to draw from   1 π(p, q) = exp −E(q) − |p|2 /Z 0 . 2 Fact: for Hamiltonian H(p, q) = E(q) + 12 |p|2 , the dynamical system dq dp

= (∂H/∂p) dt = p = −(∂H/∂q) dt = −(∂E/∂q)

has π(p, q) as invariant distribution (Liouville’s theorem). So alternate between (a) evolving using dynamics and (b) drawing from Gaussian marginal for p. Discretization means (a) is not exact; can fix this by use of MH rejection techniques. Cf : dynamic reversibility.

Go Back

Full Screen

Close

Quit

Home Page

Title Page

• Dynamic importance weighting (Liu, Liang, and Wong 2001): augment Markov chain X by adding an R importance parameter W , replace search for equilibrium (E [θ(Xn )] → θ(x)π( dx)) by requirement Z lim E [Wn θ(Xn , Wn )] ∝ θ(x)π( dx) . n→∞

Contents

JJ

II

J

I

Page 65 of 163

Go Back

Full Screen

Close

Quit

• Evolutionary Monte Carlo Liang and Wong (2000, 2001): use ideas of genetic algorithms (multiple samples, “exchanging” sub-components) and tempering. • and many more ideas besides . . . .

2.6. Speed of convergence Home Page

There is of course an issue as to how fast is the convergence to equilibrium. This can be analytically challenging (Diaconis 1988)! Title Page

2.6.1. Convergence and symmetry: card shuffling Contents

JJ

II

J

I

Page 66 of 163

Example: transposition random walk X on permutation group of n cards. Equilibrium occurs at strong uniform times T (Aldous and Diaconis 1987): Definition 2.10 The random stopping time T is a strong uniform time for the Markov chain X (whose equilibrium distribution is π) if P [T = k, Xk = s]

=

πs × P [T = k] .

Broder: notion of checked cards. Transpose by choosing 2 cards at random; LH, RH. Check card pointed to by LH if either LH, RH are the same unchecked card;

Go Back

or LH is unchecked, RH is checked. Full Screen

Close

Inductive claim: given number of checked cards, positions in pack of checked cards, list of values of cards; the map determining which checked card has which value is uniformly random. Pn−1 Set Tm to be the time till m cards checked. The independent sum T = m=0 Tm is a strong uniform time by induction. How big is T ?

Quit

Home Page

Title Page

Now Tm+1 −Tm is Geometrically distributed with success probability (n−m)(m+ 1)/n2 . So mean of T is # n−1 " n−1  X n2  1 X 1 Tm = + ≈ 2n log n . E n+1 n−m m+1 m=0 m=0

Contents

JJ

II

J

I

Pn−1 MOREOVER: T = m=0 Tm is a discrete version of time to complete infection for simple epidemic (without recovery) in n individuals, starting with 1 infective, individuals contacting at rate n2 . A classic calculation tells us T /(2n log n) has a limiting distribution. NOTE: group representation theory: correct asymptotic is 12 n log n.

Page 67 of 163

Go Back

Full Screen

Close

Quit

The MCMC approach to substitution decoding also produces a Markov chain moving by transpositions on a permutation group of around n = 60 symbols. However it takes much longer to reach equilibrium than order of 60 log(60) ≈ 250 transpositions. Conditioning on data can have a drastic effect on convergence rates!

2.6.2. Convergence and small sets Home Page

Title Page

Contents

JJ

II

J

I

The most accessible theory (as expounded for example in Meyn and Tweedie 1993) uses the notion of small sets for Markov chains. In effect, small set theory allows us to argue as if (suitably regular) Markov chains are actually discrete, even if defined on very general state-spaces. In fact this discreteness can even be formalized as the existence of a latent discrete structure.

Definition 2.11 C is a small set (of order k) if there is a positive integer k, positive c and a probability measure µ such that for all x ∈ X and E ⊆ X p(k) (x, E)

Page 68 of 163

Go Back

Full Screen

Close

Quit



c × I [x ∈ C] µ(E) .

Markov chains can be viewed as regenerating at small sets. The notion is useful for recurrence conditions of Lyapunov type: compare low-tech “empty pattern” method. Small sets play a crucial rˆole in notions of periodicity, recurrence, positive recurrence, geometric ergodicity, making link to discrete state-space theory (Meyn and Tweedie 1993).

Home Page

Foster-Lyapunov recurrence on small sets (Mykland, Tierney, and Yu 1995; Roberts and Tweedie 1996a; Roberts and Tweedie 1996b; Roberts and Rosenthal 2000; Roberts and Rosenthal 1999; Roberts and Polson 1994)

Title Page

Contents

JJ

II

J

I

Page 69 of 163

Go Back

Full Screen

Theorem 2.12 (Meyn and Tweedie 1993) Positive recurrence holds if one can find a small set C, a constant β > 0, and a non-negative function V bounded on C such that for all n > 0 E [V (Xn+1 )|Xn ]

V (Xn ) − 1 + β I [Xn ∈ C] .

(2.6)

Proof: Let N be the random time at which X first (re-)visits C. It suffices3 to show E [N |X0 ] < V (X0 ) + constant < ∞ (then use small-set regeneration). By iteration of (2.6), we deduce E [V (Xn )|X0 ] < ∞ for all n. If X0 6∈ C then (2.6)  tells us n 7→ V (Xn∧N ) + n ∧ N defines a nonnegative supermartingale (I X(n∧N ) ∈ C = 0 if n < N ) . Consequently E [N |X0 ]

Close

Quit



3 This



E [V (XN ) + N |X0 ]



V (X0 ) .

martingale approach can be reformulated as an application of Dynkin’s formula.

If X0 ∈ C then the above can be used to show Home Page

Title Page

Contents

JJ

II

J

I

Page 70 of 163

Go Back

Full Screen

Close

Quit

E [N |X0 ]

= E [1 × I [X1 ∈ C]|X0 ] + E [E [N |X1 ] I [X1 6∈ C]|X0 ] ≤ P [X1 ∈ C|X0 ] + E [1 + V (X1 )|X0 ] ≤ P [X1 ∈ C|X0 ] + V (X0 ) + β

where the last step uses Inequality (2.6) applied when I [Xn−1 ∈ C] = 1.



This idea can be extended to give bounds on convergence rates. Home Page

Title Page

Contents

JJ

II

J

I

Theorem 2.13 (Meyn and Tweedie 1993) Geometric ergodicity holds if one can find a small set C, positive constants λ < 1, β, and a function V ≥ 1 bounded on C such that E [V (Xn+1 )|Xn ]



λV (Xn ) + β I [Xn ∈ C] .

(2.7)

Proof: Define N as in Theorem 2.12. Iterating (2.7), we deduce E [V (Xn )|X0 ] < ∞ and more specifically n

7→

V (Xn∧N )/λn∧N

Page 71 of 163

Go Back

Full Screen

is a nonnegative supermartingale. Consequently   E V (XN )/λN |X0 ≤

V (X0 ) .

Using the facts that V ≥ 1, λ ∈ (0, 1) and Markov’s inequality we deduce P [N > n|X0 ]



λn V (X0 ) ,

Close

which delivers the required geometric ergodicity. Quit



Home Page

Title Page

Contents

JJ

Bounds may not be tight, but there are many small sets for most Markov chains (not just the trivial singleton sets, either!). In fact small sets can be used to show: Theorem 2.14 (Kendall and Montana 2002) If the Markov chain has a measurable transition density p(x, y) then the two-step density p(2) (x, y) can be expressed (non-uniquely) as a non-negative countable sum X p(2) (x, y) = fi (x)gi (y) .

II

Hence in such cases small sets of order 2 abound! J

I

Page 72 of 163

Go Back

Full Screen

Close

Quit

This relates to the classical theory of small sets and would not surprise its originators; see for example Nummelin 1984.

Home Page

Proof: Key Lemma, variation of Egoroff’s Theorem: Let p(x, y) be an integrable function on [0, 1]2 . Then we can find subsets Aε ⊂ [0, 1], increasing as ε decreases, such that

Title Page

Contents

JJ

II

J

I

Page 73 of 163

Go Back

Full Screen

Close

Quit

(a) for any fixed Aε the “L1 -valued function” px is uniformly continuous on Aε : for any η > 0 we can find δ > 0 such that |x − x0 | < δ and x, x0 ∈ Aε implies Z 1 |px (z) − px0 (z)| dz < η ; 0

(b) every point x in Aε is of full relative density: as u, v → 0 so Leb([x − u, x + v] ∩ Aε ) u+v



1. 

Moreover one can discover latent discrete Markov chains representing all such Markov chains at period 2.

However one cannot get below order 2: Home Page

Title Page

Contents

JJ

II

J

I

Theorem 2.15 (Kendall and Montana 2002) There are Markov chains which possess no small sets of order 1. Proof: Key Theorem: There exist Borel measurable subsets E ⊂ [0, 1]2 of positive area which are rectangle-free, so that if A × B ⊆ E then Leb(A × B) = 0. (Stirling’s formula, discrete approximations, Borel-Cantelli.) 

Page 74 of 163

Go Back

Full Screen

Close

Quit

Figure 2.2: No small sets!

Home Page

Title Page

Contents

JJ

II

J

I

Chapter 3

Perfect simulation

Page 75 of 163

Contents Go Back

Full Screen

Close

Quit

3.1 3.2 3.3

CFTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fill’s (1998) method . . . . . . . . . . . . . . . . . . . . . Sundry other matters . . . . . . . . . . . . . . . . . . . .

77 92 94

Home Page

Title Page

Contents

JJ

II

J

I

Page 76 of 163

Go Back

Full Screen

Close

Quit

Establishing bounds on convergence to equilibrium is important if we are to get a sense of how to do MCMC. It would be very useful if one could finesse the whole issue by modifying the MCMC algorithm so as to produce an exact draw from equilibrium, and this is precisely the idea of perfect simulation. A graphic illustration of this in the context of stochastic geometry (specifically, the dead leaves model) is to be found at http://www.warwick.ac.uk/statsdept/staff/WSK/dead.html.

3.1. CFTP Home Page

Title Page

Contents

JJ

II

J

I

Page 77 of 163

Go Back

Full Screen

Close

Quit

There are two main approaches to perfect simulation. One is Coupling from the Past (CFTP), introduced by Propp and Wilson (1996). Many variations have arisen: we will describe the original Classic CFTP, and two variants Small-set CFTP and Dominated CFTP. Part of the fascination of this area is the interplay between theory and actual implementation: we will present illustrative simulations.

3.1.1. Classic CFTP Home Page

Title Page

Consider reflecting random walk run From The Past (Propp and Wilson 1996): • no fixed start: run all starts simultaneously (Coupling From The Past); • extend past till all starts coalesce by t = 0;

Contents

JJ

II

J

I

• 3-line proof: perfect draw from equilibrium.

Page 78 of 163

Go Back

Figure 3.1: Classic CFTP. Implementation: - “re-use” randomness;

Full Screen

Close

- sample at t = 0 not at coalescence; - control coupling efficiently using monotonicity, boundedness; - uniformly ergodic.

Quit

Here is the 3-line proof, with a couple of extra lines to help explain! Home Page

Title Page

Theorem 3.1 If coalescence is almost sure then CFTP delivers a sample from the equilibrium distribution of the Markov chain X corresponding to the random input-output maps F(−u,v] .

Contents

JJ

II

J

I

Proof: For each time-range [−n, ∞) use input-output maps F(−n,t] to define Xt−n

Page 79 of 163

=

F(−n,t] (0)

for − n ≤ t .

We have assumed finite coalescence time −T for F . Then X0−n L(X0−n )

= =

X0−T L(Xn0 )

whenever − n ≤ −T ;

(3.1) (3.2)

Go Back

If X converges to an equilibrium π then Full Screen

Close

Quit

dist(L(X0−T ), π)

lim dist(L(X0−n ), π) = lim dist(L(Xn0 ), π) = 0 n n (3.3) (dist is total variation) hence giving the required result.  =

Home Page

Title Page

Here is a simple version of perfect simulation applied to the Ising model, following Propp and Wilson (1996) (but they do something much cleverer!). Here the probability of a given ±1-valued configuration S is proportional to  XX  Sx Sy exp β x∼y

Contents

JJ

II

J

I

Page 80 of 163

Go Back

Full Screen

Close

Quit

and our recipe is, pick a site x, compute conditional probability of Sx = +1 given rest of configuration S, P X exp(β y:y∼x (-1) × Sy ) P ρ = = exp(−2β Sy ) , exp(β y:y∼x (1) × Sy ) y:y∼x and set Sx = +1 with probability ρ, else set Sx = −1. Boundary conditions are “free” in the simulation: for small β we can make a perfect draw from Ising model over infinite lattice, by coupling in space as well as time (Kendall 1997b; H¨aggstr¨om and Steif 2000). We can do better for Ising: Huber (2003) shows how to convert Swendsen-Wang methods to CFTP.

3.1.2. A little history Home Page

Title Page

Contents

JJ

II

J

I

Kolmogorov (1936): 1 Consider an inhomogeneous (finite state) Markov transition kernel Pij (s, t). Can it be represented as the kernel of a Markov chain begun in the indefinite past? Yes! Apply diagonalization arguments to select a convergent subsequence from the probability systems arising from progressively earlier and earlier starts: P [X(0) = k] P [X(−1) = j, X(0) = k] P [X(−2) = i, X(−1) = j, X(0) = k]

= = = ...

Page 81 of 163

Go Back

Full Screen

Close

Quit

1 Thanks

to Thorisson (2000) for this reference . . . .

Qk (0) Qj (−1)Pjk (−1, 0) Qi (−2)Pij (−2, −1)Pjk (−1, 0)

3.1.3. Dead Leaves Home Page

Title Page

Nearly the simplest possible example of perfect simulation. • Dead leaves falling in the forests at Fontainebleau; • Equilibrium distribution models complex images;

Contents

• Computationally amenable even in classic MCMC framework; JJ

II

J

I

• But can be sampled perfectly with no extra effort! (And uniformly ergodic) How can this be done? see Kendall and Th¨onnes (1999) [HINT: try to think non-anthropomorphically . . . .] Occlusion CFTP. Related work: Jeulin (1997), Lee, Mumford, and Huang (2001).

Page 82 of 163

Go Back

Full Screen

Close

Figure 3.2: Falling leaves of perfection. Quit

3.1.4. Small-set CFTP Home Page

Does CFTP depend on monotonicity? No: we can use small sets. Consider a simple Markov chain on [0, 1] with triangular kernel.

Title Page

Contents

JJ

II

J

I

Figure 3.3: Small-set CFTP.

Page 83 of 163

• Note “fixed” small triangle; Go Back

• this can couple Markov chain steps; Full Screen

• basis of small-set CFTP (Murdoch and Green 1998). Close

Quit

Extends to more general cases (“partitioned multi-gamma sampler”). Uniformly ergodic.

Home Page

Title Page

Exercise 3.2 Write out the details of small-set CFTP. Deduce how to represent the output of simple small-set CFTP (uses just one small set) in terms of a Geometric random variable and draws from a single kernel.

Contents

JJ

II

J

I

Page 84 of 163

Go Back

Full Screen

Close

Quit

Exercise 3.3 One small set is usually not enough - coalescence takes much too long! Work out how to use several small sets to do partitioned small-set CFTP (“partitioned multi-gamma sampler”) (Murdoch and Green 1998).

3.1.5. Point patterns Home Page

Title Page

Recall Widom-Rowlinson model (Widom and Rowlinson 1970) from physics, “area-interaction” model in statistics (Baddeley and van Lieshout 1995). The point pattern X has density (with respect to Poisson process): exp (− log(γ) × (area of union of discs centred on X))

Contents

JJ

II

J

I

Page 85 of 163

Figure 3.4: Perfect area-interaction. Go Back

• Represent as two-type pattern (red crosses, blue disks); Full Screen

Close

Quit

• inspect structure to detect monotonicity; • attractive case γ > 1: CFTP uses “bounded” state-space (H¨aggstr¨om et al. 1999) in sense of being uniformly ergodic. All cases can be dealt with using Dominated CFTP (Kendall 1998).

3.1.6. Dominated CFTP Home Page

Title Page

Contents

JJ

II

J

I

Foss and Tweedie (Foss and Tweedie 1998) show classic CFTP is essentially equivalent to uniform ergodicity (rate of convergence to equilibrium does not depend on start-point). For classic CFTP needs vertical coalescence: every possible start from time −T leads to same result at time 0. Can we lift this uniform ergodicity requirement?

Page 86 of 163

Go Back

Figure 3.5: Horizontal CFTP. Full Screen

Close

Quit

CFTP almost works with horizontal coalescence: all sufficiently early starts from a specific location * lead to same result at time 0. But how to identify when this has happened?

Home Page

Recall Lindley’s equation for the GI/G/1 queue identity Wn+1

Title Page

Contents

JJ

II

=

I

Page 87 of 163

Go Back

Full Screen

Close

Quit

=

max{0, Wn + ηn }

and its development Wn+1

max{0, ηn , ηn + ηn−1 , . . . , ηn + ηn−1 + . . . + η1 }

=

leading by time-reversal to Wn+1

J

max{0, Wn + Sn − Xn+1 }

=

max{0, η1 , η1 + η2 , . . . , η1 + η2 + . . . + ηn }

and thus the steady-state expression W∞

=

max{0, η1 , η1 + η2 , . . .} .

Loynes (1962) discovered a coupling application to queues with general inputs, which pre-figures CFTP.

Home Page

Title Page

Contents

JJ

II

J

I

Dominated CFTP idea: Generate target chains using a dominating process for which equilibrium is known. Domination allows us to identify horizontal coalescence by checking starts from maxima given by the dominating process. Thus we can apply CFTP to Markov chains which are merely geometrically ergodic (Kendall 1998; Kendall and Møller 2000; Kendall and Th¨onnes 1999; Cai and Kendall 2002) or worse (geometric ergodicity 6= Dominated CFTP!).

Page 88 of 163

Go Back

Full Screen

(a) Perfect Strauss point process. (b) Perfect conditioned Boolean model.

Close

Figure 3.6: Two examples of non-uniformly ergodic CFTP.

Quit

Home Page

Description2 of dominated CFTP (Kendall 1998; Kendall and Møller 2000) for birth-death algorithm for point process satisfying the local stability condition λ(x; ξ)



γ.

Title Page

Contents

JJ

II

J

I

Original RAlgorithm: points x die at unitRdeath rate, are born in pattern ξ at total birth rate W λ(u; ξ) du, density λ(x; ξ)/ W λ(u; ξ) du, so local birth rate λ(x; ξ). Perfect modification: underlying supply of randomness is birth-and-death process with unit death rate, uniform birth rate γ per unit area. Mark points independently with (example) Uniform(0, 1) marks. Define lower and upper processes L and U as exhibiting minimal and maximal birth rates available in range of configurations between L and U : minimal birth rate at x = maximal birth rate at x =

Page 89 of 163

Go Back

min{λ(x; ξ) : L ⊆ ξ ⊆ U } , max{λ(x; ξ) : L ⊆ ξ ⊆ U } .

Use marks to do CFTP till current iteration delivers L(0) = U (0), then return the agreed configuration! Example uses Poisson point patterns as marks for areainteraction, following Kendall (1997a).

Full Screen

“It’s a thinning procedure, Jim, but not as we know it.” Dr McCoy, Startrek, stardate unknown.

Close

Quit

2 Recall

remark about dependent thinning!

Home Page

All this suggests a general formulation: Let X be a Markov chain on X which exhibits equilibrium behaviour. Embed the state-space X in a partially ordered space (Y, ) so that X is at bottom of Y, in the sense that for any y ∈ Y, x ∈ X ,

Title Page

yx Contents

JJ

II

J

I

implies

y = x.

We may then use Theorem 3.1 to show: Theorem 3.4 Define a Markov chain Y on Y such that Y evolves as X after it hits X ; let Y (−u, t) be the value at t a version of Y begun at time −u, (a) of fixed initial distribution L(Y (−T, −T )) = L(Y (0, 0)), and

Page 90 of 163

Go Back

Full Screen

Close

Quit

(b) obeying funnelling: if −v ≤ −u ≤ t then Y (−v, t)  Y (−u, t). Suppose coalescence occurs: P [Y (−T, 0) ∈ X ] → 1 as T → ∞. Then lim Y (−T, 0) can be used for a CFTP draw from the equilibrium of X.

Home Page

So how far can we go? It looks nearly possible to define the dominating process using the “Liapunov function” V of Theorem 2.13. There is an interesting link – Rosenthal (2002) draws it even closer – nevertheless:

Title Page

Existence of Liapunov function doesn’t ensure dominated CFTP! The relevant Equation (2.7) is:

Contents

E [V (Xn+1 )|Xn ] JJ

II

J

I

Page 91 of 163

Go Back

Full Screen

Close

Quit



λV (Xn ) + β I [Xn ∈ C] .

However one can cook up perverse examples in which this supermartingale inequality is satisfied, but there is bad failure of the stochastic dominance required to use V (X) for dominated CFTP . . . :-(.

3.2. Fill’s (1998) method Home Page

Title Page

Contents

JJ

II

J

I

The alternative to CFTP is Fill’s algorithm, which at first sight appears quite different (though we here describe the beautiful work of Fill et al. 2000, which shows they are profoundly linked). The underlying notion is that of a strong uniform time T , which we have already encountered. P [T = t, X(t) = s]

=

π(s) × P [T = t]

Fill’s method is best explained using “blocks” to model input-output maps representing a Markov chain.

Page 92 of 163

Go Back

Figure 3.7: The “block” model for input-output maps. First recall that CFTP can be viewed in a curiously redundant fashion as follows:

Full Screen

Close

• Draw from equilibrium X(−T ) and run forwards; • continue to increase T until X(0) is coalesced; • return X(0).

Quit

Key observation: By construction, X(−T ) is independent of X(0) and T so . . . Home Page

• Condition on a convenient X(0); • Run X backwards to a fixed time −T ;

Title Page

• Draw blocks conditioned on the X transitions; Contents

JJ

II

J

I

Page 93 of 163

Go Back

• If coalescence then return X(−T ) else repeat.

Figure 3.8: Conditioning the blocks on the reversed chain. See Fill (1998), Th¨onnes (1999), Fill et al. (2000). Exercise 3.5 Can you figure out how to do a dominated version of Fill’s method?

Full Screen

Close

Quit

Exercise 3.6 Produce a perfect version of the slice sampler! (Mira, Møller, and Roberts 2001)

3.3. Sundry other matters Home Page

Title Page

Contents

JJ

II

J

I

Page 94 of 163

Go Back

Layered Multishift: Wilson (2000b) and further work by Corcoran and others. Basic idea: how to draw simultaneously from Uniform(x, x + 1) for all x ∈ R, and to try to couple the draws? Answer: draw a uniformly random unit span integer lattice, . . . . 3 Exercise 3.7 Figure out how to replace “Uniform(x, x + 1)” by any other location-shifted family of continuous densities. (Hint: mixtures of uniforms . . . .) Montana used these ideas to do perfect simulation of nonlinear AR(1) in his PhD thesis.

Full Screen

Close

Quit

3 Uniformly

random lattices in dimension 2 play a part in line process theory.

Home Page

Title Page

Read-once randomness: Wilson (2000a). The clue is in: Exercise 3.8 Figure out how to do CFTP, using the “input-output” blocks picture of Figure 3.7, but under the constraint that you aren’t allowed to save any blocks!

Contents

JJ

II

J

I

Page 95 of 163

Perfect simulation for Peirls’ contours (Ferrari et al. 2002). The Ising model can be reformulated in an important way by looking only at the contours (lines separating ±1 values). In fact these form a “non-interacting hard-core gas”, thus permitting (at least in theory) Ferrari et al. (2002) to apply their variant of perfect simulation (Backwards-Forwards Algorithm)! Compare also Propp and Wilson (1996).

Go Back

Perfect simulated tempering (Møller and Nicholls 1999; Brooks et al. 2002). Full Screen

Close

Quit

3.3.1. Efficiency Home Page

Title Page

Contents

JJ

II

J

I

Page 96 of 163

Go Back

Burdzy and Kendall (2000): Coupling of couplings:. . . |pt (x1 , y) − pt (x2 , y)| ≤ ≤ |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]| ≤ |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]| × ×P [τ > t|X(0) = (x1 , x2 )] Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c exp(−µ2 t) while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt) Let X ∗ be a coupled copy of X but begun at (x2 , x1 ): |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ] | = |P [X1 (t) = y|τ > t, X(0) = (x1 , x2 )] − P [X1∗ (t) = y|τ > t, X ∗ (0) = (x2 , x1 )] | So let σ be time when X, X ∗ couple:

Full Screen

≤ P [σ > t|τ > t, X(0) = (x1 , x2 )] (≈ c exp(−µ0 t))

Close

Quit

Thus µ2 ≥ µ0 + µ. See also Kumar and Ramesh (2001).

3.3.2. Example: perfect epidemic Home Page

Title Page

Context • many models: SIR, SEIR, Reed-Frost • data: knowledge about removals

Contents

JJ

II

J

I

• draw inferences about other aspects – infection times – infectivity parameter – removal rate – ...

Page 97 of 163

Perfect MCMC for Reed-Frost model Go Back

• based on explicit formula for size of epidemic • (O’Neill 2001; Clancy and O’Neill 2002)

Full Screen

Close

Quit

Question Home Page

Title Page

Can we apply perfect simulation to SIR epidemic conditioned on removals? • IDEA – SIR → compartment model

Contents

JJ

II

J

I

Page 98 of 163

Go Back

Full Screen

Close

Quit

– conditioning on removals is reminiscent of coverage conditioning for Boolean model • consider SIR-epidemic-valued process, considered as event-driven simulation, driven by birth-and-event processes producing infection and removal events . . . • then use reversibility, perpetuation, crossover to derive perfect simulation algorithm for conditioned case.

• PRO: conditioning on removals appears to make structure “monotonic” Home Page

Title Page

• CON: Perpetuation and nonlinear infectivity rate combine to make funnelling difficult • So: attach independent streams of infection proposals to each level k = I + R to “localize” perpetuation

Contents

• Speed-up by cycling through: JJ

II

J

I

– replace all unobserved marked removal proposals (thresholds) – try to alter all observed removal marks (perpetuate . . . )

Page 99 of 163

Go Back

Full Screen

Close

Quit

– replace all infection proposals (perpetuate . . . ) (Reminiscent of H-vL-M method for perfect simulation of attractive areainteraction point process)

Consistent theoretical framework Home Page

Title Page

Contents

JJ

II

J

I

View Poisson processes in time as 0 : 1-valued processes over “infinitesimal pixels”:       infinitesimal infinitesimal P X = 1 = λ × measure pixel pixel • replace thresholds: work sequentially through “infinitesimal pixels”, reject replacement if it violates conditioning • replace removal marks: can incorporate into above step • replace infection proposals: work sequentially through “infinitesimal pixels” first at level 1, then level 2, then level 3 . . . (TV scan); reject replacement if it violates conditioning.

Page 100 of 163

Theorem 3.9 Resulting construction is monotonic (so enables CFTP) Go Back

Full Screen

Close

Proof: Establish monotonicity for • fastest path of infection (what is observed) • slowest feasible path of infection (what is used for perpetuation) 

Quit

Implementation: Home Page

Title Page

It works! • speed-up? • scaling?

Contents

JJ

II

J

I

Page 101 of 163

Go Back

Full Screen

Close

Quit

• refine? • generalize? Applications • “omnithermal” variant to produce estimate of likelihood surface • generalization using crossover to produce full Bayesian analysis

Home Page

Title Page

Contents

JJ

II

J

I

Page 102 of 163

Go Back

Full Screen

Close

Quit

Home Page

Title Page

Contents

JJ

II

J

I

Page 103 of 163

Chapter 4

Deformable templates and vascular trees

Go Back

Contents Full Screen

Close

Quit

4.1 4.2 4.3

MCMC and statistical image analysis . . . . . . . . . . . . 104 Deformable templates . . . . . . . . . . . . . . . . . . . . 106 Vascular systems . . . . . . . . . . . . . . . . . . . . . . . 107

4.1. MCMC and statistical image analysis Home Page

Title Page

How to use MCMC on images? Huge literature, beginning with Geman and Geman (1984). Simple example: reconstruct white/black ±1 image: observation Xij = ±1, “true image” Sij = ±1, X|S independent

Contents

P [Xij = xij |Sij ] JJ

II

J

I

Full Screen

Close

exp (φSij xij ) cosh(φ)

(i,j)∼(i0 ,j 0 )

Posterior for S|X proportional to  X exp β (i,j)∼(i0 ,j 0 )

 Sij Si0 j 0 + φ

X

Xij Sij 

ij

(Ising model with inhomogeneous external magnetic field φX), and we can sample from this using MCMC algorithms. Easy CFTP for this example

Quit

2

while prior for S is Ising model, probability mass function proportional to   X exp β Sij Si0 j 0 

Page 104 of 163

Go Back

=

Home Page

“Low-level image analysis”: contrast “high-level image analysis”. For example, pixel array {(i, j)} representing small squares in a window [0, 1]2 , and replace S by a Boolean model (Baddeley and van Lieshout 1993).

Title Page

Contents

JJ

II

J

I

Page 105 of 163

Go Back

Full Screen

Close

Quit

Figure 4.1: Scheme for high-level image analysis. van Lieshout and Baddeley (2001), Loizeaux and McKeague (2001) provide perfect examples.

4.2. Deformable templates Home Page

Grenander and Miller (1994): to answer “how can knowledge in complex systems be represented? use “deformable templates” (image algebras). Title Page

• configuration space; Contents

• configurations (prior probability measure from interactions) → image; • groups of similarities allow deformations;

JJ

II

• template is representation within similarity equivalence class; J

I

Page 106 of 163

Go Back

Full Screen

Close

Quit

• “jump-diffusion”: analogue of Metropolis-Hastings. Besag makes link in discussion of Grenander and Miller (1994), suggesting discrete-time LangevinMetropolis-Hastings moves. Expect to need long MCMC runs; Grenander and Miller (1994) use SIMD technology.

4.3. Vascular systems Home Page

We illustrate using an application to vascular structure in medical images (Bhalerao et al. 2001, Th¨onnes et al. 2002, 2003). Figures: Title Page

Figure 4.2(a) cerebral vascular system (application: preplanning of surgical procedures). Contents

Figure 4.2(b) retinal vascular system (application: diabetic retinopathy). JJ

II

J

I

Page 107 of 163

Go Back

Full Screen

(a) AVI movie of cerebral vascular system.

(b) Retina vascular system: 4002 pixel image.

Close

Figure 4.2: Examples of vascular structures. Quit

a

Home Page

Title Page

Contents

JJ

II

J

I

Plan: use random forests of planar (or spatial!) trees as configurations, since the underlying vascular system is nearly (but not exactly!) tree-like. Note: our tree edges are graph vertices in the Grenander-Miller formalism. Vary tree number, topologies and geometries using Metropolis-Hastings moves. Hence sample from posterior so as to partly reconstruct the vascular system. Issues: • figuring out effective reduction of the very large datasets which can be involved, how to link the model to the data, • careful formulation of the prior stochastic geometry model, • choice of the moves so as not to slow down the MCMC algorithm, • and use of multiscale information.

Page 108 of 163

Go Back

Full Screen

Close

Quit

4.3.1. Data Home Page

Title Page

For computational feasibility, use multiresolution Fourier Transform (Wilson, Calway, and Pearson 1992) based on dyadic recursive partitioning using quadtree. (Compare wavelets.)

Contents

JJ

II

J

I

Page 109 of 163

Go Back

Figure 4.3: Multiresolution summary (planar case!). Full Screen

Close

Quit

Home Page

Title Page

Partitioning: based on mean-squared error criterion and the extent to which summarizing feature is elongated: alternative approach uses Akaike’s information criterion (Li and Bhalerao 2003). Figure 4.4(a) shows a typical feature, which in the planar case is a weighted bivariate Gaussian kernel, while Figure 4.4(b) shows the result of summarizing a 400 × 400 pixel retinal image using 1000 weighted bivariate Gaussian kernels.

Contents

JJ

II

J

I

Page 110 of 163

Go Back

Full Screen

Close

Quit

(a) A typical feature for the MFT.

(b) 1000 kernels for a retina.

Figure 4.4: Decomposition into features using quadtree.

Home Page

Title Page

This procedure can be viewed as a spatial variant of a treed model (Chipman et al. 2002), a variation on CART (Breiman et al. 1984) in which different models are fitted to divisions of the data into disjoint subsets, with further divisions being made recursively until satisfactory models are obtained. Thus subsequent Bayesian inference can be viewed as conditional on the treed model inference.

Contents

JJ

II

J

I

Page 111 of 163

Go Back

Full Screen

Figure 4.5: Final result of Multiresolution Fourier Transform Close

Quit

4.3.2. Likelihood Home Page

Link summarized data (mixture of weighted bivariate Gaussian kernels k(x, y, e) to configuration, assuming additive Gaussian noise. Title Page

f (x, y) = Contents

JJ

II

J

I

XX



k(x, y, e) + noise

f˜(x, y)

=

τ ∈F e∈τ

X

k(x, y, e) .

e∈E

Split likelihood by rectangles Ri . `(F|f˜)

=

N X

`i (F|f˜) + constant

i=1 Page 112 of 163

`i (F|f˜)

= −

Go Back



1 2σ 2

1 − 2 2σ

h

X

f˜(x, y) −

h

f˜(x, y) −

Ri

i2 k(x, y, e)

τ ∈F e∈τ

(x,y)∈Ri

ZZ

XX

XX

i2 k(x, y, e) hi (x, y) dx dy

τ ∈F e∈τ

Full Screen

Extend from Ri to all R2 , enabling closed form computation. Close

`i (F|f˜) Quit





1 2σ 2

ZZ R2

h

f˜(x, y) −

XX τ ∈F e∈τ

i2 k(x, y, e) gi (x, y) dx dy (4.1)

4.3.3. Prior Home Page

Prior using AR(1) processes for location, width, extending along forests of trees τ produced by subcritical binary Galton-Watson processes.

Title Page

lengthe Contents

JJ

II

J

I

Page 113 of 163

Go Back

Full Screen

Close

Quit

=

α × lengthe0 + Gaussian innovation

for autoregressive parameter α ∈ (0, 1).

(a) sub-critical binary (b) AR(1) process on (c) each edge reprebranching process. edges. sented by a weighted Gaussian kernel.

Figure 4.6: Components of the prior.

(4.2)

4.3.4. MCMC implementation Home Page

We have now formulated likelihood and model. It remains to decide on a suitable MCMC process.

Title Page

Contents

JJ

II

J

I

Page 114 of 163

Go Back

Full Screen

Figure 4.7: Refined scheme for high-level image analysis. Close

Quit

Home Page

Title Page

Changes are local (hence slow). Compute MH accept-reject probabilities using AR(1) location distributions and Y Y P [τ ] = P [ node has n children ] (4.3) n=0,1,2 v∈τ :v has n children

Contents

JJ

II

J

I

Page 115 of 163

Go Back

Full Screen

Close

Figure 4.8: Reversible jump MCMC proposals. Quit

Computations for topological moves. Suppose family-size distribution is Home Page

Title Page

Contents

JJ

II

J

I

Page 116 of 163

Go Back

Full Screen

Close

Quit

(1 − p)2 , 2p(1 − p) , p2 Hastings ratio for adding a specified leaf ` ∈ ∂ext (τ ): H(τ ; `)

=

P [τ ∪ {`}] /#(∂int (τ ∪ {`}) ∪ ∂ext (τ ∪ {`})) . P [τ ∪ {`}] /#(∂int (τ ) ∪ ∂ext (τ ))

Use induction on τ : P [τ ∪ {`}] = P [τ ] p(1 − p), . . . H(τ ; `)

=

p(1 − p)

1 + #(V (τ )) + #(∂ext (τ )) 2 + #(V (τ )) + #(∂ext (τ ))+1

where +1 is added to the denominator if ` is added to a vertex which already has one daughter. It is clear that this Hastings ratio is always smaller than p(1 − p), and is approximately equal to p(1 − p) for large trees τ . Consequently removals of leaves will always be accepted, and additions will usually be accepted, but there is a computational slowdown as the tree size increases.

Home Page

Title Page

Contents

JJ

II

J

I

Page 117 of 163

Go Back

Full Screen

Close

Quit

Geometric ergodicity is impossible: consider second eigenvalue λ2 of the Markov kernel by Rayleigh-Ritz: P P 2 x y |φ(x) − φ(y)| p(x, y)π(y) 1 − λ2 = inf P P 2 φ6=0 x y |φ(x) − φ(y)| π(x)π(y) P P x∈S y∈S c p(x, y)π(y) ≤ 2 π(S) whenever π(S) < 1/2. Taking S to be the set of all trees of depth n or more, and using the fact that such trees must have at least n sites, it follows 1 − λ2



1 2 p(1 − p) n



0.

Experiments: Home Page

MCMC for prior model Title Page

JJ

II

First we implement only the moves relating to a single tree, and only to that tree’s topology. We change the tree just one leaf at a time. Convergence is clearly very slow! moreover the animation clearly indicates the problem is to do with incipient criticality phenomena.

J

I

MCMC for prior model with more global changes

Contents

Page 118 of 163

Go Back

Moves dealing with sub-trees will speed things up greatly! We won’t use these as such – however we will be working with a whole forest of such trees, and using more “global” moves of join and divide.

Full Screen

MCMC for occupancy conditioned prior model Close

Quit

Conditioning on data will also help! Here is a crude parody of conditioning (pixels are black/white depending on whether or not they intersect the planar tree).

4.3.5. Results Home Page

Title Page

Figure 4.9 presents snapshots (over about 50k moves) of a first try, using accept/reject probabilities for a Metropolis-Hastings algorithm based on the above prior and likelihood.

Contents

JJ

II

J

I

Page 119 of 163

Go Back

Full Screen

Close

Quit

Figure 4.9: AVI movie of fixed-width MCMC.

We improve the algorithm by adding Home Page

Title Page

Contents

(1) a further AR(1) process to model the width of blood vessels; (2) a Langevin component to the Metropolis-Hasting proposal distribution, biasing the proposal according to the gradient of the posterior density (following Grenander and Miller 1994 and especially Besag’s remarks!); (3) the possibility of proposing addition (and deletion) of features of length 2;

JJ

II

J

I

(4) gradual refinement of feature proposals from coarse to fine levels.

Page 120 of 163

Go Back

Full Screen

Close

Figure 4.10: AVI movies of improved MCMC. Quit

4.3.6. Refining the starting state Home Page

Title Page

Contents

JJ

II

J

I

Page 121 of 163

Go Back

Full Screen

Close

Quit

We improve matters by choosing a better starting state for the algorithm, using a variant of Kruskal’s algorithm. This generates a minimal spanning tree of a graph as follows: Let G = (V, E) be a graph with edges weighted by cost. A spanning tree is a subgraph of G which is (a) a tree and (b) spans all vertices. We wish to find a spanning tree τ of minimal cost: def kruskal (edges) : edgelist ← edges.sort () # sort so cheapest edge is first tree ← [ ] for e ∈ edgelist : if cyclefree (tree, e) : # check no cycle is created tree.append (e) return tree Actually we need a binary tree (so amend cyclefree accordingly!) and the cost depends on the edges to which it is proposed to join e, but a heuristic modification of Kruskal’s algorithm produces a binary forest which is a good candidate for a starting position for our algorithm. Figure 4.11 shows a result; notice how now the algorithm begins with many small trees, which are progressively dropped or joined to other trees.

Home Page

Title Page

Contents

JJ

II

J

I

Page 122 of 163

Go Back

Full Screen

Close

Quit

Figure 4.11: AVI movie of MCMC using starting position provided by Kruskal’s algorithm.

Home Page

Title Page

Contents

JJ

II

J

I

Chapter 5

Multiresolution Ising models

Page 123 of 163

Contents Go Back

Full Screen

Close

Quit

5.1 5.2 5.3 5.4 5.5 5.6

Generalized quad-trees Random clusters . . . . Percolation . . . . . . . Comparison . . . . . . . Simulation . . . . . . . Future work . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

125 129 132 139 141 143

Home Page

Title Page

Contents

JJ

II

J

I

Page 124 of 163

The vascular problem discussed above involves multi-resolution image analysis described, for example, in Wilson and Li (2002). This represents the image on different scales: the simplest example is that of a quad-tree (a tree graph in which each node has four daughters and just one parent), for which each node carries a random value 0 (coding the colour white) and 1 (coding the colour black), and the random values interact in the manner of a Markov point process with their neighbours. In the vascular case the 0/1 value is replaced by a line segment summarizing the linear structure which best explains the local data at the appropriate resolution level, but for now we prefer to analyze the simpler 0/1 case. Conventional multi-resolution models have interactions only running up and down the tree; however in our case we clearly need interactions between neighbours at the same resolution level as well. This complicates the analysis!

Go Back

Full Screen

Close

Quit

A major question is to determine the extent to which values at high resolution levels are correlated with values at lower resolution levels (Kendall and Wilson 2003). This is related to much recent interest in the question of phase transitions for Ising models on non-Euclidean random graphs. We shall show how geometric arguments lead to a schematic phase portrait.

5.1. Generalized quad-trees Home Page

Title Page

Define Qd as graph whose vertices are cells of all dyadic tessellations of Rd , with edges connecting each cell u to its 2d neighbours, and also its parent M(u) (covering cell in tessellation at next resolution down) and its 2d daughters (cells which it covers in next resolution up).

Contents

JJ

II

J

I

Case d = 1:

Page 125 of 163

Go Back

Full Screen

Close

Quit

Neighbours at same level also are connected. Remark: No spatial symmetry! However: Qd can be constructed using an iterated function system generated by elements of a skew-product group R+ ⊗s Rd .

Further define: Home Page

Title Page

Contents

JJ

II

J

I

Page 126 of 163

• Qd;r as subgraph of Qd at resolution levels of r or higher;

• Qd (o) as subgraph formed by o and all its descendants.

Go Back

Full Screen

Close

Quit

• Remark: there are many graph-isomorphisms Su;v between Qd;r and Qd;s , with natural Zd -action; • Remark: there are graph homomorphisms injecting Q(o) into itself, sending o to x ∈ Q(o) (semi-transitivity).

Simplistic analysis Home Page

Title Page

Contents

JJ

II

J

I

Page 127 of 163

Define Jλ to be strength of neighbour interaction, Jτ to be strength of parent interaction. If Sx = ±1 then probability of configuration is proportional to exp(−H) where X 1 Jhx,yi (Sx Sy − 1) , (5.1) H=− 2 hx,yi∈E(G)

for Jhx,yi = Jλ , Jτ as appropriate. If Jλ = 0 then the free Ising model on Qd (o) is a branching process (Preston 1977; Spitzer 1975); if Jτ = 0 then the Ising model on Qd (o) decomposes into sequence of d-dimensional classical (finite) Ising models. So we know there is a phase change at (Jλ , Jτ ) = (0, ln(5/3)) (results from branching processes), √ and expect one at (Jλ , Jτ ) = (ln(1 + 2), 0+) (results from 2-dimensional Ising model). But is this all that there is to say?

Go Back

Full Screen

Close

Quit

Home Page

Title Page

There has been a lot of work on percolation and Ising models on non-Euclidean graphs (Grimmett and Newman 1990; Newman and Wu 1990; Benjamini and Schramm 1996; Benjamini et al. 1999; H¨aggstr¨om and Peres 1999; H¨aggstr¨om et al. 1999; Lyons 2000; H¨aggstr¨om et al. 2002). Mostly this concerns graphs admitting substantial symmetry (unimodular, quasi-transitive), so that one can use H¨aggstr¨om’s remarkable Mass Transport method:

Contents

JJ

II

J

I

Page 128 of 163

Go Back

Full Screen

Close

Quit

Theorem 5.1 Let Γ ⊂ Aut(G) be unimodular and quasi-transitive. Given mass transport m(x, y, ω) ≥ 0 “diagonally invariant”, set M (x, y) = R m(x, y, ω) dω. Then the expected total transport out of any x equals Ω the expected total transport into x: X X M (x, y) = M (y, x) . y∈Γx

y∈Γx

Sadly our graphs do not have quasi-transitive symmetry :-(. Furthermore the literature mostly deals with phase transition results “for sufficiently large values of the parameter”. This is an invaluable guide, but image analysis needs actual estimates!

5.2. Random clusters Home Page

Title Page

Contents

JJ

II

J

I

Page 129 of 163

Go Back

Full Screen

Close

Quit

A similar problem, concerning Ising models on products of trees with Euclidean lattices, is treated by Newman and Wu (1990). We follow them by exploiting the celebrated Fortuin-Kasteleyn random cluster representation (Fortuin and Kasteleyn 1972; Fortuin 1972a; Fortuin 1972b): The Ising model is the marginal site process at q = 2 of a site/bond process derived from a dependent bond percolation model with configuration probability Pq,p proportional to qC ×

Y

 (phx,yi )bhx,yi × (1 − phx,yi )1−bhx,yi .

hx,yi∈E(G)

(where bhx,yi indicates whether or not hx, yi is closed, and C is the number of connected clusters of vertices). Site spins are chosen to be the same in each cluster independently of other clusters with equal probabilities for ±1. We find phx,yi = 1 − exp(−Jhx,yi ) ;

(5.2)

Argue for general q (Potts model): spot the Gibbs’ sampler! (I) given bonds, chose connected component colours independently and uniformly from 1, . . . , q; (II) given colouring, open bonds hx, yi independently, probability phx,yi .

Home Page

Title Page

Exercise 5.2 Check step (II) gives correct conditional probabilities! Now compute conditional probability that site o gets colour 1 given neighbours split into components C1 , . . . , Cr with colour 1, and other neighbours in set C0 .

Contents

Exercise 5.3 Use inclusion-exclusion to show it is this: JJ

II

J

I

Page 130 of 163

! Y 1 P [colour(o) = 1|neighbours] ∝ ×q (1 − phx,yi ) + q y   ! Y Y    + (1 − phx,yi ) 1− 1 − phx,zi y∈C0

Go Back

z∈C1 ∪...∪Cr

=

Y

(1 − phx,yi ) .

y:colour(y)6=1 Full Screen

Close

Quit

Deduce interaction strength is Jhx,yi = − log(1 − phx,yi ) as stated in Equation (5.2) above.

FK-comparison inequalities Home Page

If q ≥ 1 and A is an increasing event then Pq,p (A) ≤ P1,p (A) Pq,p (A) ≥ P1,p0 (A)

Title Page

Contents

where p0hx,yi =

JJ

II

J

I

Page 131 of 163

Go Back

Full Screen

Close

Quit

(5.3) (5.4)

phx,yi phx,yi = . phx,yi + (1 − phx,yi )q q − (q − 1)phx,yi

Since P1,p is bond percolation (bonds open or not independently of each other), we can find out about phase transitions by studying independent bond percolation.

5.3. Percolation Home Page

Title Page

Contents

JJ

II

J

I

Page 132 of 163

Go Back

Independent bond percolation on products of trees with Euclidean lattices have been studied by Grimmett and Newman (1990), and these results were used in the Newman and Wu work on the Ising model. So we can make good progress by studying independent bond percolation on Qd , using pτ for parental bonds, pλ for neighbour bonds.

Theorem 5.4 There is almost surely no infinite cluster in Qd;0 (and consequently in Qd (o)) if   q 2d τ Xλ 1 + 1 − Xλ−1 < 1, where Xλ is the mean size of the percolation cluster at the origin for λpercolation in Zd .

Full Screen

Close

Quit

 Modelled on Grimmett and Newman (1990, §3 and §5). Get from matrix spectral asymptotics.

1+

q

1 − Xλ−1



1 Home Page

Title Page

λ Contents

? JJ

II

J

I

Infinite clusters here

Page 133 of 163

0 Go Back

No infinite clusters here 0

(may or may not be unique)

2

−d

1 τ

Full Screen

Close

Quit

The story so far: small λ, small to moderate τ .

Case of small τ Home Page

Title Page

Contents

JJ

II

J

I

Page 134 of 163

Go Back

Full Screen

Close

Quit

Need d = 2 for mathematical convenience. Use Borel-Cantelli argument and planar duality to show, for supercritical λ > 1/2 (that is, supercritical with respect to planar bond percolation!), all but finitely many of the resolution layers Ln = [1, 2n ] × [1, 2n ] of Q2 (o) have just one large cluster each of diameter larger than constant × n. Hence . . . Theorem 5.5 When λ > 1/2 and τ is positive there is one and only one infinite cluster in Q2 (o).

1 Home Page

Just one unique infinite cluster here

Title Page

λ Contents

JJ

II

J

I

1/2

?

Page 135 of 163

0 Go Back

Full Screen

Close

Quit

No infinite clusters here 0

Infinite clusters here (may or may not be unique)

1/4

τ

The story so far: adds small τ for case d = 2.

1

Uniqueness of infinite clusters Home Page

Title Page

Contents

JJ

II

J

I

Page 136 of 163

Go Back

Full Screen

Close

Quit

The Grimmett and Newman (1990) work was remarkable in pointing out that as τ increases so there is a further phase change, from many to just one infinite cluster for λ > 0. The work of Grimmett and Newman carries √ through for Qd (o). However the relevant bound is improved by a factor of 2 if we take into account the hyperbolic structure of Qd (o)! Theorem 5.6 If τ < 2−(d−1)/2 and λ > 0 then there cannot be just one infinite cluster in Qd;0 .

Method: sum weights of “up-paths” in Qd;0 starting, ending at level 0. For fixed s and start point there are infinitely many such up-paths containing s λ-bonds; but no more than (1 + 2d + 2d )s which cannot be reduced by “shrinking” excursions. Hence control the mean number of open up-paths stretching more than a given distance at level 0.

Contribution to upper bound on second phase transition: Home Page

Title Page

Contents

JJ

II

J

I

Page 137 of 163

Go Back

Full Screen

Close

Quit

p Theorem 5.7 If τ > 2/3 then the infinite cluster of Q2:0 is almost surely unique for all positive λ. Method: prune bonds, branching processes, 2-dim comparison . . .

1

Home Page

Just one unique infinite cluster here

Title Page

Contents

JJ

II

J

λ

1/2

I

Page 138 of 163 0

Go Back

Infinite clusters here

?

0

(may or may not be unique)

No infinite clusters here

Many infinite clusters 1

1/4 τ

0.707

0.816

The story so far: includes uniqueness transition for case d = 2. Full Screen

Close

Quit

5.4. Comparison Home Page

Title Page

We need to apply the Fortuin-Kasteleyn comparison inequalities (5.3) and (5.4). The event “just one infinite cluster” is not increasing, so we need more. Newman and Wu (1990) show it suffices to establish a finite island property for the site percolation derived under adjacency when all infinite clusters are removed. Thus:

Contents

JJ

II

J

I

1

λ

finite islands

1/2

Page 139 of 163

Go Back 0

Full Screen

0

1

1/4 τ

Close

Quit

0.707

0.816

Home Page

Comparison arguments then show the following schematic phase diagram for the Ising model on Q2 (o):

Title Page

Contents

JJ

II

J

I

All nodes substantially

Free boundary condition

correlated with each

is mixture of the two extreme

other in case of free

Gibbs states (spin 1 at

boundary conditions

boundary, spin −1 at boundary)

1.099

0.693

Page 140 of 163

Go Back Unique

J Full Screen

λ

Gibbs state

J Close

Quit

τ

Root influenced by wired boundary

0.288

0.511 0.762

1.228 1.190

2.292

5.5. Simulation Home Page

Title Page

Approximate simulations confirm the general story (compare our original animation, which had Jτ = 2, Jλ = 1/4):

Contents

http://www.dcs.warwick.ac.uk/˜rgw/sira/sim.html

JJ

II

J

I

(1) Only 200 resolution levels;

(2) At each level, 1000 sweeps in scan order; Page 141 of 163

Go Back

Full Screen

(3) At each level, simulate square sub-region of 128 × 128 pixels conditioned by mother 64 × 64 pixel region;

(4) Impose periodic boundary conditions on 128 × 128 square region;

Close

Quit

(5) At the coarsest resolution, all pixels set white. At subsequent resolutions, ‘all black’ initial state.

Home Page

Title Page

Contents

JJ

II

J

I

(a) Jλ = 1, Jτ = 0.5

(b) Jλ = 1, Jτ = 1

(c) Jλ = 1, Jτ = 2

(d) Jλ = 0.5, Jτ = 0.5

(e) Jλ = 0.5, Jτ = 1

(f) Jλ = 0.5, Jτ = 2

(g) Jλ = 0.25, Jτ = 0.5

(h) Jλ = 0.25, Jτ = 1

(i) Jλ = 0.25, Jτ = 2

Page 142 of 163

Go Back

Full Screen

Close

Quit

5.6. Future work Home Page

Title Page

Contents

JJ

II

J

I

Page 143 of 163

Go Back

Full Screen

Close

Quit

This is about the free Ising model on Q2 (o). Image analysis more naturally concerns the case of prescribed boundary conditions (say, image at finest resolution level . . . ). Question: will boundary conditions at “infinite fineness” propagate back to finite resolution? Series and Sina˘ı (1990) show answer is yes for analogous problem on hyperbolic disk (2-dim, all bond probabilities the same). Gielis and Grimmett (2002) point out (eg, in Z3 case) these boundary conditions translate to a conditioning for random cluster model, and investigate using large deviations. Project: do same for Q2 (o) . . . and get quantitative bounds? Project: generalize from Qd to graphs with appropriate hyperbolic structure.

Home Page

Title Page

Contents

JJ

II

J

I

Page 144 of 163

Go Back

Full Screen

Close

Quit

Home Page

Title Page

Contents

JJ

II

J

I

Page 145 of 163

Go Back

Full Screen

Close

Quit

Appendix A

Notes on the proofs in Chapter 5

A.1. Notes on proof of Theorem 5.4 Home Page

Mean size of cluster at o bounded above by ∞ X X

Title Page

n−T (t)

Xλ τ n (Xλ − 1)T (t) Xλ

n=0 t:|t|=n Contents

≤ JJ

Xλ (τ Xλ )n

n=0

II

≤ J

∞ X

I

∞ X

Page 146 of 163

(1 − Xλ−1 )T (t)

t:|t|=n

Xλ (2d τ Xλ )n

n=0



X

X

(1 − Xλ−1 )T (j)

j:|j|=n ∞ X

 n q Xλ (2d τ Xλ )n 1 + 1 − Xλ−1 .

n=0 Go Back

Full Screen

For last step, use spectral analysis of matrix representation  n   X   1 1 1 −1 T (j) 1 1 (1 − Xλ ) = . 1 1 − Xλ−1 1 j:|j|=n

Close

Quit

A.2. Notes on proof of Theorem 5.5 Home Page

Title Page

Contents

JJ

II

J

I

Uniqueness: For negative exponent ξ(1 − λ) of dual connectivity function, set `n

=

(n log 4 + (2 + ) log n) ξ(1 − λ) .

More than one “`n -large” cluster in Ln forces existence of open path in dual lattice longer than `n . Now use Borel-Cantelli . . . . On the other hand super-criticality will mean some distant points in Ln are interconnected. Existence: consider 4n−[n/2] points in Ln−1 and specified daughters in Ln . Study probability that (a) parent percolates more than `n−1 ,

Page 147 of 163

Go Back

(b) parent and child are connected, (c) child percolates more than `n . Now use Borel-Cantelli again . . . .

Full Screen

Close

Quit

A.3. Notes on proof of Theorem 5.6 Home Page

Two relevant lemmata: Title Page

Lemma A.1 Consider u ∈ Ls+1 ⊂ Qd and v = M(u) ∈ Ls ⊂ Qd . There are exactly 2d solutions in Ls+1 of Contents

M(x) JJ

II

J

I

Page 148 of 163

=

Su;v (x) .

One is x = u. Others are the remaining 2d − 1 vertices y such that closure of cell representing y intersects vertex shared by closures of cells representing u and M(u). Finally, if x ∈ Ls+1 does not solve M(x) = Su;v (x) then kSu;v (x) − Su;v (u)ks,∞

>

kM(x) − M(u)ks,∞ .

Go Back

Full Screen

Lemma A.2 Given distinct v and y in the same resolution level. Count pairs of vertices u, x in the resolution level one step higher, such that (a) M(u) = v; (b) M(x) = y; (c) Su;v (x) = y.

Close

Quit

There are at most 2d−1 such vertices.

A.4. Notes on proof of Theorem 5.7 Home Page

Prune! Then a direct connection is certainly established across the boundary between the cells corresponding to two neighbouring vertices u, v in L0 if Title Page

(a) the τ -bond leading from u to the relevant boundary is open; Contents

JJ

II

J

I

(b) a τ -branching process (formed by using τ -bonds mirrored across the boundary) survives indefinitely, where this branching process has family-size distribution Binomial(2, τ 2 ); (c) the τ -bond leading from v to the relevant boundary is open.

Page 149 of 163

Go Back

Full Screen

Close

Quit

Then there are infinitely many chances of making a connection across the cell boundary.

A.5. Notes on proof of infinite island property Home Page

Title Page

Notion of “cone boundary” ∂c (S) of finite subset S of vertices: collection of daughters v of S such that Qd (v) ∩ S = ∅. Use induction on S, building it layer Ln on layer Ln−1 to obtain an isoperimetric bound: #(∂c (S)) ≥ (2d − 1)#(S). Hence deduce

Contents

P [S in island at u] JJ

II

J

I

Page 150 of 163



(2d −1)n

(1 − pτ (1 − η))

where #(S) = n and η = P [u not in infinite cluster of Qd (u)]. Upper bound on number N (n) of self-avoiding paths S of length n beginning at u0 : N (n) ≤ (1 + 2d + 2d )(2d + 2d )n . Hence upper bound on the mean size of the island: ∞ X

Go Back

n(1−2−d )

(1 + 2d + 2d )(2d + 2d )n ηbr

,

n=0 Full Screen

Close

Quit

where ηbr is extinction probability for branching process based on Binomial(2d , pτ ) family distribution.

Home Page

Title Page

Bibliography

Contents

JJ

II

J

I

Page 151 of 163

Go Back

Full Screen

Close

Quit

Aldous, D. and M. T. Barlow [1981]. On countable dense random sets. Lecture Notes in Mathematics 850, 311– 327. Aldous, D. J. and P. Diaconis [1987]. Strong uniform times and finite random walks. Advances in Applied Mathematics 8(1), 69–97. Baddeley, A. J. and J. Møller [1989]. Nearest-neighbour Markov point processes and random sets. Int. Statist. Rev. 57, 89–121. Baddeley, A. J. and M. N. M. van Lieshout [1993]. Stochastic geometry models in high-level

vision. Journal of Applied Statistics 20, 233–258. Baddeley, A. J. and M. N. M. van Lieshout [1995]. Areainteraction point processes. Annals of the Institute of Statistical Mathematics 47, 601–619. Barndorff-Nielsen, O. E., W. S. Kendall, and M. N. M. van Lieshout (Eds.) [1999]. Stochastic Geometry: Likelihood and Computation. Number 80 in Monographs on Statistics and Applied Probability. Boca Raton: Chapman and Hall / CRC. Benjamini, I., R. Lyons, Y. Peres, and

Home Page

Title Page

Contents

JJ

II

J

I

Page 152 of 163

Go Back

Full Screen

Close

Quit

O. Schramm [1999]. Group-invariant percolation on graphs. Geom. Funct. Anal. 9(1), 29–66. Benjamini, I. and O. Schramm [1996]. Percolation beyond Zd , many questions and a few answers. Electronic Communications in Probability 1, no. 8, 71–82 (electronic). Bhalerao, A., E. Th¨onnes, W. S. Kendall, and R. G. Wilson [2001]. Inferring vascular structure from 2d and 3d imagery. In W. J. Niessen and M. A. Viergever (Eds.), Medical Image Computing and Computer-Assisted Intervention, proceedings of MICCAI 2001, Volume 2208 of Springer Lecture Notes in Computer Science, pp. 820–828. Springer-Verlag. Also: University of Warwick Department of Statistics Research Report 392. Billera, L. J. and P. Diaconis [2001]. A geometric interpretation of the Metropolis-Hastings algorithm. Statistical Science 16(4), 335–339. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone [1984]. Clas-

sification and Regression. Statistics/Probability. Wadsworth. Brooks, S. P., Y. Fan, and J. S. Rosenthal [2002]. Perfect forward simulation via simulated tempering. Research report, University of Cambridge. Burdzy, K. and W. S. Kendall [2000, May]. Efficient Markovian couplings: examples and counterexamples. The Annals of Applied Probability 10(2), 362–409. Also University of Warwick Department of Statistics Research Report 331. Cai, Y. and W. S. Kendall [2002, July]. Perfect simulation for correlated Poisson random variables conditioned to be positive. Statistics and Computing 12, 229–243. Also: University of Warwick Department of Statistics Research Report 349. Carter, D. S. and P. M. Prenter [1972]. Exponential spaces and counting processes. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 21, 1–19.

Home Page

Title Page

Contents

JJ

II

J

I

Page 153 of 163

Go Back

Full Screen

Close

Quit

Chipman, H., E. George, and R. McCulloch [2002]. Bayesian treed models. Machine Learning 48, 299–320. Clancy, D. and P. D. O’Neill [2002]. Perfect simulation for stochastic models of epidemics among a community of households. Research report, School of Mathematical Sciences, University of Nottingham. Cowan, R., M. Quine, and S. Zuyev [2003]. Decomposition of gammadistributed domains constructed from Poisson point processes. Advances in Applied Probability 35(1), 56–69. Diaconis, P. [1988]. Group Representations in Probability and Statistics, Volume 11 of IMS Lecture Notes Series. Hayward, California: Institute of Mathematical Statistics. Ferrari, P. A., R. Fern´andez, and N. L. Garcia [2002]. Perfect simulation for interacting point processes, loss networks and Ising models. Stochastic Process. Appl. 102(1), 63–88.

Fill, J. [1998]. An interruptible algorithm for exact sampling via Markov Chains. The Annals of Applied Probability 8, 131–162. Fill, J. A., M. Machida, D. J. Murdoch, and J. S. Rosenthal [2000]. Extension of Fill’s perfect rejection sampling algorithm to general chains. Random Structures and Algorithms 17(3-4), 290–316. Fortuin, C. M. [1972a]. On the randomcluster model. II. The percolation model. Physica 58, 393–418. Fortuin, C. M. [1972b]. On the randomcluster model. III. The simple random-cluster model. Physica 59, 545–570. Fortuin, C. M. and P. W. Kasteleyn [1972]. On the random-cluster model. I. Introduction and relation to other models. Physica 57, 536–564. Foss, S. G. and R. L. Tweedie [1998]. Perfect simulation and backward coupling. Stochastic Models 14, 187–203.

Home Page

Title Page

Contents

JJ

II

J

I

Page 154 of 163

Go Back

Full Screen

Close

Quit

Gelfand, A. E. and A. F. M. Smith [1990]. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85(410), 398–409. Geman, S. and D. Geman [1984]. Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741. Geyer, C. J. and J. Møller [1994]. Simulation and likelihood inference for spatial point processes. Scand. J. Statist. 21, 359–373. Gielis, G. and G. R. Grimmett [2002]. Rigidity of the interface for percolation and random-cluster models. Journal of Statistical Physics 109, 1–37. See also arXiv:math.PR/0109103., Green, P. J. [1995]. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732.

Grenander, U. and M. I. Miller [1994]. Representations of knowledge in complex systems (with discussion and a reply by the authors). Journal of the Royal Statistical Society (Series B: Methodological) 56(4), 549– 603. Grimmett, G. R. and C. Newman [1990]. Percolation in ∞ + 1 dimensions. In Disorder in physical systems, pp. 167–190. New York: The Clarendon Press Oxford University Press. Also at Hadwiger, H. [1957]. Vorlesungen u¨ ber Inhalt, Oberfl¨ache und Isoperimetrie. Berlin: Springer-Verlag. H¨aggstr¨om, O., J. Jonasson, and R. Lyons [2002]. Explicit isoperimetric constants and phase transitions in the random-cluster model. The Annals of Probability 30(1), 443–473. H¨aggstr¨om, O. and Y. Peres [1999]. Monotonicity of uniqueness for percolation on Cayley graphs: all infinite clusters are born simultane-

Home Page

Title Page

Contents

JJ

II

J

I

Page 155 of 163

Go Back

Full Screen

Close

Quit

ously. Probability Theory and Related Fields 113(2), 273–285. H¨aggstr¨om, O., Y. Peres, and R. H. Schonmann [1999]. Percolation on transitive graphs as a coalescent process: relentless merging followed by simultaneous uniqueness. In Perplexing problems in probability, pp. 69– 90. Boston, MA: Birkh¨auser Boston. H¨aggstr¨om, O. and J. E. Steif [2000]. Propp-Wilson algorithms and finitary codings for high noise Markov random fields. Combin. Probab. Computing 9, 425–439. H¨aggstr¨om, O., M. N. M. van Lieshout, and J. Møller [1999]. Characterisation results and Markov chain Monte Carlo algorithms including exact simulation for some spatial point processes. Bernoulli 5(5), 641–658. Was: Aalborg Mathematics Department Research Report R-96-2040. Hall, P. [1988]. Introduction to the theory of coverage processes. Wiley Series in Probability and Mathematical Statistics: Probability and Math-

ematical Statistics. New York: John Wiley & Sons. Hastings, W. K. [1970]. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. Himmelberg, C. J. [1975]. Measurable relations. Fundamenta mathematicae 87, 53–72. Huber, M. [2003]. A bounding chain for Swendsen-Wang. Random Structures and Algorithms 22(1), 43–59. Hug, D., M. Reitzner, and R. Schneider [2003]. The limit shape of the zero cell in a stationary poisson hyperplane tessellation. Preprint 03-03, Albert-Ludwigs-Universit¨at Freiburg. Jeulin, D. [1997]. Conditional simulation of object-based models. In D. Jeulin (Ed.), Advances in Theory and Applications of Random Sets, pp. 271– 288. World Scientific. Kallenberg, O. [1977]. A counterexample to R. Davidson’s conjecture on

Home Page

Title Page

Contents

JJ

II

J

I

Page 156 of 163

Go Back

Full Screen

Close

Quit

line processes. Math. Proc. Camb. Phil. Soc. 82(2), 301–307. Kelly, F. P. and B. D. Ripley [1976]. A note on Strauss’s model for clustering. Biometrika 63, 357–360. Kendall, D. G. [1974]. Foundations of a theory of random sets. In E. F. Harding and D. G. Kendall (Eds.), Stochastic Geometry. John Wiley & Sons. Kendall, W. S. [1997a]. On some weighted Boolean models. In D. Jeulin (Ed.), Advances in Theory and Applications of Random Sets, Singapore, pp. 105–120. World Scientific. Also: University of Warwick Department of Statistics Research Report 295. Kendall, W. S. [1997b]. Perfect simulation for spatial point processes. In Bulletin ISI, 51st session proceedings, Istanbul (August 1997), Volume 3, pp. 163–166. Also: University of Warwick Department of Statistics Research Report 308.

Kendall, W. S. [1998]. Perfect simulation for the area-interaction point process. In L. Accardi and C. C. Heyde (Eds.), Probability Towards 2000, New York, pp. 218–234. SpringerVerlag. Also: University of Warwick Department of Statistics Research Report 292. Kendall, W. S. [2000, March]. Stationary countable dense random sets. Advances in Applied Probability 32(1), 86–100. Also University of Warwick Department of Statistics Research Report 341. Kendall, W. S. and J. Møller [2000, September]. Perfect simulation using dominating processes on ordered state spaces, with application to locally stable point processes. Advances in Applied Probability 32(3), 844–865. Also University of Warwick Department of Statistics Research Report 347. Kendall, W. S. and G. Montana [2002, May]. Small sets and Markov transition densities. Stochastic Processes

Home Page

Title Page

Contents

JJ

II

J

I

Page 157 of 163

Go Back

Full Screen

Close

Quit

and Their Applications 99(2), 177– 194. Also University of Warwick Department of Statistics Research Report 371. Kendall, W. S. and E. Th¨onnes [1999]. Perfect simulation in stochastic geometry. Pattern Recognition 32(9), 1569–1586. Also: University of Warwick Department of Statistics Research Report 323. Kendall, W. S., M. van Lieshout, and A. Baddeley [1999]. Quermassinteraction processes: conditions for stability. Advances in Applied Probability 31(2), 315–342. Also University of Warwick Department of Statistics Research Report 293. Kendall, W. S. and R. G. Wilson [2003, March]. Ising models and multiresolution quad-trees. Advances in Applied Probability 35(1), 96–122. Also University of Warwick Department of Statistics Research Report 402. Kingman, J. F. C. [1993]. Poisson processes, Volume 3 of Oxford Stud-

ies in Probability. New York: The Clarendon Press Oxford University Press. Oxford Science Publications. Kolmogorov, A. N. [1936]. Zur theorie der markoffschen ketten. Math. Annalen 112, 155–160. Kovalenko, I. [1997]. A proof of a conjecture of David Kendall on the shape of random polygons of large area. Kibernet. Sistem. Anal. (4), 3– 10, 187. English translation: Cybernet. Systems Anal. 33 461-467. Kovalenko, I. N. [1999]. A simplified proof of a conjecture of D. G. Kendall concerning shapes of random polygons. J. Appl. Math. Stochastic Anal. 12(4), 301–310. Kumar, V. S. A. and H. Ramesh [2001]. Coupling vs. conductance for the Jerrum-Sinclair chain. Random Structures and Algorithms 18(1), 1– 17. Kurtz, T. G. [1980]. The optional sampling theorem for martingales indexed by directed sets. The Annals of

Probability 8, 675–681. Home Page

Title Page

Contents

JJ

II

J

I

Page 158 of 163

Go Back

Full Screen

Close

Quit

Lee, A. B., D. Mumford, and J. Huang [2001]. Occlusion models for natural images: A statistical study of a scale-invariant dead leaves model. International Journal of Computer Vision 41, 35–59. Li, W. and A. Bhalerao [2003]. Model based segmentation for retinal fundus images. In Proceedings of Scandinavian Conference on Image Analysis SCIA’03, G¨oteborg, Sweden, pp. to appear. Liang, F. and W. H. Wong [2000]. Evolutionary Monte Carlo sampling: Applications to Cp model sampling and change-point problem. Statistica Sinica 10, 317–342. Liang, F. and W. H. Wong [2001, June]. Real-parameter evolutionary Monte Carlo with applications to Bayesian mixture models. Journal of the American Statistical Association 96(454), 653–666. Lindvall, T. [1982]. On coupling of

continuous-time renewal processes. Journal of Applied Probability 19(1), 82–89. Lindvall, T. [2002]. Lectures on the coupling method. Mineola, NY: Dover Publications Inc. Corrected reprint of the 1992 original. Liu, J. S., F. Liang, and W. H. Wong [2001, June]. A theory for dynamic weighting in Monte Carlo computation. Journal of the American Statistical Association 96(454), 561–573. Loizeaux, M. A. and I. W. McKeague [2001]. Perfect sampling for posterior landmark distributions with an application to the detection of disease clusters. In Selected Proceedings of the Symposium on Inference for Stochastic Processes, Volume 37 of Lecture Notes - Monograph Series, pp. 321–331. Institute of Mathematical Statistics. Loynes, R. [1962]. The stability of a queue with non-independent interarrival and service times. Math. Proc. Camb. Phil. Soc. 58(3), 497–520.

Home Page

Title Page

Contents

JJ

II

J

I

Page 159 of 163

Go Back

Lyons, R. [2000]. Phase transitions on nonamenable graphs. J. Math. Phys. 41(3), 1099–1126. Probabilistic techniques in equilibrium and nonequilibrium statistical physics. Matheron, G. [1975]. Random Sets and Integral Geometry. Chichester: John Wiley & Sons. Meester, R. and R. Roy [1996]. Continuum percolation, Volume 119 of Cambridge Tracts in Mathematics. Cambridge: Cambridge University Press. Metropolis, N., A. V. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller [1953]. Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, 1087–1092.

Full Screen

Meyn, S. P. and R. L. Tweedie [1993]. Markov Chains and Stochastic Stability. New York: Springer-Verlag.

Close

Miles, R. E. [1995]. A heuristic proof of a long-standing conjecture of D. G. Kendall concerning the shapes of

Quit

certain large random polygons. Advances in Applied Probability 27(2), 397–417. Minlos, R. A. [1999]. Introduction to Mathematical Statistical Physics, Volume 19 of University Lecture Series. American Mathematical Society. Mira, A., J. Møller, and G. O. Roberts [2001]. Perfect slice samplers. Journal of the Royal Statistical Society (Series B: Methodological) 63(3), 593–606. See (Mira, Møller, and Roberts 2002) for corrigendum. Mira, A., J. Møller, and G. O. Roberts [2002]. Corrigendum: perfect slice samplers. Journal of the Royal Statistical Society (Series B: Methodological) 64(3), 581. Molchanov, I. S. [1996a]. A limit theorem for scaled vacancies of the Boolean model. Stochastics and Stochastic Reports 58(1-2), 45–65. Molchanov, I. S. [1996b]. Statistics of the Boolean Models for Practition-

Home Page

Title Page

Contents

JJ

II

J

I

Page 160 of 163

Go Back

Full Screen

Close

Quit

ers and Mathematicians. Chichester: John Wiley & Sons.

of Statistics Theory and Applications 25, 483–502.

Møller, J. (Ed.) [2003]. Spatial Statistics and Computational Methods, Volume 173 of Lecture Notes in Statistics. Springer-Verlag.

Mykland, P., L. Tierney, and B. Yu [1995]. Regeneration in Markov chain samplers. Journal of the American Statistical Association 90(429), 233–241.

Møller, J. and G. Nicholls [1999]. Perfect simulation for sample-based inference. Technical Report R-99-2011, Department of Mathematical Sciences, Aalborg University. Møller, J. and R. Waagepetersen [2003]. Statistics inference and simulation for spatial point processes. Monographs on Statistics and Applied Probability. Boca Raton: Chapman and Hall / CRC. Møller, J. and S. Zuyev [1996]. Gammatype results and other related properties of Poisson processes. Advances in Applied Probability 28(3), 662– 673. Murdoch, D. J. and P. J. Green [1998]. Exact sampling from a continuous state space. Scandinavian Journal

Newman, C. and C. Wu [1990]. Markov fields on branching planes. Probability Theory and Related Fields 85(4), 539–552. Nummelin, E. [1984]. General irreducible Markov chains and nonnegative operators. Cambridge: Cambridge University Press. O’Neill, P. D. [2001]. Perfect simulation for Reed-Frost epidemic models. Statistics and Computing 13, 37– 44. Parkhouse, J. G. and A. Kelly [1998]. The regular packing of fibres in three dimensions. Philosophical Transactions of the Royal Society, London, Series A 454(1975), 1889–1909. Peskun, P. H. [1973]. Optimum Monte

Home Page

Title Page

Contents

JJ

II

J

I

Page 161 of 163

Go Back

Full Screen

Close

Quit

Carlo sampling using Markov chains. Biometrika 60, 607–612. Preston, C. J. [1977]. Spatial birth-anddeath processes. Bull. Inst. Internat. Statist. 46, 371–391. Propp, J. G. and D. B. Wilson [1996]. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms 9, 223–252. Quine, M. P. and D. F. Watson [1984]. Radial generation of n-dimensional Poisson processes. Journal of Applied Probability 21(3), 548–557. Ripley, B. D. [1977]. Modelling spatial patterns (with discussion). Journal of the Royal Statistical Society (Series B: Methodological) 39, 172–212. Ripley, B. D. [1979]. Simulating spatial patterns: dependent samples from a multivariate density. Applied Statistics 28, 109–112. Ripley, B. D. and F. P. Kelly [1977]. Markov point processes. J. Lond. Math. Soc. 15, 188–192.

Robbins, H. E. [1944]. On the measure of a random set I. Annals of Mathematical Statistics 15, 70–74. Robbins, H. E. [1945]. On the measure of a random set II. Annals of Mathematical Statistics 16, 342–347. Roberts, G. O. and N. G. Polson [1994]. On the geometric convergence of the Gibbs sampler. Journal of the Royal Statistical Society (Series B: Methodological) 56(2), 377–384. Roberts, G. O. and J. S. Rosenthal [1999]. Convergence of slice sampler Markov chains. Journal of the Royal Statistical Society (Series B: Methodological) 61(3), 643–660. Roberts, G. O. and J. S. Rosenthal [2000]. Recent progress on computable bounds and the simple slice sampler. In Monte Carlo methods (Toronto, ON, 1998), pp. 123–130. Providence, RI: American Mathematical Society. Roberts, G. O. and R. L. Tweedie [1996a]. Exponential convergence of

Home Page

Title Page

Contents

JJ

II

J

I

Page 162 of 163

Go Back

Full Screen

Close

Quit

Langevin diffusions and their discrete approximations. Bernoulli 2, 341–364.

Verlag, Berlin). Strauss, D. J. [1975]. A model for clustering. Biometrika 63, 467–475.

Roberts, G. O. and R. L. Tweedie [1996b]. Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83, 96–110.

Th¨onnes, E. [1999]. Perfect simulation of some point processes for the impatient user. Advances in Applied Probability 31(1), 69–87. Also University of Warwick Statistics Research Report 317.

Rosenthal, J. S. [2002]. Quantitative convergence rates of Markov chains: A simple account. Electronic Communications in Probability 7, no. 13, 123–128 (electronic).

Th¨onnes, E., A. Bhalerao, W. S. Kendall, and R. G. Wilson [2002]. A Bayesian approach to inferring vascular tree structure from 2d imagery. In W. J. Niessen and M. A. Viergever (Eds.), Proceedings IEEE ICIP-2002, Volume II, Rochester, NY, pp. 937–939. Also: University of Warwick Department of Statistics Research Report 391.

Series, C. and Y. Sina˘ı [1990]. Ising models on the Lobachevsky plane. Comm. Math. Phys. 128(1), 63–76. Spitzer, F. [1975]. Markov random fields on an infinite tree. The Annals of Probability 3(3), 387–398. Stoyan, D., W. S. Kendall, and J. Mecke [1995]. Stochastic geometry and its applications (Second ed.). Chichester: John Wiley & Sons. (First edition in 1987 joint with Akademie

Th¨onnes, E., A. Bhalerao, W. S. Kendall, and R. G. Wilson [2003]. Bayesian analysis of vascular structure. In LASR 2003, Volume to appear. Thorisson, H. [2000]. Coupling, stationarity, and regeneration. New York:

Springer-Verlag. Home Page

Title Page

Contents

JJ

II

J

I

Page 163 of 163

Go Back

Full Screen

Close

Quit

Tierney, L. [1998]. A note on MetropolisHastings kernels for general state spaces. The Annals of Applied Probability 8(1), 1–9. van Lieshout, M. N. M. [2000]. Markov point processes and their applications. Imperial College Press. van Lieshout, M. N. M. and A. J. Baddeley [2001, October]. Extrapolating and interpolating spatial patterns. CWI Report PNA-R0117, CWI, Amsterdam. Widom, B. and J. S. Rowlinson [1970]. New model for the study of liquidvapor phase transitions. Journal of Chemical Physics 52, 1670–1684. Wilson, D. B. [2000a]. How to couple from the past using a readonce source of randomness. Random Structures and Algorithms 16(1), 85–113. Wilson, D. B. [2000b]. Layered Multishift Coupling for use in Perfect Sampling Algorithms (with a primer

on CFTP). In N. Madras (Ed.), Monte Carlo Methods, Volume 26 of Fields Institute Communications, pp. 143–179. American Mathematical Society. Wilson, R., A. Calway, and E. R. S. Pearson [March 1992]. A Generalised Wavelet Transform for Fourier Analysis: The Multiresolution Fourier Transform and Its Application to Image and Audio Signal Analysis. IEEE Transactions on Information Theory 38(2), 674–690. Wilson, R. G. and C. T. Li [2002]. A class of discrete multiresolution random fields and its application to image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 42–56. Zuyev, S. [1999]. Stopping sets: gammatype results and hitting properties. Advances in Applied Probability 31(2), 355–366.

Suggest Documents