The Stochastic Geometry and Statistical Applications section of Advances .....
nevertheless its analysis can lead to very substantial calculations and theory.
Home Page
Title Page
Contents
JJ
II
J
I
Page 1 of 163
Models and simulation techniques from stochastic geometry Wilfrid S. Kendall Department of Statistics, University of Warwick
Go Back
Full Screen
Close
Quit
A course of 5 × 75 minute lectures at the Madison Probability Intern Program, June-July 2003
Introduction Home Page
Title Page
Stochastic geometry is the study of random patterns, whether of points, line segments, or objects. Related reading: Stoyan et al. (1995);
Contents
Barndorff-Nielsen et al. (1999); JJ
II
Møller and Waagepetersen (2003) and the recent expository essays edited by Møller (2003);
J
I
The Stochastic Geometry and Statistical Applications section of Advances in Applied Probability.
Page 2 of 163
Go Back
Full Screen
Close
Quit
Home Page
This screen layout is not very suitable for printing. However, if a printout is desired then here is a button to do the job! Print
Title Page
Contents
JJ
II
J
I
Page 3 of 163
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 4 of 163
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 5 of 163
Go Back
Full Screen
Close
Quit
Contents 1
A rapid tour of point processes
7
2
Simulation for point processes
45
3
Perfect simulation
75
4
Deformable templates and vascular trees
103
5
Multiresolution Ising models
123
A Notes on the proofs in Chapter 5
145
Home Page
Title Page
Contents
JJ
II
J
I
Page 6 of 163
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Chapter 1
A rapid tour of point processes
Page 7 of 163
Contents Go Back
Full Screen
Close
Quit
1.1 1.2 1.3 1.4 1.5
The Poisson point process . . . . Boolean models . . . . . . . . . . General definitions and concepts Markov point processes . . . . . Further issues . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
8 23 27 33 40
1.1. The Poisson point process Home Page
Title Page
Contents
JJ
II
J
I
The simplest example of a random point pattern is the Poisson point process, which models “completely random” distribution of points in the plane R2 , or 3-space R3 , or other spaces. It is a basic building block in stochastic geometry and elsewhere, with many symmetries which typically reduce calculations to computations of area or volume.
Page 8 of 163
Go Back
Full Screen
Close
Quit
Figure 1.1: Which of these point clouds exhibits interaction?
Home Page
Title Page
Contents
JJ
II
J
I
Page 9 of 163
Go Back
Definition 1.1 Let X be a random point process in a window W ⊆ Rn defined as a collection of random variables X(A), where A runs through the Borel subsets of W ⊆ Rn . The point process X is Poisson of intensity λ > 0 if the following are true: 1. X(A) is a Poisson random variable of mean λ Leb(A); 2. X(A1 ), . . . , X(An ) are independent for disjoint A1 , . . . , An . Examples: stars in the night sky;
patterns used to define collections of random discs;
Full Screen
. . . or space-time structures; Close
Brownian excursions indexed by local time(!). Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 10 of 163
Go Back
• In dimension 1 we can construct X using a Markov chain (“number of points in [0, t]”) and simulate it using Exponential random variables for inter-point spacings. Higher dimensions are trickier! See Theorem 1.5 below. • We can replace λ Leb (and W ⊆ Rn ) by a nonnegative σ-finite measure µ on a measurable space X , in which case X is an inhomogeneous Poisson point process with intensity measure µ. We should also require µ to be diffuse (P [µ({x}) = 0] = 1 for every point {x}) in order to avoid the point process faux pas of placing than one point onto a single location.
Exercise 1.2 Show that any measurable map will carry one (perhaps inhomogeneous) Poisson point process into another, so long as – the image of the intensity measure is a diffuse measure; – and we note that the image Poisson point process is degenerate if the image measure fails to be σ-finite.
Full Screen
Close
Quit
Exercise 1.3 Show that in particular we can superpose two independent Poisson point processes to obtain a third, since the sum of independent Poisson random variables is itself Poisson.
Home Page
Title Page
• It is a theorem that the second, independent scattering, requirement of Definition 1.1 is redundant! In fact even the first requirement is over-elaborate:
Theorem 1.4 Suppose a random point process X satisfies P [X(A) = 0]
Contents
JJ
II
J
I
Page 11 of 163
Go Back
=
exp (−µ(A))
for all measurable A .
Then X is an (inhomogeneous) Poisson point process of intensity measure µ.
– This can be proved directly. I find it more satisfying to view the result as a consequence of Choquet capacity theory, since this approach extends to cover a theory of random closed sets (Kendall 1974). Compare inclusion-exclusion arguments and “correlation functions” used in the lattice case in statistical mechanics (Minlos 1999).
Full Screen
– We don’t have to check every measurable set:1 it is enough to consider sets A running through the finite unions of rectangles!
Close
– But we cannot be too constrained in our choice of various A (the family of convex A is not enough).
Quit
1 Certainly
we need check only the BOUNDED measurable sets.
Home Page
• An alternative approach2 lends itself to simulation methods for Poisson point processes in bounded windows: condition on the total number of points.
Title Page
Contents
Theorem 1.5 Suppose the window W ⊆ Rn has finite area. The following constructs X as a Poisson point process of finite intensity λ: – X(W ) is Poisson of mean λ Leb(W );
JJ
II
J
I
– Conditional on X(W ) = n, the random variables X(A) are constructed as point-counts of a pattern of n independent points U1 , . . . , Un uniformly distributed over W . So X(A)
=
# {i : Ui ∈ A} .
Page 12 of 163
Go Back
Full Screen
Exercise 1.6 Generalize Theorem 1.5 to (inhomogeneous) Poisson point processes and to the case of W of infinite area/volume.
Close
Quit
2 The construction described in Theorem 1.5 is the same as the one producing the link via conditioning between Poisson and multinomial contingency table models.
Home Page
Title Page
Contents
JJ
II
J
I
Page 13 of 163
Go Back
Full Screen
Close
Quit
Exercise 1.7 Independent thinning. Suppose each point ξ of X, produced as in Theorem 1.5, is deleted independently of all other points with probability 1 − p(ξ) (so p(ξ) is the retention probability). Show that the resulting thinned random point pattern is an inhomogeneous Poisson point process with intensity measure p(u) Leb( du).
Exercise 1.8 Formulate the generalization of the above exercise which uses position-dependent thinning on inhomogeneous Poisson point processes.
Home Page
Title Page
Contents
JJ
II
J
I
Page 14 of 163
Go Back
Full Screen
Close
Quit
• The process produced by conditioning on X(W ) = n has a claim to being even more primitive than the Poisson point process.
Definition 1.9 A Binomial point process with n points defined on a window W is produced by scattering n points independently and uniformly over W .
However you should notice – point counts in disjoint subsets are now (perhaps weakly) negatively correlated: the more in one region, the fewer can fall in another; – the total number n of points is fixed, so we cannot extend the notion of a (homogeneous) Binomial point process to windows of infinite measure; – if you like that sort of thing, the Binomial point process is the simplest representative of a class of point processes (with finite total number of points) analogous to canonical ensembles in statistical physics. The Poisson point process relates similarly to grand canonical ensembles (grand, because points are indistinguishable and we randomize over the total number of particles). Of course these are trivial cases because there is no interaction between points — but more of that later!
Home Page
Title Page
Contents
JJ
II
J
I
Here’s a worry.3 It is easy to prove that the construction of Theorem 1.5 produces point count random variables which satisfy Definition 1.1 and therefore delivers a Poisson point process. But what about the other way round? Can we travel from Poisson point counts to a random point pattern? Or is there some subtle concealed random behaviour in Theorem 1.5 going beyond the labelling of points, which cannot be extracted from a point count specification? The answer is that the construction and the point count definition are equivalent.
Page 15 of 163
Go Back
Full Screen
Close
Quit
3 Perhaps you think this is more properly a neurosis to be dealt with by personal counselling?! Consider that a neurotic concern here is what allows us to introduce measure-theoretic foundations which are most valuable in further theory.
Home Page
Title Page
Theorem 1.10 Any Poisson point process given by Definition 1.1 induces a unique probability measure on the space of locally finite point patterns, when this is furnished with the σ-algebra generated by point counts.
Contents
Here is a sketch of the proof. JJ
II
J
I
Page 16 of 163
Go Back
Full Screen
Close
Quit
Take W = [0, 1)2 . For each n = 1, 2, . . . , make a dyadic dissection of [0, 1)2 into 4n different sub-rectangles [k2−n , (k + 1)2−n ) × [`2−n , (` + 1)2−n ). For a given level n, visualize the pattern Pn obtained by painting sub-rectangles black if they are occupied by points, white otherwise. The point counts for each of these 4n sub-rectangles are independent Poisson of mean 4−n λ: by subadditivity of probability: P [one or more point counts > 1] ≤ 4n (1 − (1 + 4−n λ) exp −4−n λ = 4n (4−n λ)2 θ exp −4−n λθ for some θ ∈ (0, 1) (use the mean value theorem for x 7→ 1 − (1 + x) exp(−x)), which is summable in n. So by Borel-Cantelli we deduce that for all large enough n the black sub-rectangles in pattern Pn contain just one Poisson point each. Ordering these points randomly yields the construction in Theorem 1.5.
Home Page
Simulation techniques for Poisson point processes are rather direct and straightforward: here are two possibilities. 1. We can use Theorem 1.5 to reduce the problem to the algorithm
Title Page
Contents
JJ
II
J
I
Page 17 of 163
Go Back
Full Screen
Close
Quit
• draw N = n from a Poisson random variable distribution; • draw n independent values from a Uniform distribution over the required window.
def poisson process (region ← W, intensity ← alpha) : pattern ← [ ] for i ∈ range (poisson (alpha)) : pattern.append (uniform (region ← W )) return pattern
Home Page
Title Page
Contents
JJ
II
J
I
Page 18 of 163
Go Back
Full Screen
2. Quine and Watson (1984) describe a generalization of the 1-dimensional “Markov chain” method mentioned above. Suppose we wish to simulate a Poisson point process of intensity λ in ball(o, R). We describe the twodimensional case; the generalization to higher dimensions is straightforward. 4 Using polar coordinates and power transform of radius, produce a measure-preserving map (0, R2 /2] × (0, 2π]
→
(s, θ)
7→
ball(o, R) √ ( 2s, θ)
(1.1)
√ where ( 2s, θ) is expressed in polar coordinates, and [0, R2 /2] × [0, 2π] and the ball(o, R) are endowed with the measure λ Leb. Being measure-preserving, the map preserves point processes. But the following Exercise 1.11 shows a Poisson(λ) point process on the product space [0, R2 /2] × [0, 2π] can be thought of as a 1dimensional Poisson(2λπ) point process of “half-squared radii” on [0, R2 /2] (which may be constructed using a sequence of independent Exponential(2λπ) random variables), each point of which is associated with an “angle” which is uniformly distributed on [0, 2π].
Close
Quit
4 The Quine-Watson method is particularly suited to the construction of Poisson point processes which are to be used to build geometric tessellations . . .
Home Page
Title Page
Contents
JJ
II
J
I
Page 19 of 163
Exercise 1.11 Suppose X is a Poisson point process of intensity λ on the rectangle [a, b] × [0, 1]. Show that it may be realized as a one-dimensional Poisson point process on [a, b] of intensity λ, enriched by attaching to each one-dimensional point an independent Uniform([0, 1]) random variable to provide the second coordinate.
Simply suppose the construction is given by Theorem 1.5, show the first coordinate of each point provides the required one-dimensional Poisson point process, note the second coordinate is uniformly distributed as required!
Go Back
Full Screen
Close
Quit
Exercise 1.12 Why is it irrelevant to this proof that the map image omits the centre o of ball(o, R)?
Home Page
Title Page
Contents
JJ
II
J
I
Page 20 of 163
Exercise 1.13 Check the details of modifying the Quine-Watson construction for higher dimensions, to show how to build a d-dimensional Poisson point process using independent sequences of Exponential random variables and random variables uniformly distributed on S d−1 .
Exercise 1.14 Consider the following method of parametrizing lines: measure s the perpendicular distance from the origin and θ the angle made by the perpendicular to a reference line. How should you modify the Quine-Watson construction to produce a Poisson line process which is invariant under isometries of the plane? What about invariant Poisson hyperplane processes?
Go Back
Full Screen
Close
Quit
What do large cells look like? In the early 1940s D.G. Kendall conjectured a large cell should approximate a disk. We now know this is true, and more besides! (Miles 1995; Kovalenko 1997; Kovalenko 1999; Hug et al. 2003)
Home Page
Title Page
Given a Poisson point process, we may construct a locally finite point pattern using the methods of Theorem 1.10, and enumerate the points of the pattern using some arbitrary rule. So we may speak of summing h(ξ) over the pattern points ξ; by Fubini the result will not depend on the arbitrary ordering so long as h is nonnegative.
Contents
JJ
II
J
I
Page 21 of 163
Go Back
Full Screen
Theorem 1.15 If X is a Poisson point process of intensity λ and if h(x, φ) is a nonnegative measurable function of a point x and a locally finite point pattern φ then X Z E h(ξ, X \ {ξ}) = E [h(u, X)] λ Leb( du) , (1.2) and indeed by direct generalization X E h(ξ1 , . . . , ξn , X \ {ξ1 , ξ2 , . . . , ξn }) Z ··· Rd ···Rd
Quit
=
disjoint ξ1 ,...,ξn ∈X
Z Close
Rd
ξ∈X
(1.3) n
E [h(u1 , . . . , un , X)] λ Leb( du1 ), . . . , Leb( dun ) .
Home Page
Title Page
Exercise 1.16 Use Theorem 1.5 to prove (1.2) for h(u, X) which is positive only when u lies in a bounded window W and which depends on X only on the intersection of the pattern X with the window W . Hence deduce the full Theorem 1.15.
Contents
This delivers a variation on the independent scattering property of the Poisson point process. JJ
II
J
I
Page 22 of 163
Corollary 1.17 If g is a nonnegative measurable function of a locally finite point pattern φ then E [g(X \ {u})|u ∈ X]
=
E [g(X)]
Go Back
Full Screen
Close
Quit
hP i Use Theorem 1.15 to calculate E ξ∈X I [ξ ∈ ball(u, ε)]g(X \ {ξ}) . Interpret R the expectation as ball(u,ε) E [g(X \ {u})|u ∈ X] λ Leb( du)
We say the Palm distribution for a Poisson point process is the Poisson point process itself with the conditioning point added in!
1.2. Boolean models Home Page
Title Page
Contents
JJ
II
J
I
In stochastic geometry the counterpart to the Poisson point process is the Boolean model; the set-union of (possibly random) geometric figures or grains located one at each point or germ of an underlying Poisson point process. Much of the amenability of the Poisson point process is inherited by the Boolean model, though nevertheless its analysis can lead to very substantial calculations and theory.
Page 23 of 163
Go Back
Boolean models have the advantage of being simple enough that one can carry explicit computation. They have a long application history, for example being used by Robbins (1944,1945) to study bombing problems (how heavily to bomb a beach in order to ensure a high probability of landmine-clearance . . . ).
Full Screen
Close
Bombing problems and coverage: refer to Hall (1988, §3.5), Molchanov (1996a); also note application of Exercise 1.14. Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 24 of 163
Go Back
Full Screen
Close
Quit
Definition 1.18 A Boolean model Ξ is defined in terms of a germ process which is a Poisson point process X of intensity λ, and a grain distribution which defines a random set5 . It is a random set which is formed by the union [ (U (ξ) + ξ) , ξ∈X
where the U (ξ) are independent realizations of the random set, one for each germ or point ξ of the germ process. (Notice: the grains should also be independent of the germs!) The classic computation for a Boolean model Ξ is to determine the probability P [o ∈ Ξ] that the resulting random set covers a specified point (here, the origin o). This reduces to a matter of geometry: if o is covered then it must be covered by at least one grain, so just think about where the corresponding germ might lie. ˆ of U in the If U is non-random then the germ will have to lie in the reflection U ˆ . If U is random then generalize origin, so restrict attention to germs lying in U h i ˆ that this: thin each germ ξ of the germ process using the probability P ξ ∈ U 5 The random set is often not random at all (a disk of fixed radius, say) or is random in a simple parametrized manner (a disk of random radius, or a random convex polygon defined as the intersection of a sequence of random half-spaces); so we will follow Robbins (1944,1945) in treating this informally, rather than developing the later and profound theory of random sets (Matheron 1975)!
Home Page
the corresponding grain coversho. Theiresult is an inhomogeneous Poisson point ˆ Leb( du): calculate the probability of this process of intensity measure P u ∈ U process containing no points at all to produce the following:
Title Page
Contents
JJ
II
J
I
Page 25 of 163
Go Back
Full Screen
Theorem 1.19 Suppose Ξ is a Boolean model with germ intensity λ and typical random grain U . Then the area / volume fraction is P [o ∈ Ξ]
=
1 − exp (−λ E [Leb(U )]) .
(1.4)
A similar argument produces the generalization
Theorem 1.20 Suppose Ξ is a Boolean model with germ intensity λ and typical random grain U . Let K be a compact region. Then the probability of Ξ hitting K is h i ˆ ⊕ K) , P [Ξ ∩ K 6= ∅] = 1 − exp −λ E Leb(U (1.5) ˆ ⊕ K is a Minkowski sum: where U
Close
K1 ⊕ K2 Quit
=
{x1 + x2 : x1 ∈ K1 , x2 ∈ K2 } .
(1.6)
Home Page
Title Page
Contents
Remember, from the discussion prior to Theorem 1.15, that we may view the germs as a sequence of points (using some arbitrary method of ordering); so simulation of Ξ need only depend on generating a suitable sequence of independent random sets U1 , U2 , . . . . Harder analytical questions include: • how to estimate the probability of complete coverage?
JJ
II
• in a high-intensity Boolean model, what is the approximate statistical shape of a vacant region?
J
I
• as intensity increases, how to describe the transition to infinite connected regions of coverage?
Page 26 of 163
Go Back
Full Screen
Close
Quit
Useful further reading for these questions: Hall (1988), Meester and Roy (1996), Molchanov (1996b).
1.3. General definitions and concepts Home Page
1.3.1. General Point Processes Title Page
Other approaches to point processes include: • random measures (nonnegative integer-valued, charge singletons with 0, 1);
Contents
• measures on exponential measure spaces; JJ
II
J
I
Page 27 of 163
Go Back
Full Screen
Close
Quit
reflected in notation (a point process X can have points belonging to it (ξ ∈ X), can be used for point counts (X(E) when E is a subset of the ground space, and can be used to provide a reference measure so as to define other point processes via densities). Carter and Prenter (1972) establish essential equivalences. We can define general point processes as probability measures on the space of locally finite point patterns, following Theorem 1.10. This allows us immediately to define stationary point processes (the distribution respects translation symmetries, at least locally) and isotropic point processes (the same, but for rotations). The distribution of a general point process is characterized by its void probabilities as in Theorem 1.4. To each point process X (whether stationary, isotropic, or not) there is an intensity measure given by Λ(E) = E [X(E)]. (For stationary point processes this has to be a multiple of Lebesgue measure!)
We can also define the Palm distribution for X: Home Page
Definition 1.21 E [g(X)|u ∈ X] is required to solve Title Page
Contents
E
X
f (ξ)g(X \ {ξ})
=
ξ∈X∩ball(x,ε)
Z JJ
II
J
I
Page 28 of 163
f (u)E [g(X)|u ∈ X] Λ( du) . (1.7) ball(x,ε)
A somewhat dual notion is the conditional intensity: informally, if x is a location and φ is a locally finite point pattern not including the location x then the conditional intensity λ(x; φ) is the infinitesimal probability of a point pattern X containing x if X \ {x} = φ:
Go Back
λ(x; φ)
=
P [x ∈ X|X \ {x} = φ] . P [X \ {x} = φ]
(1.8)
Full Screen
Close
Quit
We shall not define this in rigorous generality, as its definition will be immediate for the Markov point processes to which we will turn below.
Home Page
These considerations provide a general context for figuring out constructions which move away from the “no interactions” property of Poisson point processes.
Title Page
1. Poisson cluster processes (can be viewed as Boolean models in which the grains are simply finite random clusters of points). Neyman-Scott processes arise from clusters of independent points.
Contents
2. Randomize intensity measure: “doubly stochastic” or Cox process.
JJ
II
J
I
Page 29 of 163
Go Back
Exercise 1.22 If the number of points in a Neyman-Scott cluster is Poisson, then the Neyman-Scott point process is also a Cox process.
Line processes present an interesting issue here! Davidson conjectured that a second-order stationary line process with no parallel lines is Cox. Kallenberg (1977) counterexample: Stoyan et al. (1995, Figure 8.3) provides an illustration.
Full Screen
Close
Quit
3D variants with applications in Parkhouse and Kelly (1998). 3. Or apply rejection sampling to a Poisson point process, rejection probability depending on statistics computed from the point pattern. Compare Gibbs measure formalism of statistical mechanics.
Home Page
Title Page
Contents
JJ
II
J
I
Page 30 of 163
What sort of statistics might we use? ˆ • estimation of intensity λ by λ(X) = X(W )/ Leb(W ) (using sufficient statistic X(W )); • measuring how close the point pattern is to a specific location using the spherical contact distribution function Hs (r)
Full Screen
Close
Quit
P [dist(o, X) ≤ r] ,
(1.9)
estimated via a Boolean model Ξr based on balls U of fixed radius r by [ (U + ξ) ∩ W / Leb(W ) ; Leb(Ξr ∩ W )/ Leb(W ) = Leb ξ∈X
(1.10) • measuring how close pattern points come to each other using the nearest neighbour distance function D(r)
Go Back
=
=
P [dist(o, X) ≤ r|o ∈ X] .
(1.11)
Interpret conditioning on null event “o ∈ X” using Palm distributions; • The related reduced second moment measure 1 K(r) = E [#{ξ ∈ X : dist(o, ξ) ≤ r} − 1|o ∈ X] , (1.12) λ estimating λK(r) by the number sr (X) of (ordered) neighbour pairs (ξ1 , ξ2 ) in X for which dist(ξ1 , ξ2 ) ≤ r (use Slivynak-Mecke Theorem 1.15).
We ignore the important issue of edge effects! Home Page
Title Page
Contents
JJ
II
J
I
Page 31 of 163
Go Back
Full Screen
Close
Quit
Exercise 1.23 Use Theorem 1.15 to prove that for a homogeneous Poisson point process D(r) = Hs (r) and K(r) = Leb(ball(o, r)). Here is a prototype of use of these statistics in rejection algorithms:
Exercise 1.24 Consider the following rejection simulation algorithm, notˆ = X(W )/ Leb(W ) to connect with previous discussion: ing that λ def reject () : X ← poisson process (region ← W, intensity ← lamda) while uniform (0, 1) > theta ∗ ∗(number (X)) : X ← poisson process (region ← W, intensity ← lamda) return X Show that the resulting point process is Poisson of intensity λ + θ.
Use Theorem 1.5 to reduce this to a question about rejection sampling of Poisson random variables.
1.3.2. Marked Point Processes Home Page
Title Page
Contents
JJ
II
J
I
We can think of Boolean models as derived from marked Poisson point processes; to each germ point assign a random mark encoding the geometric data (such as disk radius) of the grain. In effect the Poisson point process now lives on Rd × M where M is the mark space. • Poisson cluster processes as mentioned above: the grains are finite random clusters; • segment and disk processes; • line and (hard) rod processes;
Page 32 of 163
Go Back
Full Screen
• fibre and surface processes. Compare Exercise 1.11; the intensity measure will factorize as µ × ν so long as the marks are independent, where µ is the intensity of the basic Poisson point process and the measure ν is the distribution of the mark. But beware! it can be meaningful to consider improper mark distributions. Examples include: Decomposition of Brownian motion into excursions;
Close
Marked planar Poisson point process for which the marks are disks of random radius R, density proportional to 1/r if r ≤ threshold. Quit
1.4. Markov point processes Home Page
Title Page
Consider interacting point processes generating by rejection sampling (Exercise 1.24 is simple example!). For the sake of simplicity, we suppose here that all our processes are two-dimensional. The general procedure is to use a statistic t(X) in the algorithm
Contents
JJ
II
J
I
Page 33 of 163
def reject () : X ← poisson process (region ← W, intensity ← lamda) while uniform (0, 1) > exp (theta ∗ t (X)) : X ← poisson process (region ← W, intensity ← lamda) return X We can now use standard arguments from rejection simulation. The resulting point process distribution has a density constantθ × exp(θt(X))
Go Back
(1.13)
with respect to the Poisson point process X of intensity λ, for normalizing constant Full Screen
constantθ Close
Quit
=
E [exp(θt(X))|X is Poisson(λ)] .
Home Page
If θt(X) > 0 is possible then the above algorithm runs into trouble (how do you reject a point pattern with probability greater than 1?!) However the probability density at Equation (1.13) can still make sense. Importance sampling could of course implement this. We can do better if the following holds:
Title Page
Contents
Definition 1.25 A point pattern density f (X) satisfies Ruelle stability if f (X)
JJ
≤
A exp(B#(X))
(1.14)
II
for constants A > 0, B; where #(X) is the number of points in X. J
I
Page 34 of 163
Go Back
Exercise 1.26 6 Show that if the density exp(θt(X)) is Ruelle stable with constants as in 1.25 then the rejection algorithm above will produce a point pattern with the correct density if applied to a Poisson point process of density λeB instead of λ, and rejection probability is changed to exp(θt(X) − B#(X))/A instead of exp(θt(X)).
Full Screen
Close
Quit
Use t(X) = sr (X), for γ = exp(θ) ∈ (0, 1), the number of ordered pairs separated by at least r, to obtain the celebrated Strauss point process (Strauss 1975). 6 Thanks
for the comment, Tom!
Home Page
Title Page
Contents
JJ
II
J
I
The statistic t(X) = Leb(Ξr ∩ W ), for γ = exp(−θ) results in a process known as the area-interaction point process (Baddeley and van Lieshout 1995), also known in statistical physics as the Widom-Rowlinson interpenetrating spheres model (Widom and Rowlinson 1970) in the case θ > 1. The (Papangelou) conditional intensity is easily defined for this kind of weighted Poisson point process, which (borrowing statistical mechanics terminology) is commonly known as a Gibbs point process: if the density of the Gibbs process with respect to a unit rate Poisson point process is given by f (X) then the conditional intensity at x may be defined as λ(x; φ)
=
f (φ ∪ {x} . f (φ)
(1.15)
Page 35 of 163
Go Back
Exercise 1.27 Compute the conditional intensity for the Poisson point process of intensity λ (use its density with respect to a unit-intensity Poisson point process; see Exercise 1.24).
Full Screen
Close
Quit
Exercise 1.28 Compute the conditional intensity for the Strauss point process.
Home Page
Exercise 1.29 Compute the conditional intensity for the area-interaction point process.
Title Page
Contents
JJ
II
J
I
Page 36 of 163
Go Back
Full Screen
Close
Quit
All these conditional intensities λ(x; φ) share a striking property: they do not depend on the point pattern φ outside of some neighborhood of x. This is an appealing property for a point process intended to model interactions: it is captured in Ripley and Kelly’s (1977) definition of a Markov point process:
Definition 1.30 A Gibbs point process is a Markov point process with respect to a neighbourhood relation ∼ if its density is everywhere positive7 , and its conditional intensity λ(x; φ) depends only on x and {y ∈ φ : y ∼ x}. The celebrated Hammersley-Clifford theorem tells us that the probability density for a Markov point process can always be factorized as a product of interactions, functions of ∼-cliques of points of the process. Ripley and Kelly (1977) presents a proof for the point process case, while Baddeley and Møller (1989) provides a significant extension. A recent treatment of Markov point processes in book form is given by Van Lieshout (2000). 7 The positivity requirement ensures that the ratio expression for λ(x; φ) in Equation (1.15) is welldefined! It can be relaxed to a hereditary property.
Home Page
Title Page
Contents
JJ
However care must be taken to ensure they are well-defined: the danger of simply writing down a statistic t(X) and defining the density by Equation (1.13) is that the total integral E [exp(θt(X)|X is Poisson(λ)] may diverge so that the normalizing constant is not well-defined. (This is the case for the Strauss point process when γ > 1, as shown by Kelly and Ripley 1976. This particular point process cannot therefore be used to model clustering — a major motivation for considering the more amenable area-interaction point process.)
II
J
I
Exercise 1.31 Show that E γ sr (X) diverges for γ > 1 if the interaction radius r exceeds the maximum separation of potential points in the window W.
Page 37 of 163
Go Back
Full Screen
Close
Quit
Exercise 1.32 Bound exp(−θ Leb(Ξr ∩ W )) above in terms of the number of points in W (show the interaction satisfies Ruelle-stability), and hence deduce that the area-interaction point process is defined for all real θ.
Home Page
Title Page
Contents
JJ
II
J
I
Page 38 of 163
Can we bias using perimeter instead of area of Ξr in area-interaction, or even “holiness” (Euler characteristic: number-of-components minus number-of-holes)? Like area, these characteristics are local (lead to formal densities whose conditional intensities satisfy the local requirement of Theorem 1.30). The Hadwiger characterization theorem (Hadwiger 1957) more-or-less says that linear combinations of these three measures exhaust all local geometric possibilities! Can the result be normalized? Exercise 1.33 Use the argument of Exercise 1.32 to show that exp(θ perimeter(Ξr ∩ W )) can be normalized for all θ.
Exercise 1.34 Argue similarly to show that exp(θ euler(Ξr ∩ W )) can be normalized for all θ > 0.
Go Back
Full Screen
Close
Quit
There can be no more components than disks/grains of the Boolean model Ξr .
Home Page
Title Page
What about exp(θ euler(Ξr ∩ W )) for θ < 0? Key remark is Theorem 4.3 of Kendall et al. (1999): a union of N closed disks in the plane (of whatever radii, not necessarily the same for each disk) can have at most 2N − 5 holes.
Contents
JJ
II
J
I
Separate, trickier argument: disks can be replaced by polygons satisfying a “uniform wedge” condition.
Page 39 of 163
Go Back
Full Screen
Close
Quit
There are counterexamples . . . .
1.5. Further issues Home Page
Before moving on to develop machinery for simulating realizations of Markov point processes, we make brief mention of a couple of recent pieces of work. Title Page
1.5.1. Poisson point processes and set-indexed martingales Contents
JJ
II
J
I
Page 40 of 163
There are striking applications of set-indexed martingales to distributional results for the areas of adaptively random regions. Recall the notion of stopping time T for a filtration {Ft }. Here is a variant on the standard definition: we require {ω : ∆(ω) t} ∈ Ft
where
∆(ω) = {ω : t T (ω)} .
(1.16)
Makes sense for partially-ordered family of closed sets (using set-inclusion); hence martingales, and Optional Sampling Theorem (Kurtz 1980). Zuyev (1999) used this to give an elegant proof of results of Møller and Zuyev (1996):
Go Back
Full Screen
Close
Theorem 1.35 Suppose ∆ is a compact stopping set for X a Poisson point process of intensity α, such that P [X(∆) = n]
>
0
and does not depend on α .
Then Leb(∆) given X(∆) is Gamma(n, α). Quit
Home Page
Title Page
Contents
JJ
II
J
I
Examples: minimal centred ball containing n points (easy: compare Exercise 1.11); regions constructed using Delaunay or Voronoi tessellations. Proof: use likelihood ratio for α (furnishing a martingale), hence h i E ez Leb(∆) | X(∆) = n, α = α = z Leb(∆) P e I [X(∆) = n]αn α0−n e−(α−α0 ) Leb(∆) | α = α0 . P [X(∆) = n | α = α0 ]
Page 41 of 163
Go Back
Full Screen
(a) Delaunay circumdisk
(b) Voronoi flower
Close
Quit
Cowan et al. (2003) use this technique to consider the extent to which these examples lead to decompositions as in the minimal centred ball.
1.5.2. Stationary random countable dense sets Home Page
Title Page
Contents
JJ
II
J
I
Page 42 of 163
Go Back
We conclude this lecture with a cautionary example! Without straying very from the theory of random point processes one can end up in the bad part of the Measure Theory town. Suppose for whatever reason one wishes to consider random point sets X which are countable but are also dense (set of directions of a line process?). We need to be precise about what we can observe, so that we genuinely treat the different points of X on equal terms. So suppose we can observe any event in the hit-or-miss σ-algebra generated by [X ∩ A] 6= ∅, as A runs through the Borel sets. Suppose that X is stationary. (We thus exclude the situations considered by Aldous and Barlow (1981), where account is taken of how various points are constructed — in effect the dense point process is obtained by projection from a conventional locally finite point process.) There are two ways to define such sets: (a) measurable subset of Ω × R; (b) Borel-set-valued random variable, hit-or-miss measurable.
Full Screen
Close
Quit
Equivalent for case of closed sets (Himmelberg 1975, Thm. 3.5). But (warning!) the set of countable sets is not hit-or-miss measurable, though constructive and weak countability are equivalent (section theorem). What can we say about stationary random countable dense sets?
Home Page
Title Page
Theorem 1.36 (Kendall 2000, Thm. 4.3, specialized to Euclidean case.) If E is any hit-or-miss measurable event then either P [E] = 0 or P [E] = 1. Whether 0 or 1 depends on E but not on X!
Contents
JJ
II
J
I
Page 43 of 163
Exercise 1.37 Compare these two stationary random countable dense sets: (a) For Y a unit-intensity Poisson point process on R × (0, ∞), the countable set obtained by projection onto the first coordinate. (b) The set Q + Z, where Q is the set of rational numbers, and Z is a Uniform([0, 1]) random variable. They are very different, but appear indistinguishable . . . .
Go Back
Full Screen
Close
Quit
This bad behaviour should not be surprising: for example Vitali’s non-measurable set depends on there being no measurable random variable taking values in Q + Z. (Contrast our use of enumeration for locally finite point sets in Theorem 1.15.) Careless thought betrays intuition!
Home Page
Title Page
Contents
JJ
II
J
I
Page 44 of 163
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Chapter 2
Simulation for point processes
Page 45 of 163
Contents Go Back
Full Screen
Close
Quit
2.1 2.2 2.3 2.4 2.5 2.6
General theory . . . . . . . . . . . Measure-theoretic approach . . . . MCMC for Markov point processes Uniqueness and convergence . . . . The MCMC zoo . . . . . . . . . . . Speed of convergence . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
48 51 54 60 61 66
Home Page
Computations with Poisson process, Boolean model, cluster processes can be very direct, and simulation is usually direct too. But what about Markov point processes?
Title Page
• Simple-minded rejection procedures (§1.4) are usually infeasible. Contents
JJ
II
• What about accept-reject procedures sequentially in small sub-windows, thus constructing a point-pattern-valued Markov chain whose invariant distribution is the desired point process?
J
I
• Or proceed to limit: add/delete single points?
Page 46 of 163
Go Back
Full Screen
Close
Quit
Markov chain Monte Carlo (MCMC) Gibbs’ sampler, Metropolis-Hastings! Poisson point process as invariant distribution of spatial immigration-death process after Preston (1977), also Ripley (1977, 1979). To deal with more general Markov point processes it is convenient first to work through the general theory of detailed balance equations for Markov chains on general state-spaces. We will show how to use this to derive simulation algorithms for our example point processes above, and discuss the more general MetropolisHastings (MH) formulation after Metropolis et al. (1953) and Hastings (1970), which allows us to view MCMC as a sequence of proposals of random moves which may or may not be rejected.
Simple example of detailed balance Home Page
Underlying the Poisson point process example: the process X of total number of points alters as follows
Title Page
• birth in time interval [0, ∆t) at rate α∆t; • death in time interval [0, ∆t) at rate X∆t.
Contents
Then we can solve JJ
II
J
I
P [X(t) = n − 1 and birth in [t, t + ∆t)] ≈ π(n − 1) × α∆t = π(n) × n∆t ≈ P [X(t) = n and death in [t, t + ∆t)] by
Page 47 of 163
Go Back
Full Screen
Close
Quit
π(n)
=
α π(n − 1) n
= ... =
αn e−α . n!
2.1. General theory Home Page
Title Page
• Ergodic theory for Markov chains: invariant distribution X π(y) = π(x)p(x, y) x
Contents
or (continuous time) X
JJ
II
J
I
π(x)q(x, y) = 0 .
x
• Detailed balance; – reversibility π(x)p(x, y) = π(y)p(y, x) ; – dynamic reversibility π(x)p(x, y) = π(˜ y )p(˜ y, x ˜) ; Page 48 of 163
Go Back
Full Screen
Close
Quit
Exercise 2.1 Show detailed balance (whether reversibility or dynamical reversibility!) implies invariant distribution.
Home Page
P • improper distributions: does x π(x) converge? R Pn • estimate θ(y) dπ(y) using limn→∞ n1 m=1 θ(Ym ) ;
Title Page
Definition 2.2 Geometric ergodicity: there is R(x) > 0 with Contents
kπ − p(n) (x, ·)ktv ≤ R(x)ρn JJ
II
J
I
for all n, all starting points x.
Page 49 of 163
Definition 2.3 Uniform ergodicity: there are ρ ∈ (0, 1), R > 0 not depending on the starting point x such that
Go Back
kπ − p(n) (x, ·)ktv ≤ Rρn
Full Screen
Close
Quit
for all n, uniformly for all starting points x;
2.1.1. History Home Page
Title Page
• Metropolis et al. (1953) • Hastings (1970) • Simulation tests
Contents
JJ
II
• Geman and Geman (1984) • Gelfand and Smith (1990) • Geyer and Møller (1994), Green (1995)
J
I
Page 50 of 163
• ... Definition 2.4 Generic Metropolis algorithm: • Propose: move x → y with probability r(x, y);
Go Back
• Accept: move with probability α(x, y). Required: balance π(x)r(x, y)α(x, y) = π(y)r(y, x)α(y, x).
Full Screen
Definition 2.5 Hastings ratio: set α(x, y) = min{Λ(x, y), 1} where Close
Quit
Λ(x, y)
=
π(y)r(y, x) . π(x)r(x, y)
2.2. Measure-theoretic approach Home Page
Title Page
Contents
JJ
II
J
I
Page 51 of 163
We follow Tierney (1998) closely in presenting a treatment of the Hastings ratio for general discrete-time Markov chains, since in stochastic geometry one frequently deals with chains on quite complicated state spaces. Here is the key detailed balance calculation in general form: compare the “fluxes” π( dx)p(x, dy) and π( dy)p(y, dx). Both are absolutely continuous with respect to their sum: set Λ(y, x) Λ(y, x) + 1
=
π( dx)p(x, dy) π( dx)p(x, dy) + π( dy)p(y, dx)
defining a possibly infinite Λ(y, x) up to nullsets of π( dx)p(x, dy)+π( dy)p(y, dx). Now Λ(y, x) is positive off a null-set N1 of π( dx)p(x, dy) and finite off a null-set N2 of π( dy)p(y, dx). Moreover N1 and N2 can be chosen as transposes of each other under the symmetry (x, y) ↔ (y, x), because they are defined in terms of whether Λ(y, x)/(Λ(y, x) + 1) is 0 or 1, and so we may arrange1 for
Go Back
Λ(x, y) Λ(y, x) + Λ(y, x) + 1 Λ(x, y) + 1
Full Screen
=
1.
Close
Quit
1 working
to null-sets of π( dx)p(x, dy) + π( dy)p(y, dx) throughout!
Applying simple algebra, we deduce Λ(y, x)Λ(x, y) = 1 off N1 ∪ N2 . We know Home Page
Title Page
0 < Λ(y, x), Λ(x, y) < ∞ so
Contents
= II
J
I
Page 52 of 163
Go Back
Λ(y, x) Λ(y, x) + 1
Λ(y, x) Λ(x, y) × Λ(y, x) + 1 Λ(x, y) Λ(x, y) min {Λ(y, x), Λ(y, x)Λ(x, y)} (Λ(y, x) + 1)Λ(x, y) Λ(x, y) = min {Λ(y, x), 1} (2.1) Λ(x, y) + 1
min {1, Λ(x, y)}
JJ
off N1 ∪ N2 ,
=
min {1, Λ(x, y)}
and therefore min {1, Λ(x, y)} π( dx)p(x, dy) = min {1, Λ(y, x)} π( dy)p(y, dx) . The kernel min{1, Λ(x, y)}p(x, dy) may have a deficit (integral over y less than one): compensate by adding an atom Z 1 − min{1, Λ(x, u)}p(x, du) δx ( dy)
Full Screen
Close
Quit
to obtain a reversible kernel (actually a Metropolis-Hastings kernel!) Z 1 − min{1, Λ(x, u)}p(x, du) δx ( dy) + min{1, Λ(x, y)}p(x, dy)
Home Page
Title Page
In case π( dx)p(x, dy) and π( dy)p(y, dx) are absolutely continuous with respect to each other, π( dy)p(y, dx) Λ(x, y) = (2.2) π( dx)p(x, dy) is exactly the Hastings ratio. Otherwise it arises via the Metropolis map L1 projection arguments of Billera and Diaconis (2001), measurable case.
Contents
JJ
II
J
I
Page 53 of 163
Go Back
Full Screen
Close
Quit
We note in passing that the Hastings ratio (as opposed to other possible accept/reject rules) yields the largest spectral gap and minimizes the asymptotic variance of additive functionals (Peskun 1973; Tierney 1998).
2.3. MCMC for Markov point processes Home Page
Title Page
Contents
JJ
II
J
I
We now apply the above to simulation of a point process in a bounded window, whose distribution2 has density f (ξ) with respect to a Poisson point process measure π1 ( dξ) of unit intensity (Geyer and Møller 1994; Green 1995). We need to design a point-pattern-valued Markov chain whose limiting distribution will be f (ξ)π1 ( dξ), and we will constrain the Markov chain to evolve solely by adding and deleting points. Let λ(x; ξ) = f (ξ ∪ {x})/f (ξ) be the Papangelou conditional intensity for the target point process, as defined in Equation (1.15). Suppose the point process satisfies some condition such as local stability, namely λ(x; ξ)
Page 54 of 163
Go Back
≤
γ
(2.3)
for some positive constant γ. We can weaken this: we need require only that Z λ(u; ξ) du
0. Write down the simulation algorithm. Check that the resulting kernel is reversible with respect to πβ , the distribution measure of the Poisson point process of rate β.
There is a very easy approach: recognize that there can be no spatial interaction, so it suffices to set up detailed balance equations for the Markov chain which counts the total number of points . . . .
2.3.2. Continuous Time case Home Page
Title Page
Contents
JJ
II
J
I
Page 58 of 163
In the treatment above it is tempting to make αn as large as possible, to maximize the rate of change per time-step. However we can also consider the limiting case of a continuous-time model and reformulate the simulation in event-based terms, thus producing a spatial birth-and-death process. For small δ, set αn = (n + 1)δ for n to maintain Equation (2.4). (For larger n use smaller αn . . . .) Then • suppose the current value of X is X = ξ. While #(ξ) is not too large, choose from: – Death: with probability #(ξ)δ, choose a point y of ξ at random and delete it from ξ. – Birth: generate a new point x with sub-probability density λ(x; ξ)δ, and add it to ξ (in case of probability deficit, do nothing!); Taking the limit as δ → 0, and reformulating as a continuous time Markov chain:
Go Back
Full Screen
Close
Quit
• suppose the current value of X is X = ξ: R • wait an Exponential(#(ξ) + W λ(x; ξ) dx) random time then R – Death: with probability #(ξ)/(#(ξ)+ W λ(x; ξ) dx), choose a point y of ξ at random and delete it from ξ; – Birth: otherwise generate a new point x with probability density proportional to λ(x; ξ), and add it to ξ.
Home Page
Exercise 2.7 Verify that the above algorithm produces an invariant distribution with Papangelou conditional intensity λ(x; ξ).
Title Page
Contents
JJ
II
J
I
Exercise 2.8 Determine a spatial birth-and-death process with invariant distribution which is a Strauss point process.
Example of Strauss process target.
Page 59 of 163
Exercise 2.9 Determine a spatial birth-and-death process with invariant distribution which is an area-interaction point process. Go Back
Full Screen
Close
Quit
Note that the technique underlying these exercises suggests the following: a local stability bound λ(x; ξ) ≤ M implies that the point process can be expressed as some complicated and dependent thinning of a Poisson point process of intensity M . We shall substantiate this in Chapter 3, when we come to discuss perfect simulation for point processes.
2.4. Uniqueness and convergence Home Page
Title Page
Contents
JJ
II
J
I
Page 60 of 163
Go Back
Full Screen
Close
Quit
We have been careful to describe the target distributions as invariant distributions rather than equilibrium distributions, because we have not yet introduced useful methods to determine whether our chains actually converge to their invariant distributions! As we have already hinted in Chapter 1, this can be a problem; indeed one can write down a Markov chain which purports to have the attractive case Strauss point process as invariant distribution, even though this point process is not well-defined! The full story here involves discussions of φ-recurrence and Harris recurrence, which we reluctantly omit for reasons of time. Fortunately there is a relatively simple technique which often works to ensure uniqueness of and convergence to an invariant distribution, once existence is established. Suppose we can ensure that the chain X visits the empty-pattern state infinitely often, and with finite mean recurrence time (the empty-pattern state ∅ is an ergodic atom). Then the coupling approach to renewal theory (Lindvall 1982; Lindvall 2002) can be applied to show that the distribution of Xn converges in total variation to its invariant distribution, which therefore must be unique. This pleasantly low-tech approach foreshadows the discussion of small sets later in this chapter, and also anticipates important techniques in perfect simulation.
2.5. The MCMC zoo Home Page
Title Page
Contents
JJ
II
J
I
Page 61 of 163
Go Back
Full Screen
Close
Quit
The attraction of MCMC is that it allows us to produce simulation algorithms for all sorts of point processes and geometric processes. It is helpful however to have in mind a general categorization: a kind of “zoo” of different MCMC algorithms. • Here are some important special cases of the MH sampler classified by type of proposal distribution. Let the target distribution be π (could be Bayesian posterior . . . ). – Independence sampler: Choose q(x, y) = q(y) (independent of current location x) which you hope is “well-matched” to π; – Random walk sampler: Choose q(x, y) = q(y − x) to be a random walk increment based on current location x. Tails of target distribution π must not be too heavy and (eg) density contours not too sharply curved (Roberts and Tweedie 1996b); – Langevin algorithms (Also called, “Metropolis-adjusted Langevin”): Choose q(x, y) = q(y − σ 2 g(x)/2) where g(x) is formed using x and the “logarithmic gradient” of the density of π( dx) at x (possibly adjusted) (σ 2 is the jump variance). Behaves badly if logarithmic gradient tends to zero at infinity; a condition requiring something of the order “tails must not be too light” (Roberts and Tweedie 1996a).
Home Page
• Slice sampler: task is to draw from density f (x). Here is a one-dimensional example! (But note, this is only interesting because it can be made to work in many dimensions . . . ) Suppose f unimodal. Define g0(y), g1(y) implicitly by the requirement that [g0(y), g1(y)] is the superlevel set {x : f (x) ≥ y}.
Title Page
Contents
JJ
II
J
I
def mcmc slice move (x) : y ← uniform (0, f (x)) return uniform (g0 (y), g1 (y))
Page 62 of 163
Go Back
Figure 2.1: Slice sampler Full Screen
Close
Quit
Roberts and Rosenthal (1999, Theorem 12) show rapid convergence (order of 530 iterations!) under a specific variation of log-concavity. Relates to notion of “auxiliary variables”.
• Simulated tempering: Home Page
Title Page
– Embed target distribution in a family of distributions indexed by some “temperature” parameter. For example, embed density f (x) (defined over bounded region for x) in family of normalized
Contents
R
(f (x))1/τ ; (f (u))1/τ du
JJ
II
– Choose weights β0 , β1 , . . . , βk for selected temperatures τ0 = 1 < τ1 < . . . τk ;
J
I
– Now do Metropolis-Hastings for target density
Page 63 of 163
βo (f (x))1/τ0 = β0 f (x) Z β1 (f (x))1/τ1 / (f (u))1/τ1 du
at level 0 at level 1 ...
Go Back 1/τk
βk (f (x))
Z /
1/τk
(f (u))
du
at level k
Full Screen
where moves either propose changes in x, or propose changes in level i; (NB: choice of βi ?) Close
Quit
– Finally, sample chain only when it is at level 0.
Home Page
Title Page
Contents
JJ
II
J
I
Page 64 of 163
• Markov-deterministic hybrid: Need to sample exp(−E(q))/Z. Introduce “momenta” p and aim to draw from 1 π(p, q) = exp −E(q) − |p|2 /Z 0 . 2 Fact: for Hamiltonian H(p, q) = E(q) + 12 |p|2 , the dynamical system dq dp
= (∂H/∂p) dt = p = −(∂H/∂q) dt = −(∂E/∂q)
has π(p, q) as invariant distribution (Liouville’s theorem). So alternate between (a) evolving using dynamics and (b) drawing from Gaussian marginal for p. Discretization means (a) is not exact; can fix this by use of MH rejection techniques. Cf : dynamic reversibility.
Go Back
Full Screen
Close
Quit
Home Page
Title Page
• Dynamic importance weighting (Liu, Liang, and Wong 2001): augment Markov chain X by adding an R importance parameter W , replace search for equilibrium (E [θ(Xn )] → θ(x)π( dx)) by requirement Z lim E [Wn θ(Xn , Wn )] ∝ θ(x)π( dx) . n→∞
Contents
JJ
II
J
I
Page 65 of 163
Go Back
Full Screen
Close
Quit
• Evolutionary Monte Carlo Liang and Wong (2000, 2001): use ideas of genetic algorithms (multiple samples, “exchanging” sub-components) and tempering. • and many more ideas besides . . . .
2.6. Speed of convergence Home Page
There is of course an issue as to how fast is the convergence to equilibrium. This can be analytically challenging (Diaconis 1988)! Title Page
2.6.1. Convergence and symmetry: card shuffling Contents
JJ
II
J
I
Page 66 of 163
Example: transposition random walk X on permutation group of n cards. Equilibrium occurs at strong uniform times T (Aldous and Diaconis 1987): Definition 2.10 The random stopping time T is a strong uniform time for the Markov chain X (whose equilibrium distribution is π) if P [T = k, Xk = s]
=
πs × P [T = k] .
Broder: notion of checked cards. Transpose by choosing 2 cards at random; LH, RH. Check card pointed to by LH if either LH, RH are the same unchecked card;
Go Back
or LH is unchecked, RH is checked. Full Screen
Close
Inductive claim: given number of checked cards, positions in pack of checked cards, list of values of cards; the map determining which checked card has which value is uniformly random. Pn−1 Set Tm to be the time till m cards checked. The independent sum T = m=0 Tm is a strong uniform time by induction. How big is T ?
Quit
Home Page
Title Page
Now Tm+1 −Tm is Geometrically distributed with success probability (n−m)(m+ 1)/n2 . So mean of T is # n−1 " n−1 X n2 1 X 1 Tm = + ≈ 2n log n . E n+1 n−m m+1 m=0 m=0
Contents
JJ
II
J
I
Pn−1 MOREOVER: T = m=0 Tm is a discrete version of time to complete infection for simple epidemic (without recovery) in n individuals, starting with 1 infective, individuals contacting at rate n2 . A classic calculation tells us T /(2n log n) has a limiting distribution. NOTE: group representation theory: correct asymptotic is 12 n log n.
Page 67 of 163
Go Back
Full Screen
Close
Quit
The MCMC approach to substitution decoding also produces a Markov chain moving by transpositions on a permutation group of around n = 60 symbols. However it takes much longer to reach equilibrium than order of 60 log(60) ≈ 250 transpositions. Conditioning on data can have a drastic effect on convergence rates!
2.6.2. Convergence and small sets Home Page
Title Page
Contents
JJ
II
J
I
The most accessible theory (as expounded for example in Meyn and Tweedie 1993) uses the notion of small sets for Markov chains. In effect, small set theory allows us to argue as if (suitably regular) Markov chains are actually discrete, even if defined on very general state-spaces. In fact this discreteness can even be formalized as the existence of a latent discrete structure.
Definition 2.11 C is a small set (of order k) if there is a positive integer k, positive c and a probability measure µ such that for all x ∈ X and E ⊆ X p(k) (x, E)
Page 68 of 163
Go Back
Full Screen
Close
Quit
≥
c × I [x ∈ C] µ(E) .
Markov chains can be viewed as regenerating at small sets. The notion is useful for recurrence conditions of Lyapunov type: compare low-tech “empty pattern” method. Small sets play a crucial rˆole in notions of periodicity, recurrence, positive recurrence, geometric ergodicity, making link to discrete state-space theory (Meyn and Tweedie 1993).
Home Page
Foster-Lyapunov recurrence on small sets (Mykland, Tierney, and Yu 1995; Roberts and Tweedie 1996a; Roberts and Tweedie 1996b; Roberts and Rosenthal 2000; Roberts and Rosenthal 1999; Roberts and Polson 1994)
Title Page
Contents
JJ
II
J
I
Page 69 of 163
Go Back
Full Screen
Theorem 2.12 (Meyn and Tweedie 1993) Positive recurrence holds if one can find a small set C, a constant β > 0, and a non-negative function V bounded on C such that for all n > 0 E [V (Xn+1 )|Xn ]
V (Xn ) − 1 + β I [Xn ∈ C] .
(2.6)
Proof: Let N be the random time at which X first (re-)visits C. It suffices3 to show E [N |X0 ] < V (X0 ) + constant < ∞ (then use small-set regeneration). By iteration of (2.6), we deduce E [V (Xn )|X0 ] < ∞ for all n. If X0 6∈ C then (2.6) tells us n 7→ V (Xn∧N ) + n ∧ N defines a nonnegative supermartingale (I X(n∧N ) ∈ C = 0 if n < N ) . Consequently E [N |X0 ]
Close
Quit
≤
3 This
≤
E [V (XN ) + N |X0 ]
≤
V (X0 ) .
martingale approach can be reformulated as an application of Dynkin’s formula.
If X0 ∈ C then the above can be used to show Home Page
Title Page
Contents
JJ
II
J
I
Page 70 of 163
Go Back
Full Screen
Close
Quit
E [N |X0 ]
= E [1 × I [X1 ∈ C]|X0 ] + E [E [N |X1 ] I [X1 6∈ C]|X0 ] ≤ P [X1 ∈ C|X0 ] + E [1 + V (X1 )|X0 ] ≤ P [X1 ∈ C|X0 ] + V (X0 ) + β
where the last step uses Inequality (2.6) applied when I [Xn−1 ∈ C] = 1.
This idea can be extended to give bounds on convergence rates. Home Page
Title Page
Contents
JJ
II
J
I
Theorem 2.13 (Meyn and Tweedie 1993) Geometric ergodicity holds if one can find a small set C, positive constants λ < 1, β, and a function V ≥ 1 bounded on C such that E [V (Xn+1 )|Xn ]
≤
λV (Xn ) + β I [Xn ∈ C] .
(2.7)
Proof: Define N as in Theorem 2.12. Iterating (2.7), we deduce E [V (Xn )|X0 ] < ∞ and more specifically n
7→
V (Xn∧N )/λn∧N
Page 71 of 163
Go Back
Full Screen
is a nonnegative supermartingale. Consequently E V (XN )/λN |X0 ≤
V (X0 ) .
Using the facts that V ≥ 1, λ ∈ (0, 1) and Markov’s inequality we deduce P [N > n|X0 ]
≤
λn V (X0 ) ,
Close
which delivers the required geometric ergodicity. Quit
Home Page
Title Page
Contents
JJ
Bounds may not be tight, but there are many small sets for most Markov chains (not just the trivial singleton sets, either!). In fact small sets can be used to show: Theorem 2.14 (Kendall and Montana 2002) If the Markov chain has a measurable transition density p(x, y) then the two-step density p(2) (x, y) can be expressed (non-uniquely) as a non-negative countable sum X p(2) (x, y) = fi (x)gi (y) .
II
Hence in such cases small sets of order 2 abound! J
I
Page 72 of 163
Go Back
Full Screen
Close
Quit
This relates to the classical theory of small sets and would not surprise its originators; see for example Nummelin 1984.
Home Page
Proof: Key Lemma, variation of Egoroff’s Theorem: Let p(x, y) be an integrable function on [0, 1]2 . Then we can find subsets Aε ⊂ [0, 1], increasing as ε decreases, such that
Title Page
Contents
JJ
II
J
I
Page 73 of 163
Go Back
Full Screen
Close
Quit
(a) for any fixed Aε the “L1 -valued function” px is uniformly continuous on Aε : for any η > 0 we can find δ > 0 such that |x − x0 | < δ and x, x0 ∈ Aε implies Z 1 |px (z) − px0 (z)| dz < η ; 0
(b) every point x in Aε is of full relative density: as u, v → 0 so Leb([x − u, x + v] ∩ Aε ) u+v
→
1.
Moreover one can discover latent discrete Markov chains representing all such Markov chains at period 2.
However one cannot get below order 2: Home Page
Title Page
Contents
JJ
II
J
I
Theorem 2.15 (Kendall and Montana 2002) There are Markov chains which possess no small sets of order 1. Proof: Key Theorem: There exist Borel measurable subsets E ⊂ [0, 1]2 of positive area which are rectangle-free, so that if A × B ⊆ E then Leb(A × B) = 0. (Stirling’s formula, discrete approximations, Borel-Cantelli.)
Page 74 of 163
Go Back
Full Screen
Close
Quit
Figure 2.2: No small sets!
Home Page
Title Page
Contents
JJ
II
J
I
Chapter 3
Perfect simulation
Page 75 of 163
Contents Go Back
Full Screen
Close
Quit
3.1 3.2 3.3
CFTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fill’s (1998) method . . . . . . . . . . . . . . . . . . . . . Sundry other matters . . . . . . . . . . . . . . . . . . . .
77 92 94
Home Page
Title Page
Contents
JJ
II
J
I
Page 76 of 163
Go Back
Full Screen
Close
Quit
Establishing bounds on convergence to equilibrium is important if we are to get a sense of how to do MCMC. It would be very useful if one could finesse the whole issue by modifying the MCMC algorithm so as to produce an exact draw from equilibrium, and this is precisely the idea of perfect simulation. A graphic illustration of this in the context of stochastic geometry (specifically, the dead leaves model) is to be found at http://www.warwick.ac.uk/statsdept/staff/WSK/dead.html.
3.1. CFTP Home Page
Title Page
Contents
JJ
II
J
I
Page 77 of 163
Go Back
Full Screen
Close
Quit
There are two main approaches to perfect simulation. One is Coupling from the Past (CFTP), introduced by Propp and Wilson (1996). Many variations have arisen: we will describe the original Classic CFTP, and two variants Small-set CFTP and Dominated CFTP. Part of the fascination of this area is the interplay between theory and actual implementation: we will present illustrative simulations.
3.1.1. Classic CFTP Home Page
Title Page
Consider reflecting random walk run From The Past (Propp and Wilson 1996): • no fixed start: run all starts simultaneously (Coupling From The Past); • extend past till all starts coalesce by t = 0;
Contents
JJ
II
J
I
• 3-line proof: perfect draw from equilibrium.
Page 78 of 163
Go Back
Figure 3.1: Classic CFTP. Implementation: - “re-use” randomness;
Full Screen
Close
- sample at t = 0 not at coalescence; - control coupling efficiently using monotonicity, boundedness; - uniformly ergodic.
Quit
Here is the 3-line proof, with a couple of extra lines to help explain! Home Page
Title Page
Theorem 3.1 If coalescence is almost sure then CFTP delivers a sample from the equilibrium distribution of the Markov chain X corresponding to the random input-output maps F(−u,v] .
Contents
JJ
II
J
I
Proof: For each time-range [−n, ∞) use input-output maps F(−n,t] to define Xt−n
Page 79 of 163
=
F(−n,t] (0)
for − n ≤ t .
We have assumed finite coalescence time −T for F . Then X0−n L(X0−n )
= =
X0−T L(Xn0 )
whenever − n ≤ −T ;
(3.1) (3.2)
Go Back
If X converges to an equilibrium π then Full Screen
Close
Quit
dist(L(X0−T ), π)
lim dist(L(X0−n ), π) = lim dist(L(Xn0 ), π) = 0 n n (3.3) (dist is total variation) hence giving the required result. =
Home Page
Title Page
Here is a simple version of perfect simulation applied to the Ising model, following Propp and Wilson (1996) (but they do something much cleverer!). Here the probability of a given ±1-valued configuration S is proportional to XX Sx Sy exp β x∼y
Contents
JJ
II
J
I
Page 80 of 163
Go Back
Full Screen
Close
Quit
and our recipe is, pick a site x, compute conditional probability of Sx = +1 given rest of configuration S, P X exp(β y:y∼x (-1) × Sy ) P ρ = = exp(−2β Sy ) , exp(β y:y∼x (1) × Sy ) y:y∼x and set Sx = +1 with probability ρ, else set Sx = −1. Boundary conditions are “free” in the simulation: for small β we can make a perfect draw from Ising model over infinite lattice, by coupling in space as well as time (Kendall 1997b; H¨aggstr¨om and Steif 2000). We can do better for Ising: Huber (2003) shows how to convert Swendsen-Wang methods to CFTP.
3.1.2. A little history Home Page
Title Page
Contents
JJ
II
J
I
Kolmogorov (1936): 1 Consider an inhomogeneous (finite state) Markov transition kernel Pij (s, t). Can it be represented as the kernel of a Markov chain begun in the indefinite past? Yes! Apply diagonalization arguments to select a convergent subsequence from the probability systems arising from progressively earlier and earlier starts: P [X(0) = k] P [X(−1) = j, X(0) = k] P [X(−2) = i, X(−1) = j, X(0) = k]
= = = ...
Page 81 of 163
Go Back
Full Screen
Close
Quit
1 Thanks
to Thorisson (2000) for this reference . . . .
Qk (0) Qj (−1)Pjk (−1, 0) Qi (−2)Pij (−2, −1)Pjk (−1, 0)
3.1.3. Dead Leaves Home Page
Title Page
Nearly the simplest possible example of perfect simulation. • Dead leaves falling in the forests at Fontainebleau; • Equilibrium distribution models complex images;
Contents
• Computationally amenable even in classic MCMC framework; JJ
II
J
I
• But can be sampled perfectly with no extra effort! (And uniformly ergodic) How can this be done? see Kendall and Th¨onnes (1999) [HINT: try to think non-anthropomorphically . . . .] Occlusion CFTP. Related work: Jeulin (1997), Lee, Mumford, and Huang (2001).
Page 82 of 163
Go Back
Full Screen
Close
Figure 3.2: Falling leaves of perfection. Quit
3.1.4. Small-set CFTP Home Page
Does CFTP depend on monotonicity? No: we can use small sets. Consider a simple Markov chain on [0, 1] with triangular kernel.
Title Page
Contents
JJ
II
J
I
Figure 3.3: Small-set CFTP.
Page 83 of 163
• Note “fixed” small triangle; Go Back
• this can couple Markov chain steps; Full Screen
• basis of small-set CFTP (Murdoch and Green 1998). Close
Quit
Extends to more general cases (“partitioned multi-gamma sampler”). Uniformly ergodic.
Home Page
Title Page
Exercise 3.2 Write out the details of small-set CFTP. Deduce how to represent the output of simple small-set CFTP (uses just one small set) in terms of a Geometric random variable and draws from a single kernel.
Contents
JJ
II
J
I
Page 84 of 163
Go Back
Full Screen
Close
Quit
Exercise 3.3 One small set is usually not enough - coalescence takes much too long! Work out how to use several small sets to do partitioned small-set CFTP (“partitioned multi-gamma sampler”) (Murdoch and Green 1998).
3.1.5. Point patterns Home Page
Title Page
Recall Widom-Rowlinson model (Widom and Rowlinson 1970) from physics, “area-interaction” model in statistics (Baddeley and van Lieshout 1995). The point pattern X has density (with respect to Poisson process): exp (− log(γ) × (area of union of discs centred on X))
Contents
JJ
II
J
I
Page 85 of 163
Figure 3.4: Perfect area-interaction. Go Back
• Represent as two-type pattern (red crosses, blue disks); Full Screen
Close
Quit
• inspect structure to detect monotonicity; • attractive case γ > 1: CFTP uses “bounded” state-space (H¨aggstr¨om et al. 1999) in sense of being uniformly ergodic. All cases can be dealt with using Dominated CFTP (Kendall 1998).
3.1.6. Dominated CFTP Home Page
Title Page
Contents
JJ
II
J
I
Foss and Tweedie (Foss and Tweedie 1998) show classic CFTP is essentially equivalent to uniform ergodicity (rate of convergence to equilibrium does not depend on start-point). For classic CFTP needs vertical coalescence: every possible start from time −T leads to same result at time 0. Can we lift this uniform ergodicity requirement?
Page 86 of 163
Go Back
Figure 3.5: Horizontal CFTP. Full Screen
Close
Quit
CFTP almost works with horizontal coalescence: all sufficiently early starts from a specific location * lead to same result at time 0. But how to identify when this has happened?
Home Page
Recall Lindley’s equation for the GI/G/1 queue identity Wn+1
Title Page
Contents
JJ
II
=
I
Page 87 of 163
Go Back
Full Screen
Close
Quit
=
max{0, Wn + ηn }
and its development Wn+1
max{0, ηn , ηn + ηn−1 , . . . , ηn + ηn−1 + . . . + η1 }
=
leading by time-reversal to Wn+1
J
max{0, Wn + Sn − Xn+1 }
=
max{0, η1 , η1 + η2 , . . . , η1 + η2 + . . . + ηn }
and thus the steady-state expression W∞
=
max{0, η1 , η1 + η2 , . . .} .
Loynes (1962) discovered a coupling application to queues with general inputs, which pre-figures CFTP.
Home Page
Title Page
Contents
JJ
II
J
I
Dominated CFTP idea: Generate target chains using a dominating process for which equilibrium is known. Domination allows us to identify horizontal coalescence by checking starts from maxima given by the dominating process. Thus we can apply CFTP to Markov chains which are merely geometrically ergodic (Kendall 1998; Kendall and Møller 2000; Kendall and Th¨onnes 1999; Cai and Kendall 2002) or worse (geometric ergodicity 6= Dominated CFTP!).
Page 88 of 163
Go Back
Full Screen
(a) Perfect Strauss point process. (b) Perfect conditioned Boolean model.
Close
Figure 3.6: Two examples of non-uniformly ergodic CFTP.
Quit
Home Page
Description2 of dominated CFTP (Kendall 1998; Kendall and Møller 2000) for birth-death algorithm for point process satisfying the local stability condition λ(x; ξ)
≤
γ.
Title Page
Contents
JJ
II
J
I
Original RAlgorithm: points x die at unitRdeath rate, are born in pattern ξ at total birth rate W λ(u; ξ) du, density λ(x; ξ)/ W λ(u; ξ) du, so local birth rate λ(x; ξ). Perfect modification: underlying supply of randomness is birth-and-death process with unit death rate, uniform birth rate γ per unit area. Mark points independently with (example) Uniform(0, 1) marks. Define lower and upper processes L and U as exhibiting minimal and maximal birth rates available in range of configurations between L and U : minimal birth rate at x = maximal birth rate at x =
Page 89 of 163
Go Back
min{λ(x; ξ) : L ⊆ ξ ⊆ U } , max{λ(x; ξ) : L ⊆ ξ ⊆ U } .
Use marks to do CFTP till current iteration delivers L(0) = U (0), then return the agreed configuration! Example uses Poisson point patterns as marks for areainteraction, following Kendall (1997a).
Full Screen
“It’s a thinning procedure, Jim, but not as we know it.” Dr McCoy, Startrek, stardate unknown.
Close
Quit
2 Recall
remark about dependent thinning!
Home Page
All this suggests a general formulation: Let X be a Markov chain on X which exhibits equilibrium behaviour. Embed the state-space X in a partially ordered space (Y, ) so that X is at bottom of Y, in the sense that for any y ∈ Y, x ∈ X ,
Title Page
yx Contents
JJ
II
J
I
implies
y = x.
We may then use Theorem 3.1 to show: Theorem 3.4 Define a Markov chain Y on Y such that Y evolves as X after it hits X ; let Y (−u, t) be the value at t a version of Y begun at time −u, (a) of fixed initial distribution L(Y (−T, −T )) = L(Y (0, 0)), and
Page 90 of 163
Go Back
Full Screen
Close
Quit
(b) obeying funnelling: if −v ≤ −u ≤ t then Y (−v, t) Y (−u, t). Suppose coalescence occurs: P [Y (−T, 0) ∈ X ] → 1 as T → ∞. Then lim Y (−T, 0) can be used for a CFTP draw from the equilibrium of X.
Home Page
So how far can we go? It looks nearly possible to define the dominating process using the “Liapunov function” V of Theorem 2.13. There is an interesting link – Rosenthal (2002) draws it even closer – nevertheless:
Title Page
Existence of Liapunov function doesn’t ensure dominated CFTP! The relevant Equation (2.7) is:
Contents
E [V (Xn+1 )|Xn ] JJ
II
J
I
Page 91 of 163
Go Back
Full Screen
Close
Quit
≤
λV (Xn ) + β I [Xn ∈ C] .
However one can cook up perverse examples in which this supermartingale inequality is satisfied, but there is bad failure of the stochastic dominance required to use V (X) for dominated CFTP . . . :-(.
3.2. Fill’s (1998) method Home Page
Title Page
Contents
JJ
II
J
I
The alternative to CFTP is Fill’s algorithm, which at first sight appears quite different (though we here describe the beautiful work of Fill et al. 2000, which shows they are profoundly linked). The underlying notion is that of a strong uniform time T , which we have already encountered. P [T = t, X(t) = s]
=
π(s) × P [T = t]
Fill’s method is best explained using “blocks” to model input-output maps representing a Markov chain.
Page 92 of 163
Go Back
Figure 3.7: The “block” model for input-output maps. First recall that CFTP can be viewed in a curiously redundant fashion as follows:
Full Screen
Close
• Draw from equilibrium X(−T ) and run forwards; • continue to increase T until X(0) is coalesced; • return X(0).
Quit
Key observation: By construction, X(−T ) is independent of X(0) and T so . . . Home Page
• Condition on a convenient X(0); • Run X backwards to a fixed time −T ;
Title Page
• Draw blocks conditioned on the X transitions; Contents
JJ
II
J
I
Page 93 of 163
Go Back
• If coalescence then return X(−T ) else repeat.
Figure 3.8: Conditioning the blocks on the reversed chain. See Fill (1998), Th¨onnes (1999), Fill et al. (2000). Exercise 3.5 Can you figure out how to do a dominated version of Fill’s method?
Full Screen
Close
Quit
Exercise 3.6 Produce a perfect version of the slice sampler! (Mira, Møller, and Roberts 2001)
3.3. Sundry other matters Home Page
Title Page
Contents
JJ
II
J
I
Page 94 of 163
Go Back
Layered Multishift: Wilson (2000b) and further work by Corcoran and others. Basic idea: how to draw simultaneously from Uniform(x, x + 1) for all x ∈ R, and to try to couple the draws? Answer: draw a uniformly random unit span integer lattice, . . . . 3 Exercise 3.7 Figure out how to replace “Uniform(x, x + 1)” by any other location-shifted family of continuous densities. (Hint: mixtures of uniforms . . . .) Montana used these ideas to do perfect simulation of nonlinear AR(1) in his PhD thesis.
Full Screen
Close
Quit
3 Uniformly
random lattices in dimension 2 play a part in line process theory.
Home Page
Title Page
Read-once randomness: Wilson (2000a). The clue is in: Exercise 3.8 Figure out how to do CFTP, using the “input-output” blocks picture of Figure 3.7, but under the constraint that you aren’t allowed to save any blocks!
Contents
JJ
II
J
I
Page 95 of 163
Perfect simulation for Peirls’ contours (Ferrari et al. 2002). The Ising model can be reformulated in an important way by looking only at the contours (lines separating ±1 values). In fact these form a “non-interacting hard-core gas”, thus permitting (at least in theory) Ferrari et al. (2002) to apply their variant of perfect simulation (Backwards-Forwards Algorithm)! Compare also Propp and Wilson (1996).
Go Back
Perfect simulated tempering (Møller and Nicholls 1999; Brooks et al. 2002). Full Screen
Close
Quit
3.3.1. Efficiency Home Page
Title Page
Contents
JJ
II
J
I
Page 96 of 163
Go Back
Burdzy and Kendall (2000): Coupling of couplings:. . . |pt (x1 , y) − pt (x2 , y)| ≤ ≤ |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]| ≤ |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]| × ×P [τ > t|X(0) = (x1 , x2 )] Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c exp(−µ2 t) while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt) Let X ∗ be a coupled copy of X but begun at (x2 , x1 ): |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ] | = |P [X1 (t) = y|τ > t, X(0) = (x1 , x2 )] − P [X1∗ (t) = y|τ > t, X ∗ (0) = (x2 , x1 )] | So let σ be time when X, X ∗ couple:
Full Screen
≤ P [σ > t|τ > t, X(0) = (x1 , x2 )] (≈ c exp(−µ0 t))
Close
Quit
Thus µ2 ≥ µ0 + µ. See also Kumar and Ramesh (2001).
3.3.2. Example: perfect epidemic Home Page
Title Page
Context • many models: SIR, SEIR, Reed-Frost • data: knowledge about removals
Contents
JJ
II
J
I
• draw inferences about other aspects – infection times – infectivity parameter – removal rate – ...
Page 97 of 163
Perfect MCMC for Reed-Frost model Go Back
• based on explicit formula for size of epidemic • (O’Neill 2001; Clancy and O’Neill 2002)
Full Screen
Close
Quit
Question Home Page
Title Page
Can we apply perfect simulation to SIR epidemic conditioned on removals? • IDEA – SIR → compartment model
Contents
JJ
II
J
I
Page 98 of 163
Go Back
Full Screen
Close
Quit
– conditioning on removals is reminiscent of coverage conditioning for Boolean model • consider SIR-epidemic-valued process, considered as event-driven simulation, driven by birth-and-event processes producing infection and removal events . . . • then use reversibility, perpetuation, crossover to derive perfect simulation algorithm for conditioned case.
• PRO: conditioning on removals appears to make structure “monotonic” Home Page
Title Page
• CON: Perpetuation and nonlinear infectivity rate combine to make funnelling difficult • So: attach independent streams of infection proposals to each level k = I + R to “localize” perpetuation
Contents
• Speed-up by cycling through: JJ
II
J
I
– replace all unobserved marked removal proposals (thresholds) – try to alter all observed removal marks (perpetuate . . . )
Page 99 of 163
Go Back
Full Screen
Close
Quit
– replace all infection proposals (perpetuate . . . ) (Reminiscent of H-vL-M method for perfect simulation of attractive areainteraction point process)
Consistent theoretical framework Home Page
Title Page
Contents
JJ
II
J
I
View Poisson processes in time as 0 : 1-valued processes over “infinitesimal pixels”: infinitesimal infinitesimal P X = 1 = λ × measure pixel pixel • replace thresholds: work sequentially through “infinitesimal pixels”, reject replacement if it violates conditioning • replace removal marks: can incorporate into above step • replace infection proposals: work sequentially through “infinitesimal pixels” first at level 1, then level 2, then level 3 . . . (TV scan); reject replacement if it violates conditioning.
Page 100 of 163
Theorem 3.9 Resulting construction is monotonic (so enables CFTP) Go Back
Full Screen
Close
Proof: Establish monotonicity for • fastest path of infection (what is observed) • slowest feasible path of infection (what is used for perpetuation)
Quit
Implementation: Home Page
Title Page
It works! • speed-up? • scaling?
Contents
JJ
II
J
I
Page 101 of 163
Go Back
Full Screen
Close
Quit
• refine? • generalize? Applications • “omnithermal” variant to produce estimate of likelihood surface • generalization using crossover to produce full Bayesian analysis
Home Page
Title Page
Contents
JJ
II
J
I
Page 102 of 163
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 103 of 163
Chapter 4
Deformable templates and vascular trees
Go Back
Contents Full Screen
Close
Quit
4.1 4.2 4.3
MCMC and statistical image analysis . . . . . . . . . . . . 104 Deformable templates . . . . . . . . . . . . . . . . . . . . 106 Vascular systems . . . . . . . . . . . . . . . . . . . . . . . 107
4.1. MCMC and statistical image analysis Home Page
Title Page
How to use MCMC on images? Huge literature, beginning with Geman and Geman (1984). Simple example: reconstruct white/black ±1 image: observation Xij = ±1, “true image” Sij = ±1, X|S independent
Contents
P [Xij = xij |Sij ] JJ
II
J
I
Full Screen
Close
exp (φSij xij ) cosh(φ)
(i,j)∼(i0 ,j 0 )
Posterior for S|X proportional to X exp β (i,j)∼(i0 ,j 0 )
Sij Si0 j 0 + φ
X
Xij Sij
ij
(Ising model with inhomogeneous external magnetic field φX), and we can sample from this using MCMC algorithms. Easy CFTP for this example
Quit
2
while prior for S is Ising model, probability mass function proportional to X exp β Sij Si0 j 0
Page 104 of 163
Go Back
=
Home Page
“Low-level image analysis”: contrast “high-level image analysis”. For example, pixel array {(i, j)} representing small squares in a window [0, 1]2 , and replace S by a Boolean model (Baddeley and van Lieshout 1993).
Title Page
Contents
JJ
II
J
I
Page 105 of 163
Go Back
Full Screen
Close
Quit
Figure 4.1: Scheme for high-level image analysis. van Lieshout and Baddeley (2001), Loizeaux and McKeague (2001) provide perfect examples.
4.2. Deformable templates Home Page
Grenander and Miller (1994): to answer “how can knowledge in complex systems be represented? use “deformable templates” (image algebras). Title Page
• configuration space; Contents
• configurations (prior probability measure from interactions) → image; • groups of similarities allow deformations;
JJ
II
• template is representation within similarity equivalence class; J
I
Page 106 of 163
Go Back
Full Screen
Close
Quit
• “jump-diffusion”: analogue of Metropolis-Hastings. Besag makes link in discussion of Grenander and Miller (1994), suggesting discrete-time LangevinMetropolis-Hastings moves. Expect to need long MCMC runs; Grenander and Miller (1994) use SIMD technology.
4.3. Vascular systems Home Page
We illustrate using an application to vascular structure in medical images (Bhalerao et al. 2001, Th¨onnes et al. 2002, 2003). Figures: Title Page
Figure 4.2(a) cerebral vascular system (application: preplanning of surgical procedures). Contents
Figure 4.2(b) retinal vascular system (application: diabetic retinopathy). JJ
II
J
I
Page 107 of 163
Go Back
Full Screen
(a) AVI movie of cerebral vascular system.
(b) Retina vascular system: 4002 pixel image.
Close
Figure 4.2: Examples of vascular structures. Quit
a
Home Page
Title Page
Contents
JJ
II
J
I
Plan: use random forests of planar (or spatial!) trees as configurations, since the underlying vascular system is nearly (but not exactly!) tree-like. Note: our tree edges are graph vertices in the Grenander-Miller formalism. Vary tree number, topologies and geometries using Metropolis-Hastings moves. Hence sample from posterior so as to partly reconstruct the vascular system. Issues: • figuring out effective reduction of the very large datasets which can be involved, how to link the model to the data, • careful formulation of the prior stochastic geometry model, • choice of the moves so as not to slow down the MCMC algorithm, • and use of multiscale information.
Page 108 of 163
Go Back
Full Screen
Close
Quit
4.3.1. Data Home Page
Title Page
For computational feasibility, use multiresolution Fourier Transform (Wilson, Calway, and Pearson 1992) based on dyadic recursive partitioning using quadtree. (Compare wavelets.)
Contents
JJ
II
J
I
Page 109 of 163
Go Back
Figure 4.3: Multiresolution summary (planar case!). Full Screen
Close
Quit
Home Page
Title Page
Partitioning: based on mean-squared error criterion and the extent to which summarizing feature is elongated: alternative approach uses Akaike’s information criterion (Li and Bhalerao 2003). Figure 4.4(a) shows a typical feature, which in the planar case is a weighted bivariate Gaussian kernel, while Figure 4.4(b) shows the result of summarizing a 400 × 400 pixel retinal image using 1000 weighted bivariate Gaussian kernels.
Contents
JJ
II
J
I
Page 110 of 163
Go Back
Full Screen
Close
Quit
(a) A typical feature for the MFT.
(b) 1000 kernels for a retina.
Figure 4.4: Decomposition into features using quadtree.
Home Page
Title Page
This procedure can be viewed as a spatial variant of a treed model (Chipman et al. 2002), a variation on CART (Breiman et al. 1984) in which different models are fitted to divisions of the data into disjoint subsets, with further divisions being made recursively until satisfactory models are obtained. Thus subsequent Bayesian inference can be viewed as conditional on the treed model inference.
Contents
JJ
II
J
I
Page 111 of 163
Go Back
Full Screen
Figure 4.5: Final result of Multiresolution Fourier Transform Close
Quit
4.3.2. Likelihood Home Page
Link summarized data (mixture of weighted bivariate Gaussian kernels k(x, y, e) to configuration, assuming additive Gaussian noise. Title Page
f (x, y) = Contents
JJ
II
J
I
XX
→
k(x, y, e) + noise
f˜(x, y)
=
τ ∈F e∈τ
X
k(x, y, e) .
e∈E
Split likelihood by rectangles Ri . `(F|f˜)
=
N X
`i (F|f˜) + constant
i=1 Page 112 of 163
`i (F|f˜)
= −
Go Back
≈
1 2σ 2
1 − 2 2σ
h
X
f˜(x, y) −
h
f˜(x, y) −
Ri
i2 k(x, y, e)
τ ∈F e∈τ
(x,y)∈Ri
ZZ
XX
XX
i2 k(x, y, e) hi (x, y) dx dy
τ ∈F e∈τ
Full Screen
Extend from Ri to all R2 , enabling closed form computation. Close
`i (F|f˜) Quit
≈
−
1 2σ 2
ZZ R2
h
f˜(x, y) −
XX τ ∈F e∈τ
i2 k(x, y, e) gi (x, y) dx dy (4.1)
4.3.3. Prior Home Page
Prior using AR(1) processes for location, width, extending along forests of trees τ produced by subcritical binary Galton-Watson processes.
Title Page
lengthe Contents
JJ
II
J
I
Page 113 of 163
Go Back
Full Screen
Close
Quit
=
α × lengthe0 + Gaussian innovation
for autoregressive parameter α ∈ (0, 1).
(a) sub-critical binary (b) AR(1) process on (c) each edge reprebranching process. edges. sented by a weighted Gaussian kernel.
Figure 4.6: Components of the prior.
(4.2)
4.3.4. MCMC implementation Home Page
We have now formulated likelihood and model. It remains to decide on a suitable MCMC process.
Title Page
Contents
JJ
II
J
I
Page 114 of 163
Go Back
Full Screen
Figure 4.7: Refined scheme for high-level image analysis. Close
Quit
Home Page
Title Page
Changes are local (hence slow). Compute MH accept-reject probabilities using AR(1) location distributions and Y Y P [τ ] = P [ node has n children ] (4.3) n=0,1,2 v∈τ :v has n children
Contents
JJ
II
J
I
Page 115 of 163
Go Back
Full Screen
Close
Figure 4.8: Reversible jump MCMC proposals. Quit
Computations for topological moves. Suppose family-size distribution is Home Page
Title Page
Contents
JJ
II
J
I
Page 116 of 163
Go Back
Full Screen
Close
Quit
(1 − p)2 , 2p(1 − p) , p2 Hastings ratio for adding a specified leaf ` ∈ ∂ext (τ ): H(τ ; `)
=
P [τ ∪ {`}] /#(∂int (τ ∪ {`}) ∪ ∂ext (τ ∪ {`})) . P [τ ∪ {`}] /#(∂int (τ ) ∪ ∂ext (τ ))
Use induction on τ : P [τ ∪ {`}] = P [τ ] p(1 − p), . . . H(τ ; `)
=
p(1 − p)
1 + #(V (τ )) + #(∂ext (τ )) 2 + #(V (τ )) + #(∂ext (τ ))+1
where +1 is added to the denominator if ` is added to a vertex which already has one daughter. It is clear that this Hastings ratio is always smaller than p(1 − p), and is approximately equal to p(1 − p) for large trees τ . Consequently removals of leaves will always be accepted, and additions will usually be accepted, but there is a computational slowdown as the tree size increases.
Home Page
Title Page
Contents
JJ
II
J
I
Page 117 of 163
Go Back
Full Screen
Close
Quit
Geometric ergodicity is impossible: consider second eigenvalue λ2 of the Markov kernel by Rayleigh-Ritz: P P 2 x y |φ(x) − φ(y)| p(x, y)π(y) 1 − λ2 = inf P P 2 φ6=0 x y |φ(x) − φ(y)| π(x)π(y) P P x∈S y∈S c p(x, y)π(y) ≤ 2 π(S) whenever π(S) < 1/2. Taking S to be the set of all trees of depth n or more, and using the fact that such trees must have at least n sites, it follows 1 − λ2
≤
1 2 p(1 − p) n
→
0.
Experiments: Home Page
MCMC for prior model Title Page
JJ
II
First we implement only the moves relating to a single tree, and only to that tree’s topology. We change the tree just one leaf at a time. Convergence is clearly very slow! moreover the animation clearly indicates the problem is to do with incipient criticality phenomena.
J
I
MCMC for prior model with more global changes
Contents
Page 118 of 163
Go Back
Moves dealing with sub-trees will speed things up greatly! We won’t use these as such – however we will be working with a whole forest of such trees, and using more “global” moves of join and divide.
Full Screen
MCMC for occupancy conditioned prior model Close
Quit
Conditioning on data will also help! Here is a crude parody of conditioning (pixels are black/white depending on whether or not they intersect the planar tree).
4.3.5. Results Home Page
Title Page
Figure 4.9 presents snapshots (over about 50k moves) of a first try, using accept/reject probabilities for a Metropolis-Hastings algorithm based on the above prior and likelihood.
Contents
JJ
II
J
I
Page 119 of 163
Go Back
Full Screen
Close
Quit
Figure 4.9: AVI movie of fixed-width MCMC.
We improve the algorithm by adding Home Page
Title Page
Contents
(1) a further AR(1) process to model the width of blood vessels; (2) a Langevin component to the Metropolis-Hasting proposal distribution, biasing the proposal according to the gradient of the posterior density (following Grenander and Miller 1994 and especially Besag’s remarks!); (3) the possibility of proposing addition (and deletion) of features of length 2;
JJ
II
J
I
(4) gradual refinement of feature proposals from coarse to fine levels.
Page 120 of 163
Go Back
Full Screen
Close
Figure 4.10: AVI movies of improved MCMC. Quit
4.3.6. Refining the starting state Home Page
Title Page
Contents
JJ
II
J
I
Page 121 of 163
Go Back
Full Screen
Close
Quit
We improve matters by choosing a better starting state for the algorithm, using a variant of Kruskal’s algorithm. This generates a minimal spanning tree of a graph as follows: Let G = (V, E) be a graph with edges weighted by cost. A spanning tree is a subgraph of G which is (a) a tree and (b) spans all vertices. We wish to find a spanning tree τ of minimal cost: def kruskal (edges) : edgelist ← edges.sort () # sort so cheapest edge is first tree ← [ ] for e ∈ edgelist : if cyclefree (tree, e) : # check no cycle is created tree.append (e) return tree Actually we need a binary tree (so amend cyclefree accordingly!) and the cost depends on the edges to which it is proposed to join e, but a heuristic modification of Kruskal’s algorithm produces a binary forest which is a good candidate for a starting position for our algorithm. Figure 4.11 shows a result; notice how now the algorithm begins with many small trees, which are progressively dropped or joined to other trees.
Home Page
Title Page
Contents
JJ
II
J
I
Page 122 of 163
Go Back
Full Screen
Close
Quit
Figure 4.11: AVI movie of MCMC using starting position provided by Kruskal’s algorithm.
Home Page
Title Page
Contents
JJ
II
J
I
Chapter 5
Multiresolution Ising models
Page 123 of 163
Contents Go Back
Full Screen
Close
Quit
5.1 5.2 5.3 5.4 5.5 5.6
Generalized quad-trees Random clusters . . . . Percolation . . . . . . . Comparison . . . . . . . Simulation . . . . . . . Future work . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
125 129 132 139 141 143
Home Page
Title Page
Contents
JJ
II
J
I
Page 124 of 163
The vascular problem discussed above involves multi-resolution image analysis described, for example, in Wilson and Li (2002). This represents the image on different scales: the simplest example is that of a quad-tree (a tree graph in which each node has four daughters and just one parent), for which each node carries a random value 0 (coding the colour white) and 1 (coding the colour black), and the random values interact in the manner of a Markov point process with their neighbours. In the vascular case the 0/1 value is replaced by a line segment summarizing the linear structure which best explains the local data at the appropriate resolution level, but for now we prefer to analyze the simpler 0/1 case. Conventional multi-resolution models have interactions only running up and down the tree; however in our case we clearly need interactions between neighbours at the same resolution level as well. This complicates the analysis!
Go Back
Full Screen
Close
Quit
A major question is to determine the extent to which values at high resolution levels are correlated with values at lower resolution levels (Kendall and Wilson 2003). This is related to much recent interest in the question of phase transitions for Ising models on non-Euclidean random graphs. We shall show how geometric arguments lead to a schematic phase portrait.
5.1. Generalized quad-trees Home Page
Title Page
Define Qd as graph whose vertices are cells of all dyadic tessellations of Rd , with edges connecting each cell u to its 2d neighbours, and also its parent M(u) (covering cell in tessellation at next resolution down) and its 2d daughters (cells which it covers in next resolution up).
Contents
JJ
II
J
I
Case d = 1:
Page 125 of 163
Go Back
Full Screen
Close
Quit
Neighbours at same level also are connected. Remark: No spatial symmetry! However: Qd can be constructed using an iterated function system generated by elements of a skew-product group R+ ⊗s Rd .
Further define: Home Page
Title Page
Contents
JJ
II
J
I
Page 126 of 163
• Qd;r as subgraph of Qd at resolution levels of r or higher;
• Qd (o) as subgraph formed by o and all its descendants.
Go Back
Full Screen
Close
Quit
• Remark: there are many graph-isomorphisms Su;v between Qd;r and Qd;s , with natural Zd -action; • Remark: there are graph homomorphisms injecting Q(o) into itself, sending o to x ∈ Q(o) (semi-transitivity).
Simplistic analysis Home Page
Title Page
Contents
JJ
II
J
I
Page 127 of 163
Define Jλ to be strength of neighbour interaction, Jτ to be strength of parent interaction. If Sx = ±1 then probability of configuration is proportional to exp(−H) where X 1 Jhx,yi (Sx Sy − 1) , (5.1) H=− 2 hx,yi∈E(G)
for Jhx,yi = Jλ , Jτ as appropriate. If Jλ = 0 then the free Ising model on Qd (o) is a branching process (Preston 1977; Spitzer 1975); if Jτ = 0 then the Ising model on Qd (o) decomposes into sequence of d-dimensional classical (finite) Ising models. So we know there is a phase change at (Jλ , Jτ ) = (0, ln(5/3)) (results from branching processes), √ and expect one at (Jλ , Jτ ) = (ln(1 + 2), 0+) (results from 2-dimensional Ising model). But is this all that there is to say?
Go Back
Full Screen
Close
Quit
Home Page
Title Page
There has been a lot of work on percolation and Ising models on non-Euclidean graphs (Grimmett and Newman 1990; Newman and Wu 1990; Benjamini and Schramm 1996; Benjamini et al. 1999; H¨aggstr¨om and Peres 1999; H¨aggstr¨om et al. 1999; Lyons 2000; H¨aggstr¨om et al. 2002). Mostly this concerns graphs admitting substantial symmetry (unimodular, quasi-transitive), so that one can use H¨aggstr¨om’s remarkable Mass Transport method:
Contents
JJ
II
J
I
Page 128 of 163
Go Back
Full Screen
Close
Quit
Theorem 5.1 Let Γ ⊂ Aut(G) be unimodular and quasi-transitive. Given mass transport m(x, y, ω) ≥ 0 “diagonally invariant”, set M (x, y) = R m(x, y, ω) dω. Then the expected total transport out of any x equals Ω the expected total transport into x: X X M (x, y) = M (y, x) . y∈Γx
y∈Γx
Sadly our graphs do not have quasi-transitive symmetry :-(. Furthermore the literature mostly deals with phase transition results “for sufficiently large values of the parameter”. This is an invaluable guide, but image analysis needs actual estimates!
5.2. Random clusters Home Page
Title Page
Contents
JJ
II
J
I
Page 129 of 163
Go Back
Full Screen
Close
Quit
A similar problem, concerning Ising models on products of trees with Euclidean lattices, is treated by Newman and Wu (1990). We follow them by exploiting the celebrated Fortuin-Kasteleyn random cluster representation (Fortuin and Kasteleyn 1972; Fortuin 1972a; Fortuin 1972b): The Ising model is the marginal site process at q = 2 of a site/bond process derived from a dependent bond percolation model with configuration probability Pq,p proportional to qC ×
Y
(phx,yi )bhx,yi × (1 − phx,yi )1−bhx,yi .
hx,yi∈E(G)
(where bhx,yi indicates whether or not hx, yi is closed, and C is the number of connected clusters of vertices). Site spins are chosen to be the same in each cluster independently of other clusters with equal probabilities for ±1. We find phx,yi = 1 − exp(−Jhx,yi ) ;
(5.2)
Argue for general q (Potts model): spot the Gibbs’ sampler! (I) given bonds, chose connected component colours independently and uniformly from 1, . . . , q; (II) given colouring, open bonds hx, yi independently, probability phx,yi .
Home Page
Title Page
Exercise 5.2 Check step (II) gives correct conditional probabilities! Now compute conditional probability that site o gets colour 1 given neighbours split into components C1 , . . . , Cr with colour 1, and other neighbours in set C0 .
Contents
Exercise 5.3 Use inclusion-exclusion to show it is this: JJ
II
J
I
Page 130 of 163
! Y 1 P [colour(o) = 1|neighbours] ∝ ×q (1 − phx,yi ) + q y ! Y Y + (1 − phx,yi ) 1− 1 − phx,zi y∈C0
Go Back
z∈C1 ∪...∪Cr
=
Y
(1 − phx,yi ) .
y:colour(y)6=1 Full Screen
Close
Quit
Deduce interaction strength is Jhx,yi = − log(1 − phx,yi ) as stated in Equation (5.2) above.
FK-comparison inequalities Home Page
If q ≥ 1 and A is an increasing event then Pq,p (A) ≤ P1,p (A) Pq,p (A) ≥ P1,p0 (A)
Title Page
Contents
where p0hx,yi =
JJ
II
J
I
Page 131 of 163
Go Back
Full Screen
Close
Quit
(5.3) (5.4)
phx,yi phx,yi = . phx,yi + (1 − phx,yi )q q − (q − 1)phx,yi
Since P1,p is bond percolation (bonds open or not independently of each other), we can find out about phase transitions by studying independent bond percolation.
5.3. Percolation Home Page
Title Page
Contents
JJ
II
J
I
Page 132 of 163
Go Back
Independent bond percolation on products of trees with Euclidean lattices have been studied by Grimmett and Newman (1990), and these results were used in the Newman and Wu work on the Ising model. So we can make good progress by studying independent bond percolation on Qd , using pτ for parental bonds, pλ for neighbour bonds.
Theorem 5.4 There is almost surely no infinite cluster in Qd;0 (and consequently in Qd (o)) if q 2d τ Xλ 1 + 1 − Xλ−1 < 1, where Xλ is the mean size of the percolation cluster at the origin for λpercolation in Zd .
Full Screen
Close
Quit
Modelled on Grimmett and Newman (1990, §3 and §5). Get from matrix spectral asymptotics.
1+
q
1 − Xλ−1
1 Home Page
Title Page
λ Contents
? JJ
II
J
I
Infinite clusters here
Page 133 of 163
0 Go Back
No infinite clusters here 0
(may or may not be unique)
2
−d
1 τ
Full Screen
Close
Quit
The story so far: small λ, small to moderate τ .
Case of small τ Home Page
Title Page
Contents
JJ
II
J
I
Page 134 of 163
Go Back
Full Screen
Close
Quit
Need d = 2 for mathematical convenience. Use Borel-Cantelli argument and planar duality to show, for supercritical λ > 1/2 (that is, supercritical with respect to planar bond percolation!), all but finitely many of the resolution layers Ln = [1, 2n ] × [1, 2n ] of Q2 (o) have just one large cluster each of diameter larger than constant × n. Hence . . . Theorem 5.5 When λ > 1/2 and τ is positive there is one and only one infinite cluster in Q2 (o).
1 Home Page
Just one unique infinite cluster here
Title Page
λ Contents
JJ
II
J
I
1/2
?
Page 135 of 163
0 Go Back
Full Screen
Close
Quit
No infinite clusters here 0
Infinite clusters here (may or may not be unique)
1/4
τ
The story so far: adds small τ for case d = 2.
1
Uniqueness of infinite clusters Home Page
Title Page
Contents
JJ
II
J
I
Page 136 of 163
Go Back
Full Screen
Close
Quit
The Grimmett and Newman (1990) work was remarkable in pointing out that as τ increases so there is a further phase change, from many to just one infinite cluster for λ > 0. The work of Grimmett and Newman carries √ through for Qd (o). However the relevant bound is improved by a factor of 2 if we take into account the hyperbolic structure of Qd (o)! Theorem 5.6 If τ < 2−(d−1)/2 and λ > 0 then there cannot be just one infinite cluster in Qd;0 .
Method: sum weights of “up-paths” in Qd;0 starting, ending at level 0. For fixed s and start point there are infinitely many such up-paths containing s λ-bonds; but no more than (1 + 2d + 2d )s which cannot be reduced by “shrinking” excursions. Hence control the mean number of open up-paths stretching more than a given distance at level 0.
Contribution to upper bound on second phase transition: Home Page
Title Page
Contents
JJ
II
J
I
Page 137 of 163
Go Back
Full Screen
Close
Quit
p Theorem 5.7 If τ > 2/3 then the infinite cluster of Q2:0 is almost surely unique for all positive λ. Method: prune bonds, branching processes, 2-dim comparison . . .
1
Home Page
Just one unique infinite cluster here
Title Page
Contents
JJ
II
J
λ
1/2
I
Page 138 of 163 0
Go Back
Infinite clusters here
?
0
(may or may not be unique)
No infinite clusters here
Many infinite clusters 1
1/4 τ
0.707
0.816
The story so far: includes uniqueness transition for case d = 2. Full Screen
Close
Quit
5.4. Comparison Home Page
Title Page
We need to apply the Fortuin-Kasteleyn comparison inequalities (5.3) and (5.4). The event “just one infinite cluster” is not increasing, so we need more. Newman and Wu (1990) show it suffices to establish a finite island property for the site percolation derived under adjacency when all infinite clusters are removed. Thus:
Contents
JJ
II
J
I
1
λ
finite islands
1/2
Page 139 of 163
Go Back 0
Full Screen
0
1
1/4 τ
Close
Quit
0.707
0.816
Home Page
Comparison arguments then show the following schematic phase diagram for the Ising model on Q2 (o):
Title Page
Contents
JJ
II
J
I
All nodes substantially
Free boundary condition
correlated with each
is mixture of the two extreme
other in case of free
Gibbs states (spin 1 at
boundary conditions
boundary, spin −1 at boundary)
1.099
0.693
Page 140 of 163
Go Back Unique
J Full Screen
λ
Gibbs state
J Close
Quit
τ
Root influenced by wired boundary
0.288
0.511 0.762
1.228 1.190
2.292
5.5. Simulation Home Page
Title Page
Approximate simulations confirm the general story (compare our original animation, which had Jτ = 2, Jλ = 1/4):
Contents
http://www.dcs.warwick.ac.uk/˜rgw/sira/sim.html
JJ
II
J
I
(1) Only 200 resolution levels;
(2) At each level, 1000 sweeps in scan order; Page 141 of 163
Go Back
Full Screen
(3) At each level, simulate square sub-region of 128 × 128 pixels conditioned by mother 64 × 64 pixel region;
(4) Impose periodic boundary conditions on 128 × 128 square region;
Close
Quit
(5) At the coarsest resolution, all pixels set white. At subsequent resolutions, ‘all black’ initial state.
Home Page
Title Page
Contents
JJ
II
J
I
(a) Jλ = 1, Jτ = 0.5
(b) Jλ = 1, Jτ = 1
(c) Jλ = 1, Jτ = 2
(d) Jλ = 0.5, Jτ = 0.5
(e) Jλ = 0.5, Jτ = 1
(f) Jλ = 0.5, Jτ = 2
(g) Jλ = 0.25, Jτ = 0.5
(h) Jλ = 0.25, Jτ = 1
(i) Jλ = 0.25, Jτ = 2
Page 142 of 163
Go Back
Full Screen
Close
Quit
5.6. Future work Home Page
Title Page
Contents
JJ
II
J
I
Page 143 of 163
Go Back
Full Screen
Close
Quit
This is about the free Ising model on Q2 (o). Image analysis more naturally concerns the case of prescribed boundary conditions (say, image at finest resolution level . . . ). Question: will boundary conditions at “infinite fineness” propagate back to finite resolution? Series and Sina˘ı (1990) show answer is yes for analogous problem on hyperbolic disk (2-dim, all bond probabilities the same). Gielis and Grimmett (2002) point out (eg, in Z3 case) these boundary conditions translate to a conditioning for random cluster model, and investigate using large deviations. Project: do same for Q2 (o) . . . and get quantitative bounds? Project: generalize from Qd to graphs with appropriate hyperbolic structure.
Home Page
Title Page
Contents
JJ
II
J
I
Page 144 of 163
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 145 of 163
Go Back
Full Screen
Close
Quit
Appendix A
Notes on the proofs in Chapter 5
A.1. Notes on proof of Theorem 5.4 Home Page
Mean size of cluster at o bounded above by ∞ X X
Title Page
n−T (t)
Xλ τ n (Xλ − 1)T (t) Xλ
n=0 t:|t|=n Contents
≤ JJ
Xλ (τ Xλ )n
n=0
II
≤ J
∞ X
I
∞ X
Page 146 of 163
(1 − Xλ−1 )T (t)
t:|t|=n
Xλ (2d τ Xλ )n
n=0
≈
X
X
(1 − Xλ−1 )T (j)
j:|j|=n ∞ X
n q Xλ (2d τ Xλ )n 1 + 1 − Xλ−1 .
n=0 Go Back
Full Screen
For last step, use spectral analysis of matrix representation n X 1 1 1 −1 T (j) 1 1 (1 − Xλ ) = . 1 1 − Xλ−1 1 j:|j|=n
Close
Quit
A.2. Notes on proof of Theorem 5.5 Home Page
Title Page
Contents
JJ
II
J
I
Uniqueness: For negative exponent ξ(1 − λ) of dual connectivity function, set `n
=
(n log 4 + (2 + ) log n) ξ(1 − λ) .
More than one “`n -large” cluster in Ln forces existence of open path in dual lattice longer than `n . Now use Borel-Cantelli . . . . On the other hand super-criticality will mean some distant points in Ln are interconnected. Existence: consider 4n−[n/2] points in Ln−1 and specified daughters in Ln . Study probability that (a) parent percolates more than `n−1 ,
Page 147 of 163
Go Back
(b) parent and child are connected, (c) child percolates more than `n . Now use Borel-Cantelli again . . . .
Full Screen
Close
Quit
A.3. Notes on proof of Theorem 5.6 Home Page
Two relevant lemmata: Title Page
Lemma A.1 Consider u ∈ Ls+1 ⊂ Qd and v = M(u) ∈ Ls ⊂ Qd . There are exactly 2d solutions in Ls+1 of Contents
M(x) JJ
II
J
I
Page 148 of 163
=
Su;v (x) .
One is x = u. Others are the remaining 2d − 1 vertices y such that closure of cell representing y intersects vertex shared by closures of cells representing u and M(u). Finally, if x ∈ Ls+1 does not solve M(x) = Su;v (x) then kSu;v (x) − Su;v (u)ks,∞
>
kM(x) − M(u)ks,∞ .
Go Back
Full Screen
Lemma A.2 Given distinct v and y in the same resolution level. Count pairs of vertices u, x in the resolution level one step higher, such that (a) M(u) = v; (b) M(x) = y; (c) Su;v (x) = y.
Close
Quit
There are at most 2d−1 such vertices.
A.4. Notes on proof of Theorem 5.7 Home Page
Prune! Then a direct connection is certainly established across the boundary between the cells corresponding to two neighbouring vertices u, v in L0 if Title Page
(a) the τ -bond leading from u to the relevant boundary is open; Contents
JJ
II
J
I
(b) a τ -branching process (formed by using τ -bonds mirrored across the boundary) survives indefinitely, where this branching process has family-size distribution Binomial(2, τ 2 ); (c) the τ -bond leading from v to the relevant boundary is open.
Page 149 of 163
Go Back
Full Screen
Close
Quit
Then there are infinitely many chances of making a connection across the cell boundary.
A.5. Notes on proof of infinite island property Home Page
Title Page
Notion of “cone boundary” ∂c (S) of finite subset S of vertices: collection of daughters v of S such that Qd (v) ∩ S = ∅. Use induction on S, building it layer Ln on layer Ln−1 to obtain an isoperimetric bound: #(∂c (S)) ≥ (2d − 1)#(S). Hence deduce
Contents
P [S in island at u] JJ
II
J
I
Page 150 of 163
≤
(2d −1)n
(1 − pτ (1 − η))
where #(S) = n and η = P [u not in infinite cluster of Qd (u)]. Upper bound on number N (n) of self-avoiding paths S of length n beginning at u0 : N (n) ≤ (1 + 2d + 2d )(2d + 2d )n . Hence upper bound on the mean size of the island: ∞ X
Go Back
n(1−2−d )
(1 + 2d + 2d )(2d + 2d )n ηbr
,
n=0 Full Screen
Close
Quit
where ηbr is extinction probability for branching process based on Binomial(2d , pτ ) family distribution.
Home Page
Title Page
Bibliography
Contents
JJ
II
J
I
Page 151 of 163
Go Back
Full Screen
Close
Quit
Aldous, D. and M. T. Barlow [1981]. On countable dense random sets. Lecture Notes in Mathematics 850, 311– 327. Aldous, D. J. and P. Diaconis [1987]. Strong uniform times and finite random walks. Advances in Applied Mathematics 8(1), 69–97. Baddeley, A. J. and J. Møller [1989]. Nearest-neighbour Markov point processes and random sets. Int. Statist. Rev. 57, 89–121. Baddeley, A. J. and M. N. M. van Lieshout [1993]. Stochastic geometry models in high-level
vision. Journal of Applied Statistics 20, 233–258. Baddeley, A. J. and M. N. M. van Lieshout [1995]. Areainteraction point processes. Annals of the Institute of Statistical Mathematics 47, 601–619. Barndorff-Nielsen, O. E., W. S. Kendall, and M. N. M. van Lieshout (Eds.) [1999]. Stochastic Geometry: Likelihood and Computation. Number 80 in Monographs on Statistics and Applied Probability. Boca Raton: Chapman and Hall / CRC. Benjamini, I., R. Lyons, Y. Peres, and
Home Page
Title Page
Contents
JJ
II
J
I
Page 152 of 163
Go Back
Full Screen
Close
Quit
O. Schramm [1999]. Group-invariant percolation on graphs. Geom. Funct. Anal. 9(1), 29–66. Benjamini, I. and O. Schramm [1996]. Percolation beyond Zd , many questions and a few answers. Electronic Communications in Probability 1, no. 8, 71–82 (electronic). Bhalerao, A., E. Th¨onnes, W. S. Kendall, and R. G. Wilson [2001]. Inferring vascular structure from 2d and 3d imagery. In W. J. Niessen and M. A. Viergever (Eds.), Medical Image Computing and Computer-Assisted Intervention, proceedings of MICCAI 2001, Volume 2208 of Springer Lecture Notes in Computer Science, pp. 820–828. Springer-Verlag. Also: University of Warwick Department of Statistics Research Report 392. Billera, L. J. and P. Diaconis [2001]. A geometric interpretation of the Metropolis-Hastings algorithm. Statistical Science 16(4), 335–339. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone [1984]. Clas-
sification and Regression. Statistics/Probability. Wadsworth. Brooks, S. P., Y. Fan, and J. S. Rosenthal [2002]. Perfect forward simulation via simulated tempering. Research report, University of Cambridge. Burdzy, K. and W. S. Kendall [2000, May]. Efficient Markovian couplings: examples and counterexamples. The Annals of Applied Probability 10(2), 362–409. Also University of Warwick Department of Statistics Research Report 331. Cai, Y. and W. S. Kendall [2002, July]. Perfect simulation for correlated Poisson random variables conditioned to be positive. Statistics and Computing 12, 229–243. Also: University of Warwick Department of Statistics Research Report 349. Carter, D. S. and P. M. Prenter [1972]. Exponential spaces and counting processes. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 21, 1–19.
Home Page
Title Page
Contents
JJ
II
J
I
Page 153 of 163
Go Back
Full Screen
Close
Quit
Chipman, H., E. George, and R. McCulloch [2002]. Bayesian treed models. Machine Learning 48, 299–320. Clancy, D. and P. D. O’Neill [2002]. Perfect simulation for stochastic models of epidemics among a community of households. Research report, School of Mathematical Sciences, University of Nottingham. Cowan, R., M. Quine, and S. Zuyev [2003]. Decomposition of gammadistributed domains constructed from Poisson point processes. Advances in Applied Probability 35(1), 56–69. Diaconis, P. [1988]. Group Representations in Probability and Statistics, Volume 11 of IMS Lecture Notes Series. Hayward, California: Institute of Mathematical Statistics. Ferrari, P. A., R. Fern´andez, and N. L. Garcia [2002]. Perfect simulation for interacting point processes, loss networks and Ising models. Stochastic Process. Appl. 102(1), 63–88.
Fill, J. [1998]. An interruptible algorithm for exact sampling via Markov Chains. The Annals of Applied Probability 8, 131–162. Fill, J. A., M. Machida, D. J. Murdoch, and J. S. Rosenthal [2000]. Extension of Fill’s perfect rejection sampling algorithm to general chains. Random Structures and Algorithms 17(3-4), 290–316. Fortuin, C. M. [1972a]. On the randomcluster model. II. The percolation model. Physica 58, 393–418. Fortuin, C. M. [1972b]. On the randomcluster model. III. The simple random-cluster model. Physica 59, 545–570. Fortuin, C. M. and P. W. Kasteleyn [1972]. On the random-cluster model. I. Introduction and relation to other models. Physica 57, 536–564. Foss, S. G. and R. L. Tweedie [1998]. Perfect simulation and backward coupling. Stochastic Models 14, 187–203.
Home Page
Title Page
Contents
JJ
II
J
I
Page 154 of 163
Go Back
Full Screen
Close
Quit
Gelfand, A. E. and A. F. M. Smith [1990]. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85(410), 398–409. Geman, S. and D. Geman [1984]. Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741. Geyer, C. J. and J. Møller [1994]. Simulation and likelihood inference for spatial point processes. Scand. J. Statist. 21, 359–373. Gielis, G. and G. R. Grimmett [2002]. Rigidity of the interface for percolation and random-cluster models. Journal of Statistical Physics 109, 1–37. See also arXiv:math.PR/0109103., Green, P. J. [1995]. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732.
Grenander, U. and M. I. Miller [1994]. Representations of knowledge in complex systems (with discussion and a reply by the authors). Journal of the Royal Statistical Society (Series B: Methodological) 56(4), 549– 603. Grimmett, G. R. and C. Newman [1990]. Percolation in ∞ + 1 dimensions. In Disorder in physical systems, pp. 167–190. New York: The Clarendon Press Oxford University Press. Also at Hadwiger, H. [1957]. Vorlesungen u¨ ber Inhalt, Oberfl¨ache und Isoperimetrie. Berlin: Springer-Verlag. H¨aggstr¨om, O., J. Jonasson, and R. Lyons [2002]. Explicit isoperimetric constants and phase transitions in the random-cluster model. The Annals of Probability 30(1), 443–473. H¨aggstr¨om, O. and Y. Peres [1999]. Monotonicity of uniqueness for percolation on Cayley graphs: all infinite clusters are born simultane-
Home Page
Title Page
Contents
JJ
II
J
I
Page 155 of 163
Go Back
Full Screen
Close
Quit
ously. Probability Theory and Related Fields 113(2), 273–285. H¨aggstr¨om, O., Y. Peres, and R. H. Schonmann [1999]. Percolation on transitive graphs as a coalescent process: relentless merging followed by simultaneous uniqueness. In Perplexing problems in probability, pp. 69– 90. Boston, MA: Birkh¨auser Boston. H¨aggstr¨om, O. and J. E. Steif [2000]. Propp-Wilson algorithms and finitary codings for high noise Markov random fields. Combin. Probab. Computing 9, 425–439. H¨aggstr¨om, O., M. N. M. van Lieshout, and J. Møller [1999]. Characterisation results and Markov chain Monte Carlo algorithms including exact simulation for some spatial point processes. Bernoulli 5(5), 641–658. Was: Aalborg Mathematics Department Research Report R-96-2040. Hall, P. [1988]. Introduction to the theory of coverage processes. Wiley Series in Probability and Mathematical Statistics: Probability and Math-
ematical Statistics. New York: John Wiley & Sons. Hastings, W. K. [1970]. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. Himmelberg, C. J. [1975]. Measurable relations. Fundamenta mathematicae 87, 53–72. Huber, M. [2003]. A bounding chain for Swendsen-Wang. Random Structures and Algorithms 22(1), 43–59. Hug, D., M. Reitzner, and R. Schneider [2003]. The limit shape of the zero cell in a stationary poisson hyperplane tessellation. Preprint 03-03, Albert-Ludwigs-Universit¨at Freiburg. Jeulin, D. [1997]. Conditional simulation of object-based models. In D. Jeulin (Ed.), Advances in Theory and Applications of Random Sets, pp. 271– 288. World Scientific. Kallenberg, O. [1977]. A counterexample to R. Davidson’s conjecture on
Home Page
Title Page
Contents
JJ
II
J
I
Page 156 of 163
Go Back
Full Screen
Close
Quit
line processes. Math. Proc. Camb. Phil. Soc. 82(2), 301–307. Kelly, F. P. and B. D. Ripley [1976]. A note on Strauss’s model for clustering. Biometrika 63, 357–360. Kendall, D. G. [1974]. Foundations of a theory of random sets. In E. F. Harding and D. G. Kendall (Eds.), Stochastic Geometry. John Wiley & Sons. Kendall, W. S. [1997a]. On some weighted Boolean models. In D. Jeulin (Ed.), Advances in Theory and Applications of Random Sets, Singapore, pp. 105–120. World Scientific. Also: University of Warwick Department of Statistics Research Report 295. Kendall, W. S. [1997b]. Perfect simulation for spatial point processes. In Bulletin ISI, 51st session proceedings, Istanbul (August 1997), Volume 3, pp. 163–166. Also: University of Warwick Department of Statistics Research Report 308.
Kendall, W. S. [1998]. Perfect simulation for the area-interaction point process. In L. Accardi and C. C. Heyde (Eds.), Probability Towards 2000, New York, pp. 218–234. SpringerVerlag. Also: University of Warwick Department of Statistics Research Report 292. Kendall, W. S. [2000, March]. Stationary countable dense random sets. Advances in Applied Probability 32(1), 86–100. Also University of Warwick Department of Statistics Research Report 341. Kendall, W. S. and J. Møller [2000, September]. Perfect simulation using dominating processes on ordered state spaces, with application to locally stable point processes. Advances in Applied Probability 32(3), 844–865. Also University of Warwick Department of Statistics Research Report 347. Kendall, W. S. and G. Montana [2002, May]. Small sets and Markov transition densities. Stochastic Processes
Home Page
Title Page
Contents
JJ
II
J
I
Page 157 of 163
Go Back
Full Screen
Close
Quit
and Their Applications 99(2), 177– 194. Also University of Warwick Department of Statistics Research Report 371. Kendall, W. S. and E. Th¨onnes [1999]. Perfect simulation in stochastic geometry. Pattern Recognition 32(9), 1569–1586. Also: University of Warwick Department of Statistics Research Report 323. Kendall, W. S., M. van Lieshout, and A. Baddeley [1999]. Quermassinteraction processes: conditions for stability. Advances in Applied Probability 31(2), 315–342. Also University of Warwick Department of Statistics Research Report 293. Kendall, W. S. and R. G. Wilson [2003, March]. Ising models and multiresolution quad-trees. Advances in Applied Probability 35(1), 96–122. Also University of Warwick Department of Statistics Research Report 402. Kingman, J. F. C. [1993]. Poisson processes, Volume 3 of Oxford Stud-
ies in Probability. New York: The Clarendon Press Oxford University Press. Oxford Science Publications. Kolmogorov, A. N. [1936]. Zur theorie der markoffschen ketten. Math. Annalen 112, 155–160. Kovalenko, I. [1997]. A proof of a conjecture of David Kendall on the shape of random polygons of large area. Kibernet. Sistem. Anal. (4), 3– 10, 187. English translation: Cybernet. Systems Anal. 33 461-467. Kovalenko, I. N. [1999]. A simplified proof of a conjecture of D. G. Kendall concerning shapes of random polygons. J. Appl. Math. Stochastic Anal. 12(4), 301–310. Kumar, V. S. A. and H. Ramesh [2001]. Coupling vs. conductance for the Jerrum-Sinclair chain. Random Structures and Algorithms 18(1), 1– 17. Kurtz, T. G. [1980]. The optional sampling theorem for martingales indexed by directed sets. The Annals of
Probability 8, 675–681. Home Page
Title Page
Contents
JJ
II
J
I
Page 158 of 163
Go Back
Full Screen
Close
Quit
Lee, A. B., D. Mumford, and J. Huang [2001]. Occlusion models for natural images: A statistical study of a scale-invariant dead leaves model. International Journal of Computer Vision 41, 35–59. Li, W. and A. Bhalerao [2003]. Model based segmentation for retinal fundus images. In Proceedings of Scandinavian Conference on Image Analysis SCIA’03, G¨oteborg, Sweden, pp. to appear. Liang, F. and W. H. Wong [2000]. Evolutionary Monte Carlo sampling: Applications to Cp model sampling and change-point problem. Statistica Sinica 10, 317–342. Liang, F. and W. H. Wong [2001, June]. Real-parameter evolutionary Monte Carlo with applications to Bayesian mixture models. Journal of the American Statistical Association 96(454), 653–666. Lindvall, T. [1982]. On coupling of
continuous-time renewal processes. Journal of Applied Probability 19(1), 82–89. Lindvall, T. [2002]. Lectures on the coupling method. Mineola, NY: Dover Publications Inc. Corrected reprint of the 1992 original. Liu, J. S., F. Liang, and W. H. Wong [2001, June]. A theory for dynamic weighting in Monte Carlo computation. Journal of the American Statistical Association 96(454), 561–573. Loizeaux, M. A. and I. W. McKeague [2001]. Perfect sampling for posterior landmark distributions with an application to the detection of disease clusters. In Selected Proceedings of the Symposium on Inference for Stochastic Processes, Volume 37 of Lecture Notes - Monograph Series, pp. 321–331. Institute of Mathematical Statistics. Loynes, R. [1962]. The stability of a queue with non-independent interarrival and service times. Math. Proc. Camb. Phil. Soc. 58(3), 497–520.
Home Page
Title Page
Contents
JJ
II
J
I
Page 159 of 163
Go Back
Lyons, R. [2000]. Phase transitions on nonamenable graphs. J. Math. Phys. 41(3), 1099–1126. Probabilistic techniques in equilibrium and nonequilibrium statistical physics. Matheron, G. [1975]. Random Sets and Integral Geometry. Chichester: John Wiley & Sons. Meester, R. and R. Roy [1996]. Continuum percolation, Volume 119 of Cambridge Tracts in Mathematics. Cambridge: Cambridge University Press. Metropolis, N., A. V. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller [1953]. Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, 1087–1092.
Full Screen
Meyn, S. P. and R. L. Tweedie [1993]. Markov Chains and Stochastic Stability. New York: Springer-Verlag.
Close
Miles, R. E. [1995]. A heuristic proof of a long-standing conjecture of D. G. Kendall concerning the shapes of
Quit
certain large random polygons. Advances in Applied Probability 27(2), 397–417. Minlos, R. A. [1999]. Introduction to Mathematical Statistical Physics, Volume 19 of University Lecture Series. American Mathematical Society. Mira, A., J. Møller, and G. O. Roberts [2001]. Perfect slice samplers. Journal of the Royal Statistical Society (Series B: Methodological) 63(3), 593–606. See (Mira, Møller, and Roberts 2002) for corrigendum. Mira, A., J. Møller, and G. O. Roberts [2002]. Corrigendum: perfect slice samplers. Journal of the Royal Statistical Society (Series B: Methodological) 64(3), 581. Molchanov, I. S. [1996a]. A limit theorem for scaled vacancies of the Boolean model. Stochastics and Stochastic Reports 58(1-2), 45–65. Molchanov, I. S. [1996b]. Statistics of the Boolean Models for Practition-
Home Page
Title Page
Contents
JJ
II
J
I
Page 160 of 163
Go Back
Full Screen
Close
Quit
ers and Mathematicians. Chichester: John Wiley & Sons.
of Statistics Theory and Applications 25, 483–502.
Møller, J. (Ed.) [2003]. Spatial Statistics and Computational Methods, Volume 173 of Lecture Notes in Statistics. Springer-Verlag.
Mykland, P., L. Tierney, and B. Yu [1995]. Regeneration in Markov chain samplers. Journal of the American Statistical Association 90(429), 233–241.
Møller, J. and G. Nicholls [1999]. Perfect simulation for sample-based inference. Technical Report R-99-2011, Department of Mathematical Sciences, Aalborg University. Møller, J. and R. Waagepetersen [2003]. Statistics inference and simulation for spatial point processes. Monographs on Statistics and Applied Probability. Boca Raton: Chapman and Hall / CRC. Møller, J. and S. Zuyev [1996]. Gammatype results and other related properties of Poisson processes. Advances in Applied Probability 28(3), 662– 673. Murdoch, D. J. and P. J. Green [1998]. Exact sampling from a continuous state space. Scandinavian Journal
Newman, C. and C. Wu [1990]. Markov fields on branching planes. Probability Theory and Related Fields 85(4), 539–552. Nummelin, E. [1984]. General irreducible Markov chains and nonnegative operators. Cambridge: Cambridge University Press. O’Neill, P. D. [2001]. Perfect simulation for Reed-Frost epidemic models. Statistics and Computing 13, 37– 44. Parkhouse, J. G. and A. Kelly [1998]. The regular packing of fibres in three dimensions. Philosophical Transactions of the Royal Society, London, Series A 454(1975), 1889–1909. Peskun, P. H. [1973]. Optimum Monte
Home Page
Title Page
Contents
JJ
II
J
I
Page 161 of 163
Go Back
Full Screen
Close
Quit
Carlo sampling using Markov chains. Biometrika 60, 607–612. Preston, C. J. [1977]. Spatial birth-anddeath processes. Bull. Inst. Internat. Statist. 46, 371–391. Propp, J. G. and D. B. Wilson [1996]. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms 9, 223–252. Quine, M. P. and D. F. Watson [1984]. Radial generation of n-dimensional Poisson processes. Journal of Applied Probability 21(3), 548–557. Ripley, B. D. [1977]. Modelling spatial patterns (with discussion). Journal of the Royal Statistical Society (Series B: Methodological) 39, 172–212. Ripley, B. D. [1979]. Simulating spatial patterns: dependent samples from a multivariate density. Applied Statistics 28, 109–112. Ripley, B. D. and F. P. Kelly [1977]. Markov point processes. J. Lond. Math. Soc. 15, 188–192.
Robbins, H. E. [1944]. On the measure of a random set I. Annals of Mathematical Statistics 15, 70–74. Robbins, H. E. [1945]. On the measure of a random set II. Annals of Mathematical Statistics 16, 342–347. Roberts, G. O. and N. G. Polson [1994]. On the geometric convergence of the Gibbs sampler. Journal of the Royal Statistical Society (Series B: Methodological) 56(2), 377–384. Roberts, G. O. and J. S. Rosenthal [1999]. Convergence of slice sampler Markov chains. Journal of the Royal Statistical Society (Series B: Methodological) 61(3), 643–660. Roberts, G. O. and J. S. Rosenthal [2000]. Recent progress on computable bounds and the simple slice sampler. In Monte Carlo methods (Toronto, ON, 1998), pp. 123–130. Providence, RI: American Mathematical Society. Roberts, G. O. and R. L. Tweedie [1996a]. Exponential convergence of
Home Page
Title Page
Contents
JJ
II
J
I
Page 162 of 163
Go Back
Full Screen
Close
Quit
Langevin diffusions and their discrete approximations. Bernoulli 2, 341–364.
Verlag, Berlin). Strauss, D. J. [1975]. A model for clustering. Biometrika 63, 467–475.
Roberts, G. O. and R. L. Tweedie [1996b]. Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83, 96–110.
Th¨onnes, E. [1999]. Perfect simulation of some point processes for the impatient user. Advances in Applied Probability 31(1), 69–87. Also University of Warwick Statistics Research Report 317.
Rosenthal, J. S. [2002]. Quantitative convergence rates of Markov chains: A simple account. Electronic Communications in Probability 7, no. 13, 123–128 (electronic).
Th¨onnes, E., A. Bhalerao, W. S. Kendall, and R. G. Wilson [2002]. A Bayesian approach to inferring vascular tree structure from 2d imagery. In W. J. Niessen and M. A. Viergever (Eds.), Proceedings IEEE ICIP-2002, Volume II, Rochester, NY, pp. 937–939. Also: University of Warwick Department of Statistics Research Report 391.
Series, C. and Y. Sina˘ı [1990]. Ising models on the Lobachevsky plane. Comm. Math. Phys. 128(1), 63–76. Spitzer, F. [1975]. Markov random fields on an infinite tree. The Annals of Probability 3(3), 387–398. Stoyan, D., W. S. Kendall, and J. Mecke [1995]. Stochastic geometry and its applications (Second ed.). Chichester: John Wiley & Sons. (First edition in 1987 joint with Akademie
Th¨onnes, E., A. Bhalerao, W. S. Kendall, and R. G. Wilson [2003]. Bayesian analysis of vascular structure. In LASR 2003, Volume to appear. Thorisson, H. [2000]. Coupling, stationarity, and regeneration. New York:
Springer-Verlag. Home Page
Title Page
Contents
JJ
II
J
I
Page 163 of 163
Go Back
Full Screen
Close
Quit
Tierney, L. [1998]. A note on MetropolisHastings kernels for general state spaces. The Annals of Applied Probability 8(1), 1–9. van Lieshout, M. N. M. [2000]. Markov point processes and their applications. Imperial College Press. van Lieshout, M. N. M. and A. J. Baddeley [2001, October]. Extrapolating and interpolating spatial patterns. CWI Report PNA-R0117, CWI, Amsterdam. Widom, B. and J. S. Rowlinson [1970]. New model for the study of liquidvapor phase transitions. Journal of Chemical Physics 52, 1670–1684. Wilson, D. B. [2000a]. How to couple from the past using a readonce source of randomness. Random Structures and Algorithms 16(1), 85–113. Wilson, D. B. [2000b]. Layered Multishift Coupling for use in Perfect Sampling Algorithms (with a primer
on CFTP). In N. Madras (Ed.), Monte Carlo Methods, Volume 26 of Fields Institute Communications, pp. 143–179. American Mathematical Society. Wilson, R., A. Calway, and E. R. S. Pearson [March 1992]. A Generalised Wavelet Transform for Fourier Analysis: The Multiresolution Fourier Transform and Its Application to Image and Audio Signal Analysis. IEEE Transactions on Information Theory 38(2), 674–690. Wilson, R. G. and C. T. Li [2002]. A class of discrete multiresolution random fields and its application to image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 42–56. Zuyev, S. [1999]. Stopping sets: gammatype results and hitting properties. Advances in Applied Probability 31(2), 355–366.