Tree-valued Markov limit dynamics Habilitationsschrift

0 downloads 0 Views 1MB Size Report
and office mate Achim Klenke whom I would like to thank particularly for ...... movieren aus der Bildungsferne, Geschichten aus 1001 Promotion (Werner.
Tree-valued Markov limit dynamics Habilitationsschrift Anita Winter Mathematisches Institut Universit¨at Erlangen–N¨urnberg Bismarckstraße 1 21 91054 Erlangen GERMANY [email protected]

Contents Many many thanks ...

5

Introduction Part I: Real trees and metric measure spaces Part II: Examples of prominent real trees and mm-spaces Part III: Tree-valued Markov dynamics Notes

9 10 13 15 20

Chapter 1. State spaces I: R-trees 1.1. The Gromov-strong topology 1.2. A complete metric: The Gromov-Hausdorff metric 1.3. Gromov-Hausdorff and the Gromov-strong topology coincide 1.4. Compact sets in Xc 1.5. 0-hyperbolic spaces and R-trees 1.6. R-trees with 4 leaves 1.7. Length measure 1.8. Rooted R-trees 1.9. Rooted subtrees and trimming 1.10. Compact sets in T 1.11. Weighted R-trees 1.12. Distributions of random (weighted) real trees

21 22 22 24 25 25 28 29 31 34 40 41 47

Chapter 2. State spaces II: The space of metric measure trees 2.1. The Gromov-weak topology 2.2. A complete metric: The Gromov-Prohorov metric 2.3. Distance distribution and modulus of mass distribution 2.4. Compact sets in M 2.5. Gromov-Prohorov and Gromov-weak topology coincide 2.6. Ultra-metric measure spaces 2.7. Compact metric measure spaces 2.8. Distributions of random metric measure spaces 2.9. Equivalent metrics

49 50 53 59 64 69 70 71 73 75

Chapter 3. Examples of limit trees I: Branching trees 3.1. Excursions 3.2. The Brownian continuum random tree 3.3. Aldous’s line-breaking representation of the Brownian CRT

83 83 87 89

3

4

CONTENTS

3.4. Campbell measure facts: Functionals of the Brownian CRT 3.5. Existence of the reactant branching trees 3.6. Random evolutions: Proof of Theorem 3.5.3

91 99 103

Chapter 4. Examples of limit trees II: Coalescent trees 4.1. Λ-coalescent measure trees 4.2. Spatially structured Λ-coalescent trees 4.3. Scaling limit of spatial Λ-coalescent trees on Zd , d ≥ 3 4.4. Scaling limit of spatial Kingman coalescent trees on Z2

111 112 115 120 125

Chapter 5. Root growth and Regrafting 5.1. A deterministic construction 5.2. Introducing randomness 5.3. Connection to Aldous’s line-breaking construction 5.4. Recurrence, stationarity and ergodicity 5.5. Feller property 5.6. Asymptotics of the Aldous-Broder algorithm 5.7. An application: The Rayleigh process

131 134 138 141 145 147 156 158

Chapter 6. Subtree Prune and Regraft 6.1. A symmetric jump measure on (Twt , dGHwt ) 6.2. Dirichlet forms 6.3. An associated Markov process 6.4. The trivial tree is essentially polar

163 164 167 169 172

Chapter 7. Tree-valued Fleming-Viot dynamics 7.1. The tree-valued Fleming-Viot martingale problem 7.2. Duality: A unique solution 7.3. Approximating tree-valued Moran dynamics 7.4. Compact containment: Limit dynamics exist 7.5. Limit dynamics are tree-valued Fleming-Viot dynamics 7.6. Limit dynamics yield continuous paths 7.7. Proof of the main results (Theorems 7.1.6 and 7.3.1) 7.8. Long-term behavior 7.9. The measure-valued Fleming-Viot process as a functional 7.10. More general resampling mechanisms and extensions 7.11. Application: Sample tree lengths distributions

181 181 186 189 191 193 200 201 202 203 206 208

Index

219

Bibliography

221

Many many thanks ... Es fehlte mir weniger am Zutrauen zu promovieren. Das w¨are der zweite Schritt vor dem ersten gewesen. Christiane Leidinger1 Submitting my habilitation thesis is a good occasion to review the long and often difficult way I have covered and, of course, to thank all of the people who gave advice and supported me in relating me with the Habilitation degree. Being aware of my social-cultural and educational background I go beyond the common focus at the end of a habilitation project on the years after the PhD and want to stress and acknowledge that it took the present thesis each day of the last 35 years to finally become real. Thanks and love to my mother Claudia Winter who has supported my craving for studying from my earliest days, for example, by finding me popular scientific books in mathematics and physics of which she had only guessed how much I appreciated them and for finding and defending a space of my own in our much too small Prenzlauer Berg apartment. And to my sisters Jeanette and Simone Winter for relieving my feelings of becoming a stranger to my family. I’m also very grateful to my father’s middle school teacher and later friend of the family Brunhilde Lorff who guided me in feminism more than 30 years ago. I want to thank my teachers Elke Elsing, Frau Holl¨ander, Frau Matthes ¨ and Frau Steinbrecher at the Anne and Anton Saefkow School who supported me in many aspects and who for the first time related me to the idea of studying at a university - and even abroad - in the age of only 13. By that time I had just joined a circle in the Mathematische Sch¨ ulergesellschaft (MSG) Leonard Euler at the Humboldt University instructed by Ingmar Lehman who introduced me to the academic way of approaching mathematical problems and who believed in my abilities to an extend that I could not ignore them anymore. Thanks to him and to all the people involved in the organization of the yearly 10 days MSG summer camps which I happily remember as very challenging. And also to G¨ unter Last who taught me probability during my last year of high school and who advised my first scientific thesis written as part of the requirements of my Abitur. 1KlasseN Dissertation Uber ¨ Arbeitert¨ ochter und das Promovieren aus der Bildungs-

ferne, [Lei06] 5

6

MANY MANY THANKS ...

After I entered university - despite an optimum preparation - I lost selfconfidence with each day I was sitting in class and was too intimidated to raise a single question and in the end often too confused to be able to summarize what I had just learned. The strong feeling of alienation often kept me from studying with my fellow students. It was only after 3 years of chosen isolation that I attended a seminar with Andreas Greven on Interacting Particle Systems which all of a sudden excited and inspired me in many ways. I want to thank Uta Freiberg who also attended the seminar for letting me re-enjoy talking about mathematics. My great gratitude goes to Andreas Greven, who later supervised my Diploma and PhD theses and with whom I have continued to work in different projects since. He introduced me to the models arising in population genetics on which I seem to have built a career now, he encouraged and supported me in traveling to international conferences, but in the first place he adviced and emotionally supported me over all the years. I also felt emotionally supported by my former colleague and office mate Achim Klenke whom I would like to thank particularly for sharing personal experiences at many occasions during my Diploma and PhD theses. Since I have started my PhD I have traveled to many places. The incomplete list of people whom I want to thank for invitations to research stays, meetings and seminars include Siva Athreya, Ellen Baake, Don Dawson, Frank den Hollander, Allison Etheridge, Steve Evans, Nina Gantert, Andreas Greven, Achim Klenke, Vlada Limic, Terry Lyons, Ed Perkins, Theo Sturm, Silke Rolles, Jan Swart, Alain-Sol Sznitman, Anton Wakolbinger, Ruth Williams, ... A huge impact had definitely my first international workshop on Stochastic Partial Differential Equations and the attached summer school at the University of British Colombia in Vancouver in 1997 at which for the first time I met several of the people who later became my co-authors. This journey as well as a research stay one and a half years later at the Fields Institute in Toronto were made possible due to a grant of Don Dawson who along with Anton Wakolbinger - later refereed my PhD thesis and has since been following my track. I would like to thank both as well as Steve Evans and Andreas Greven for writing letters of recommendation often with very short notice. I would also like to thank for the additional financial support from the Edith and Otto Haupt Foundation and the unconventional help of Wolfgang Schmidt which made the trip to Vancouver possible. It was one of my stays at the Fields Institute in Toronto from which I brought a poster with the striking message “Gays and lesbians are our teachers, students, parents, doctors, ...” which once hanging in the Mathematical Institute in Erlangen created hot disputes. I would like to thank everybody who was supportive during that time which was difficult for me, in particular, to Tanja Dierkes, Andreas Greven, Andreas Knauf, Peter Pfaffelhuber and Iljana Z¨ahle who took a firm stand in the discussion. This is a welcome opportunity to also thank all my gay and lesbian (to (may)be) colleagues

MANY MANY THANKS ...

7

at the department for coming out to me who would or would not like to see their names written here. The academic year 2002/03 I spent with a DFG research fellowship at the University of California in Berkeley where I worked with Steve Evans and Jim Pitman. I would like to thank them and all the people with whom I interacted during that year. I am particularly grateful to Steve who introduced me to the central theme of this thesis which is the space of real trees and the Gromov-Hausdorff distance and from whom I learned a lot about writing (and finishing) a paper. Working and specifically writing with him is a great pleasure to me. In the summer semester 2004 I did my first academic outing in the humanities by attending a very inspiring reading seminar on “Written Identities” instructed by Doris Feldmann at the English Department at Erlangen University. As a consequence in February 2005 - although still troubled with relating the mathematician that I am with lesbian-feminist research - I took the courage to join the lfq network “Netzwerk lesbisch-feministisch-queerer Forschung” initiated a year before by Christiane Leidinger. Coming back from Berkeley to Erlangen I also started to take Hebrew lessons, a bold venture in a region which inhabits not many native speakers. However there is a class at the Bildungszentrum N¨ urnberg which had been taught for more than 10 years by Ganja Benari. Although, by the time I joined, the class was far beyond my knowledge in Hebrew, I found in Ganja and in all the women taking part in the course excellent teachers. So many thanks to Batja, Dorothee, Edith, Ganja, Hiltrud, Margot, Renate, Rosemarie, ... The present thesis was written during a current research stay at the Technion in Haifa founded by the Aly Kaufman Foundation. I am very grateful to Leonid Mytnik for inviting and encouraging me to come and to all the local probability people for their great hospitality. Thanks also to Chen Weider who rented to me his wonderful sea view apartment which had been my home for the last months and in which I enjoyed writing huge parts of this thesis. I particularly acknowledge my co-authors whose work with me appears in this thesis: Steven N. Evans, Andreas Greven, Vlada Limic, Peter Pfaffelhuber, Jim Pitman and Lea Popovic as well as all my other collaborators Siva Athreya, Michael Eckhoff, Janos Engl¨ander, Leonid Mytnik, Anja Sturm, Rongfeng Sun and Iljana Z¨ahle. I am also thankful to all the anonymous referees for the thorough reading of the papers. The revisions based on their reports often improved the presentation. Further thanks to Michael Eckhoff, Grit Paechnatz, Peter Pfaffelhuber, Ulrike Tisch and Iljana Z¨ahle who kindly proof-read various parts of the manuscript. Special thanks go to my house mates and friends Heike Herzog, Grit Paechnatz and Kathrin Schmidt who have been with me through all the ups and down in the last years. And to the physiotherapists Perla Ben Simon, Silke Kruse and Dorit Th¨ umer who professionally worked with me. I also extend thanks for support that runs far beyond the bounds of collegiality to

8

MANY MANY THANKS ...

Lisa Beck, Nina Gantert, Vlada Limic, Lea Popovic and Iljana Z¨ahle. And to Peter Pfaffelhuber from whom I learned how to collaborate with people which may have different scholarly interests from my own. An incomplete list of other colleagues and friends whom I would like to thank for advise and support include David Aldous, Nihat Ay, Michaela Baetz, Nadja Bennewitz, Dieter Binz, Juditha Cofman, Claudia Dempel, Gabriele Dennert, Axel Ebinger, Silvia Eichner, Oye Felde, Simone Fischer, Orit Furman, Walter Hofmann, Tobias J¨ager, Gerhard Keller, Edith Kellinghusen, Julia Kempe, Manfred Kronz, Christiane Leidinger, Alexander Lepke, Heike Lepke, Anna Levit, Wolfgang L¨ohr, Oded Regev, Rosi Ringer, Gerhard Scheibel, Frank Schiller, Johanna Schmidt, Sarah Schmiedel, Christoph Schumacher, J(.) Seipel, Thomas Springer, Ljiljana Stamenkovic, Andrea Stroux, Sreekar Vadlamani, Stefanie Weigel, Silvia Wendler, Yael Zbar, Helga Zech, ...

Anita Winter Haifa, June 2007

Introduction In the present thesis we study random trees and tree-valued Markov dynamics which arise in the limit of discrete trees and discrete tree-valued Markov chains, respectively, after a suitable rescaling, as the number of vertices tends to ∞. Random trees appear frequently in the mathematical literature. Prominent examples are random binary search trees as a special case of random recursive trees ([DH05]), ultra-metric structures in spin-glasses (see, for example, [BK06, MPV87]), spanning trees (see, for example, [AS92, KL96, PW98, BGL98]), ect. In branching models trees arise, for example, as the Kallenberg tree and the Yule tree in the (sub-)critical or supercritical, respectively, Galton-Watson process which is conditioned on “survival” ([Kal77, EO94]). A huge enterprize in biology is phylogenetic analysis which reconstructs the “family trees” of a collection of taxa. An introduction to mathematical aspects of the subject are surveyed in [SS03]. Due to enormous diversity in life phylogenetics often leads to the consideration of very large trees and therefore demands for an investigation of limits of finite trees. For example, by taking continuum (mass) limits we have for branching models the Brownian continuum random tree (Brownian CRT) or the Brownian snake ([Ald91b, Ald93, LG99a]). More general branching mechanisms lead to more general genealogies, such as Levy trees which is the infinite variance offspring distribution counterpart of the Brownian CRT ([DLG02]), the Poisson snake ([AS02]) or the reactant trees arising in catalytic branching systems ([GPW06b]), to name just a few. In population models with a fixed population size (or total mass), the genealogical trees can be generated by coalescent processes, for example, the Kingman coalescent tree ([Kin82a, Ald93, Eva00a]) or the Λ-coalescent measure trees ([GPW06a]). Many results towards convergence of finite trees toward a limit tree had been shown by considering the asymptotic behavior of functionals of ensembles of random trees such as their height, total number of vertices, averaged branching degree, ect. (see, for example, [CP00, Win02, PR04, CKMR05]). With a series of papers [Ald91a, Ald91b, Ald93] (see also [LG99a, Pit02]) Aldous suggested a much stronger notion of convergence of random trees. The main difficulty Aldous had to overcome was to map the sequence 9

10

INTRODUCTION

of trees into a space of all the “tree-like” objects which may arise in the limit as the number of vertices tends to infinity. First, following a long tradition, he relied on the connection between trees and continuous paths. Encoding trees by continuous functions allows to think of weak convergence of random trees as weak convergence of continuous functions with respect to the uniform topology on compacta. In particular examples this approach may have at least two drawbacks. Although there is a classical bijection between rooted planar trees and lattice paths (see, for example, [DM00]), there seems to be no obvious way to uniquely associate a continuous path to a limit tree. Moreover the uniform topology is a rather strong topology. Secondly, Aldous noticed that a finite leaf-labeled tree with edge lengths is isomorphic to a compact subset of `1 , i.e., the space of non-negative summable sequences equipped with the Hausdorff topology. With this encoding weak convergence of random trees translates into weak convergence of the associated closed subsets of `1 where the space of closed subsets of the metric space `1 is equipped, as usual, with the Hausdorff topology. Aldous’s approach is extremely powerful. In particular, it allowed him to show that a suitably rescaled family of Galton-Watson trees, conditioned to have total population size n, converges as n → ∞ to the Brownian continuum random tree, which can be thought of as the tree inside a standard Brownian excursion. More recently his approach was applied in [HMPW] to identify the self-similar fragmentation tree as the scaling limit of discrete fragmentation trees and doing so to confirm in a strong way that the whole trees grow at the same speed as the mean height of a randomly chosen leaf. Both approaches via continuous paths and compact subsets of `1 are designed for leaf-labeled trees. If one wants to rescale unlabeled trees one may be tempted to invent a labelling. Since the choice of a labelling playing the role of “coordinates” is arbitrary, it may not always be handy to work in this setting. One rather should be more consequent and follow an intrinsic - that is, “coordinate free” - path. Before we motivate this, we note that there is quite a large literature on other approaches to “geometrizing” and “coordinatizing” spaces of trees. The first construction of codes for labeled trees without edge-length goes back to 1918: Pr¨ ufer ([Pr¨ u18]) sets up a bijection between labeled trees of size n and the points of {1, 2, ..., n}n−2 . Phylogenetic trees are identified with points in matching polytopes in [DH98], and [BHV01a] equips the space of finite phylogenetic trees with a fixed number of leaves with a metric that makes it a cell-complex with non-positive curvature. The thesis is split in three parts. Part I: Real trees and metric measure spaces In the first part of the present thesis we develop systematically the topological properties of possible state spaces and characterize the corresponding convergence in distribution.

PART I: REAL TREES AND METRIC MEASURE SPACES

11

In Chapter 1 we follow the path of the so-called T-theory (see [Dre84, DMT96, Ter97]) to extend the definition of a tree with edge lengths by allowing behavior such as infinite total edge length and vertices with infinite branching degree. T-theory takes finite trees to be just metric spaces with certain characteristic properties and then defines a more general class of tree-like metric spaces called real-trees or R-trees. We note that one of the primary impetuses for the development of T-theory was to provide mathematical tools for concrete problems in the reconstruction of phylogenies. We also note that R-trees have been objects of intensive study in geometric group theory (see, for example, the surveys [Sha87, Mor92, Sha91, Bes02] and the recent book [Chi01]). Once we have an extended notion of trees as just particular abstract metric spaces (or, more correctly, isometry classes of metric spaces), we need a means of convergence. We will follow one aspect of Aldous’s philosophy to embed a sequence of trees isometrically in one and the same metric space and to then decide convergence based on whether or not the sequence of isometric copies converges. However, rather than using Aldous’s particular embedding into the space of compact subsets of `1 , we say that the sequence converges if and only if there is any common compact metric space in which the sequence can be embedded isometrically such that the sequence of isometric copies converges. We refer to this topology as the Gromov-strong topology and show that a metric on the space of compact metric spaces generating the Gromov-strong topology is provided by the well-studied Gromov-Hausdorff metric (compare, for example, [BBI01] and references therein). The Gromov-Hausdorff metric originated in geometry as a means of making sense of intuitive notions such as the convergence to Euclidean space of a re-scaled integer lattice as the grid size approaches zero. We remark in passing that the papers [Pau88, Pau89] are an application of the Gromov-Hausdorff metric to the study of R-trees that is quite different to what we present here. In some applications one is mainly interested in rooted real trees, that are real trees (X, r) with a distinguished point ρ ∈ X that we may think of as a common ancestor. We therefore introduce also the space of rooted compact real trees and equip it with the rooted Gromov-Hausdorff metric or sometimes referred to as the pointed Gromov-Hausdorff metric. This distance was introduced in geometry as a means of making sense of the convergence to Euclidean space of a sphere when viewed from a fixed point (for example, the North Pole) as the radius approaches infinity. An important preliminary step is to show that it is possible to equip the space of pairs of compact real trees and their accompanying weights with the weighted Gromov–Hausdorff metric. We note that a Gromov–Hausdorff like metric on more general metric spaces equipped with measures was introduced in [Stu06]. The latter metric is based on the Wasserstein-L2 metric between measures, whereas ours is based on the Prohorov metric.

12

INTRODUCTION

Aldous’s philosophy has a second aspect which we exploit in Chapter 2. For a motivation one needs to have a more detailed look at the proof of his invariance principle for branching trees. In order to define convergence Aldous codes trees not only as separable and complete metric spaces satisfying some special properties for the metric characterizing them as trees which are embedded into `+ 1 but in addition equips them with a probability measure. The idea of convergence in distribution of a “consistent” family of finite random trees follows then Kolmogorov’s theorem which gives the characterization of convergence of R-indexed stochastic processes with regular paths. That is, a sequence has a unique limit provided a tightness condition holds on path space and assuming that the “finite-dimensional distributions” converge. The analogs of finite-dimensional distributions are “subtrees spanned by finitely many randomly chosen leaves” and Aldous’s notion of convergence has been successful not only to show convergence of branching trees but also to construct limit coalescent trees which can not be represented by continuous functions. To follow Aldous’s approach without using his particular embedding we equip the space of separable and complete real trees which are equipped with a probability measure with the following topology such that a sequence of trees (equipped with a probability measure) converges to a limit tree (equipped with a probability measure) if and only if all randomly sampled finite subtrees converge to the corresponding limit subtrees. The resulting topology is referred to as the Gromov-weak topology. Since the construction of the topology works not only for tree-like metric spaces, but also for the space (of measure preserving isometry classes) of metric measure spaces we formulate everything within this framework. We will see that the Gromovweak topology on the space of metric measure spaces is Polish. In fact, we metrize the space of metric measure spaces equipped with the Gromov-weak topology by the Gromov-Prohorov metric which combines the two concepts of metrizing the space of metric spaces and the space of probability measures on a given metric space in a straightforward way. Moreover, we present a number of equivalent metrics which might be useful in different contexts. This then allows to discuss convergence of random variables taking values in that space. We characterize compact sets and tightness via quantities which are reasonably easy to compute. The most important ideas on metric measure spaces are contained in Gromov’s book, Chapter 3 12 in [Gro99]. Several of the results presented here are stated in [Gro99] in a different set-up. While Gromov focuses on geometric aspects, we provide the tools necessary to do probability theory on the space of metric measure spaces. Further related topologies on particular subspaces of isometry classes of complete and separable metric spaces have already been considered in [Stu06] and [EW06] (where the weighted Gromov-Hausdorff metric discussed in Chapter 1 was introduced). Convergence in one of these two

PART II: EXAMPLES OF PROMINENT REAL TREES AND MM-SPACES

13

topologies implies convergence in the Gromov-weak topology but not vice versa. Part II: Examples of prominent real trees and mm-spaces In the second part of the thesis we illustrate the theory with examples of limit trees arising in the theory of branching and coalescing. In Chapter 3 we reconsider branching trees as one source of random tree models. Following the tradition to code branching trees by excursions we first recall the connection between trees and continuous paths. We then define the Aldous’s Brownian continuum random tree as the tree associated with the path of a Brownian excursion. We will illustrate how this allows to calculate certain functionals of the Brownian continuum random tree explicitly via Ito’s excursion measure. An explicit representation of the Brownian continuum random tree is given by Aldous’s line-breaking construction ([Ald91a, Ald93]). This technique allows for studying explicit distributions. Since we are going to apply this construction later in Chapter 5, in this chapter we recall Aldous’s results and reformulate it in terms of real trees and metric measure trees to illustrate the theory developed so far. Note that line-braking constructions for generalizations of the Brownian continuum random tree can be found in [AP06, HMPW]. It is shown in [Ald91a, Ald91b, Ald93] (see also [LG99a, Pit02]) that the Brownian continuum random tree arises as the re-scaling limit of suitably rescaled family of Galton-Watson trees, conditioned to have total population size n, converges as n → ∞. to the Brownian continuum random tree, which can therefore be considered s the “family tree” of Feller’s branching diffusion. For branching models with interaction between the branching “individuals” excursions coding the genealogy have been investigated only recently. Following [GPW06b], in this chapter we describe the genealogy of catalytic branching particle models and their diffusion limits by a consistent family of contour processes. In Chapter 4 we discuss coalescent processes as another source of random tree models (see the recent review [Ald99] for references). The Kingman coalescent was introduced in [Kin82a]) as a model for the genealogies of a neutral population model. Spatially structured Kingman coalescents with locally infinitely many particles were first introduced in [DEF+ 99, GLW05]. They appear as duals of interacting systems such as the voter model, the stepping stone model and interacting Fleming-Viot processes. The Kingman coalescent is a prominent representative of the class of Λ-coalescents which were introduced in [Pit99] (see also [Sag99]) and allow for possibly multiple collisions reflecting an infinite variance of the population’s offspring distribution. Such a process has since been the subject of many applied and theoretical work (see, for example, [MS01], [BG05], [BBC+ 05], [BBS06]). The spatially structured Λ-coalescent was introduced in [LS06].

14

INTRODUCTION

In [Eva00b, DEF+ 99] it was shown how the Kingman coalescent and a system of coalescing Brownian motions on the circle are naturally associated with a random compact metric space. In this chapter we will illustrate the theory developed in Chapter 2 to decide for which Λ-coalescents the genealogies are described by a metric measure space. We will also show that the suitably rescaled spatially structured coalescent trees on Zd , d ≥ 2, converge towards the Kingman measure tree. For that we rely on estimates obtained in [GLW05, LS06, GLW07]. Note that not only trees get large, but also the number of possible phylogenetic trees grows rapidly with an increasing number of taxa. For example, the number of trees with n labeled leaves is (0.1)

(2n − 3)!! = (2n − 3) · (2n − 1) · ... · 3 · 1

(compare, for example, Chapter 3 of [Fel03]). In particular, if n = 100 then the number of possible trees is ∼ 3.5 · 10184 . If we try to use statistical methods to find the “best” tree to fit a given set of data rather than doing the impossible, i.e., performing an exhausting search through the enormously huge space of all possible trees one is definitely more successful performing a random search. Markov chains that move through a space of finite trees are an important ingredient in Markov chain Monte Carlo algorithms for simulating distributions on spaces of trees in Bayesian tree reconstruction and in simulated annealing algorithms in maximum likelihood and maximum parsimony2 tree reconstruction (see, for example, [Fel03] for a comprehensive overview of the field). Usually, such chains are based on a set of simple rearrangements that transform a tree into a “neighboring” tree. One widely used set of moves is the nearest neighbor interchanges (NNI) (see, for example, [Fel03, BRST02, BHV01b, AS01]). Two other standard sets of moves that are implemented in several phylogenetic software packages but seem to have received less theoretical attention are the subtree prune and re-graft (SPR) moves and the tree bisection and re-connection (TBR) moves that were first described in [SO90] and are further discussed in [Fel03, AS01, SS03]. We note that an NNI move is a particular type of SPR move and that an SPR move is particular type of TBR move, and, moreover, that every TBR operation is either a single SPR move or the composition of two such moves (see, for example, Section 2.6 of [SS03]). Chains based on other moves are investigated in [DH02, Ald00, Sch02]. Once more, because of the exponential growth of the state space with an increasing number of vertices, discrete tree-valued Markov chains are - even so easy to construct by standard theory - hard to analyze for their qualitative properties. It therefore seems to be reasonable to pass to a continuum limit 2Maximum parsimony tree reconstruction is based on finding the phylogenetic tree and inferred ancestral states that minimize the total number of obligatory inferred substitution events on the edges of the tree.

PART III: TREE-VALUED MARKOV DYNAMICS

15

and to construct certain limit dynamics and study them with methods from stochastic analysis. Markov dynamics with values in the space of “infinite” or continuum trees have been constructed only recently. These include excursion path valued Markov processes with continuous sample paths - which can therefore be thought of tree-valued diffusions - as investigated in [Zam03, Zam02, Zam01], and dynamics working with real-trees, for example, the so-called root growth with re-grafting (RGRG) ([EPW06]), the wild chain [AE99], the so-called subtree prune and re-graft move (SPR) ([EW06]), the limit random mapping ([EL07]) and the tree-valued FlemingViot dynamics. While the RGRG dynamics have a projective property allowing for an explicit construction of the Feller semi-group as the limit semigroup of “finite” tree-valued dynamics arising in an algorithm for constructing uniform spanning trees, the SPR and the limit random mapping were constructed as candidates of the limit of “finite” tree-valued dynamics using Dirichlet forms. Unfortunately, Dirichlet forms are often inadequate for proving convergence theorems as opposed to generator or martingale problem characterizations of Markov processes, for example. The first example of tree-valued Markov dynamics constructed as the solution of a well-posed martingale problem were given with the tree-valued Fleming-Viot dynamics in [GPW07].

Part III: Tree-valued Markov dynamics In the third part of the present thesis we use different techniques from Markov process theory to construct three of the above tree-valued limit dynamics. Tree-valued Markov processes appear also in contexts other than phylogenetic analysis. For example, a number of this processes appear in combinatorics associated with spanning trees. One such process is the Root growth with regrafting dynamics which we construct in Chapter 5 as the limit dynamics of the Aldous-Broder algorithm (see, [Bro89, Ald90] and Figure 0.1 for an illustration). The Aldous–Broder algorithm is a Markov chain on the space of rooted combinatorial trees with N vertices that has the uniform tree as its stationary distribution. We construct and study a Markov process on the space of all rooted compact real trees that has the Brownian continuum random tree as its stationary distribution and arises as the scaling limit as N → ∞. The resulting process evolves via alternating deterministic root growth and random jumps due to re-grafting and is an example of a piecewise-deterministic Markov process. A general framework for such processes was introduced in [Dav84] as an abstraction of numerous examples in queueing and control theory, and this line of research was extensively developed in the subsequent monograph [Dav93].

16

INTRODUCTION

S

T U

Figure 0.1. Pruning off the subtree S of the tree T and regrafting it at the root ρ (figure courtesy of Steve Evans)

A more general formulation in terms of martingales and additive functionals can be found in [JS96]. Some other appearances of such processes are [EP98, CDP01, DC99, Cai93, Cos90]. The crucial feature of the root growth with re-grafting dynamics is that they have a simple projective structure: If one follows the evolution of the points in a rooted subtree of the initial tree along with that of the points added at later times due to root growth, then these points together form a rooted subtree at each period in time and this subtree evolves autonomously according to the root growth with re-grafting dynamics. The presence of this projective structure suggests that one can make sense of the notion of running the root growth with re-grafting dynamics starting from an initial “tree” that has exotic behavior such as infinitely many leaves, points with infinite branching, and infinite total edge length – provided that this “tree” can be written as the increasing limit of a sequence of finite trees in some appropriate sense. One of our main objectives is to give rigorous statements and proofs of these and related facts. Once the extended process has been constructed, we gain a new perspective on objects such as standard Brownian excursion and the associated random triangulation of the circle (see [Ald94a, Ald94b, Ald00]). For example, suppose we follow the height (that is, distance from the root) of some point in the initial tree. It is clear that this height evolves autonomously as a one-dimensional piecewise-deterministic Markov process that: • increases linearly at unit speed (due to growth at the root), • makes jumps at rate x when it is in state x (due to cut points falling on the path that connects the root to the point we are following), • jumps from state x to a point that is uniformly distributed on [0, x] (due to re-grafting at the root).

PART III: TREE-VALUED MARKOV DYNAMICS

17

We call such a process a Rayleigh process because, as we will show in Section 5.7, this process converges to the standard Rayleigh stationary distribution R on R+ given by (0.2)

R(]x, ∞[) = e−x

2 /2

,

x ≥ 0,

(thus R is also the distribution of the Euclidean length of a two-dimensional standard Gaussian random vector or, up to a scaling constant, the distribution of the distance to the closest point to the origin in a standard planar Poisson process). Now, if B ex := {Buex ; u ∈ [0, 1]} is standard Brownian excursion and U is an independent uniform random variable on [0, 1], then there is a valid sense in which 2BUex has the law of the height of a randomly sampled leaf of the Brownian CRT, and this accords with the well-known result © ª (0.3) P 2BUex ∈ dx = R(dx). As we noted earlier, Markov chains that move through a space of finite trees are an important ingredient for several algorithms in phylogenetic analysis. In Chapter 6 we investigate with the Subtree Prune and Regraft (SPR) the asymptotics of one of the standard sets of moves that are implemented in several phylogenetic software packages first described in [SO90] and further discussed in [Fel03, AS01, SS03]. In an SPR move, a binary tree T (that is, a tree in which all non-leaf vertices have degree three) is cut “in the middle of an edge” to give two subtrees, say T 0 and T 00 . Another edge is chosen in T 0 , a new vertex is created “in the middle” of that edge, and the cut edge in T 00 is attached to this new vertex. Lastly, the “pendant” cut edge in T 0 is removed along with the vertex it was attached to in order to produce a new binary tree that has the same number of vertices as T (see Figure 0.2 for an illustration). As remarked in [AS01], The SPR operation is of particular interest as it can be used to model biological processes such as horizontal gene transfer3 and recombination. Section 2.7 of [SS03] provides more background on this point as well as a comment on the role of SPR moves in the two phenomena of lineage sorting and gene duplication and loss. The main emphasis is to construct a candidate for the limit dynamics as the number of vertices tends to infinity. We do not, in fact, prove that the suitably rescaled Markov chain with SPR moves converges to the process we construct. Rather, we use Dirichlet form techniques to establish the existence of a process that has the dynamics one would expect from such a limit. Unfortunately, although Dirichlet form techniques provide powerful 3Horizontal gene transfer is the transfer of genetic material from one species to an-

other. It is a particularly common phenomenon among bacteria.

18

INTRODUCTION

x a

c b

y a

b

c

Figure 0.2. An SPR move. The dashed subtree tree attached to vertex x in the top tree is re-attached at a new vertex y that is inserted into the edge (b, c) in the bottom tree to make two edges (b, y) and (y, c). The two edges (a, x) and (b, x) in the top tree are merged into a single edge (a, b) in the bottom tree.

tools for constructing and analyzing symmetric Markov processes, they are notoriously inadequate for proving convergence theorems. In Chapter 7 we construct and study the evolution of the genealogical structure for two related classes of neutral multi-type population models, which are called the Moran model and the Fleming-Viot process. In both models individuals have a genetic type, the population size is constant and the genetic decomposition changes due to random dynamics called resampling. The Moran model (MM) can be described as follows: consider a population of finitely many individuals which carry genetic types. Each pair resamples at constant rate. Resampling of a pair means that one individual dies and the other one reproduces (see Figure 0.3 for an illustration). For many purposes sufficient information about the population is contained in the probability measure on type space given by the empirical measure of the current population. The Fleming-Viot process (FV) is the measure-valued diffusion which arises in the limit of large populations (compare, for example, [FV78, FV79, Daw93, Eth01]). It is well-known that there exist moment dualities between the Moran model or the Fleming-Viot process on the one hand and the Kingman coalescent on the other (see, for example, [Kin82a, DGV95, GLW05]). This duality has a strong version showing that the Kingman coalescent describes the genealogy of a sample taken from the infinitely old population. This is helpful, for example, for the analysis of the long-time behavior.

PART III: TREE-VALUED MARKOV DYNAMICS

time

..... .......... .... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .

t s

.... .... .... .... .... ... ... ... ... ... ... .. .. .. .. ......... ....... ............ ....... .......... ....... .......... ....... .......... .................................... . . . .... ..... . . .... . .... ................................. . ... ... .. ... . . .. ... ... ......................................... ... ... .. ... . .. . . . ... ... ... ........................................ . . .. . . . .......... ....... .......... ....... .......... ....... .......... ....... .......... . . .. . . ... ... . ........................................ . . . .... . . . .... . ... . .. . . . ... ... ........................................ . .... ... ... . .. . . .... ... ... ...................................... .. .. . ... . ........................................ . ... ... ... ...................................... ... ...................................... .. . .. .. .... ... ... ... .... .. .. ... ... ... ........................................ . . . .... . ... ... .. ... ... ... ... ... ... ... ... ... . .. .................................... .. ... ... ..................................... . . . .... . . .... .... . ... ... ... ... ... ..... .... ... ... .. ... ... ... ... ... ... ... ... ... ... ... ......... ...... .. ............................... . ... . . .............................. . .. ... .... ... .... ... ... ... ... ... ... ... ... ... ... .... .... .... .... ....

19

....... ....... ....... ....... ....... ....... ....... ....... ....... .......• ........ .• .. ........ ....... ....... ....... • .......... ....... ....... ....... • .......... .• . . ... .... . . . . . . . . . . . . . . . . . . . . . .... . . . . . . . . ..... . . . . . . . . . . . . . . . . . . . . . . ...... . . . . . . ....... . . . . . . . . . ....... . . ... . . . . . . ... ... ... ... ... ... ... ... ... ... ... ... ... . . . . ....... • ..... .......• .. ....... ....... .......• .......... ........• ......... ....... ....• . . . ... .. . . ... ... ... ... .. .. ... ... . . ... . . . ...... . . . . ....... . . . . . . . . ...... ...... . . . . . ......... ... . ... . ... . . . . ..... . . . . ...... . . . . . . . ......... ... . . . ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . ... . . ... . . . ... . ... ... .. ... ..... ... ..... ... ... ... ... ... ... .. . . . . . . . . ....... . . . ....... . . . . . . . . . . . . . . . . .... ... . ... ... ... ..... ... ... . . . . . . . . . . .....

Figure 0.3. The graphical representation of a Moran model of size N = 5. By resampling the genealogical relationships between individuals change. Arrows between lines indicate resampling events. The individual at the tip dies and the other one reproduces. At any time, genealogical relationships of individuals •, which are currently alive, can be read from this graphical representation. The main goal is to change the static point of view on fixed time genealogies to dynamically evolving genealogies. That is, we construct the treevalued Moran dynamics and the tree-valued Fleming-Viot dynamics, which are strong Markov processes modeling the evolution of the genealogies as time varies. We do this by well-posed martingale problems. Evolving genealogies (of skeletons) in exchangeable population models - including the models under consideration - have been described by lookdown processes ([DK96, DK98, DK99]). Even though not formulated in terms of trees, since look-down processes contain all information available in the model, they contain all information about the trees. The same time they encode a lot of information which for many purposes is not needed. For example, the crucial point in the construction of “look-down” processes is the use of labels as coordinates, while we are interested in developing the stochastic analysis which allows for a coordinate-free description of treevalued dynamics. A first approach in this direction has been taken in spatial settings via so-called historical processes in the context of branching models ([DP91]) and of interacting Fleming-Viot processes ([GLW05]). The martingale problem formulation allows us to show that the suitably rescaled tree-valued Moran dynamics converge towards the tree-valued Fleming-Viot dynamics as the population size tends to infinity. Another important consequence of the construction of the tree-valued Moran and Fleming-Viot dynamics as solutions of well-posed martingale problems is that it allows to characterize the functionals of these processes which are again strong Markov processes, and which can therefore also be characterized as solutions of well-posed martingale problems. As an application we

20

INTRODUCTION

show that the measure-valued Fleming-Viot diffusion is randomly embedded in the tree-valued Fleming-Viot dynamics. In addition we study the dynamics of the averaged total length of a sub-tree spanned by a finite set of “individuals” where the average is taken with respect to sampling the “individuals” according to the probability measure associated with the tree. Another related interesting functional, the diameter of the tree as a metric space representing the time to the most recent common ancestor, has been studied earlier in [PW06] using the look-down construction of [DK98]. Notes Portions of this thesis have been previously published in joint papers. In Chapter 1 we collect facts on spaces of rooted real trees and weighted real trees which were presented in [EPW06] and [EW06], respectively. Chapter 2 is essentially [GPW06a] updated by the results stated in Sections 2.6 and 2.7 which appear in [GPW07]. In Chapter 3 we discuss the connection on excursion paths and trees. The path decomposition result of the standard Brownian motion presented in Section 3.4 appeared in [EW06]. Sections 3.5 and 3.6 summarize parts of the results obtained in [GPW06b]. In Chapter 4 we illustrate how coalescent trees can be associated with an ultra-metric measure space. Section 4.1 is taken from [GPW06a], while the construction and convergence of the spatially structured coalescent in Sections 4.2, 4.3 and 4.4 are novel. Chapter 5 is essentially [EPW06]. Chapter 6 follows [EW06]. Chapter 7 is about to be submitted as [GPW07].

CHAPTER 1

State spaces I: R-trees In this chapter we consider spaces of real trees that are compact metric spaces with “tree-like” properties and give a notion of convergence such that a sequence of trees converges to a limit tree if and only if the sequence and the limit tree can be embedded isometrically in one and the same compact metric space on which the image of the sequence converges as a sequence of closed subsets of a complete metric space to the image of the corresponding limit. The resulting topology is referred to as the Gromov-strong topology. The chapter is then organized as follows. In Section 1.2 we recall two equivalent definitions of the Gromov-Hausdorff metric as a candidate for a complete metric which generates the Gromov-strong topology. In Section 1.3 we prove that the topology generated by the Gromov-Hausdorff metric coincides with the Gromov-strong topology. As a technical preparation in Section 1.4 we recall a criterion for a set of compact metric spaces to be pre-compact. In Section 1.5 we consider the subspaces of 0-hyperbolic spaces and real trees. In particular, we show that the space (of isometry classes) of compact real trees is separable and complete. In Section 1.6 we discuss the fact that trees can be reconstructed from their leaf to leaf distances. In Section 1.7 we explain how compact real trees are associated with a natural length measure. In Chapter 5 we construct the Root Growth with Regrafting dynamics via a limiting procedure in which a general rooted compact real tree is approximated “from the inside” by an increasing sequence of finite subtrees. To prepare the construction we collect some results and properties on the set of isometry classes of rooted compact R-trees equipped with the rooted Gromov-Hausdorff distance in Section 1.8. We then establish some facts about rooted subtrees and their trimmings in Section 1.9. This yields also into a characterization of the compact sets in the space (of isometry classes) of compact real trees which we state in Section 1.10. In Chapter 6 we construct with the Subtree Prune and Regraft process a candidate for a limit dynamics of a Markov chain in which subtrees are pruned at randomly chosen edges and get regrafted at randomly picked vertices. While choosing an edge corresponds in the limit to choosing a point according to the (normalized) length measure associated with the tree, picking a vertex at random needs more thought since there is no canonical weight associated with a real tree. In Section 1.11 we therefore equip real 21

22

1. STATE SPACES I: R-TREES

trees with a probability measure and provide with the weighted GromovHausdorff metric a distance which takes this extra structure into account. Finally, Section 1.12 discusses convergence in distribution of random (rooted, weighted) real trees and characterizes tightness. 1.1. The Gromov-strong topology The ordinary Hausdorff metric between two subsets A1 , A2 of one metric space (X, d) is defined as (1.1)

dH (A1 , A2 ) := inf{ε > 0; A1 ⊆ Uε (A2 ) and A2 ⊆ Uε (A1 )},

where (1.2)

© ª U² (A) := x ∈ X; d(x, A) ≤ ε .

Given a metric space (X, r), denote by P(X, r) the space of closed subsets of X equipped with the Hausdorff metric. Recall from Proposition 7.3.7 in [BBI01] that if (X, r) is complete then P(X, r) is complete and from Propositions 7.3.8 in [BBI01] that if (X, r) is compact then P(X, r) is compact. In the following we refer to the topology generated by the Hausdorff metric as the Hausdorff topology. Denote by Xc the space of isometry classes of compact metric spaces. If no confusion is possible, we denote by X = (X, r) the isometry class of a metric space. Remark 1.1.1. Since the space (of isometry classes) of compact metric spaces can, of course, be metrized such that this space becomes compact, we have to be careful to deal with sets in the sense of the Zermelo-Fraenkel axioms. The way out is to define Xc as the space of isometry classes of those compact metric spaces whose elements are not metric spaces themselves. ¤ We are then in a position to define the Gromov-strong topology on Xc . Definition 1.1.2 (Gromov-strong topology). A sequence (Xn )n∈N is said to converge Gromov-strongly to X in Xc , as n → ∞, if and only if there exists a compact metric space (Z, rZ ) and isometric embeddings ϕ, ϕ1 , ϕ2 , ... of (X, r), (X1 , rX1 ), (X2 , rX2 ), ..., respectively, into (Z, rZ ) such that (ϕn (Xn ))n∈N converges to ϕ(X ) in (Z, rZ ) in the Hausdorff topology, as n → ∞. 1.2. A complete metric: The Gromov-Hausdorff metric In this section we introduce the Gromov-Hausdorff metric dH on Xc and prove that the metric space (Xc , dGH ) is complete. In Section 1.3 we will see that the Gromov-Hausdorff metric generates the Gromov-strong topology. The following definition can be found, for example, in [Gro99, BH99, BBI01].

1.2. A COMPLETE METRIC: THE GROMOV-HAUSDORFF METRIC

23

Definition 1.2.1 (Gromov-Hausdorff distance). We define the GromovHausdorff distance, dGH (X1 , X2 ), between two metric spaces (X1 , rX1 ) and (Z,r ) (X2 , rX2 ) as the infimum of dH Z (X10 , X20 ) over all metric spaces X10 and 0 X2 that are isomorphic to X1 and X2 , respectively, and that are subspaces of some common metric space (Z, rZ ). Remark 1.2.2 (Gromov-Hausdorff metric on Xc ). The Gromov-Hausdorff distance defines a finite metric on the space of all isometry classes of compact metric spaces (see, for example, Theorem 7.3.30 in [BBI01]). ¤ We point out that a direct application of Definition 1.2.1 requires an optimal embedding into a new metric space (Z, rZ ). While this definition is conceptually appealing it turns out to often not be so useful for explicit computations in concrete examples. A re-formulation of the Gromov-Hausdorff distance is suggested by the following observation. Suppose that two metric spaces (X1 , rX1 ) and (X2 , rX2 ) are close in the Gromov-Hausdorff distance as witnessed by isometric embeddings ϕ1 and ϕ2 into some common space (Z, rZ ). The map that associates each point in x1 ∈ X1 to a point in x2 ∈ X2 such that rZ (ϕ1 (x1 ), ϕ2 (x2 )) is minimal should then be close to an isometry onto its image, and a similar remark holds with the roles of X1 and X2 reversed. In order to quantify the observation of the previous paragraph, we require some more notation. Definition 1.2.3 (Distortion). Let (X1 , rX1 ) and (X2 , rX2 ) be metric spaces. The distortion of a relation R ⊆ X1 × X2 is defined by © ª (1.3) dis(R) := sup |rX1 (x1 , y1 ) − rX2 (x2 , y2 )|; (x1 , x2 ), (y1 , y2 ) ∈ R . Example 1.2.4 (Distortion of a function). Let (X1 , rX1 ) and (X2 , rX2 ) be metric spaces. The distortion of a function f : X1 → X2 is defined as the distortion of the corresponding relation ª © (1.4) Rf := (x, f (x)); x ∈ X1 ⊆ X1 × X2 . That is,

© ª (1.5) dis(f ) := dis(Rf ) = sup |rX1 (x, y) − rX2 (f (x), f (y))| : x, y ∈ X1 . ¤ Definition 1.2.5 (Correspondence). A relation R ⊆ X1 × X2 is said to be a correspondence between sets X1 and X2 if for each x1 ∈ X1 there exists at least one x2 ∈ X2 such that (x1 , x2 ) ∈ R, and for each y2 ∈ X2 there exists at least one y1 ∈ X1 such that (y1 , y2 ) ∈ R. The following re-formulation of the Gromov-Hausdorff metric is taken from Theorem 7.3.25 in [BBI01].

24

1. STATE SPACES I: R-TREES

Proposition 1.2.6. For two elements (X1 , rX1 ) and (X2 , rX2 ) in Xc , 1 (1.6) dGH ((X1 , rX1 ), (X2 , rX2 )) = inf dis(R), 2 R where the infimum is taken over all correspondences R between X1 and X2 . 1.3. Gromov-Hausdorff and the Gromov-strong topology coincide The definition of the Gromov-Hausdorff metric uses an embedding into a common metric space. Convergence in the Gromov-Hausdorff topology can as well be formulated in terms of an embedding into a common metric space. Theorem 1.3.1. Let (X, rX ), (X1 , rX1 ), (X2 , rX2 ), ... be in Xc . Then n→∞

dGH (Xn , X) −→ 0 if and only if the sequence (Xn )n∈N converges Gromov-strongly to X in Xc , as n → ∞. We prepare the proof of Theorem 1.3.1 with the following lemma. Lemma 1.3.2 (Extension of metrics via relations). Assume that (X1 , rX1 ) and (X2 , rX2 ) are metric spaces and R ⊆ X1 × X2 is a non-empty relation between X1 and X2 . Let for x1 ∈ X1 and x2 ∈ X2 , (1.7)

R rX (x1 , x2 ) 1 tX2 © ª := inf rX1 (x1 , x01 ) + 12 dis(R) + rX2 (x2 , x02 ) : (x01 , x02 ) ∈ R .

Then the following hold: R (i) rX defines a metric on the disjoint union X1 t X2 . 1 tX2 R (ii) The metric rXi equals the metric rX restricted to Xi , for i = 1 tX2 1, 2. R (iii) rX (x1 , x2 ) = 21 dis(R), for any pair (x1 , x2 ) ∈ R. 1 tX2 (iv) With π1 and π2 denoting the projection operators on X1 and X2 , respectively, (1.8)

R (X1 tX2 ,rX

dH

1 tX2

)

(π1 R, π2 R). = 21 dis(R),

R Proof. It is not hard to check that rX defines a metric on X1 t X2 1 tX2 R which extends the metrics on X1 and X2 . In particular, rX (x1 , x2 ) = 1 tX2 1 ¤ 2 dis(R), for any pair (x1 , x2 ) ∈ R, and therefore (1.8) holds.

Proof of Theorem 1.3.1. The “if”-direction is ¢clear. So we come ¡ immediately to the “only if” direction. If dGH Xn , X → 0, as n → ∞, and we find correspondences Rn between Xn and X such that dis(Rn ) → 0, as n → ∞, by (1.6). Using these, we define recursively metrics rZn on F Zn := X t nk=1 Xk . First, set Z1 := X t X1 and rZ1 := rZR11 as in (1.7). In

1.5. 0-HYPERBOLIC SPACES AND R-TREES

25

the nth step, we are given a metric on Zn . Consider the canonical isometric ˜ n ⊆ Zn × Xn+1 by embedding ϕ from X to Zn and define the relation R © ª (1.9) Rn := (z, x) ∈ Zn × Xn+1 : (ϕ−1 (z), x) ∈ Rn+1 , n . By this procedure we end up with a metric rZ on and set rZn+1 := rZRn+1 F Z := X t n∈N Xn and isometric embeddings ϕ, ϕ1 , ϕ2 , ... between X, X1 , X2 , ... and Z, respectively, such that ¢ 1 n→∞ (Z,r ) ¡ (1.10) dH Z ϕn (Xn ), ϕ(X) = dis(Rn ) −→ 0. 2 W.l.o.g. we can assume that Z is complete. Otherwise we just embed everything into the completion of Z. To verify compactness of (Z, rZ ) it is therefore sufficient to show that Z is totally bounded (see, for example, Theorem 1.6.5 in [BBI01]). For that purpose fix ε > 0, and let n ∈ N. Since X is compact, we can choose a finite ε/2-net S in X. Then for all x ∈ Z 0 0 with ¡ rZ (x, X) < ε/2 ¢ there exists x ∈ S such that rZ (x, x ) < ε. Moreover, dH ϕn (Xn ), ϕ(X) < ε, for all but finitely many n ∈ N. For the remaining ˜ In this way, S ∪ S˜ ϕn (Xn ) choose finite ε-nets and denote their union by S. ˜ is a covering of Z. is a finite set, and {Bε (s) : s ∈ S ∪ S} ¤

1.4. Compact sets in Xc Since the Gromov-Hausdorff topology is a relatively weak one, one may expect that it has relatively many compact sets. The following criterion for a set to be pre-compact in the Gromov-Hausdorff topology is taken from Theorem 7.4.15 in [BBI01]. Proposition 1.4.1 (A criterion for pre-compactness). A set Γ ⊆ Xc is pre-compact if it is uniformly totally bounded, i.e., © ª • the set diam(X) : X ∈ Γ is bounded, and • for all ε > 0 there is a number N such that every X ∈ Γ can be covered by at most N balls of radius ε.

1.5. 0-hyperbolic spaces and R-trees Definition 1.5.1 (0-hyperbolic metric space). A metric space (X, r) is said to be 0-hyperbolic with respect to a point v ∈ X if and only if (1.11)

(x · y)v ≥ (x · z)v ∧ (y · z)v ,

for all x, y, z ∈ X, where for x, y ∈ X, ¢ 1¡ (1.12) (x · y)y := r(x, v) + r(y, v) − r(x, y) . 2

26

1. STATE SPACES I: R-TREES

Remark 1.5.2. If a metric space (X, r) is 0-hyperbolic with respect to a point v ∈ X then it is 0-hyperbolic with respect to all points in X. In the following we will therefore refer to a metric space which is 0hyperbolic with respect to a point v ∈ X as simply being 0-hyperbolic. ¤ Lemma 1.5.3 (Equivalence to “Four point condition”). A metric space (X, r) is 0-hyperbolic if and only if it satisfies the so-called four point condition, i.e., r(x1 ,x2 ) + r(x3 , x4 ) (1.13) ≤ max{r(x1 , x3 ) + r(x2 , x4 ), r(x1 , x4 ) + r(x2 , x3 )} for all x1 , . . . , x4 ∈ X. Example 1.5.4 (Ultra-metric space). Recall that a metric space (X, r) is said to be ultra-metric if (1.14)

r(u, w) ≤ r(u, v) ∨ r(v, w),

for all u, v, w ∈ X. It is easy to verify that each ultra-metric space (X, r) is 0-hyperbolic. Definition 1.5.5 (R-tree). A complete hyperbolic metric space (X, d) is said to be an R-tree if it is path-connected. See Section 1.6 for more elaboration on trees spanned by four vertices. Compare also with Figure 1.1 that shows all possible shapes. We refer the reader to ([Dre84, DT96, DMT96, Ter97]) for background on R-trees. In particular, [Chi01] shows that a number of other definitions are equivalent to the one above. Remark 1.5.6. A particularly useful fact is that a complete metric space (X, r) is an R-tree if it satisfies the following axioms: Axiom 1 (Unique geodesics) For all x, y ∈ X there exists a unique isometric embedding φx,y : [0, r(x, y)] → X such that φx,y (0) = x and φx,y (r(x, y)) = y. Axiom 2 (Loop-free) For every injective continuous map ψ : [0, 1] → X one has ψ([0, 1]) = φψ(0),ψ(1) ([0, r(ψ(0), ψ(1))]). Axiom 1 says simply that there is a unique “unit speed” path between any two points, whereas Axiom 2 implies that the image of any injective path connecting two points coincides with the image of the unique unit speed path, so that it can be re-parameterized to become the unit speed path. Thus, Axiom 1 is satisfied by many other spaces such as Rd with the usual metric, whereas Axiom 2 expresses the property of “treeness” and is only satisfied by Rd when d = 1. ¤

1.5. 0-HYPERBOLIC SPACES AND R-TREES

27

The following result states that 0-hyperbolic metric spaces can be isometrically embedded into a real tree (see, Theorem 3.38 in [Eva06]). Proposition 1.5.7 (The real tree spanned by an 0-hyperbolic space). Let (X, r) be a 0-hyperbolic metric spaces. There exists a R-tree (X 0 , r0 ) and an isometry ϕ : X → X 0 . Example 1.5.8 (Ultra-metric spaces can be embedded into real trees). Proposition 1.5.7 says, in particular, that every complete ultra-metric space (U, rU ) can be isometrically embedded into an R-tree (X, rX ). ¤ Let (T, dGH ) be the metric space of isometry classes of compact R-trees equipped with dGH . We will be a little loose and sometimes refer to an R-tree as an element of T rather than as a class representative of an element. The following results says that, at the very least, T equipped with the Gromov-Hausdorff distance is a “reasonable” space on which to do probability theory. Theorem 1.5.9. The metric space (T, dGH ) is complete and separable. The following result is useful in the proof of Theorem 1.5.9. Lemma 1.5.10. The set T of compact R-trees is a closed subset of the space of compact metric spaces equipped with the Gromov-Hausdorff distance. Proof. It suffices to note that the limit of a sequence in T is pathconnected (see, for example, Theorem 7.5.1 in [BBI01]) and satisfies the four point condition (1.13), (indeed, as remarked after Proposition 7.4.12 in [BBI01], there is a “meta–theorem” that if a feature of a compact metric space can be formulated as a continuous property of distances among finitely many points, then this feature is preserved under Gromov-Hausdorff limits). ¤ Proof of Theorem 1.5.9. We start by showing separability. Given a compact R-tree, T , and ε > 0, let Sε be a finite ε-net in T . For a, b ∈ T , let (1.15)

[a, b[ := φa,b ( [0, r(a, b)[ )

and

]a, b[ := φa,b ( ]0, r(a, b)[ )

be the unique half open and open, respectively, arc between them, and write Tε for the subtree of T spanned by Sε , that is, [ ¯ (1.16) Tε := [x, y] and rTε := r¯Tε . x,y∈Sε

Obviously, Tε is still an ε-net for T , and hence dGH (Tε , T ) ≤ dH (Tε , T ) ≤ ε. Now each Tε is just a “finite tree with edge-lengths” and can clearly be approximated arbitrarily closely in the dGH -metric by trees with the same tree topology (that is, “shape”), and rational edge-lengths. The set of

28

1. STATE SPACES I: R-TREES

(I)

x ¡ 3 ¡

x4@ @¡y3,4

(II)

¡ x3

x1@ @¡

(III)

¡x4 ¡

x1@ @¡

y1,2 x1 ¡¡@@ x2

x2 ¡¡@@ x4

x2 ¡¡@@ x3

(IV) x1 x2

x3

¡ ¡ @ @¡ ¡@ ¡ @

x4

Figure 1.1. shows the 4 different shapes of a labeled tree with 4 leaves. isometry types of finite trees with rational edge-lengths is countable, and so (T, dGH ) is separable. It remains to establish completeness. It suffices by Lemma 1.5.10 to show that any Cauchy sequence in T converges to some compact metric space, or, equivalently, any Cauchy sequence in T has a subsequence that converges to some metric space. Let (Tn )n∈N be a Cauchy sequence in T. By Exercise 7.4.14 and Theorem 7.4.15 in [BBI01], a sufficient condition for this sequence to have a subsequential limit is that for every ε > 0 there exists a positive number N = N (ε) such that every Tn contains an ε-net of cardinality N . Fix ε > 0 and n0 = n0 (ε) such that dGH (Tm , Tn ) < ε/2 for m, n ≥ n0 . Let Sn0 be a finite (ε/2)-net for Tn0 of cardinality N . Then by (1.6) for each n ≥ n0 there exists a correspondence Rn between Tn0 and Tn such that dis(Rn ) < ε. For each x ∈ Tn0 , choose fn (x) ∈ Tn such that (x, fn (x)) ∈ Rn . Since for any y ∈ Tn with (x, y) ∈ Rn , rTn (y, fn (x)) ≤ dis(Rn ), for all n ≥ n0 , the set fn (Sn0 ) is an ε-net of cardinality N for Tn , n ≥ n0 . ¤

1.6. R-trees with 4 leaves For the sake of reference and establishing some notation, we record here some well-known facts about reconstructing trees from a knowledge of the distances between the leaves. We remark that the fact that trees can be reconstructed from their collection of leaf-to-leaf distances (plus also the leafto-root distances for rooted trees) is of huge practical importance in so-called distance methods for inferring phylogenetic trees from DNA sequence data, and the added fact that one can build such trees by building subtrees for each collection of four leaves is the starting point for the sub-class of distance methods called quartet methods. We refer the reader to [Fel03, SS03] for an extensive description of these techniques and their underlying theory. Lemma 1.6.1. The isometry class of an unrooted tree (T, r) with four leaves is uniquely determined by the distances between the leaves of T . Proof. Let {x1 , x2 , x3 , x4 } be the set of leaves of T . The tree T has one of four possible shapes:

1.7. LENGTH MEASURE

29

Consider case (I), and let y1,2 be the uniquely determined branch point on the tree that lies on the arcs [x1 , x2 ] and [x1 , x3 ], and y3,4 be the uniquely determined branch point on the tree that lies on the arcs [x3 , x4 ] and [x1 , x3 ]. Observe that r(x1 , y1,2 ) = (x2 · x3 )x1 = (x2 · x4 )x1 r(x2 , y1,2 ) = (x1 · x3 )x2 = (x1 · x4 )x2 (1.17)

r(x3 , y3,4 ) = (x4 · x1 )x3 = (x4 · x2 )x3

r(x4 , y3,4 ) = (x3 · x1 )x4 = (x3 · x2 )x4 1 r(y1,2 , y3,4 ) = (r(x1 , x4 ) + r(x2 , x3 ) − r(x1 , x2 ) − r(x3 , x4 )), 2 Similar observations for the other cases show that if we know the shape of the tree, then we can determine its edge-lengths from leaf-to-leaf distances. Note also that 1 χ(I) (T ) := (r(x1 , x3 ) + r(x2 , x4 ) − r(x1 , x2 ) − r(x3 , x4 )) 2  for shape (I),  (1.18)  >0 0. Then the following hold. (i) If dGHroot ((X1 , ρ1 ), (X2 , ρ2 )) < ε, then there exists a root-invariant 2ε-isometry from (X1 , ρ1 ) to (X2 , ρ2 ). (ii) If there exists a root-invariant ε-isometry from (X1 , ρ1 ) to (X2 , ρ2 ), then 3 dGHroot ((X1 , ρ1 ), (X2 , ρ2 )) ≤ ε. 2 Proof. (i) Let dGHroot ((X1 , ρ1 ), (X2 , ρ2 )) < ε. By Proposition 1.8.6 there exists a correspondence Rroot between X1 and X2 such that (ρ1 , ρ2 ) ∈ Rroot and dis(Rroot ) < 2ε. Define f : X1 → X2 by setting f (ρ1 ) = ρ2 , and choosing f (x) such that (x, f (x)) ∈ Rroot for all x ∈ X1 \ {ρ1 }. Clearly, dis(f ) ≤ dis(Rroot ) < 2ε. To see that f (X1 ) is an 2ε-net for X2 , let x2 ∈ X2 , and choose x1 ∈ X1 such that (x1 , x2 ) ∈ Rroot . Then rX2 (f (x1 ), x2 ) ≤ rX1 (x1 , x1 ) + dis(Rroot ) < 2ε. (ii) Let f be a root-invariant ε-isometry from (X1 , ρ1 ) to (X2 , ρ2 ). Define a correspondence Rfroot ⊆ X1 × X2 by (1.29)

Rfroot := {(x1 , x2 ) : rX2 (x2 , f (x1 )) ≤ ε}.

Then (ρ1 , ρ2 ) ∈ Rfroot and Rfroot is indeed a correspondence since f (X1 ) is a ε-net for X2 . If (x1 , x2 ), (y1 , y2 ) ∈ Rfroot , then |rX1 (x1 , y1 ) − rX2 (x2 , y2 )| ≤ |rX2 (f (x1 ), f (y1 )) − rX1 (x1 , y1 )| (1.30)

+ rX2 (x2 , f (x1 )) + rX2 (f (x1 ), y2 ) < 3ε.

Hence dis(Rfroot ) < 3ε and, by (1.26), dGHroot ((X1 , ρ1 ), (X2 , ρ2 )) ≤ 23 ε.

¤

The second preparatory result we need is the following compactness criterion, which is the analogue of Theorem 7.4.15 in [BBI01] (note also Exercise 7.4.14 in [BBI01]) and can be proved the same way, using Lemma 1.8.9 in place of Corollary 7.3.28 in [BBI01] and noting that the analogue of Lemma 1.5.10 holds for Troot . Lemma 1.8.10 (A criterion for pre-compactness in Troot ). A subset Γ ⊆ Troot is pre-compact if for every ε > 0 there exists a positive integer N (ε) such that each T ∈ Γ has an ε-net with at most N (ε) points.

34

1. STATE SPACES I: R-TREES

Proof of Theorem 1.8.7. The proof follows very much the same lines as that of Theorem 1.5.9. The proof of separability is almost identical. The key step in establishing completeness is again to show that a Cauchy sequence in Troot has a subsequential limit. This can be shown in the same manner as in the proof of Theorem 1.5.9, with an appeal to Lemma 1.8.10 replacing one to Theorem 7.4.15 and Exercise 7.4.14 in [BBI01]. ¤ 1.9. Rooted subtrees and trimming There are situations, as for example the Root Growth with Regrafting dynamics which we construct in Chapter 5, which involve a limiting procedure in which a general rooted compact R-tree is approximated “from the inside” by an increasing sequence of finite subtrees. We therefore need to establish some facts about such approximations. To begin with, we require a notation for one tree being a subtree of another, with both trees sharing the same root. We need to incorporate the fact that we are dealing with equivalence classes of trees rather than trees themselves. A rooted subtree of (T, r, ρ) ∈ Troot is an element (T ∗ , r∗ , ρ∗ ), ∈ Troot that has a class representative that is a subspace of a class representative of (T, r, ρ), with the two roots coincident. Equivalently, any class representative of (T ∗ , r∗ , ρ∗ ) can be isometrically embedded into any class representative of (T, r, ρ) via an isometry that maps roots to roots. We write T ∗ ¹root T and note that ¹root is an partial order on Troot . All of the “wildness” in a compact R-tree happens “at the leaves”. For example, if T ∈ Troot has a point x at which infinite branching occurs (so that the removal of x would disconnect T into infinitely many components), then any open neighborhood of x must contain infinitely many leaves, while for each η > 0 there are only finitely many leaves y such that x ∈ [ρ, y] with r(x, y) > η. A natural way in which to produce a finite subtree that approximates a given tree is thus to fix η > 0 and trim off the fringe of the tree by removing those points that are not at least distance η from at least one leaf. Formally, for η > 0 define Rη : Troot → Troot to be the map that assigns to (T, ρ) ∈ Troot the rooted subtree (Rη (T, ρ), ρ) that consists of ρ and points a ∈ T for which the subtree © ª (1.31) S T,a := x ∈ T : a ∈ [ρ, x[ (that is, the subtree above a ) has height greater than or equal to η. Equivalently, © ª (1.32) Rη (T, ρ) := x ∈ T : ∃ y ∈ T x ∈ [ρ, y], rT (x, y) ≥ η ∪ {ρ}. In particular, if T has height at most η, then Rη (T, ρ) is just the trivial tree consisting of the root ρ. Remark 1.9.1. Notice that the map described in (1.32) maps a metric space into a sub-space. However, since isometric spaces are mapped into

1.9. ROOTED SUBTREES AND TRIMMING

35

isometric sub-spaces, we may think of Rη as a map from Troot into Troot . ¤ Lemma 1.9.2 (Properties of the trimming map). (i) The range of Rη consists of finite rooted trees. (ii) The map Rη is continuous. (iii) The family of maps (Rη )η>0 is a semigroup, i.e., Rη0 ◦ Rη00 = Rη0 +η00 , for all η 0 , η 00 > 0. In particular, Rη0 (T, ρ) ¹root Rη00 (T, ρ), for all η 0 ≥ η 00 > 0. (iv) For any (T, ρ) ∈ Troot and η ≥ 0, dGHroot ((T, ρ), (Rη (T, ρ), ρ)) ≤ dH (T, Rη (T, ρ)) ≤ η. Proof. (i) Fix (T, r, ρ) ∈ Troot . Let E ⊂ Rη (T, ρ) be the leaves of Rη,ρ , that is, the points that have no subtree above them. We have to show that E is finite. However, if a1 , a2 , ... are infinitely many points in E \ {ρ}, then we can find points b1 , b2 , ... in T such that bi is in the subtree above ai and r(ai , bi ) ≥ η. It follows that inf i6=j r(bi , bj ) ≥ 2η, which contradicts the compactness of T . (ii) Suppose that (T 0 , r0 , ρ0 ) and (T 00 , r00 , ρ00 ) are two compact trees with dGHroot ((T 0 , ρ0 ), (T 00 , ρ00 )) < ε. By part (i) of Lemma 1.8.9, there exists a root-invariant 2ε-isometry f : T 0 → T 00 . Recall that this means, f (ρ0 ) = ρ00 , dis(f ) < 2ε, and f (T 0 ) is an 2ε-net for T 00 . For a ∈ Rη (T 0 , ρ0 ), let f¯(a) be the unique point in Rη (T 00 , ρ00 ) that is closest to f (a). We will show that f¯ : Rη (T 0 , ρ0 ) → Rη (T 00 , ρ00 ) is a root-invariant 25ε-isometry and hence, by part (ii) of Lemma 1.8.9, dGHroot (Rη (T 0 , ρ0 ), Rη (T 00 , ρ00 )) ≤ 32 25ε. We first show that © ª (1.33) sup r00 (f (a), f¯(a)) : a ∈ Rη (T 0 , ρ0 ) ≤ 8ε. Fix a ∈ Rη (T 0 , ρ0 ) and let b ∈ T 0 be a point in the subtree above a such that r0 (a, b) ≥ η. Denote the most recent common ancestor of f (a) and f (b) on T 00 by f (a) ∧00 f (b). Then ¡ ¢ r00 f (a) ∧00 f (b), f (a) ¢ 1 ¡ 00 r (f (a), f (b)) + r00 (ρ00 , f (a)) − r00 (ρ00 , f (b)) 2 ¯ 1 ¡¯¯ 00 r (f (a), f (b)) − r0 (a, b)¯ ≤ 2 ¯ ¯ ¯ ¯¢ +¯r00 (ρ00 , f (a)) − r0 (ρ0 , a)¯ + ¯r00 (ρ00 , f (b)) − r0 (ρ0 , b)¯

= (1.34)

≤ 3ε.

36

1. STATE SPACES I: R-TREES

a f (b)

a b a a

f (a) ∧00 f (b) a

0 a ρ

af (a)

00 a ρ

Figure 1.2. illustrates the shapes of the trees spanned by {ρ0 , a, b} and by {ρ00 , f (a), f (b)}. The point f¯(a) lies somewhere on the arc [ρ00 , f (a)]. If f¯(a) ∈ [f (a) ∧00 f (b), f (a)] then we are immediately done. Otherwise, ¯ f (a) ∈ [ρ00 , f (a)] and f¯(a) is a leaf in Rη (T 00 , ρ00 ). Hence f (b) 6∈ Rη (T 00 , ρ00 ), and therefore (1.35) r00 (f¯(a), f (b)) ≤ η. Furthermore, ¡ ¢ ¡ ¢ r00 f (a) ∧00f (b), f (b) = r00 (f (a), f (b)) − r00 f (a) ∧00f (b), f (a) ¡ ¢ (1.36) ≥ r0 (a, b) − 2ε − 3ε ≥ η − 5ε. Combining (1.34), (1.35) and (1.36) finally yields that r00 (f¯(a), f (a)) ≤ 8ε and completes the proof of (1.33). It follows from (1.33) that © ª dis(f¯) = sup |r0 (a, b) − r00 (f¯(a), f¯(b))| : a, b ∈ Rη (T 0 , ρ0 ) © ª ≤ sup |r0 (a, b) − r00 (f (a), f (b))| : a, b ∈ Rη (T 0 , ρ0 ) © ª (1.37) + 2 sup r00 (f (a), f¯(a)) : a ∈ Rη (T 0 , ρ0 ) < 2ε + 2 × 8ε = 18ε. The proof of (ii) will thus be completed if we can show that f¯(Rη (T 0 , ρ0 )) 00 is a 25ε-net in Rη (T 00,ρ ). Consider a point c ∈ Rη (T 00 , ρ00 ). We need to show that there is a point b ∈ Rη (T 0 , ρ0 ) such that (1.38) r00 (f¯(b), c) < 25ε. If r00 (ρ00 , c) < 7ε, then we are done, because we can take b = ρ0 (recall that f¯(ρ0 ) = ρ00 ). Assume, therefore, that r00 (ρ00 , c) ≥ 7ε. We can then find points c− , c+ ∈ T 00 such that ρ00 ≤ c− ≤ c ≤ c+ with r00 (c− , c) = 7ε and r00 (c, c+ ) ≥ η. There are corresponding points a− , a, a+ ∈ T 0 such that r00 (f (a− ), c− ) < 2ε, r00 (f (a), c) < 2ε, and r00 (f (a+ ), c+ ) < 2ε. We claim that b := a− ∧0 a+ (the most recent common ancestor of a− and a+ in the tree T 0 ) belongs to Rη (T 0 , ρ0 ) and satisfies (1.38).

1.9. ROOTED SUBTREES AND TRIMMING

37

Note first of all that 0

r (b, a+ ) = r0 (a− ∧0 a+ , a+ ) ¢ 1¡ 0 = r (a+ , a− ) + r0 (ρ0 , a+ ) − r0 (ρ0 , a− ) 2 1¡ ≥ r00 (f (a+ ), f (a− )) − 2ε + r00 (f (ρ0 ), f (a+ )) − 2ε 2 ¢ − r00 (f (ρ0 ), f (a− )) − 2ε ¢ 1 ¡ 00 ≥ r (c+ , c− ) − 4ε + r00 (ρ00 , c+ ) − 2ε − r00 (ρ00 , c− ) − 2ε − 3ε 2 = r00 (c+ , c− ) − 7ε = η + 7ε − 7ε η, and so b ∈ Rη (T 0 , ρ0 ). Furthermore, r00 (c, f (b)) ≤ r00 (c, c− ) + r00 (c− , f (a− )) + r00 (f (a− ), f (b)) ≤ 7ε + 2ε + r0 (a− , b) + 2ε ¢ 1¡ 0 r (a+ , a− ) + r0 (ρ0 , a− ) − r0 (ρ0 , a+ ) = 11ε + 2 1¡ ≤ 11ε + r00 (f (a+ ), f (a− )) + 2ε + r00 (f (ρ0 ), f (a− )) + 2ε 2 ¢ − r0 (f (ρ0 ), f (a+ )) + 2ε ¢ 1 ¡ 00 r (c+ , c− ) + 2ε + r00 (ρ00 , c− ) + 2ε − r00 (ρ00 , c+ ) + 2ε ≤ 14ε + 2 = 17ε. Therefore, by (1.33), r(c, f¯(b)) ≤ 17ε + 8ε = 25ε. This completes the proof of (1.38), and thus the proof of part (ii). Claims (iii) and (iv) are clear.

¤

Lemma 1.9.3 (Length of trimmed tree is continuous). Let Λ : Troot → R ∪ {∞} be the map that sends a tree to its total length. For η > 0, the map Λ ◦ Rη is continuous. Proof. For all η > 0 we have by Lemma 1.9.2 that: Rη = Rη/2 ◦ Rη/2 , the map Rη is continuous, and the range of Rη consists of finite trees. It therefore suffices to show for all η > 0 that if (T, d, ρ) is a fixed finite tree and (T 0 , d0 , ρ0 ) is any another finite tree sufficiently close to T , then Λ ◦ Rη (T 0 ) is close to Λ ◦ Rη (T ).

38

1. STATE SPACES I: R-TREES

Suppose, therefore, that (T, r, ρ) is a fixed finite tree with leaves {x1 , . . . , xn } and that (T 0 , r0 , ρ0 ) is another finite tree with dGHroot ((T, r, ρ), (T 0 , r0 , ρ0 )) < δ,

(1.39)

where δ is small enough that the conclusions of Lemma 5.5.2 hold. Consider a rooted subtree (T 00 , r0 , ρ0 ) of (T 0 , r0 , ρ0 ) and a map f¯ : T → T 00 with the properties guaranteed by Lemma 5.5.2. Set x0k = f¯(xk ) for 1 ≤ k ≤ n. Fix κ > 0. For 1 ≤ k ≤ n, write x ˆk ∈ T for the point on the arc [ρ, xk ] that is at distance κ ∧ r(ρ, xk ) from xk . Set x ˆ0 := ρ. Define x ˆ00 , . . . , x ˆ0n ∈ 00 T similarly. Note that Rκ (T ) is spanned by {ˆ x0 , . . . , x ˆn } and Rκ (T 00 ) is spanned by {ˆ x00 , . . . , x ˆ0n }. By Lemma 1.7.3, Λ ◦ Rκ (T ) (1.40)

= r(ˆ x0 , x ˆ1 ) +

n X

^

k=2 0≤i 0, with Lipschitz constants uniformly bounded by some constant C such that ³ ´ Λ ◦ Rκ (T ) = Fκ (r(xi , xj ))0≤i,j≤n and Λ ◦ Rκ (T 00 ) = Fκ

³¡ ´ ¢ r0 (x0i , x0j ) 0≤i,j≤n .

By construction |r(xi , xj ) − r0 (x0i , x0j )| < 8δ, and so |Λ ◦ Rκ (T ) − Λ ◦ Rκ (T 00 )| ≤ 8δC

1.9. ROOTED SUBTREES AND TRIMMING

39

for all κ > 0. Because dH (T 0 , T 00 ) < 3δ, we have Λ ◦ Rη (T 00 ) ≤ Λ ◦ Rη (T 0 ) ≤ Λ ◦ Rη−3δ (T 00 ). Thus

Λ ◦ Rη (T ) − 8δC ≤ Λ ◦ Rη (T 0 ) ≤ Λ ◦ Rη−3δ (T ) + 8δC. Since limδ↓0 Λ◦Rη−3δ (T ) = Λ◦Rη (T ), this suffices to establish the result. ¤ Finally, we require the following result, which will be the key to showing that the “projective limit” of a consistent family of tree-valued processes can actually be thought of as a tree-valued process in its own right. Lemma 1.9.4. Consider a sequence (Tn )n∈N of representatives of isometry classes of rooted compact trees in (T, dGHroot ) with the following properties. • Each set Tn is a subset of some common set U . • Each tree Tn has the same root ρ ∈ U . • The sequence (Tn )n∈N is nondecreasing, that is, T1 ⊆ T2 ⊆ · · · ⊆ U . • Writing rn for the metric on Tn , for m < n the restriction of rn to TmScoincides with rm , so that there is a well-defined metric on T := n∈N Tn given by (1.43)

r(a, b) = rn (a, b),

a, b ∈ Tn .

• The sequence of subsets (Tn )n∈N is Cauchy in the Hausdorff distance with respect to r. Then the following hold. (i) The metric completion T¯ of T is a compact R-tree, and dH (Tn , T¯) → 0 as n → ∞, where the Hausdorff distance is computed with respect to the extension of d to T¯. In particular, (1.44) lim d root ((Tn , ρ), (T¯, ρ)) = 0. n→∞ GH

S (ii) The tree T¯ has skeleton T¯o = n∈N Tno . (iii) The measure on T¯ is the unique measure concentrated on S length o n∈N Tn that restricts to the length measure on Tn for each n ∈ N. Proof. (i) Because T¯ is a complete metric space, the collection of closed subsets of T¯ equipped with the Hausdorff distance is also complete (see, for example, Proposition 7.3.7 of [BBI01]). Therefore the Cauchy sequence (T the closure of S n )n∈N has a¯ limit that is (see Exercise 7.3.4 of [BBI01]) ¯ is totally bounded, T , i.e, T itself. It is clear that the complete space T k k∈N path-connected, and satisfies the four point condition, and so T¯ is a compact R-tree. Finally, (1.45) d root ((Tn , ρ), (T¯, ρ)) ≤ dH (Tn , T¯) ∨ d(ρ, ρ) = dH (Tn , T¯) → 0, GH

as n → ∞. Claims (ii) and (iii) are obvious.

¤

40

1. STATE SPACES I: R-TREES

1.10. Compact sets in T In this section we state a necessary and sufficient condition for a subset of T to be relatively compact. For ε > 0, T ∈ T, and ρ ∈ T , recall from (1.32) the ε-trimming Rε (T, ρ) relative to the root ρ of the compact R-tree T . Then set ½ T ρ∈T Rε (T, ρ), diam(T ) > ε, (1.46) Rε (T ) := singleton, diam(T ) ≤ ε, where by singleton we mean the trivial R-tree consisting of one point. The tree Rε (T ) is called the ε-trimming of the compact R-tree T . Proposition 1.10.1 (A characterization of pre-compactness in T). A subset Γ of (T, dGH ) is relatively compact if and only if for all ε > 0, © ª (1.47) sup µT (Rε (T )) : T ∈ Γ < ∞. The proof relies on the following estimate. T (T ) < ∞. For each ε > 0 there Lemma 1.10.2. Let T ∈ T be such that £ εµ −1 ¤£ ¤ is an ε-net for T of cardinality at most ( 2 ) µT (T ) ( 2ε )−1 µT (T ) + 1

Proof. Note that an 2ε -net for R 2ε (T ) will be an ε-net for T . The set T \ R 2ε (T ) is a collection of disjoint subtrees, one for each leaf of R 2ε (T ), and each such subtree is of diameter at least 2ε . Thus the number of leaves of R 2ε (T ) is at most ( 2ε )−1 µT (T ). Enumerate the leaves of R 2ε (T ) as x0 , x1 , . . . , xn . Each arc [x0 , xi ], 1 ≤ i ≤ n, of R 2ε (T ) has an 2ε -net of cardinality at most ( 2ε )−1 rT (x0 , xi ) + 1 ≤ ( 2ε )−1 µT (T ) + 1. Therefore, by taking the union of these nets, R 2ε (T ) has an 2ε -net of cardinality at most £ ε −1 T ¤£ ¤ ( 2 ) µ (T ) ( 2ε )−1 µT (T ) + 1 . ¤ Remark 1.10.3. The bound in Lemma 1.10.2 is far from optimal. It can be shown that T has an ε-net with a cardinality that is of order µT (T )/ε. This is clear for finite trees (that is, trees with a finite number of branch points), where we can traverse the tree with a unit speed path and hence think of the tree as an image of the interval [0, 2µT (T )] by a Lipschitz map with Lipschitz constant 1, so that a covering of the interval [0, 2µT (T )] by ε-balls gives a covering of T by ε-balls. This argument can be extended to arbitrary finite length R-trees, but the details are tedious and so we have contented ourselves with the above simpler bound. ¤ Proof of Proposition 1.10.1. The “only if” direction follows from the fact that T 7→ µT (Rε (T )) is continuous, by Lemma 1.9.2. Conversely, suppose that (1.47) holds. Given T ∈ Γ, an ε-net for Rε (T ) for T . By ¤ £ Lemma 1.10.2, Rε (T¤) has an ε-net of cardinality at most £is aε 2ε-net ( 2 )−1 µT (Rε (T )) ( 2ε )−1 µT (Rε (T )) + 1 . By assumption, the last quantity

1.11. WEIGHTED R-TREES

41

is uniformly bounded in T ∈ Γ. Thus Γ is uniformly totally bounded and hence is relatively compact by Theorem 7.4.15 of [BBI01]. ¤ 1.11. Weighted R-trees As usual, given a topological space (X, O), we denote by M1 (X) be space of all probability measures on X equipped with the Borel-σ-algebra B(X). The push forward of ν under a measurable map ϕ from X into another metric space (Z, rZ ) is the probability measure ϕ∗ ν ∈ M1 (Z) defined by ¡ ¢ (1.48) ϕ∗ ν(A) := ν ϕ−1 (A) , for all A ∈ B(Z). In the following we will be interested in compact R-trees (T, r) ∈ T equipped with a probability measure ν on the Borel σ-field B(T ). We call such objects weighted compact R-trees. Definition 1.11.1 (Weight preserving isometry). A function ξ : X1 → X2 is called a weight preserving isometry between two weighted R-trees (X1 , r1 , ν1 ) and (X2 , r2 , ν2 ) if and only if the function ξ is an isometry from X1 to X2 with ν2 = φ∗ ν1 . It is clear that the property of being weight-preserving isometric is an equivalence relation. Denote by Twt the space of weight-preserving isometry classes of weighted compact R-trees. We want to equip Twt with a Gromov-Hausdorff type of distance which incorporates the weights on the trees. For that purpose we first introduce some notions that will be used in the definition. Recall from Definition 1.8.8 and Example 1.2.4 the notion of an εisometry f between two metric spaces (X, rX ) and (Y, rY ) and its distortion dis(f ), respectively. It is easy to see ¡ that if for two¢ metric spaces (X, rX ) and (Y, rY ) and ε > 0 we have dGH (X, rX ), (Y, rY ) < ε, then there exists a (possibly nonmeasurable) 2ε-isometry from X to Y (compare Lemma 7.3.28 in [BBI01] and Lemma 1.8.9). The following Lemma states that we may choose the distorted isometry between X and Y to be measurable if we allow a slightly bigger distortion. Lemma ¡ 1.11.2. Let (X, ¢ rX ) and (Y, rY ) be two compact real trees such that dGH (X, rX ), (Y, rY ) < ε for some ε > 0. Then there exists a measurable 3ε-isometry from X to Y . ¡ ¢ Proof. If dGH (X, rX ), (Y, rY ) < ε, then there exists a correspondence R between X and Y such that dis(R) < 2ε, by Proposition 1.2.6. Since (X, rX ) is compact there exists a finite ε-net in X. We claim that for each such finite ε-net, S X,ε = {x1 , ..., xN ε } ⊆ X, any set S Y,ε = {y1 , ..., yN ε } ⊆ Y such that (xi , yi ) ∈ R for all i ∈ {1, 2, ..., N ε } is an 3ε-net in Y . To see

42

1. STATE SPACES I: R-TREES

this, fix y ∈ Y . We have to show the existence of i ∈ {1, 2, ..., N ε } with rY (yi , y) < 3ε. For that choose x ∈ X such that (x, y) ∈ R. Since S X,ε is an ε-net in X there exists an i ∈ {1, 2, ..., N ε } such that rX (xi , x) < ε. (xi , yi ) ∈ R implies therefore that |rX (xi , x) − rY (yi , y)| ≤ dis(R) < 2ε, and hence rY (yi , y) < 3ε. Furthermore we may decompose X into N ε possibly empty measurable disjoint subsets of X by letting X 1,ε := B(x1 , ε), X 2,ε := B(x2 , ε) \ X 1,ε , and so on, where B(x, r) is the open ball {x0 ∈ X : rX (x, x0 ) < r}. Then f defined by f (x) = yi for x ∈ X i,ε is obviously a measurable 3ε-isometry from X to Y . ¤ We also need to recall the definition of the Prohorov metric between two probability measures ν1 and ν2 on a common metric space (Z, rZ ) defined by n o ¢ (Z,r ) ¡ (1.49) dPr Z ν1 , ν2 := inf ε > 0 : ν1 (F ) ≤ ν2 (F ε ) + ε, ∀ F closed with (1.50)

© ª F ε := x ∈ Z : inf r(x, y) < ε . y∈F

The Prohorov distance is a metric on the collection of probability measures on X (see, for example, [EK86]). The following result shows that if we push measures forward with a map having a small distortion, then Prohorov distances can’t increase too much. Lemma 1.11.3. Suppose that (X, rX ) and (Y, rY ) are two metric spaces, f : X → Y is a measurable map with dis(f ) ≤ ε, and µ and ν are two probability measures on X. Then (1.51)

dPr (f∗ µ, f∗ ν) ≤ dPr (µ, ν) + ε.

Proof. Suppose that dPr (µ, ν) < δ. By definition, µ(F ) ≤ ν(F δ ) + δ, for all closed sets F ⊆ A. If D is a closed subset of Y , then f∗ µ(D) = µ(f −1 (D)) (1.52)

≤ µ(f −1 (D)) δ

≤ ν(f −1 (D) ) + δ = ν(f −1 (D)δ ) + δ.

Now x0 ∈ f −1 (D)δ means there is x00 ∈ X such that rX (x0 , x00 ) < δ and ∈ D. By the assumption that dis(f ) ≤ ε, we have rY (f (x0 ), f (x00 )) < δ + ε, and hence f (x0 ) ∈ Dδ+ε . Thus

f (x00 )

(1.53)

f −1 (D)δ ⊆ f −1 (Dδ+ε )

and we have (1.54)

f∗ µ(D) ≤ ν(f −1 (Dδ+ε )) + δ = f∗ ν(Dδ+ε ) + δ,

1.11. WEIGHTED R-TREES

43

so that dPr (f∗ µ, f∗ ν) ≤ δ + ε, as required.

¤

We are now in a position to define the weighted Gromov-Hausdorff distance between the two compact, weighted R-trees (X, rX , νX ) and (Y, rY , νY ). For ε > 0, set © ª ε (1.55) FX,Y := measurable ε-isometries from X to Y . Put (1.56)

∆GHwt (X, Y ) ( := inf

ε ,g ∈ Fε exist f ∈ FX,Y Y,X such that ε>0: dPr (f∗ νX , νY ) ≤ ε, dPr (νX , g∗ νY ) ≤ ε

) .

Note that the set on the right hand side is non-empty because X and Y are compact, and hence bounded. It will turn out that ∆GHwt satisfies all the properties of a metric except the triangle inequality. To rectify this, let (n−1 ) X 1 (1.57) dGHwt (X, Y ) := inf ∆GHwt (Zi , Zi+1 ) 4 , i=1

where the infimum is taken over all finite sequences of compact, weighted R-trees Z1 , . . . Zn with Z1 = X and Zn = Y . Lemma 1.11.4. The map dGHwt : Twt × Twt → R+ is a metric on Twt . Moreover, 1 1 1 ∆GHwt (X, Y ) 4 ≤ dGHwt (X, Y ) ≤ ∆GHwt (X, Y ) 4 2 for all X, Y ∈ Twt . Proof. It is immediate from (1.56) that the map ∆GHwt is symmetric. We next claim that ¡ ¢ (1.58) ∆GHwt (X, rX , νX ), (Y, rY , νY ) = 0, if and only if (X, rX , νX ) and (Y, rY , νY ) are weight-preserving isometric. The “if” direction is immediate. Note first for the converse that (1.58) implies that for all ε > 0 there exists an ε-isometry from X ¡ ¢ to Y , and therefore, by Lemma 7.3.28 in [BBI01], d (X, r ), (Y, r ) < 2ε. Thus GH X Y ¡ ¢ dGH (X, rX ), (Y, rY ) = 0, and it follows from Theorem 7.3.30 of [BBI01] that (X, rX ) and (Y, rY ) are isometric. Checking the proof of that result, we see that we can construct an isometry f : X → Y by taking any dense countable set S ⊂ X, any sequence of functions (fn ) such that fn is an εn -isometry with εn → 0 as n → ∞, and letting f be limk fnk along any subsequence such that the limit exists for all x ∈ S (such a subsequence exists by the compactness of Y ). Therefore, fix some dense subset S ⊂ X and suppose without loss of generality that we have an isometry f : X → Y given εn by f (x) = limn→∞ fn (x), x ∈ S, where fn ∈ FX,Y , dPr (fn∗ νX , νY ) ≤ εn , and

44

1. STATE SPACES I: R-TREES

limn→∞ εn = 0. We will be done if we can show that f∗ νX = νY . If µX is a discrete measure with atoms belonging to S, then h dPr (f∗ νX , νY ) ≤ lim sup dPr (fn∗ νX , νY ) + dPr (fn∗ µX , fn∗ νX ) n i (1.59) + dPr (f∗ µX , fn∗ µX ) + dPr (f∗ νX , f∗ µX ) ≤ 2dPr (µX , νX ), where we have used Lemma 1.11.3 and the fact that dPr (f∗ µX , fn∗ µX ) → 0, as n → ∞, because of the pointwise convergence of fn to f on S. Because we can choose µX so that dPr (µX , νX ) is arbitrarily small, we see that f∗ νX = νY , as required. Now consider three spaces (X, rX , νX ), (Y, ¡ rY , νY ), and (Z, rZ , ν¢Z ) in wt Twt , and constants ε, δ > 0, such that ∆ (X, rX , νX ), (Y, rY , νY ) < ε GH ¡ ¢ ε and ∆GHwt (Y, rY , νY ), (Z, rZ , νZ ) < δ. Then there exist f ∈ FX,Y and δ g ∈ FY,Z such that dPr (f∗ νX , νY ) < ε and dPr (g∗ νY , νZ ) < δ. Note that ε+δ g ◦ f ∈ FX,Z . Moreover, by Lemma 1.11.3, ¡ ¢ ¡ ¢ ¡ ¢ dPr (g ◦ f )∗ νX , νZ ≤ dPr g∗ νY , νZ + dPr g∗ f∗ νX , g∗ νY (1.60) < δ + ε + δ. This, and a similar argument with the roles of X and Z interchanged, shows that £ ¤ (1.61) ∆GHwt (X, Z) ≤ 2 ∆GHwt (X, Y ) + ∆GHwt (Y, Z) . The second inequality in the statement of the lemma is clear. In order to see the first inequality, it suffices to show that for any Z1 , . . . Zn we have 1

(1.62)

∆GHwt (Z1 , Zn ) 4 ≤ 2

n−1 X

1

∆GHwt (Zi , Zi+1 ) 4 .

i=1

We will establish (1.62) by induction. The inequality certainly holds when n = 2. Suppose it holds for 2, . . . , n − 1. Write S for the value of the sum on the right hand side of (1.62). Put (1.63)

m−1 X © ª 1 k := max 1 ≤ m ≤ n − 1 : ∆GHwt (Zi , Zi+1 ) 4 ≤ S/2 . i=1

By the inductive hypothesis and the definition of k, 1

(1.64)

∆GHwt (Z1 , Zk ) 4 ≤ 2

k−1 X

1

∆GHwt (Zi , Zi+1 ) 4

i=1

≤ 2(S/2) = S. Of course, (1.65)

1

∆GHwt (Zk , Zk+1 ) 4 ≤ S

1.11. WEIGHTED R-TREES

45

By definition of k, (1.66)

k X

1

∆GHwt (Zi , Zi+1 ) 4 > S/2,

i=1

so that once more by the inductive hypothesis, n−1 X

1

∆GHwt (Zk+1 , Zn ) 4 ≤ 2

1

∆GHwt (Zi , Zi+1 ) 4

i=k+1

(1.67) = 2S − 2

k X

1

∆GHwt (Zi , Zi+1 ) 4

i=1

≤ S. From (1.64), (1.65), (1.67) and two applications of (1.61) we have 1

(1.68)

∆GHwt (Z1 , Zn ) 4 n £ ¤o 41 ≤ 4 ∆GHwt (Z1 , Zk ) + ∆GHwt (Zk , Zk+1 ) + ∆GHwt (Zk+1 , Zn ) ¡ ¢1 ≤ 4 × 3 × S4 4 ≤ 2S,

as required. It is obvious by construction that dGHwt satisfies the triangle inequality. The other properties of a metric follow from the corresponding properties we have already established for ∆GHwt and the bounds in the statement of the lemma which we have already established. ¤ The procedure we used to construct the weighted Gromov-Hausdorff metric dGHwt from the semi-metric ∆GHwt was adapted from a proof in [Kel75] of the celebrated result of Alexandroff and Urysohn on the metrizability of uniform spaces. That proof was, in turn, adapted from earlier work of Frink and Bourbaki. The choice of the power 14 is not particularly special, any sufficiently small power would have worked. Theorem 1.11.7 below says that the metric space (Twt , dGHwt ) is complete and separable and hence is a reasonable space on which to do probability theory. In order to prove this result, we need a compactness criterion that will be useful in its own right. Proposition 1.11.5 (A characterization of pre-compactness in Twt ). A subset Γwt of (Twt , dGHwt ) is relatively compact if and only if the subset Γ := {(T, r) : (T, r, ν) ∈ Γwt } in (T, dGH ) is relatively compact. Together with Proposition 1.10.1 we immediately obtain the following.

46

1. STATE SPACES I: R-TREES

Corollary 1.11.6. A subset Γwt of (Twt , dGHwt ) is relatively compact if and only if ¡ ¢ (1.69) sup µT Rε (T ) < ∞, (T,r,µ)∈Γ

for all ε > 0. Proof. The “only if” direction is clear. Assume for the converse that Γ is relatively compact. Suppose that ((Tn , rTn , νTn ))n∈N is a sequence in Γwt . By assumption, ((Tn , rTn ))n∈N has a subsequence converging to some point (T, rT ) of (T, dGH ). For ease of notation, we will renumber and also denote this subsequence by ((Tn , rTn ))n∈N . For brevity, we will also omit specific mention of the metric on a real tree when it is clear from the context. By Proposition 7.4.12 in [BBI01], for each ε > 0 there is a finite ε-net T ε ε,#Tnε in T and for each n ∈ N a finite ε-net Tnε := {xε,1 } in Tn such that n , ..., xn ε ε ε dGH (Tn , T ) → 0 as n → ∞. Moreover, we take #Tn = #T ε = N ε , say, for n sufficiently large, and so, by passing to a further subsequence if necessary, we may assume that #Tnε = #T ε = N ε for all n ∈ N. We may then assume ε,j that Tnε and T ε have been indexed so that that limn→∞ rTn (xε,i n , xn ) = rT (xε,i , xε,j ) for 1 ≤ i, j ≤ N ε . We may begin with the balls of radius ε around each point of Tnε and decompose Tn into N ε possibly empty, disjoint, measurable sets ε {Tnε,1 , ..., Tnε,N } of radius no greater than ε. Define a measurable map ε,i ε fnε : Tn → Tnε by fnε (x) = xε,i n if x ∈ Tn and let gn be the inclusion map from ε ε Tnε to Tn .¡ By construction, ¢ fn and gn are measurable ¡ ε-isometries. ¢Moreε ε over, dPr (gn )∗ (fn )∗ νn , νn < ε and, of course, dPr (fnε )∗ νn , (fnε )∗ νn = 0. Thus, ¡ ¢ (1.70) ∆GHwt (Tnε , (fnε )∗ νn ), (Tn , νn ) ≤ ε. ε,i By similar reasoning, if we define hεn : Tnε → T ε by xε,i n 7→ x , then ¡ ε ε ¢ (1.71) lim ∆GHwt (Tn , (fn )∗ νn ), (T ε , (hεn )∗ νn ) = 0. n→∞

Since T ε is finite, by passing to a subsequence (and relabeling as before) we have ¡ ¢ (1.72) lim dPr (hεn )∗ νn , ν ε = 0 n→∞

for some probability measure ν ε on T ε , and hence ¡ ¢ (1.73) lim ∆GHwt (T ε , (hεn )∗ νn ), (T ε , ν ε ) = 0. n→∞

Therefore, by Lemma 1.11.4, ¡ ¢ 1 (1.74) lim sup dGHwt (Tn , νn ), (T ε , (hεn )∗ νn ) ≤ ε 4 . n→∞

Now, since (T, rT ) is compact, the family of measures {ν ε : ε > 0} is relatively compact, and so there is a probability measure ν on T such that ν ε converges to ν in the Prohorov distance along a subsequence ε ↓ 0

1.12. DISTRIBUTIONS OF RANDOM (WEIGHTED) REAL TREES

47

and hence, by arguments similar to the above, along the same subsequence ∆GHwt ((T ε , ν ε ), (T, ν)) converges to 0. Again applying Lemma 1.11.4, we have that dGHwt ((T ε , ν ε ), (T, ν)) converges to 0 along this subsequence. Combining the foregoing, we see that by passing to a suitable subsequence and relabeling, dGHwt ((Tn , νn ), (T, ν)) converges to 0, as required. ¤ Theorem 1.11.7. The metric space (Twt , dGHwt ) is complete and separable. Proof. Separability follows readily from separability of (T, dGH ) (see Theorem 1.5.9), and the separability with respect to the Prohorov distance of the probability measures on a fixed complete, separable metric space (see, for example, [EK86]), and Lemma 1.11.4. It remains to establish completeness. By a standard argument, it suffices to show that any Cauchy sequence in Twt has a convergent subsequence. Let (Tn , rTn , νn )n∈N be a Cauchy sequence in Twt . Then (Tn , rTn )n∈N is a Cauchy sequence in T by Lemma 1.11.4. By Theorem 1.5.9 there is a T ∈ T such that dGH (Tn , T ) → 0, as n → ∞. In particular, the sequence (Tn , rTn )n∈N is relatively compact in T, and therefore, by Proposition 1.11.5, (Tn , rTn , νn )n∈N is relatively compact in Twt . Thus (Tn , rTn , νn )n∈N has a convergent subsequence, as required. ¤ 1.12. Distributions of random (weighted) real trees In section we consider characterizations of tightness which are particularly useful in approximating random trees. The following simple result is based on the characterization of precompactness in (T, dGH ) as formulated in Proposition 1.10.1. Proposition 1.12.1 (A characterization of tightness in M1 (T)). Fix an index set I 6= ∅. A family A := {Tα ; α ∈ I} of T-valued random variables is relatively compact with respect to the Gromov-strong topology if and only if the family Bη := {µTα (Rη (Tα )); α ∈ I} of R+ -valued random variables is relatively compact, for all η > 0. Proof. For the “only if” direction assume that A is tight and fix ε, η > 0. By definition, we find a compact set Γε in (T, dGH ) such that inf α∈I P{Tα ∈ Γε } > 1 − ε. Since Γε is compact there exists, by Proposition 1.10.1, an N = N (ε, η) > 0 and such that µT (Rη (T )) ≤ N , for all T ∈ Γε . Hence, for all α ∈ I, © ª ¡ ¢ ¡ ¢ (1.75) P µTα (Rη (Tα )) > N = P {µTα (Rη (Tα )) > N } \ Γε ≤ P {Γε ≤ ε. Therefore, since ε, η > 0 were chosen arbitrarely, Bη is a tight family of R+ -valued random variables, for all η > 0.

48

1. STATE SPACES I: R-TREES

For the “if” direction assume that Bη is relatively compact and fix ε > 0. Then for all n ∈ N there exists Nn = Nn (ε) such that ε (1.76) inf P{µTα (R2−n (Tα )) > Nn } > 1 − n . α∈I 2 Let Γε,n := {Tα ; µTα (R2−n (Tα )) ≤ Nn }. By Lemma 1.9.2(ii) Γε,n is closed, for all n ∈ N. Put \ Γε,2−n . (1.77) Γε := n∈N

Then also Γε is closed and hence, by Proposition 1.10.1, compact. Since ε > 0 was chosen arbitrarily, it follows therefore together with (1.76) that the family A is relatively compact. ¤ From Proposition 1.12.1 together with Proposition 1.11.5 we immediately obtain the following. Corollary 1.12.2 (A characterization of tightness in M1 (Twt )). Fix an index set I 6= ∅. A family Awt := {(Tα , να ); α ∈ I} of Twt -valued random variables is relatively compact with respect to the weighted Gromov-strong topology if and only if the family Bη := {µTα (Rη (Tα )); α ∈ I} of R+ -valued random variables is relatively compact, for all η > 0.

CHAPTER 2

State spaces II: The space of metric measure trees In this chapter we study convergence of metric measure spaces with the main emphasis to exploit the second aspect of Aldous’s philosophy of convergence without using Aldous’s particular embedding. That is, we consider the space of separable and complete real trees which are equipped with a probability measure and give a notion of convergence such that a sequence of trees (equipped with a probability measure) converges to a limit tree (equipped with a probability measure) if and only if all randomly sampled finite subtrees converge to the corresponding limit subtrees. Since the construction of the topology works not only for tree-like metric spaces, but also for the space (of measure preserving isometry classes) of metric measure spaces we formulate everything within this framework. The resulting topology is referred to as the Gromov-weak topology. This chapter is organized as follows. In Section 2.2 we introduce the Gromov-Prohorov metric as a candidate for a complete metric which generates the Gromov-weak topology and show that the generated topology is separable. As a technical preparation we collect results on the modulus of mass distribution and the distance distribution (see Definition 2.3.1) in Section 2.3. In Sections 2.4 we give characterizations on pre-compactness for the topology generated by the Gromov-Prohorov metric. In Section 2.5 we prove that the topology generated by the Gromov-Prohorov metric coincides with the Gromov-weak topology. In Section 2.6 we discuss the closed sub-space of ultra-metric measure spaces whose elements are (often phylogenetic analysis) referred to as ultra-metric trees. In Subsection 2.7 we give a pre-compactness criterion for the sub-space of compact metric measure spaces. Section 2.8 discusses convergence in distribution and characterizes tightness. Finally, in Section 2.9 we provide several other metrics that generate the Gromov-weak topology. Remark 2.0.3 (Gromov’s Chapter 3 12 ). Even so the material presented in this chapter was developed independently of Gromov’s work, the most important ideas formulated in Sections 2.1 through 2.5 are already contained in Chapter 3 12 in [Gro99]. A more detailed look shows that the GromovProhorov metric coincides with Gromov’s ¤1 -metric, and hence parts of the pre-compactness characterizations given in Section 2.4 are - maybe not in an 49

50

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

obvious way - implicitly stated in Proposition 3 21 .D in [Gro99]. However, the characterization of tightness and the equivalent metrics are novel. ¤ 2.1. The Gromov-weak topology As before, given a topological space (X, O), we denote by M1 (X) be space of all probability measures on X equipped with the Borel-σ-algebra B(X). Recall from (1.48) the push forward of a measure under a map and that the support of µ, supp(µ), is the smallest closed set X0 ⊆ X such that µ(X \ X0 ) = 0. In the following we focus on complete and separable metric spaces. Definition 2.1.1 (Metric measure space). A metric measure space is a complete and separable metric space (X, r) which is equipped with a probability measure µ ∈ M1 (X). We write M for the space of measure-preserving isometry classes of complete and separable metric measure spaces, where we say that (X, r, µ) and (X 0 , r0 , µ0 ) are measure-preserving isometric if there exists an isometry ϕ between the supports of µ on (X, r) and of µ0 on (X 0 , r0 ) such that µ0 = ϕ∗ µ. It is clear that the property of being measure-preserving isometric is an equivalence relation. We abbreviate (X, r, µ) for a whole isometry class of metric spaces whenever no confusion seems to be possible. Remark 2.1.2. (i) Metric measure spaces, or short mm-spaces, are discussed in [Gro99] in detail. Therefore they are sometimes also referred to as Gromov metric triples (see, for example, [Ver98]). (ii) Recall from Remark 1.1.1 that we have to be careful to deal with sets in the sense of the Zermelo-Fraenkel axioms since M will turns out to be Polish by Theorem 2.1.10. Hence if P ∈ M1 (M) then the measure preserving isometry class represented by M equipped with P yields an element in M. The way out is once more to define M as the space of measure preserving isometry classes of those metric spaces equipped with a probability measure whose elements are not themselves metric spaces. ¤ We are typically only interested in functions of metric measure trees that do not describe artifacts of the chosen representation, i.e., which are invariant under measure-preserving isometries. These are of a special form which we introduce next. For a metric space (X, r) we define by ( (N2) N X (X,r) ¡ ¢ →R ¡+ , ¢ (2.1) R : (xi )i≥1 7→ r(xi , xj ) 1≤i 0. Then there exists N = Nε such that ¡ ¢ (2.33) dGPr Xn , X < ε, for all n ≥ N . By definition, for all n ≥ N , there are a metric space (Yn , rYn ) and isometric embeddings ϕ˜X and ϕ˜Xn from supp(µX ) and supp(µXn ), respectively, to (Yn , rYn ) with ¢ (Y ,r ) ¡ (2.34) dPrn Yn (ϕ˜X )∗ µX , (ϕ˜X )∗ µXn < ε. Put (2.35)

© ª Rn := (x, x0 ) ∈ X × Xn : rYn (ϕ˜X (x), ϕ˜Xn (x0 )) < ε .

Once more, since (2.34) implies the existence of a coupling µ ˜n of (ϕ˜X )∗ µX and (ϕ˜Xn )∗ µXn such that © ª (2.36) µ ˜n (x, x0 ) : rYn (y, y 0 ) < ε > 1 − ε, Rn is non-empty.

2.3. DISTANCE DISTRIBUTION AND MODULUS OF MASS DISTRIBUTION

59

Recall from Lemma 1.3.2, for two metric spaces (X1 , rX1 ) and (X2 , rX2 ) R and a relation R ∈ RX1 ,X2 , the metric rX on X1 t X2 which extends the 1 ,X2 R metrics rX1 and rX2 such that rX1 ,X2 (x1 , x2 ) = 12 dis(R), for all (x1 , x2 ) ∈ R. Then by (2.35) together with (2.36), Rn ¢ (XtXn ,rXtX )¡ n (2.37) dPr (ϕX )∗ µX , (ϕXn )∗ µXn ≤ ε where ϕX and ϕXn are the canonical isometric embeddings from X and Xn to X t Xn , respectively. Rn Using the metric spaces (X t Xn , rXtX ) we define recursively metrics n Fn rZn on Zn : X t k=1 Xn . Starting with n = 0, we set (Z0 , rZ0 ) := (X, r). Next, assume we are given a metric rZn on Zn . Consider the isometric embeddings ψkn from Xk to Zn , for k = 0, ..., n (with X0 := X) which arise from the canonical embedding of Xk in Zn . Define for all n ∈ N, © ª ˜ n := (z, x) ∈ Zn × Xn+1 : ((ψ n )−1 (z), x) ∈ Rn (2.38) R n

˜n rZRn+1

which defines metrics on Zn+1 via (1.7). By this F procedure we obtain in the limit a separable metric space (Z 0 := X t ∞ n=1 Xn , rZ 0 ). Denote its completion by (Z, rZ ) and isometric embeddings from Xn to Z which arise by the canonical embedding by ψn , n ∈ N. Observe that the restriction of rZ to X t Xn is isometric to Rn (X t Xn , rXtX ) and thus n ¢ (Z,r ) ¡ (2.39) dPr Z (ψ0 )∗ µX , (ψn )∗ µXn ≤ ε by (2.37). So the claim follows.

¤

2.3. Distance distribution and modulus of mass distribution In order to obtain later in criteria for tightness of a family of laws of random elements in M we need a characterization of the compact sets of (M, OM ). Informally, a subset of M will turn out to be pre-compact iff the corresponding sequence of probability measures put most of their mass on subspaces of a uniformly bounded diameter, and if the contribution of points which do not carry much mass in their vicinity is small. These two criteria lead to the following definitions. Definition 2.3.1 (Distance and modulus of mass distribution). Let X = (X, r, µ) ∈ M. (i) The distance distribution, which is an element in M1 ([0, ∞)), is given by wX := r∗ µ⊗2 , i.e., © ª wX (·) := µ⊗2 (x, x0 ) : r(x, x0 ) ∈ · . (2.40) (ii) For δ > 0, define the modulus of mass distribution as n o © ª (2.41) vδ (X ) := inf ε > 0 : µ x ∈ X : µ(Bε (x)) ≤ δ ≤ ε .

60

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

Remark 2.3.2. Observe that wX and vδ are well-defined because they are constant on isometry classes of a given metric measure space. ¤ In this section we provide results on the distance distribution and on the modulus of mass distribution. These will be heavily used in the following sections, where we present metrics which are equivalent to the GromovProhorov metric and which are very helpful in proving the characterizations of compactness and tightness in the Gromov-Prohorov topology. We start by introducing the random distance distribution of a given metric measure space. Definition 2.3.3 (Random distance distribution). Let X = (X, r, µ) ∈ M. For each x ∈ X, define the map rx : X → [0, ∞) by rx (x0 ) := r(x, x0 ), and put µx := (rx )∗ µ ∈ M1 ([0, ∞)), i.e., µx defines the distribution of distances to the point x ∈ X. Moreover, define the map rˆ : X → M1 ([0, ∞)) by rˆ(x) := µx , and let (2.42)

µ ˆX := rˆ∗ µ ∈ M1 (M1 ([0, ∞))

be the random distance distribution of X . Notice first that the random distance distribution does not characterizes the metric measure space uniquely. We will illustrate this with an example. Example 2.3.4. Consider the following two metric measure spaces: 1 20

•........



1 20

1 20

•........



2 20

2 20





2 20

1 20





2 20

3 20





3 20

4 20





3 20

4 20





4 20

4 20





3 20

... ... ... ... ... ... . ... . .. ... ... ... ... ... ... ... ... ....... . . .... . . . ....... .. ....... ....... ..... ... ......... ....... .. ... ........ .......... ........................................................ ... ............. ......... ... ........ ....... ... ... ........ ....... ..... ....... ... ....... ..... .. ... ....... . . ... .. . . ... .. . . . ... .. . . . ... ... ... ... ... ... ... ... ... ... .

X

... ... ... ... ... ... . ... . .. ... ... ... ... ... ... ... ... ....... . . .... . . . ....... .. ....... ....... ..... ... ......... ....... .. ... ........ .......... ........................................................ ... ............. ......... ... ........ ....... ... ... ........ ....... ..... ....... ... ....... ..... .. ... ....... . . ... .. . . ... .. . . . ... .. . . . ... ... ... ... ... ... ... ... ... ... .

Y

That is, both spaces consist of 8 points. The distance between two points equals the minimal number of edges one has to cross to come from one point to the other. The measures µX and µY are given by numbers in the figure. We find that 1 µ ˆX = µ ˆY = 10 δ1 + 51 δ 1 9 1 2 1 20 δ0 + 20 δ1 + 2 δ2 10 δ0 + 5 δ1 + 2 δ2 (2.43) 3 δ3 + 10 + 25 δ 1 7 1 3 1 . 20 δ0 + 20 δ1 + 2 δ2

5 δ0 + 10 δ1 + 2 δ2

Hence, the random distance distributions agree. But obviously, X and Y are not measure preserving isometric. ¤ Recall the distance distribution w· and the modulus of mass distribution vδ (·) from Definition 2.3.1. Both can be expressed through the random

2.3. DISTANCE DISTRIBUTION AND MODULUS OF MASS DISTRIBUTION

61

distance distribution µ ˆ(·). These facts follow directly from the definitions, so we omit the proof. Lemma 2.3.5 (Reformulation of w· and vδ (·) in terms of µ ˆ(·)). Let X ∈ M. (i) The distance distribution wX satisfies Z (2.44) wX = µ ˆX (dν) ν. M1 ([0,∞))

(ii) For all δ > 0, the modulus of mass distribution vδ (X ) satisfies © ª (2.45) vδ (X ) = inf ε > 0 : µ ˆX {ν ∈ M1 ([0, ∞)) : ν([0, ε]) ≤ δ} ≤ ε . The next result will be used frequently. Lemma 2.3.6. Let X = (X, r, µ) ∈ M and δ > 0. If vδ (X ) < ε, for some ε > 0, then © ª (2.46) µ x ∈ X : µ(Bε (x)) ≤ δ < ε. © Proof. Byª definition of vδ (·), there exists ε0 < ε for which µ x ∈ X : µ(Bε0 (x)) ≤ δ ≤ ε0 . Consequently, since {x : µ(Bε (x)) ≤ δ} ⊆ {x : µ(Bε0 (x)) ≤ δ}, (2.47)

µ{x : µ(Bε (x)) ≤ δ} ≤ µ{x : µ(Bε0 (x)) ≤ δ} ≤ ε0 < ε,

and we are done.

¤

The next result states basic properties of the map δ 7→ vδ . Lemma 2.3.7 (Properties of vδ (·)). Fix X ∈ M. The map which sends δ ≥ 0 to vδ (X ) is non-decreasing, right-continuous and bounded by 1. Moreδ→0

over, vδ (X ) −→ 0. Proof. The first three properties are trivial. For the forth, fix ε > 0, and let X = (X, r, µ) ∈ M. Since X is complete and separable there exists a compact set Kε ⊆ X with µ(Kε ) > 1 − ε (see [EK86], Lemma 3.2.1). In particular, Kε can be covered by finitely many balls A1 , ..., ANε of radius ε/2 and positive µ-mass. Choose δ such that © ª (2.48) 0 < δ < min µ(Ai ) : 1 ≤ i ≤ Nε . Then (2.49)

¢ © ª ¡ [Nε Ai µ x ∈ X : µ(Bε (x)) > δ ≥ µ i=1

≥ µ(Kε ) > 1 − ε.

Therefore, by definition and Lemma 2.3.6, vδ (X ) ≤ ε, and since ε was chosen arbitrary, the assertion follows. ¤

62

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

The following proposition states continuity properties of µ ˆ(·), w· and vδ (·). The reader should have in mind that we finally prove with Theorem 2.5.1 in Section 2.5 that the Gromov-weak and the Gromov-Prohorov topology are the same. Proposition 2.3.8 (Continuity properties of µ ˆ(·), w· and vδ (·)). (i) The map X 7→ µ ˆX is continuous with respect to the Gromov-weak topology on M and the weak topology on M1 (M1 ([0, ∞))). (ii) The map X 7→ µ ˆX is continuous with respect to the GromovProhorov topology on M and the weak topology on M1 (M1 ([0, ∞))). (iii) The map X 7→ wX is continuous with respect to both the Gromovweak and the Gromov-Prohorov topology on M and the weak topology on M1 ([0, ∞)). n→∞ (iv) Let X , X1 , X2 , ... in M such that µ ˆXn =⇒ µ ˆX and δ > 0, where here =⇒ means weak convergence on M1 (M1 ([0, ∞))). Then (2.50)

lim sup vδ (Xn ) ≤ vδ (X ). n→∞

The proof of Parts (i) and (ii) of Proposition 2.3.8 are based on the notion of moment measures. Definition 2.3.9 (Moment measures of µ ˆX ). For X = (X, r, µ) ∈ M and k ∈ N, define the k th moment measure µ ˆkX ∈ M1 ([0, ∞)k ) of µ ˆX by Z (2.51) µ ˆkX (d(r1 , ..., rk )) := µ ˆX (dν) ν ⊗k (d(r1 , ..., rk )). Remark 2.3.10 (Moment measures determine µ ˆX ). Observe that for all k ∈ N, (2.52)

µ ˆkX (A1 × ... × Ak ) © ª = µ⊗k+1 (u0 , u1 , ..., uk ) : r(u0 , u1 ) ∈ A1 , ..., r(u0 , uk ) ∈ Ak .

By Theorem 16.16 of [Kal02], the moment measures µ ˆkX , k = 1, 2, ... determine µ ˆX uniquely. Moreover, weak convergence of random measures is equivalent to convergence of all moment measures. ¤ Proof of Proposition 2.3.8. (i) Take X , X1 , X2 , ... in M such that (2.53)

n→∞

Φ(Xn ) −−−→ Φ(X ),

k+1 for all Φ ∈ Π. For k ∈ N, consider all φ ∈ Cb ([0, ∞)( 2 ) ) which depend on (rij )0≤i 0 is such that ε > vδ (X ) and © ª (2.58) µ ˆX ν ∈ M1 ([0, ∞)) : ν([0, ε]) = δ = 0. Then by Lemmata 2.3.5(ii) and 2.3.6, © ª (2.59) µ ˆX ν ∈ M1 ([0, ∞)) : ν([0, ε]) ≤ δ < ε,

64

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

and the set {ν ∈ M1 ([0, ∞)) : ν([0, ε]) ≤ δ} is a µ ˆX -zero set in M1 ([0, ∞)). Hence by the Portmanteau Theorem (see, for example, Theorem 3.3.1 in [EK86]), © ª lim µ ˆXn ν ∈ M1 ([0, ∞)) : ν([0, ε]) ≤ δ n→∞ (2.60) © ª =µ ˆX ν ∈ M1 ([0, ∞)) : ν([0, ε]) ≤ δ < ε. That is, we have vδ (Xn ) < ε, for all but finitely many n, by (2.50). Therefore we find that lim supn→∞ vδ (Xn ) < ε. This holds for every ε > vδ (X ) satisfying (2.59), and and hence for a set of ε > vδ (X ) for which vδ (X ) is a limit point, so we are done. ¤ The following estimate will be used in the proofs of the pre-compactness characterization given in Theorem 2.4.1 and of Part (i) of Lemma 2.9.3. Lemma 2.3.11. Let δ > 0, ε ≥ 0, and X = (X, r, µ) ∈ M. If vδ (X ) < ε, then there exists N < b 1δ c and points x1 , ..., xN ∈ X such that the following hold. N ¡ ¢ ¡S ¢ • For i = 1, ..., N , µ Bε (xi ) > δ, and µ B2ε (xi ) > 1 − ε. i=1 ¢ ¡ • For all i, j = 1, ..., N with i 6= j, r xi , xj > ε. Proof. Consider the set D := {x ∈ X : µ(Bε (x)) > δ}. Since vδ (X ) < ε, Lemma 2.3.6 implies that µ(D) > 1 − ε. Take a maximal 2ε separated net {x1 , ..., xN } ⊆ D, i.e., (2.61)

D⊆

N [

B2ε (xi ),

i=1

and for all i 6= j, (2.62)

r(xi , xj ) > 2ε,

while adding a further point to D would destroy (2.62). Such a net exists in every metric space (see, for example, in [BBI01], p. 278). Since N N ³[ ´ X ¡ ¢ (2.63) 1≥µ Bε (xi ) = µ Bε (xi ) > N δ, i=1

N
0 there exists Nε ∈ N such that for all X = (X, r, µ) ∈ Γ there is Xε,X ⊆ X with ¡ a subset ¢ – µ Xε,X ≥ 1 − ε, – Xε,X can be covered by at most Nε balls of radius ε, and – Xε,X has diameter at most Nε . (d) For all ε > 0 and X := (X, rX , µX ) ∈ Γ there exists a compact subset ¡Kε,X ¢⊆ X with – µ Kε,X ≥ 1 − ε, and – the family Kε := {Kε,X ; X ∈ Γ} is pre-compact in (Xc , dGH ). Remark 2.4.2. If Γ = {X1 , X2 , ...} then we can replace sup by lim sup in (2.64). ¤ Remark 2.4.3. (i) In the space of compact metric spaces equipped with a probability measure with full support, Proposition 1.11.5 states that Condition (d) is sufficient for pre-compactness. (ii) Starting with Theorem 2.4.1, (b) characterizes tightness for the stronger topology given in [Stu06] based on certain L2 -Wasserstein metrics if one requires in addition uniform integrability of sampled mutual distance. Similarly, (b) characterizes tightness in the space of measure preserving isometry classes of metric spaces equipped with a finite measure (rather than a probability measure) if one requires in addition tightness of the family of total masses. ¤ Example 2.4.4. In the following we illustrate the two requirements for a family in M to be pre-compact which are given in Theorem 2.4.1 by two counter-examples. (i) Consider the isometry classes of the metric measure spaces Xn := ({1, 2}, rn (1, 2) = n, µn {1} = µn {2} = 12 ). A potential limit object would be a metric space with masses 12 within distance infinity. This clearly does not exist.

66

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

Indeed, the family {wXn = 12 δ0 + 12 δn ; n ∈ N} is not tight, and hence {Xn ; n ∈ N} is not pre-compact in M by Condition (ii) of Theorem 2.4.1. (ii) Consider the isometry classes of the metric measure spaces Xn = (Xn , rn , µn ) given for n ∈ N by n

n

(2.65)

Xn := {1, ..., 2 },

−n

rn (x, y) := 1{x 6= y},

µn := 2

2 X

δi ,

i=1

i.e., Xn consists of 2n points of mutual distance 1 and is equipped with a uniform measure on all points. •...... 1 .... ... .. ... ... ... ... ... ... ... ... ... ... ... ...

2

4

.... ... .. ... ... .. .......................................................................................... .. ... ... ... ... ... ... ...

1• 4 1

•2 X1

(2.66)

•...... 1

• 14

1

•4 X2

1 •........ . 8 ..............

•1

•1

.. .... ..... ... ..... ..... ... ..... ..... . . . . . . ..... .. ..... .... ..... ..... . ..... ..... .... ........ ..... .. ..... ...... ..... ................................................................................................... ... ..... .. ..... ..... .. ..... ..... ... ......... ..... ..... . . . .... . ..... .. ..... ... ..... ..... ..... ... ..... ..... . . . . ..... . . . .... ..... . . . . . .. . .

8

1• 8 1 8•

8

•1

1

•8 X3

8

···

1

•8

A potential limit object would consist of infinitely many points of mutual distance 1 with a uniform measure. That means we would need the uniform distribution on a non-compact set which does not exist. Indeed, notice that for δ > 0, ( 0, δ < 2−n , vδ (Xn ) = 1, δ ≥ 2−n , so supn∈N vδ (Xn ) = 1, for all δ > 0. Hence {Xn ; n ∈ N} does not fulfil Condition (ii) of Theorem 2.4.1, and is therefore not precompact. ¤

Proof of Theorem 2.4.1. As before, we abbreviate X = (X, rX , µX ). We prove four implications giving the statement. (a) ⇒ (b). Assume that Γ ∈ M is pre-compact in the Gromov-Prohorov topology. © ª To show that w(X ); X ∈ Γ is tight, consider a sequence X1 , X2 , ... in Γ. Since Γ is relatively compact by assumption, there is a converging k→∞ subsequence, i.e., we find X ∈ M such that dGPr (Xnk , X ) −→ 0 along a k→∞

suitable subsequence (nk )k∈N . By Part (iii) of Proposition 2.3.8, wXnk =⇒ © ª wX . As the sequence was chosen arbitrary it follows that w(X ); X ∈ Γ is tight. The second part of the assertion in (b) is by contradiction. Assume that vδ (X ) does not converge to 0 uniformly in X ∈ Γ, as δ → 0. Then we find

2.4. COMPACT SETS IN M

67

an ε > 0 such that for all n ∈ N there exists sequences (δn )n∈N converging to 0 and Xn ∈ Γ with (2.67)

vδn (Xn ) ≥ ε.

By assumption, there is a subsequence {Xnk ; k ∈ N}, and a metric mea¡ k→∞ sure space X ∈ Γ such that dGPr Xnk , X ) −→ 0. By Parts (ii) and (iv) of Proposition 2.3.8, we find that lim supk→∞ vδnk (Xnk ) = 0 which contradicts (2.67). (b) ⇒ (c). By assumption, for all ε > 0 there are C(ε) with ¡ ¢ (2.68) sup wX [C(ε), ∞) < ε, X ∈Γ

and δ(ε) such that (2.69)

sup vδ(ε) (X ) < ε.

X ∈Γ

Set

© ¡ 0 Xε,X := x ∈ X : µX B

(2.70)

¢

ε2 (x) C( 4 )

ª > 1 − ε/2 .

0 ) > 1 − ε/2. If this were not the case, there We claim that µX (Xε,X would be X ∈ Γ with ¢ © ª ¡ 0 0 1 2 wX [C( 14 ε2 ); ∞) = µ⊗2 X (x, x ) ∈ X × X : rX (x, x ) ≥ C( 4 ε ) ª © 0 0 ≥ µ⊗2 / Xε,X , x0 ∈ / B ε2 (x) X (x, x ) : x ∈ C( 4 )

ε 0 ) ≥ µX ({Xε,X 2 ε2 ≥ , 4 0 which contradicts (2.68). Furthermore, the diameter of Xε,X is bounded

(2.71)

2

0 by 3C( ε4 ). Indeed, otherwise we would find points x, x0 ∈ Xε,X with 0 B ε2 (x) ∩ B ε2 (x ) = ∅, which contradicts that C( 4 )

(2.72)¡ µX B

C( 4 )

ε2 (x) C( 4 )

¢

0 ε2 (x ) C( 4 )

∩B

¡ ≥ 1 − µX {B

¢

ε2 (x) C( 4 )

¡ − µX {B

¢

0 ε2 (x ) C( 4 )

> 1 − ε. By Lemma 2.3.11, for all X = (X, rX , µX ) ∈ Γ, we can choose points 1 x1 , ..., xNεX ∈ X with NεX ≤ N (ε) := b δ(ε/2) c, rX (xi , xj ) > ε/2, 1 ≤ i < j ≤ ¢ ¡ X S N ε NεX , and with µX i=1 Bε (xi ) > 1 − ε/2. Set NεX

(2.73)

Xε,X :=

0 Xε,X



[

i=1

Bε (xi ).

68

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

Then µX (Xε,X ) > 1 − ε. In addition, Xε,X can be covered by at most 2 0 N (ε) balls of radius ε and Xε,X has diameter at most 3C( ε4 ), so the same is true for Xε,X . (c) ⇒ (d). Fix ε > 0, and set εn := ε2−(n+1) , for all n ∈ N. By assumption we may choose for each n ∈ N, Nεn ∈ N such that¡ for all¢ X ∈ Γ there is a subset Xεn ,X ⊆ X of diameter at most Nεn with µ Xεn ,X ≥ 1 − εn , and such that Xεn ,X can be covered by at most Nεn balls of radius εn . Without loss of generality we may assume that all {Xεn ,X ; n ∈ N, X ∈ Γ} are closed. Otherwise we just take their closure. For every X ∈ Γ take compact sets Kεn ,X ⊆ X with µX (Kεn ,X ) > 1 − εn . Then the set (2.74)

Kε,X :=

∞ \ ¡ ¢ Xεn ,X ∩ Kεn ,X n=1

is compact since it is the intersection of a compact set with closed sets, and (2.75)

∞ X ¡ ¢ µX (Kε,X ) ≥ 1 − µX ({Xεn ,X ) + µX ({Kεn ,X ) > 1 − ε. n=1

Consider (2.76)

© ª Kε := Kε,X ; X ∈ Γ .

To show that Kε is pre-compact we use the pre-compactness criterion given in Proposition 1.4.1, i.e., we have to show that Kε is uniformly totally bounded. This means that the elements of Kε have bounded diameter and for all ε0 > 0 there is a number Nε0 such that all elements of Kε can be covered by Nε0 balls of radius ε0 . By definition, Kε,X has diameter at most Nε1 . So, take ε0 < ε and n large enough for εn < ε0 . Then Xεn ,X as well as Kε,X can be covered by Nεn balls of radius ε0 . So Kε is pre-compact in (Xc , dGH ). (d) ⇒ (a). The proof is in two steps. Assume first that all metric spaces (X, rX ) such that there is µX ∈ M1 (X) with (X, rX , µX ) ∈ Γ are compact, and that the family {(X, rX ) : (X, rX , µX ) ∈ Γ} is pre-compact in the Gromov-Hausdorff topology. Under these assumptions we can choose for every sequence in Γ a subsequence (Xm )m∈N , Xm = (Xm , rXm , µXm ), and a metric space (X, rX ), such that (2.77)

m→∞

dGH (X, Xm ) −→ 0.

By Theorem 1.3.1, there are a compact metric space (Z, rZ ) and isometric embeddings ϕX , ϕX1 , ϕX2 , ... from supp(µX ), supp(µXm ), supp(µX2 ), ..., ¡ ¢ m→∞ respectively, to Z, such that dH ϕX (supp(µX )), ϕXm (supp(µXm )) −→ 0. Hence the set (ϕXm )∗ µXm is pre-compact in M1 (Z) equipped with the weak topology. Therefore (ϕXm )∗ µXm has a converging subsequence, and (a) follows in this case.

2.5. GROMOV-PROHOROV AND GROMOV-WEAK TOPOLOGY COINCIDE

69

In the second step we consider the general case. Let εn := 2−n , fix for every X ∈ Γ and every n ∈ N, x ∈ Kεn ,X . Put (2.78) and let

µX,n (·) := µX (· ∩ Kεn ,X ) + (1 − µX (Kεn ,X ))δx (·) Xn

(2.79)

:= (X, rX , µX ,n ). By construction, for all X ∈ Γ, ¡ ¢ dGPr X n , X ≤ εn ,

and µX,n is supported by Kεn ,X . Hence by the first step, Γn := {X n ; X ∈ Γ} is pre-compact in Xc equipped with the Gromov-Hausdorff topology, for all n ∈ N. We can therefore find a converging subsequence in Γn , for all n. By a diagonal argument we find a subsequence (Xm )m∈N with Xm = n) (Xm , rXm , µXm ) such that (Xm m∈N converges for every n ∈ N to some metric measure space Zn . Pick a subsequence such that for all n ∈ N and m ≥ n, ¡ n ¢ (2.80) dGPr Xm , Zn ≤ εm . Then (2.81)

¡ n n¢ dGPr Xm , Xm0 ≤ 2εn ,

for all m, m0 ≥Pn. We conclude that (Xn )n∈N is a Cauchy sequence in (M, dGPr ) since n≥1 εn < ∞. Indeed, ¢ ¡ dGPr Xn , Xn+1 ¢ ¡ ¢ ¡ ¢ ¡ n n (2.82) ≤ dGPr Xn , Xnn + dGPr Xnn , Xn+1 + dGPr Xn+1 , Xn+1 ≤ 4εn . Since (M, dGPr ) is complete, this sequence converges and we are done. ¤ 2.5. Gromov-Prohorov and Gromov-weak topology coincide In this section we show that the topologies induced by convergence of polynomials and convergence in the Gromov-Prohorov metric coincide. This implies that the characterizations of compact subsets of M in the Gromovweak topology are covered by the corresponding characterizations with respect to the Gromov-Prohorov topology given in Theorem 2.4.1 and Proposition 2.8.5, respectively. Recall from Definition 2.1.3 the distance matrix distribution ν X of X ∈ M. Theorem 2.5.1. Let X , X1 , X2 , ... ∈ M. The following are equivalent: (a) All polynomials Φ ∈ Π converge, i.e., (2.83)

n→∞

Φ(Xn ) −−−→ Φ(X ).

(b) The Gromov-Prohorov metric converges, i.e., ¡ ¢ n→∞ (2.84) dGPr Xn , X −−−→ 0.

70

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES N

(c) The distance matrix distributions converge weakly in M1 (R(2 ) ), i.e., (2.85)

ν Xn =⇒ ν X . n→∞

n→∞

Proof. (a) ⇒ (b). Assume that for all Φ ∈ Π, Φ(Xn ) −−−→ Φ(X ). It is enough to show that the sequence (Xn )n∈N is pre-compact with respect to the Gromov-Prohorov topology, since by Proposition 2.1.8, this would imply that all limit points coincide and equal X . We need to check the two conditions guaranteeing pre-compactness given by Part (b) of Theorem 2.4.1. By Parts (iii) of Proposition 2.3.8, the map X 7→ wX is continuous with respect to the Gromov-weak topology. Hence, the family {wXn ; n ∈ N} is tight. δ→0 In addition, lim supn→∞ vδ (Xn ) ≤ vδ (X ) −−−→ 0, by Parts (i) and (iv) of Proposition 2.3.8. By Remark 2.4.2, the latter implies (2.64), and we are done. (b) ⇒ (a). Let X = (X, rX , µX ), X1 = (X1 , r1 , µ1 ), X2 = (X2 , r2 , µ2 ), .... be in M. By Corollary 2.2.6 there are a complete and separable metric space (Z, rZ ) and isometric embeddings ϕ, ϕ1 , ϕ2 ,... from (X, rX ), (X1 , r1 ), (X2 , r2 ), ..., respectively, to (Z, rZ ) such that (ϕn )∗ µn converges weakly to ϕ∗ µX on (Z, rZ ). Assume that Φn,φ ∈ Π for some n ∈ N and n n φ ∈ Cb ([0, ∞)( 2 ) ). Define the continuous map rˆ : Z n → [0, ∞)( 2 ) by rˆ(z1 , ..., zn ) := (rZ (zi , zj ))1≤i r12 ∨ r13 , r13 > r12 ∨ r23 . By the Portmanteau Theorem (compare, for example, Theorem 3.3.1 in [EK86]), ν X (A) = limn→∞ ν U n (A) = 0. Thus, (1.14) holds for µ⊗3 -all triples (u, v, w) ∈ X 3 . In other words, X is ultra-metric. ¤

2.7. Compact metric measure spaces A metric measure space (X, r, µ) is called compact if and only if the metric space (supp(µ), r|supp(µ) ) is compact. Define © ª (2.89) Mc := X ∈ M : X is a compact measure space . Remark 2.7.1 (Mc is not closed). (i) If X = (X, r, µ) is a finite metric measure space, i.e, #supp(µ) < ∞, then X ∈ Mc . (ii) Since M is separable, every element X ∈ M can be approximated by a sequence of finite metric measure spaces. Hence the sub-space Mc is not closed. ¤ To be in a position to characterize pre-compactness in Mc , recall for X = (X, r, µ) ∈ M the distance distribution wX := r∗ µ⊗2 from Definition 2.3.1(i). Proposition 2.7.2 (Pre-compactness in Mc ). A set Γ ⊆ Mc is precompact in the Gromov-weak topology on Mc if the following two conditions are satisfied. (i) {wX : X ∈ Γ} is tight in M1 (R+ ). (ii) For all ε > 0 there exists a Nε ∈ N such that for all (X, r, µ) ∈ Γ, the metric space (supp(µ), r) can be covered by Nε balls of radius ε.

72

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

The proof is based on two Lemmata. Recall that for a metric space (X, r) an ε-separated set is a subset X 0 ⊆ X such that r(x0 , y 0 ) ≥ ε, for all x0 , y 0 ∈ X 0 with x0 6= y 0 . Of course, each non-empty metric space has at least one ε-separated set. Lemma 2.7.3 (Relation between ε-balls and ε-separated nets). Fix N ∈ N, a metric space (X, r) with #X ≥ N + 1 and ε > 0. The following hold. (a) If (X, r) can be covered by N balls of radius ε, then (X, r) has no 2ε-separated set of cardinality at least N + 1. (b) If (X, r) has no ε0 -separated set of cardinality at least N + 1, then (X, r) can be covered by N balls of radius ε0 , for all ε0 > 0. S Proof. (a) Assume that x1 , ..., xN ∈ X are such that X = N i=1 Bε (xi ). Choose (N +1) distinct points y1 , ..., yN +1 ∈ X. By the pigeonhole principle, two of the points must fall into the same ball Bε (xi ), for some i = 1, ..., N , and are therefore in distance smaller than ε. Hence {y1 , ..., yN +1 } is not 2εseparated. Since y1 , ..., yN +1 ∈ X were chosen arbitrarily, the claim follows. (b) We proceed by contradiction. Let K be the maximal possible cardinality of an ε-separated set in (X, r). By assumption, K ≤ N . Assume that SεK := {x1 , ..., xK } is an ε-separated set in (X, r). We claim that S 0 0 X= K i=1 Bε0 (xi ), for all ε > ε. Indeed, let ε > ε, and assume, to the con0 trary, that y ∈ X is such that r(y, xi ) ≥ ε , for all i = 1, ..., K, then SεK ∪{y} is an ε-set of cardinality K + 1, which clearly gives the contradiction. ¤ Lemma 2.7.4. Fix ε > 0 and N ∈ N. Let X = (X, r, µ), X 1 = (X1 , r1 , µ1 ), X 2 = (X2 , r2 , µ2 ), ... be elements of M such that X n → X in the Gromov-weak topology, as n → ∞. If (supp(µ1 ), r1 ), (supp(µ2 ), r2 ), ... can be covered by N balls of radius ε then (supp(µ), r) can be covered by N balls of radius 2ε0 , for all ε0 > ε. Proof. Fix ε > 0 and N ∈ N. Since X = (X, r, µ), X 1 = (X1 , r1 , µ1 ), = (X2 , r2 , µ2 ), ... are elements of M such that X n → X in the Gromovweak topology, as n → ∞, Assume that for each n ∈ N, the metric space (supp(µn ), rn ) can be covered by N balls of radius ε. Then by Lemma 2.7.3(a), there is no n ∈ N for which (supp(µn ), rn ) has a 2ε-separated set of cardinality N + 1. Set N +1 B := (2ε, ∞)( 2 ) . Notice that for Y = (Y, rY , µY ) ∈ M, (supp(µY ), rY ) has a 2ε-separated set of cardinality N + 1 if and only if ΦN +1,1B (Y ) 6= 0. Indeed, X2

(2.90)

0 ≤ ΦN +1,1B (X ) = lim inf ΦN +1,1B (X n ) = 0, n→∞

ΦN +1,1B (X )

by Theorem 4.1.3(d), i.e., = 0. By Lemma 2.7.3, (supp(µ), r) can be therefore covered by N balls of radius 2ε0 , for all ε0 > ε. ¤

2.8. DISTRIBUTIONS OF RANDOM METRIC MEASURE SPACES

73

Proof of Proposition 2.7.2. Let a set Γ ⊆ Mc be such that the family {wX : X ∈ Γ} is tight in M1 (R+ ), and for all ε > 0 there is an Nε ∈ N X such that for all X = (X, r, µ) ∈ Γ, there exists SεX := {xX 1 , ..., xNε } ⊆ X SNε such that supp(µ) ⊆ i=1 Bε (xi ). For ε > 0 and X = (X, r, µ) ∈ Γ, put [ (2.91) XX ,ε := Bε (xX i ). x∈SεX ; µ(Bε (x))> Nε

ε

Then µ(XX ,ε ) ≥ 1 − ε, and XX ,ε can also be covered by Nε balls of radius ε. We claim that the set {diam(XX ,ε ); X ∈ Γ} is bounded. To see this, assume to the contrary, that we find a sequence (X n )n∈N in Γ with diam(XX n ,ε ) ≥ n. Then (2.92)

sup w([n − 2ε, ∞)) ≥ X ∈Γ

ε2 > 0, Nε2

for all n ∈ N, which contradicts the tightness assumption of the distance distribution. We can therefore apply Theorem 2.4.1(c) to conclude that Γ is pre-compact in the Gromov-weak topology on M. It is therefore enough to show that all limit points are compact. To see this take X ∈ M and X 1 , X 2 , . . . ∈ Γ such that X n → X in the Gromovweak topology, as n → ∞, and let ε > 0. By Assumption(ii) together with Lemma 2.7.4, (supp(µ), r) can be covered by Nε/3 balls of radius ε. Therefore, X is totally bounded which implies X ∈ Mc , and we are done. ¤ 2.8. Distributions of random metric measure spaces From Theorem 2.1.10 and Definition 2.1.9 we immediately conclude the characterization of weak convergence for a sequence of probability measures on M. Corollary 2.8.1 (Characterization of weak convergence). A sequence (Pn )n∈N in M1 (M) converges weakly if and only if (i) the family {Pn ; n ∈ N} is relatively £ ¤ compact in M1 (M), and (ii) for all polynomials Φ ∈ Π, (Pn Φ )n∈N converges in R. Proof. The “only if” direction is clear, as polynomials are bounded and continuous functions by definition. To see the converse, recall from Lemma 3.4.3 in [EK86] that given a relative compact sequence of probability measures, each separating family of bounded continuous functions is convergence determining. ¤ While Condition (ii) of the characterization of convergence given in Corollary 2.8.1 can be checked in particular examples, we still need a manageable characterization of tightness on M1 (M) which we can conclude from Theorem 2.4.1 together with Theorem 2.5.1. It will be given in terms of the distance distribution and the modulus of mass distribution.

74

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

Theorem 2.8.2 (Characterization of tightness). A set A ⊆ M1 (M) is tight if and only if the following holds: (i) The family {P[wX ] : P ∈ A} is tight in M1 (R). (ii) For all ε > 0 there exist a δ = δ(ε) > 0 such that £ ¤ (2.93) sup P vδ (X ) < ε. P∈A

Remark 2.8.3. If A = {P1 , P2 , ...} then we can replace sup by lim sup in (2.93). ¤ We will illustrate Theorem 2.8.2 with the example of Λ-coalescent measure trees in Chapter 4.1, with examples of trees corresponding to spatially structured coalescents in Section 4.2, their scaling limit in Sections 4.3 and 4.4 and of evolving coalescents in Chapter 4. Remark 2.8.4. Starting with Theorem 2.8.2 one characterizes easily tightness for the stronger topology given in [Stu06] based on certain L2 Wasserstein metrics if one requires in addition to (i) and (ii) uniform integrability of sampled mutual distance. Similarly, with Theorem 2.8.2 one characterizes tightness in the space of measure preserving isometry classes of metric spaces equipped with a finite measure (rather than a probability measure) if one requires in addition tightness of the family of total masses (compare, also with Remark 2.4.3). ¤ In Theorem 2.4.1 we have given a characterization for relative compactness in M with respect to the Gromov-Prohorov topology. This characterization extends to the following tightness characterization in M1 (M) which is equivalent to Theorem 2.8.2, once we have shown the equivalence of the Gromov-Prohorov and the Gromov-weak topology in Theorem 2.5.1 in Section 2.9. Proposition 2.8.5 (Tightness). A set A ⊆ M1 (M) is tight with respect to the Gromov-weak topology on M if and only if for all ε > 0 there exist δ > 0 and C > 0 such that £ ¤ (2.94) sup P vδ (X ) + wX ([C; ∞)) < ε. P∈A

Proof. For the “only if” direction assume that A is tight and fix ε > 0. By definition, we find a compact set Γε in (M, dGPr ) such that inf P∈A P(Γε ) > 1 − ε/4. Since Γε is compact there are, by part (b) of Theorem 2.4.1, δ = δ(ε) > 0 and C = C(ε) > 0 such that vδ (X ) < ε/4 and wX ([C, ∞)) < ε/4, for all X ∈ Γε . Furthermore both vδ (·) and w· ([C, ∞)) are bounded above

2.9. EQUIVALENT METRICS

75

by 1. Hence for all P ∈ A, £ ¤ P vδ (X ) + wX ([C, ∞)) £ ¤ £ ¤ = P vδ (X ) + wX ([C, ∞)); Γε + P vδ (X ) + wX ([C, ∞)); {Γε (2.95) ε ε < + 2 2 = ε. Therefore (2.94) holds. For the “if” direction assume (2.94) is true and fix ε > 0. For all n ∈ N, there are δn > 0 and Cn > 0 such that £ ¤ sup P vδn (X ) + wX ([Cn , ∞)) < 2−2n ε2 . (2.96) P∈A

By Tschebychev’s inequality, we conclude that for all n ∈ N, © ª (2.97) sup P X : vδn (X ) + wX ([Cn , ∞)) > 2−n ε < 2−n ε. P∈A

By the equivalence of (a) and (b) in Theorem 2.4.1 the closure of (2.98)

∞ \ © ª Γε := X : vδn (X ) + wX ([Cn , ∞)) ≤ 2−n ε n=1

is compact. We conclude ¡ ¡ P Γε ) ≥ P Γε ) (2.99)

∞ X © ≥1− P X : vδn (X ) + wX ([Cn , ∞)) >

ε 2n

ª

n=1

> 1 − ε. Since ε was arbitrary, A is tight.

¤

2.9. Equivalent metrics In Section 2.2 we have seen that M equipped with the Gromov-Prohorov metric is separable and complete. In this section we conclude the paper by presenting further metrics (not necessarily complete) which are all equivalent to the Gromov-Prohorov metric and which may be in some situations easier to work with. The Eurandom metric.1 Recall from Definition 2.1.4 the algebra of polynomials, i.e., functions which evaluate distances of finitely many points sampled from a metric measure space. By Proposition 2.1.8, polynomials separate points in M. Consequently, two metric measure spaces are different if and only if the distributions of sampled finite subspaces are different. 1When the authors of [GPW06a] first discussed how to metrize the Gromov-weak topology the Eurandom metric came up. Since the discussion took place during a meeting at Eurandom, we decided to name the metric accordingly.

76

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

We therefore define (2.100) ¡ ¢ © dEur X , Y := inf inf ε > 0 : µ ˜⊗2 {(x, y), (x0 , y 0 ) ∈ (X × Y )2 : µ ˜

ª |rX (x, x0 ) − rY (y, y 0 )| ≥ ε} < ε ,

where the infimum is over all couplings µ ˜ of µX and µY . We will refer to dEur as the Eurandom metric. Not only is dEur a metric on M, it also generates the Gromov-Prohorov topology. Proposition 2.9.1 (Equivalent metrics). The distance dEur is a metric on M. It is equivalent to dGPr , i.e., the generated topology is the Gromovweak topology. Before we prove the proposition we give an example to show that the Eurandom metric is not complete. Example 2.9.2 (Eurandom metric is not complete). Let for all n ∈ N, Xn := (Xn , rn , µn ) as in Example 2.4.4(ii). For all n ∈ N, (2.101) ¡ ¢ dEur Xn , Xn+1 © ª ≤ inf ε > 0 : (µn ⊗ µn+1 )⊗2 {|1{x = x0 } − 1{y = y 0 }| ≥ ε} ≤ ε = 2−(n−1) , i.e., (Xn )n∈N is a Cauchy sequence for dEur which does not converge. Hence (M, dEur ) is not complete. The Gromov-Prohorov metric was shown to be complete, and hence the above sequence is not Cauchy in this metric. Indeed, n→∞ ¢ ¡ (2.102) dGPr Xn , Xn+1 = 2−1 6−→ 0. ¤ To prepare the proof of Proposition 2.9.1, we provide bounds on the introduced “distances”. Lemma 2.9.3 (Equivalence). Let X , Y ∈ M, and δ ∈ (0, 12 ). ¡ ¢ ¡ ¢ (i) If dEur X , Y < δ 4 then dGPr X , Y < 12(2vδ (X ) + δ). (ii) ¡ ¢ ¡ ¢ (2.103) dEur X , Y ≤ 2dGPr X , Y . Proof. (i) The Gromov-Prohorov metric relies on the Prohorov metric of embeddings of µX and µY in M1 (Z) in a metric space (Z, rZ ). This is in contrast to the Eurandom metric which is based on an optimal coupling of the two measures µX and µY without referring to a space of measures over a third metric space. Since we want to bound the Gromov-Prohorov metric in terms of the Eurandom metric the main goal of the proof is to construct a suitable metric space (Z, rZ ).

2.9. EQUIVALENT METRICS

77

The construction proceeds in three steps. We start in Step 1 with finding a suitable ε-net {x1 , ..., xN } in (X, rX ), and show that this net has a suitable corresponding net {y1 , ..., yN } in (Y, rY ). In Step 2 we then verify that these nets have the property that rX (xi , xj ) ≈ rY (yi , yj ) (where the ’≈’ is made precise below) and δ-balls around these nets carry almost all µX - and µY mass. Finally, in Step 3 we will use these nets to define a metric space (Z, rZ ) containing both (X, rX ) and (Y, rY ), and bound the Prohorov metric of the images of µX and µY . Step 1 (Construction of suitable ε-nets¡ in X¢ and Y ). Fix δ ∈ (0, 12 ). Assume that X , Y ∈ M are such that dEur X , Y < δ 4 . By definition, we find a coupling µ ˜ of µX and µY such that © ª ⊗2 µ ˜ (x1 , y1 ), (x2 , y2 ) : |rX (x1 , x2 ) − rY (y1 , y2 )| > 2δ < δ 4 . (2.104) Set ε := 4vδ (X ) ≥ 0. By Lemma 2.3.11, there are N ≤ b 1δ c points x1 , ..., xN ∈ X with pairwise distances at least ε, ¡ ¢ (2.105) µ Bε (xi ) > δ, for all i = 1, ..., N , and ¡ [N ¢ µ Bε (xi ) ≥ 1 − ε.

(2.106) Put D := yi ∈ Y with (2.107)

SN

i=1 Bε (xi ).

i=1

We claim that for every i = 1, ..., N there is

¡ ¢ ¡ ¢ µ ˜ Bε (xi ) × B2(ε+δ) (yi ) ≥ (1 − δ 2 )µX Bε (xi ) .

Indeed, assume the assertion is not true for some 1 ≤ i ≤ N . Then, for all y ∈ Y , ¡ ¢ ¡ ¢ (2.108) µ ˜ Bε (xi ) × {B2(ε+δ) (y) ≥ δ 2 µX Bε (xi ) . which implies that µ ˜⊗2 {(x0 , y 0 ), (x00 , y 00 ) : |rX (x0 , x00 ) − rY (y 0 , y 00 )| > 2δ} (2.109)

≥µ ˜⊗2 {(x0 , y 0 ), (x00 , y 00 ) : x0 , x00 ∈ Bε (xi ), y 00 ∈ / B2(ε+δ) (y 0 )} ≥ µX (Bε (xi ))2 δ 2 > δ4,

by (2.61) and (2.108) which contradicts (2.104). Step 2 (Distortion of {x1 , ..., xN } and {y1 , ..., yn }). Assume that {x1 , ..., xN } and {y1 , ..., yn } are such that (2.105) through (2.107) hold. We claim that then ¯ ¯ ¯rX (xi , xj ) − rY (yi , yj )¯ ≤ 6(ε + δ), (2.110) for all i, j = 1, ..., N . Assume that (2.110) is not true for some pair (i, j). Then for all x0 ∈ Bε (xi ), x00 ∈ Bε (xj ), y 0 ∈ B2(ε+δ) (yi ), and y 00 ∈ B2(ε+δ) (yj ), ¯ ¯ ¯rX (x0 , x00 ) − rY (y 0 , y 00 )¯ > 6(ε + δ) − 2ε − 4(ε + δ) = 2δ. (2.111)

78

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

Then (2.112)© ª µ ˜⊗2 (x0 , y 0 ), (x00 , y 00 ) : |rX (x0 , x00 ) − rY (y 0 , y 00 )| > 2δ © ≥µ ˜⊗2 (x0 , y 0 ), (x00 , y 00 ) : x0 ∈ Bε (xi ), x00 ∈ Bε (xj ), y 0 ∈ B2(ε+δ) (yi ), y 00 ∈ B2(ε+δ) (yj )} ¡ ¢ ¡ ¢ =µ ˜ Bε (xi ) × B2(ε+δ) (yi ) µ ˜ Bε (xj ) × B2(ε+δ) (yj ) > δ 2 (1 − δ)2 > δ4, where we used (2.107), (2.61) and δ < 21 . Since (2.112) contradicts (2.104), we are done. Step 3 (Definition of a suitable metric space (Z, rZ )). Define the relation R := {(xi , yi ) : i = 1, ..., N } between X and Y and consider the metric R space (Z, rZ ) defined by Z := X t Y and rZ := rXtY , given in Lemma 1.3.2. Choose isometric embeddings ϕX and ϕY from (X, rX ) and (Y, rY ), respectively, into (Z, rZ ). As dis(R) ≤ 6(ε + δ) (see (1.3) for definition), by Lemma 1.3.2, rZ (ϕX (xi ), ϕY (yi )) ≤ 3(ε + δ), for all i = 1, ..., N . If x ∈ X and y ∈ Y are such that rZ (ϕX (x), ϕY (y)) ≥ 6(ε + δ) and rX (x, xi ) < ε then rY (y, yi ) ≥ rZ (ϕX (x), ϕY (y)) − rX (x, xi ) − rZ (ϕX (xi ), ϕY (yi )) ≥ 6(ε + δ) − ε − 3(ε + δ)

(2.113)

≥ 2(ε + δ) and so for all x ∈ Bε (xi ), © ª (2.114) y ∈ Y : rZ (ϕX (x), ϕY (y)) ≥ 6(ε + δ) ⊆ {B2(ε+δ) (yi ). Let µ ˜ be the probability measure on Z × Z defined by µ ˆ(A × B) := −1 −1 µ ˜(ϕX (A) × ϕY (B)), for all A, B ∈ B(Z). Therefore, by (2.110), (2.114), (2.107) and as N ≤ b1/δc, (2.115) µ ˆ{(z, z 0 ) : rZ (z, z 0 ) ≥ 6(ε + δ)} N ³[ ´ ¡ ≤µ ˆ ϕX ({D) × ϕY (Y )) + µ ˆ Bε (ϕX (xi )) × {B2(ε+δ) (ϕY (yi )) i=1

≤ε+

N X

µX (Bε (xi ))δ 2

i=1

≤ ε + δ. Hence, using (2.11) and ε = 4vδ (X ), ¢ ¡ ¢ (Z,r ) ¡ (2.116) dPr Z (ϕX )∗ µX , (ϕY )∗ µY ≤ 6 4vδ (X ) + 2δ , ¡ ¢ ¡ ¢ and so dGPr X , Y ≤ 12 2vδ (X ) + δ , as claimed.

2.9. EQUIVALENT METRICS

79

¡ ¢ (ii) Assume that dGPr X , Y < δ. Then, by definition, there exists a metric space (Z, rZ ), isometric embeddings ϕX and ϕY between supp(µX ) and supp(µY ) and Z, respectively, and a coupling µ ˆ of (ϕX )∗ µX and (ϕY )∗ µY such that (2.117)

© ª µ ˆ (z, z 0 ) : rZ (z, z 0 ) ≥ δ < δ.

Hence with¡ the special choice ˜ of µX and µY defined by ¢ of a coupling µ µ ˜(A × B) = µ ˆ ϕX (A) × ϕY (B) , for all A ∈ B(X) and B ∈ B(Y ),

(2.118)

© µ ˜⊗2 (x, y), (x0 , y 0 ) ∈ X × Y : |rX (x, x0 ) − rY (y, y 0 )| ≥ 2δ} © ≤µ ˜⊗2 (x, y), (x0 , y 0 ) ∈ X × Y : ª rZ (ϕX (x), ϕY (y)) ≥ δ or rZ (ϕX (x0 ), ϕY (y 0 )) ≥ δ < 2δ.

¡ ¢ This implies that dEur X , Y < 2δ.

¤

δ→0

Proof of Proposition 2.9.1. Observe that by Lemma 2.3.7, vδ (X ) −→ 0. So Lemma 2.9.3 implies the equivalence of dGPr and dEur once we have shown that dEur is indeed a metric. The symmetry is clear. If X , Y ∈ M are such that dEur (X , Y) = 0, by equivalence, dGPr (X , Y) = 0 and hence X = Y. For the triangle inequality, let Xi = (Xi , ri , µi ) ∈ M, i = 1, 2, 3, be such that dEur (X1 , X2 ) < ε and dEur (X2 , X3 ) < δ for some ε, δ > 0. Then there exist couplings µ ˜1,2 of µ1 and µ2 and µ ˜2,3 of µ2 and µ3 with (2.119)

© ª 0 0 0 0 µ ˜⊗2 1,2 (x1 , x2 ), (x1 , x2 ) : |r1 (x1 , x1 ) − r2 (x2 , x2 )| ≥ ε < ε

and (2.120)

© ª 0 0 0 0 µ ˜⊗2 2,3 (x2 , x3 ), (x2 , x3 ) : |r2 (x2 , x2 ) − r3 (x3 , x3 )| ≥ δ < δ.

Introduce the transition kernel K2,3 from X2 to X3 defined by (2.121)

µ ˜2,3 (d(x2 , x3 )) = µ2 (dx2 )K2,3 (x2 , dx3 ).

which exists since X2 and X3 are Polish. Using this kernel, define a coupling µ ˜1,3 of µ1 and µ3 by Z (2.122)

µ ˜1,3 (d(x1 , x3 )) :=

X2

µ ˜1,2 (d(x1 , x2 ))K2,3 (x2 , dx3 ).

80

2. STATE SPACES II: THE SPACE OF METRIC MEASURE TREES

Then (2.123) © ª ⊗2 µ ˜1,3 (x1 , x3 ), (x01 , x03 ) : |r1 (x1 , x01 ) − r3 (x3 , x03 )| ≥ ε + δ Z = µ ˜1,2 (d(x1 , x2 ))˜ µ1,2 (d(x01 , x02 ))K2,3 (x2 , dx3 )K2,3 (x02 , dx03 ) X12 ×X22 ×X32

© ª 1 |r1 (x1 , x01 ) − r3 (x3 , x03 )| ≥ ε + δ

Z ≤

X12 ×X22 ×X32

µ ˜1,2 (d(x1 , x2 ))˜ µ1,2 (d(x01 , x02 ))K2,3 (x2 , dx3 )K2,3 (x02 , dx03 )

¡ ¢ 1{|r1 (x1 , x01 ) − r2 (x2 , x02 )| ≥ ε} + 1{|r2 (x2 , x02 ) − r3 (x3 , x03 )| ≥ δ} © 0 0 0 =µ ˜⊗2 1,2 (x1 , x2 ), (x1 , x2 ) : |r1 (x1 , x1 ) − r2 (x2 , x2 )| ≥ ε} ª 0 0 0 +µ ˜⊗2 2,3 {(x2 , x3 ), (x2 , x3 ) : |r2 (x2 , x2 ) − r3 (x3 , x3 )| ≥ δ 0 for 0 < t < ζ(e), .   and e(t) = 0 for t ≥ ζ(e) For ` > 0, let U ` := {e ∈ U : ζ(e) = `}. We associate each e ∈ U 1 with a metric tree as follows. Define an equivalence relation ∼e on [0, 1] by letting (3.3)

u1 ∼e u2 ,

iff

e(u1 ) =

inf

u∈[u1 ∧u2 ,u1 ∨u2 ]

e(u) = e(u2 ).

Consider the pseudo-metric on [0, 1] defined by (3.4)

rTe (u1 , u2 ) := e(u1 ) − 2

inf

u∈[u1 ∧u2 ,u1 ∨u2 ]

e(u) + e(u2 ),

3.1. EXCURSIONS

85

¯ ¯ which becomes a true metric on the quotient space Te := R+ ¯∼e = [0, 1]¯∼e . Lemma 3.1.3. For each e ∈ U 1 the metric space (Te , rTe ) is a compact R-tree.

Proof. It is straightforward to check that the quotient map from [0, 1] onto Te is continuous with respect to rTe . Thus (Te , rTe ) is path-connected and compact as the continuous image of a metric space with these properties. In particular, (Te , rTe ) is complete. To complete the proof, it therefore suffices to verify the four point condition (1.13). However, for u1 , u2 , u3 , u4 ∈ Te we have (3.5)

max{rTe (u1 , u3 ) + rTe (u2 , u4 ), rTe (u1 , u4 ) + rTe (u2 , u3 )} ≥ rTe (u1 , u2 ) + rTe (u3 , u4 ),

where strict inequality holds if and only if min

inf

i6=j u∈[ui ∧uj ,ui ∨uj ]

e(u)

½

(3.6) 6∈

inf

u∈[u1 ∧u2 ,u1 ∨u2 ]

e(u),

inf

u∈[u3 ∧u4 ,u3 ∨u4 ]

¾ e(u) . ¤

Conversely, (at least finite) rooted planar trees can be associated with root,lin an excursion. Denote by Tfin the subspace in Troot,lin of finite trees, and fix a speed σ > 0. There is a map ¡ ¢ (3.7) C ·; σ : Troot,lin → CR+ [0, ∞), fin which associates to T ∈ Troot,lin an excursion by “walking around the tree” fin (respecting the order) at speed σ and recording the height profile. (A formal description of the contour process can be found in Section 6.1 in [Pit02].) Typically, one sets the speed σ = 1. However as soon as we later want to take limits of rooted planar trees we will need to increase the speed of the traversal in order to obtain a reasonable compactly supported limit contour path. Remark 3.1.4. If (T, r, ρ, ≤lin ) is such that µT (T ) = ∞ then, of course, we can not walk “around the tree” in finite time with constant speed. However, any compact R-tree T is still isometric to Te for some e ∈ U 1 . To see this, fix a root ρ ∈ T . Recall Rε (T, ρ), the ε-trimming of T with respect to ρ defined in (1.32). Let µ ¯ be a probability measure on T that is equivalent to the length measure µT . Because µT is σ-finite, such a probability measure always exists, but one can construct µ ¯ explicitly as follows: set

86

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

(112)

(112)

¢A (111) ¢ A ¢ A ¢¢AA(11)

(122)

(111) (11)

(121)

(22)

¢

(21) (12)

¢ ¢

(2)

(1)

ρ

(122) (121)

¢ ¢A ¢ A ¢ A¢ ¢

A

A

AA¢

¢

(12)

(1)

¢

(22)

¢¢A

¢A ¢ A

A

A

A

(21)

¢

¢A ¢ A¢

A

ρA

A

A

(2)

¢

A

AA



Figure 3.2. illustrates the mapping of a finite linearly ordered tree to an excursion. H := maxu∈T r(ρ, u), and put µT (R2−1 H (T, ρ) ∩ ·) µT (R2−1 H (T, ρ)) X µT (R −i (T, ρ) \ R2−i+1 H (T, ρ) ∩ ·) + 2−i T 2 H . µ (R2−i H (T, ρ) \ R2−i+1 H (T, ρ))

µ ¯ := 2−1 (3.8)

i≥2

For all 0 < ε < H there is a continuous path fε : [0, 2µT (Rε (T, ρ))] → Rε (T, ρ) T

such that hε defined by hε (t) := r(ρ, fε (t)) belongs to U 2µ (Rε (T,ρ)) (in particular, fε (0) = fε (2µT (Rε (T, ρ))) = ρ), hε is piecewise linear with slopes ±1, and Thε is isometric to Rε (T, ρ). Moreover, these paths may be chosen consistently so that if ε0 ≤ ε00 , then ¡ ¢ (3.9) fε00 (t) = fε0 inf{s > 0 : |{0 ≤ r ≤ s : fε0 (r) ∈ Rε00 (T, ρ)}| > t} , where | · | denotes Lebesgue measure. Now define eε ∈ U µ¯(Rε (T,ρ)) to be the absolutely continuous path satisfying (3.10)

deε (t) dµT dhε (t) =2 (fε (t)) . dt d¯ µ dt

It can be shown that eε converges uniformly to some e ∈ U 1 as ε ↓ 0 and that Te is isometric to T . ¤ Each tree coming from a path in U 1 has a natural weight on it: for e ∈ U 1 , we equip (Te , rTe ) with the weight (3.11)

νTe := q∗ λ

given by the push-forward of Lebesgue measure λ on [0, 1] by the quotion map q which sends an element u ∈ [0, 1] to its equivalence class [u] ∈ Te .

3.2. THE BROWNIAN CONTINUUM RANDOM TREE

87

We finish this section with a remark about the natural length measure on a tree coming from a path. Given e ∈ U 1 and a ≥ 0, let   e(t) = a and, for some ε > 0,   (3.12) Ga := t ∈ [0, 1] : e(u) > a for all u ∈]t, t + ε[,   e(t + ε) = a. denote the countable set of starting points of excursions of the function e above the level a. Then µTe , the length measure on Te , is given by ³Z ∞ X ´ Te (3.13) µ = q∗ da δt . 0

t∈Ga

Alternatively, write (3.14)

© ª Γe := (s, a) : s ∈]0, 1[, a ∈ [0, e(s)[

for the region between the time axis and the graph of e, and for (s, a) ∈ Γe denote by © ª (3.15) s(e, s, a) := sup r < s : e(r) = a and

© ª s¯(e, s, a) := inf t > s : e(t) = a

(3.16)

the start and finish of the excursion of e above level a that straddles time s. Then ³Z ´ ds ⊗ da Te (3.17) µ = q∗ δs(e,s,a) . ¯(e, s, a) − s(e, s, a) Γe s We note that the measure µTe appears in [AS02]. 3.2. The Brownian continuum random tree In this section we will recall the definition of Aldous’s Brownian continuum random tree (Brownian CRT), which can be thought of as a uniformly chosen random weighted compact R-tree. Consider the Itˆo excursion measure for excursions of standard Brownian motion away from 0. This σ-finite measure is defined subject to a normalization of Brownian local time at 0, and we take the usual normalization of local times at each level which makes the local time process an occupation density in the spatial variable for each fixed value of the time variable. The excursion measure is the sum of two measures, one which is concentrated on non-negative excursions and one which is concentrated on non-positive excursions. Let N be the part which is concentrated on non-negative excursions. Thus, in the notation of Section 3.1, N is a σ-finite measure on U , where we equip U with the σ-field U generated by the coordinate maps. √ . Then Define a map v : U → U 1 by e 7→ e(ζ(e)·) ζ(e)

(3.18)

P(Γ) :=

N{v −1 (Γ) ∩ {e ∈ U : ζ(e) ≥ c}} , N{e ∈ U : ζ(e) ≥ c}

Γ ∈ U,

88

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

does not depend on c > 0 (see, for example, Exercise 12.2.13.2 in [RY99]). The probability measure P is called the law of normalized non-negative Brownian excursion. We have © ª dc (3.19) N e ∈ U : ζ(e) ∈ dc = √ 2 2πc3 and, defining Sc : U 1 → U c by √ ¡·¢ (3.20) Sc e := ce c we have Z Z ∞ Z ¡ ¢ dc √ (3.21) N(de) G(e) = P(de) G Sc e 2 2πc3 U 1 0 for a non-negative measurable function G : U → R. Recall from Section 3.1 how each e ∈ U 1 is associated with a weighted compact R-tree (Te , rTe , νTe ), and let the equivalence class [0] play the role of the root ρT2e . Definition 3.2.1 (The Brownian CRT). Denote for 2e ∈ U 1 , the excursion path t 7→ 2e(t). (i) The rooted Brownian CRT is the probability measure on the space (Troot , dGHroot ) that is the push-forward of the normalized excursion measure by the map e 7→ (T2e , rT2e , ρT2e ). (ii) The weighted Brownian CRT is the probability measure on the space (Twt , dGHwt ) that is the push-forward of the normalized excursion measure by the map e 7→ (T2e , rT2e , νT2e ). In the following let PCRT be the weighted Brownian CRT. If no confusion may occur, we refer to it shortly as the Brownian CRT. Remark 3.2.2. The probability measure P is the distribution of an object consisting of Aldous’s Brownian CRT along with a natural measure on this tree (see, for example, [Ald91a, Ald93]). Various combinatorial models of random trees correspond to conditioned critical Galton-Watson process with specific offspring distribution. In particular, the uniform random ordered tree Tn with a fixed number n ∈ N of vertices (other than the root) corresponds to the shifted geometric 1/2 offspring distribution. It has long been known its contour process can be constructed from simple symmetric random walk on the integers, conditioned on the first return to 0 at time 2n. This construction makes it simple to show that ¡ 1 1¢ (3.22) C √ Tn ; =⇒ B ex , n n→∞ n where here ⇒ means weak convergence on C([0, 1]) equipped with the uniform topology. The appearance of 2e rather than e in the definition of P is a consequence of this choice of scaling. The associated probability measure on each realization of the continuum random tree is the measure that arises in this limiting

3.3. ALDOUS’S LINE-BREAKING REPRESENTATION OF THE BROWNIAN CRT 89

construction by taking the uniform probability measure on realizations of the approximating finite trees. The probability measure P can therefore be viewed informally as the “uniform distribution” on (Twt , dGHwt ). We want to point out for general offspring distributions the contour process is not Markovian, and hence it is not profitable to derive (3.22) by proving convergence of this process to the Brownian excursion directly. The motivation for Aldous to develop his abstract notion on convergence of trees presented in [Ald93] was actually to derive the invariance principle for all non-trivial finite variance offspring distributions. ¤ 3.3. Aldous’s line-breaking representation of the Brownian CRT The Brownian CRT can be obtained as an almost sure limit by Aldous’s line-breaking construction. To prepare a result stated in Chapter 5 in this section we recall this representation and reformulate in terms of metric measure spaces and real trees. We mainly follow Section 4 in [Ald91a] (compare also Section in [Eva06]). Put σ0 = τ0 := 0. Let τ := (τn ; n ∈ N) the successive arrival times of an inhomogeneous Poisson process with arrival rate r(t) = t at time t ≥ 0. Let σn := Un τn , where {Un ; n ∈ N} is a family of independent identically distributed uniform random variables on [0, 1] independent of τ . In the following we will refer to τn and σn as the nth cut time and the nth cut point, for each n ∈ N. Informally, we are growing finite trees (that is, trees with finitely many leaves and finite total branch length) in continuous time as follows (at all times t ≥ 0 the procedure will produce a rooted tree Rt with total edge length t) as follows: • Start at time 0 with the 1-tree (that is a line segment with two ends), R0 , of length zero (R0 is “really” the trivial tree that consists of one point only, but thinking this way will help later in Chapter 5). Identify one end of R0 as the root. • Let this line segment grow at unit speed until the first cut time τ1 . • At time τ1 pick the first cut point σ1 uniformly on the segment that has been grown so far. • Between time τ1 and time τ2 , evolve a tree with 3 ends by letting a new branch growing away from the first cut point at unit speed. • Proceed inductively: Given the n-tree (that is, a tree with n + 1 ends), Rτn − , pick the n-th cut point σn uniformly on Rτn − to give an n + 1-tree, Rτn , with one edge of length zero, and for t ∈ [τn , τn+1 [, let Rt be the tree obtained from Rτn by letting a branch grow away from the nth cut point with unit speed. Formally, given (τ, σ), let for each t > 0, Rt = (Rt , rt , µt ) ∈ Mc be defined by letting (3.23)

Rt := [0, t],

90

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

r r R 0

Rτ1 −

r

Rτ1

Rτ2 −

Rτ2

Figure 3.3. illustrates how the real tree-valued process (Rt ; t ≥ 0) evolve. (The bold dots re-present an edge of length zero.) and with N (t) := inf{n ∈ N : t ≤ τn }, (3.24)

N (t) X 1 µt := δτi ∧t . (1 + N (t)) i=0

Furthermore, define a metric rt inductively as follows: if s ∈ [0, τ1 ], let for all x, y ∈ Rs , (3.25)

rs (x, y) := |x − y|.

Assume then that for some n ∈ N we have already defined rs0 for all s0 ∈ [0τn ]. If s ∈]τn , τn+1 ], let for all x, y ∈ Rs ,  |x − y|, if x, y ∈]τn , τn+1 ],    rτn (x, y), if x, y ∈ [0, τn ], (3.26) rs (x, y) := (y − τ ) + r (x, σ ), if x ∈ [0, τn ], y ∈]τn , τn+1 ],  n τn n   (x − τn ) + rτn (y, σn ), if y ∈ [0, τn ], x ∈]τn , τn+1 ]. Notice that this inductive definition extends to a metric on R∞ := [0, ∞). Denote by (R, r) the completion of (R∞ , r∞ ). Theorem 3.3.1 (Line-breaking construction in Xc ). The random tree (R, r) is the Brownian CRT. Recall the space of compact metric measure spaces Mc from Section 2.7. Notice that by Lemma 1.10.2 the metric measure space Rt is compact, for each t ≥ 0. To prepare the proof, we will first show that the sequence (Rt )t≥0 converges in Mc and identify its limit as the weighted Brownian CRT. Proposition 3.3.2 (Line-breaking construction in Mc ). The family of random trees {Rt ; t ≥ 0} converges weakly to the weighted Brownian CRT with respect to the Gromov-weak topology on Mc , as t → ∞. The hardest part is, or course, compactness of the limit. We will rely on the following estimate stated in Proposition 4 in [Ald91a]. Proposition 3.3.3. There exists a K > 0 such that, almost surely, (R, r) can be covered by at most −Kε−2 log ε balls of radius ε, for all ε > 0 sufficiently small.

3.4. CAMPBELL MEASURE FACTS: FUNCTIONALS OF THE BROWNIAN CRT

91

Moreover, we will rely on consistency (see, for example, the proof of Lemma 21 in [Ald93]). Proposition 3.3.4 (Consistency property). The family of random metric spaces {Rn ; n ∈ N} is consistent, i.e., the random tree span({x1 , ..., xk }) ⊗ k equals in distribution (Rτk , rτk ), for all k ∈ N and n ≥ k, µτn↓ -almost surely, where P µ − n1 δx1 µ − n1 k−1 i=1 δxi ⊗↓ k (3.27) µ := µ(dx1 ) ⊗ (dxk ). 1 (dx2 ) ⊗ ... ⊗ (k−1) 1− n 1− n

Proof of Proposition 3.3.2. For existence notice that the family (Rt , rt ) is non-decreasing in t. Hence, for all sufficiently small ε > 0 and t ≥ 0, there exists a finite constant K such that we can cover (Rt , rt ) by at most −Kε−2 log ε balls of radius ε, almost surely, by Proposition 3.3.3. In particular, the family of trees {Rt ; t ≥ 0} satisfies the assumptions of Proposition 2.7.2 and is therefore pre-compact in Mc , almost surely. Uniqueness follows from consistency. In particular, if Φ ∈ Π is a monomial of degree k ≥ 1, then £ ¤ £ ¤ (3.28) P Φ(Rτn ) = P Φ(Rτk ) , for all n ≥ k. Hence we can conclude convergence of {Rt ; t ≥ 0} from Corollary 2.8.1. Moreover, it follows from Corollary 23 in [Ald93] that if R is the Brownian CRT then ¤ £ ¤ £ (3.29) P Φ(R) = P Φ(Rτk ) , for all k ∈ N which identifies the limit as claimed.

¤

Proof of Theorem 3.3.1. If (X, r, µ) is the weighted Brownian CRT then (R, r) equals in distribution (supp(µ), r) by Theorem 3(iii) in [Ald91a], which together with Proposition 3.3.2 gives the claim. ¤ 3.4. Campbell measure facts: Functionals of the Brownian CRT In this section we will conclude from the connection between the Brownian CRT and the Brownian excursion explicit expressions of the expectations of some functionals with respect to P (the “uniform distribution” on (Twt , dGHwt ) as introduced in the end of Section 3.2). We will establish our calculations from what appears to be a novel path decomposition of the standard Brownian excursion. For T ∈ Twt , and ρ ∈ T , recall Rc (T, ρ) from (1.32), and the length measure µT from (1.20). Given (T, d) ∈ Twt and u, v ∈ T , let © ª (3.30) S T,u,v := w ∈ T : u ∈]v, w[ ,

92

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

denote the subtree of T that differs from its closure by the point u, which can be thought of as its root, and consists of points that are on the “other side” of u from v (recall ]v, w[ is the open arc in T between v and w). The main result of this section is the following. Theorem 3.4.1 (Expected height and weight moments). (i) For x > 0, £ © ª¤ P µT ⊗ νT (u, v) ∈ T × T : height(S T,u,v ) > x hZ i =P νT (dv) µT (Rx (T, v)) T

=2

∞ X

nx exp(−n2 x2 /2).

n=1

(ii) For 1 < α < ∞, ·Z ¸ Z ¡ ¢ T T,u,v α P νT (dv) µ (du) height(S ) T

T

¡α + 1¢ =2 ζ(α), αΓ 2 P where, as usual, ζ(α) := n≥1 n−α . (iii) For 0 < p ≤ 1, α+1 2

£ ¤ P µT ⊗ νT {(u, v) ∈ T × T : νT (S T,u,v ) > p} =

s 2(1 − p) . πp

(iv) For 21 < β < ∞, ¡ ¢ ·Z ¸ Z 1 ¡ ¡ T,u,v ¢¢β T − 12 Γ β − 2 P . νT (dv) µ (du) νT S =2 Γ(β) T T Remark 3.4.2. We will apply Theorem 3.4.1 in Section 6 in the proof of Lemma 6.1.2. ¤ The proof of Theorem 3.4.1 will rely on calculations with Ito’s excursion measure. Because of our observation in Section 3.1 that for an excursion e the length measure µTe on the corresponding tree is given by (3.17), we need to understand the decomposition of the excursion e into the excursion above a that straddles s and the “remaining” excursion when e is chosen according to the standard Brownian excursion distribution P and (s, a) is 1 on Γe (see chosen according to the σ-finite measure ds ⊗ da s¯(e,s,a)−s(e,s,a) Figure 3.4). Given an excursion e ∈ U and a level a ≥ 0 write: • ζ(e) := inf{t > 0 : e(t) = 0} for the “length”of e, • `at (e) for the local time of e at level a up to time t,

3.4. CAMPBELL MEASURE FACTS: FUNCTIONALS OF THE BROWNIAN CRT

93

(s,a)

Figure 3.4. The decomposition of the excursion e (top picture) into the excursion eˆs,a above level a that straddles time s (bottom left picture) and the “remaining” excursion eˇs,a (bottom right picture). Rt • e↓a for e time-changed by the inverse of t 7→ 0 ds 1{e(s) ≤ a} (that is, e↓a is e with the sub-excursions above level a excised and the gaps closed up), • `at (e↓a ) for the local time of e↓a at the level a up to time t, • U ↑a (e) for the set of sub-excursion intervals of e above a (that is, an element of U ↑a (e) is an interval I = [gI , dI ] such that e(gI ) = e(dI ) = a and e(t) > a for gI < t < dI ), • N ↑a (e) for the counting measure that puts a unit mass at each point (s0 , e0 ), where, for some I ∈ U ↑a (e), s0 := `agI (e) is the amount of local time of e at level a accumulated up to the beginning of the sub-excursion I and e0 ∈ U is given by ( e(gI + t) − a, 0 ≤ t ≤ dI − gI , (3.31) e0 (t) = 0, t > dI − gI , is the corresponding piece of the path e shifted to become an excursion above the level 0 starting at time 0, • eˆs,a ∈ U and eˇs,a ∈ U , for the subexcursion “above” (s, a) ∈ Γe , that is, ½ e(s(e, s, a) + t) − a, 0 ≤ t ≤ s¯(e, s, a) − s(e, s, a), s,a (3.32) eˆ (t) := 0, t > s¯(e, s, a) − s(e, s, a),

(3.33)

respectively “below” (s, a) ∈ Γe , that is, ½ e(t), 0 ≤ t ≤ s(e, s, a), eˇs,a (t) := e(t + s¯(e, s, a) − s(e, s, a)), t > s(e, s, a). • σsa (e) := inf{t ≥ 0 : `at (e) ≥ s} and τsa (e) := inf{t ≥ 0 : `at (e) > s},

94

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

• e˜s,a ∈ U for e with the interval ]σsa (e), τsa (e)[ containing an excursion above level a excised, that is, ( e(t), 0 ≤ t ≤ σsa (e), s,a (3.34) e˜ (t) := e(t + τsa (e) − σsa (e)), t > σsa (e). The following path decomposition result under the σ-finite measure N is preparatory to a decomposition under the probability measure P, Corollary 3.4.4, that has a simpler intuitive interpretation. Proposition 3.4.3. For non-negative measurable functions F on R+ and G, H on U , Z Z ds ⊗ da N(de) F (s(e, s, a))G(ˆ es,a )H(ˇ es,a ) s ¯ (e, s, a) − s(e, s, a) Γe Z Z ∞ Z 0 = N(de) da N ↑a (e)(d(s0 , e0 )) F (σsa0 (e))G(e0 )H(˜ es ,a ) 0

£ = N[G] N H

Z

ζ

¤ ds F (s) .

0

Proof. The first equality is just a change in the order of integration and has already been remarked upon in Section 3.1. Standard excursion theory (see, for example, [RW00, RY99, Ber96]) says that under N, the random measure e 7→ N ↑a (e) conditional on e 7→ e↓a is a Poisson random measure with intensity measure λ↓a (e)⊗N, where λ↓a (e) is Lebesgue measure restricted to the interval [0, `a∞ (e)] = [0, 2`a∞ (e↓a )]. 0 Note that e˜s ,a is constructed from e↓a and N ↑a (e) − δ(s0 ,e0 ) in the same 0 way that e is constructed from e↓a and N ↑a (e). Also, σsa0 (˜ es ,a ) = σsa0 (e). Therefore, by the Campbell-Palm formula for Poisson random measures (see, for example, Section 12.1 of [DVJ88]), Z Z ∞ Z 0 N(de) da N ↑a (e)(d(s0 , e0 )) F (σsa0 (e))G(e0 )H(˜ es ,a ) 0 Z Z ∞ ¯ i hZ ↑a 0 0 a 0 s0 ,a ¯ ↓a = N(de) da N N (e)(d(s , e )) F (σs0 (e))G(e )H(˜ e )¯e Z

0

Z

hn Z



=

`a ∞ (e)

0

o

F (σsa0 (e))

N(de) da N[G] N ds 0 Z ∞ 0 Z Z ³n o ´ = N[G] da N(de) d`as (e) F (s) H(e) Z0 Z ³n Z ∞ o ´ = N[G] N(de) da d`as (e) F (s) H(e) h Z = N[G] N H

¯ i ¯ H ¯ e↓a

0

ζ

i ds F (s) .

0

¤

3.4. CAMPBELL MEASURE FACTS: FUNCTIONALS OF THE BROWNIAN CRT

95

The next result says that if we pick an excursion e according to the standard excursion distribution P and then pick a point (s, a) ∈ Γe according to the σ-finite length measure corresponding to the length measure µTe on the associated tree Te (see the end of Section 3.1), then the following objects are independent: (a) the length of the excursion above level a that straddles time s, (b) the excursion obtained by taking the excursion above level a that straddles time s, turning it (by a shift of axes) into an excursion eˆs,a above level zero starting at time zero, and then Brownian re-scaling eˆs,a to produce an excursion of unit length, (c) the excursion obtained by taking the excursion eˇs,a that comes from excising eˆs,a and closing up the gap, and then Brownian re-scaling eˇs,a to produce an excursion of unit length, (d) the starting time s(e, s, a) of the excursion above level a that straddles time s rescaled by the length of eˇs,a to give a time in the interval [0, 1]. Moreover, the length in (a) is “distributed” according to the σ-finite measure 1 dρ √ p , 2 2π (1 − ρ)ρ3

(3.35)

0 ≤ ρ ≤ 1,

the unit length excursions in (b) and (c) are both distributed as standard Brownian excursions (that is, according to P), and the time in (d) is uniformly distributed on the interval [0, 1]. Recall from (3.20) the Brownian re-scaling map Sc : U 1 → U c . Corollary 3.4.4. For non-negative measurable functions F on R+ and K on U × U , Z Z ³ s(e, s, a) ´ ds ⊗ da F P(de) K(ˆ es,a , eˇs,a ) s,a ) s ¯ (e, s, a) − s(e, s, a) ζ(ˇ e Γe Z nZ 1 oZ ds ⊗ da = du F (u) P(de) K(ˆ es,a , eˇs,a ) s ¯ (e, s, a) − s(e, s, a) 0 Γe Z nZ 1 o 1 Z 1 dρ p = du F (u) √ P(de0 ) ⊗ P(de00 ) K(Sρ e0 , S1−ρ e00 ). 3 2 2π (1 − ρ)ρ 0 0 Proof. For a non-negative measurable function L on U × U , it follows straightforwardly from Proposition 3.4.3 that Z Z ³ s(e, s, a) ´ ds ⊗ da F L(ˆ es,a , eˇs,a ) N(de) s,a ) s ¯ (e, s, a) − s(e, s, a) ζ(ˇ e Γe (3.36) nZ 1 oZ = du F (u) N(de0 ) ⊗ N(de00 ) L(e0 , e00 )ζ(e00 ). 0

96

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

The left-hand side of equation (3.36) is, by (3.21), ³ ´ s,a s(Sc e,s,a) Z ∞ Z Z d ˇ s,a F L(S s,a c e , Sc e ) dc ζ(Sˇc e ) √ (3.37) P(de) ds ⊗ da . s¯(Sc e, s, a) − s(Sc e, s, a) 2 2πc3 ΓS c e 0 √ If we change variables to t = s/c and b = a/ c, then the integral for (s, a) over ΓSc e becomes an integral for (t, b) over Γe . Also, n √ √ ³r´ √ o s(Sc e, ct, cb) = sup r < ct : ce < cb c (3.38) = c sup {r < t : e(r) < b} = cs(e, t, b), and, by similar reasoning, √ s(e, t, b) s¯(Sc e, ct, cb) = c¯

(3.39) and

ct, ζ(Sˇc e

(3.40)

√ cb

) = cζ(ˇ et,b ).

Thus (3.37) is (3.41) √ √ ¡ s(e,t,b) ¢ ct, cb ct, cb Z ∞ Z Z d ˇ F L( S e , S e ) c c √ dc ζ(ˇ et,b ) √ dt ⊗ db . P(de) c s¯(e, t, b) − s(e, t, b) 2 2πc3 0 Γe Now suppose that L is of the form (3.42)

M (ζ(e0 ) + ζ(e00 )) L(e0 , e00 ) = K(Rζ(e0 )+ζ(e00 ) e0 , Rζ(e0 )+ζ(e00 ) e00 ) p , ζ(e0 ) + ζ(e00 )

where, for ease of notation, we put for e ∈ U , and c > 0, 1 Rc e := Sc−1 e = √ e(c ·). c

(3.43) Then (3.41) becomes Z (3.44) 0



dc √ 2 2πc3

Z

³

Z P(de)

dt ⊗ db

F

Γe

s(e,t,b) ζ(ˇ et,b )

´

K(ˆ et,b , eˇt,b ) M (c)

s¯(e, t, b) − s(e, t, b)

.

Since (3.44) was shown to be equivalent to the left hand side of (3.36), it follows from (3.21) that Z Z ¡ s(e, t, b) ¢ dt ⊗ db F K(ˆ et,b , eˇt,b ) P(de) t,b ) s ¯ (e, t, b) − s(e, t, b) ζ(ˇ e Γe (3.45) R1 Z 0 du F (u) = N(de0 ) ⊗ N(de00 ) L(e0 , e00 ) ζ(e00 ), N[M ] and the first equality of the statement follows.

3.4. CAMPBELL MEASURE FACTS: FUNCTIONALS OF THE BROWNIAN CRT

97

We have from the identity (3.45) that, for any C > 0, Z Z ds ⊗ da N{ζ(e) > C} P(de) K(ˆ es,a , eˇs,a ) s ¯ (e, s, a) − s(e, s, a) Γe Z 1{ζ(e0 ) + ζ(e00 ) > C} 00 = N(de0 ) ⊗ N(de00 ) K(Rζ(e0 )+ζ(e00 ) e0 , Rζ(e0 )+ζ(e00 ) e00 ) p ζ(e ) ζ(e0 ) + ζ(e00 ) Z ∞ Z ∞ dc0 dc00 √ √ = 03 0 2 2πc00 0 Z 2 2πc 1{c0 + c00 > C} √ P(de0 ) ⊗ P(de00 ) K(Rc0 +c00 Sc0 e0 , Rc0 +c00 Sc00 e00 ) . c0 + c00 0

c 0 00 (with correMake the change of variables ρ = c0 +c 00 and ξ = c + c sponding Jacobian factor ξ) to get Z ∞ Z ∞ dc0 dc00 √ √ 0 2 2πcZ0 3 0 2 2πc00 1{c0 + c00 > C} √ P(de0 ) ⊗ P(de00 ) K(Rc0 +c00 Sc0 e0 , Rc0 +c00 Sc00 e00 ) c0 + c00 µ ¶2 Z ∞ Z 1 1 1{ξ > C} dρ ξ √ p √ = dξ 3 4 ξ 2 2π ρ (1 − ρ)ξ 0 0 Z P(de0 ) ⊗ P(de00 ) K(Sρ e0 , S1−ρ e00 ) )Z µ ¶2 (Z ∞ 1 1 dξ dρ √ p p = 3 3 2 2π ρ (1 − ρ) ξ C 0 Z P(de0 ) ⊗ P(de00 ) K(Sρ e0 , S1−ρ e00 ),

and the corollary follows upon recalling (3.19). Corollary 3.4.5. Z Z P(de) Γe

=2

∞ X

(i) For x > 0, ds ⊗ da 1{ max eˆs,a > x} s¯(e, s, a) − s(e, s, a) 0≤t≤ζ(ˆes,a )

nx exp(−2n2 x2 )

n=1

(ii) For 0 < p ≤ 1, r Z Z ds ⊗ da 1−p s,a P(de) 1{ζ(ˆ e ) > p} = . s ¯ (e, s, a) − s(e, s, a) 2πp Γe Proof. (i) Recall first of all from Theorem 5.2.10 in [Kni81] that ½ ¾ ∞ X (4n2 x2 − 1) exp(−2n2 x2 ). (3.46) P e ∈ U 1 : max e(t) > x = 2 0≤t≤1

n=1

¤

98

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

By Corollary 3.4.4 applied to K(e0 , e00 ) := 1{maxt∈[0,ζ(e0 )] e0 (t) ≥ x} and F ≡ 1, Z Z ds ⊗ da P(de) 1{ max eˆs,a > x} s ¯ (e, s, a) − s(e, s, a) 0≤t≤ζ(ˆ es,a ) Γe ¾ ½ Z 1 1 dρ √ p = √ P max ρe(t/ρ) > x t∈[0,ρ] 2 2π 0 ρ3 (1 − ρ) ½ ¾ Z 1 dρ x 1 p P max e(t) > √ = √ ρ t∈[0,1] 2 2π 0 ρ3 (1 − ρ) µ ¶ µ ¶ Z 1 ∞ X 1 dρ x2 x2 p = √ 2 4n2 − 1 exp −2n2 ρ ρ 2 2π 0 ρ3 (1 − ρ) n=1 =2

∞ X

nx exp(−2n2 x2 ),

n=1

as claimed. (ii) Corollary 3.4.4 applied to K(e0 , e00 ) := 1{ζ(e0 ) ≥ p} and F ≡ 1 immediately yields Z Z ds ⊗ da 1{ζ(ˆ es,a ) > p} P(de) ¯(e, s, a) − s(e, s, a) Γe s r Z 1 1 dρ 1−p p = √ = . 2πp 2 2π p ρ3 (1 − ρ) ¤ Proof of Theorem 3.4.1. (i) The first equality is clear from the definition of Rx (T, v) and Fubini’s theorem. Turning to the equality of the first and last terms, first recall from Definition 3.2.1 that P is the push-forward on (Twt , dGHwt ) of the normalized excursion measure P by the map e 7→ (T2e , dT2e , νT2e ), where 2e ∈ U 1 is just the excursion path t 7→ 2e(t). In particular, T2e is the quotient of the interval [0, 1] by the equivalence relation defined by 2e. By the invariance of the standard Brownian excursion under random re-rooting (see Section 2.7 of [Ald91b]), the point in T2e that corresponds to the equivalence class of 0 ∈ [0, 1] is distributed according to νT2e when e is chosen according to P. Moreover, recall from the end of Section 3.1 that for e ∈ U 1 , the length 1 measure µTe is the push-forward of the measure ds⊗da s¯(e,s,a)−s(e,s,a) δs(e,s,a) on the sub-graph Γe by the quotient map defined in (3.3). It follows that if we pick T according to P and then pick (u, v) ∈ T × T according to µT ⊗ νT , then the subtree S T,u,v that arises has the same σ-finite law as the tree associated with the excursion 2ˆ es,a when e is chosen according to P and (s, a) is chosen according to the measure 1 ds ⊗ da s¯(e,s,a)−s(e,s,a) δs(e,s,a) on the sub-graph Γe .

3.5. EXISTENCE OF THE REACTANT BRANCHING TREES

99

Therefore, by part (i) of Corollary 3.4.5, ¸ ·Z Z © ª T T,u,v P νT (dv) µ (du)1 height(S )>x T T ½ ¾ Z Z ds ⊗ da x s,a = 2 P(de) 1 max eˆ > (3.47) ¯(e, s, a) − s(e, s, a) 2 0≤t≤ζ(ˆ es,a ) Γe s ∞ X =2 nx exp(−n2 x2 /2). n=1

Part (ii) is a consequence of part (i) and some straightforward calculus. Part (iii) follows immediately from part(ii) of Corollary 3.4.5. Part (iv) is a consequence of part (iii) and some more straightforward calculus. ¤

3.5. Existence of the reactant branching trees In this section we introduce the family forests with their contour processes of a catalytic branching particle model. We then state that suitably rescaled versions cut off at the random height at which the catalyst mass falls below a given threshold converge. The catalytic branching model consists of two different populations of distinct individuals: the catalyst population and the reactant population. The pair of populations (η, ξ) = (ηt , ξt )t≥0 evolves as a Markov process according to the following rules. The catalyst population η is a classical continuous-time critical branching process: every individual has an exponential lifetime with parameter 1 after which it is replaced by 0 or 2 offspring with equal probability. The reactant population ξ evolves analogously except that the exponential lifetime distribution R s+tof each individual is replaced by a lifetime distribution F (t) = 1 − exp(− s b(u)du) where s is the birth time of this individual and b(u) equals the total number of catalyst individuals at time u. The total number of individuals in the catalyst-reactant population is a continuous-time Markov process with values in (mN)2 if we associate mass m with an individual. Typically one sets the mass of a single particle m = 1. However, to obtain a reasonable limit of the rescaled catalytic branching model we will need to scale down a single particle’s mass contribution. Definition 3.5.1 (Total mass processes). • The catalyst total mass process η tot = (ηttot )t≥0 is a critical binary Galton-Watson process with constant branching rate 1 ½ tot ¡ ¢ η tot ηt + m each at rate 21 t , (3.48) ηttot ≡ ηttot ; m 7→ tot ηt − m m

100

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

• given η tot , the reactant total mass process ξ tot = (ξttot )t≥0 is a critical binary Galton-Watson proces with time-inhomogeneous branching rate ηttot ½ tot ¡ tot ¢ ξ tot ξt + m tot (3.49) ξt ≡ ξt ; m 7→ each at rate 12 ηttot t . tot ξt − m m Remark 3.5.2. Recall that criticality of the catalyst process implies that η tot will almost surely get absorbed at 0. Let © ª (3.50) T 0,1 := inf t ≥ 0 : ηttot = 0 . Notice that after the catalyst mass process is absorbed at 0, the reactant mass process gets absorbed as well. That is, in the reactant family tree the most recent common ancestor of any two points at height greater than or equal to T 0,1 has height smaller than T 0,1 , or equivalently, after T 0,1 the branches simply extend to ∞. ¤ The standard way of representing genealogical relationships between individuals in a branching population is with a family forest. The family forest consists of as many trees as the initial number of individuals. Each individual in a branching process has an edge associated to it whose length is equal to its lifetime. When an individual gives birth the edge branches into two new edges, while its death turns its edge end into a leaf. In particular, time in the branching process corresponds to height in the family forest which is measured in terms of the distance to the roots. Recall from Section 1.8 the collection Troot of all root invariant isometry classes of real trees. In what follows we will think of a forest as a rooted R-tree whose root will have degree 1 or larger. For all t ≥ 0, let ∂Qt denote the set of individuals alive at time t in the branching population. Define the genealogical distance metric for (T, d, ρ) by ¡ ¢ ¡ ¢ (3.51) d (t1 , ι1 ), (t2 , ι2 ) := t1 + t2 − 2τ (t1 , ι1 ), (t2 , ι2 ) ¡ ¢ where τ (t1 , ι1 ), (t2 , ι2 ) denotes the death of, or splitting time from the “most recent common ancestor” of the individuals ι1 and ι2 alive at times 0 t1 and t2 , respectively. ¡ Moreover, ¢ for any two (0, ρ), (0, ρ ) ∈ ∂Q0 by our 0 convention we have d (0, ρ), (0, ρ ) = 0. Let then ³[ ´ ξ (3.52) ξ for := {(t, ι)}, d , ρ . ξ ξ + t∈R ,ι∈∂Qt

be the rooted R-trees generated ¡ by the ¢ catalyst populations. We now consider a family η˜n , ξ˜n of catalytic branching particle models indexed by n, where the parameter n means that the initial number of individuals is increased by a factor n and the branching rate of the catalyst is sped up by a factor n. With probability of order O( n1 ) the height of a Galton-Watson tree is of order O(n), and given a Galton Watson process is still alive at a time of

3.5. EXISTENCE OF THE REACTANT BRANCHING TREES

101

order O(n) its population size at that time is of order O(n). Hence, starting initially with n particles and speeding up time by a factor of n yields in the limit a Poisson number of trees, each having total population size of order O(n2 ). We therefore put (3.53)

η˜0tot,n = ξ˜0tot,n := 1,

and let ¡ tot 1 ¢ ¡ tot 1 ¢ η˜tot,n := ηn· ; and ξ˜tot,n := ξn· ; . n n With standard techniques one can show that there exists a Markov process (X, Y ) with paths in DR+ ×R+ [0, ∞) such that ¡ tot,n tot,n ¢ (3.55) η˜ , ξ˜ =⇒ (X, Y ) (3.54)

n→∞

(see, for example, [Pen03]). Moreover, (X, Y ) is the unique strong solution to the following system of stochastic differential equations p dXt = Xt dWtX (3.56) p dYt = Xt Yt dWtY with initial value (X0 , Y0 ) := (1, 1), and where W X = (WtX )t≥0 and W Y = (WtY )t≥0 are two independent standard Brownian motions on the real line. Let then (3.57) ξ˜for,n be the family forest of rooted R-trees associated with ξ˜n . By (3.55), we can realize the rescaled catalyst total mass processes and the limiting Feller diffusion X = (Xt )t≥0 on a common probability space such that for all T > 0 ¯ ¯ (3.58) sup ¯η˜tot,n − Xt ¯ −→ 0. t≤T

t

n→∞

Hence, branching rates of the reactant processes ξ˜tot,n will be given by a sequence of functions in Skorohod space which converge uniformly on compacta to a continuous limit function X. Given t > 0 and (T, ρ) ∈ Troot,lin , fin let © ª (3.59) Qt (T, ρ) := x ∈ T : d(x, ρ) ≤ t , be the cut operator which takes a forest and cuts off the portion that lies above height t. Put then © ª (3.60) ∂Qt (T, ρ) := x ∈ T : d(x, ρ) = t . Let (3.61)

ξ˜for,n,δ := QT˜δ,n (ξ˜for,n ).

denote the reactant tree cut off at ¯ ¯ (3.62) sup ¯η˜tot,n − Xt ¯ −→ 0. t≤T

t

n→∞

102

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

In the following we study the behavior of the rescaled reactant forests ξ˜for,n,δ under Assumption (3.62). A lexicographic labeling, also called the Ulam-Harris labeling, gives each individual a label from the set Nn , where n ∈ {1, 2, . . . } is the number of ancestors the individual has had from the start of the process. The initial individuals are given labels between 1 and the initial population size, distributed in a random order to them. Each parent subsequently gives a label to its children that consists of its own label followed by a number between 1 and the total number of its children, distributing these randomly amongst its children. Recall also Figure 3.2. Recall then from (3.7) the map C(·; σ) which associates an finite ordered tree with the excursion obtained by “walking around the tree” (respecting the order) with speed σ and recording the height profile. Notice that since branching is sped up by a factor of n all edges are of order O( n1 ). As a consequence, in order to find a non-trivial limit contour, in the nth approximation step we need to traverse the rescaled forest ξ˜for,n at speed σ = n. We therefore put ¡ ¢ ¡ ¢ (3.63) C˜ δ,n = C˜ δ,n := C ξ˜for,δ,n ; n . u

u≥0

The main result of this section describes the behavior of the limit of the rescaled reactant forests in a catalytic environment that is stopped at T δ,n . We will give the proof via random evolutions in the next subsection. Theorem 3.5.3 (Limit of the reactant contour process). Assume (3.62), and fix δ > 0. Consider the operator (Aδ , D(Aδ )) with ´0 ³ 1 f 0 (c) (3.64) Aδ f (c) := 2Xc on [0, τ δ ] with reflection on the boundary, that is with domain n o ¯ ¡ ¢ 1 0 1 1 0¯ (3.65) D Aδ := f ∈ C[0,τ f ∈ C[0,τ δ ] [0, ∞) : δ ] [0, ∞), f {0,τ δ } ≡ 0 . X· Then the following holds: (i) The (Aδ , D(Aδ ))-martingale problem is well-posed. (ii) If ζ δ is the solution of the (Aδ , D(Aδ ))-martingale problem then ¡ ¢ ¡ δ,n tot,n ¢ (3.66) C˜ ; η˜ =⇒ ζ δ ; X , n→∞

where ⇒ here mean weak convergence on CR+ ([0, ∞)) with respect to the uniform topology on compacta. Recall that T : CR0 + [0, ∞) → Troot maps an excursion to a rooted linearly ordered compact R-tree. From Theorem 3.5.3 we can immediately conclude the following. Corollary 3.5.4. For all δ > 0, ¡ for,δ,n tot,n ¢ ¡ ¢ (3.67) ξ˜ ; η˜ =⇒ T ζ δ ; X . n→∞

3.6. RANDOM EVOLUTIONS: PROOF OF THEOREM 3.5.3

103

3.6. Random evolutions: Proof of Theorem 3.5.3 In this section we give the proof of Theorem 3.5.3. We first show that given a realization of the catalyst total mass process the reactant contour process is associated with a Markov process, and we derive its generator. We also give a representation of this Markov process as a random evolution process. By this we mean that it moves at constant velocity for a random time, then changes the sign of its velocity and proceeds at constant velocity for a random time again. We use this representation to prove that the suitably rescaled contour processes of the truncated reactant forest converge towards a limit contour process which is characterized as the solution of wellposed martingale problem. Throughout this section a realization η tot := (ηstot )s≥0 ∈ DN [0, ∞) of the catalyst path is fixed. Recall once more from (3.7) the map C(·; σ) which associates an finite ordered tree with the excursion obtained by “walking around the tree” (respecting the order) with speed σ, put ¡ ¢ ¡ ¢ (3.68) C = Cu u≥0 := C ξ for ; 1 , and let its slope process V := (Vu )u≥0 be defined by Vu := slope(Cu ) ∈ Eslope with Eslope := {−1, +1},

£ ¤ Econt := 0, T 0,1 .

where the slope of the root, branch points and leaves are defined in such a way that (Vu )u≥0 has cadlag paths. The next result states that the functional pairing of the height of the contour with its slope is a Markov process. Lemma 3.6.1 (Markov property of the contour process). The process (C, V ) := (Cu , Vu )u≥0 on Econt × Eslope is a Markov process whose generator is the closure of the operator (3.69)

Af (c, v) = v

£ ¤ ∂ f (c, v) + ηctot f (c, −v) − f (c, v) , ∂c

for all f ∈ D(A), where (3.70)

© ª ∂f ¯¯ D(A) = f ∈ CE1,0cont ×Eslope [0, ∞) : ≡0 . ∂(E )×E cont slope ∂c

root,lin Proof. Recall that C(·, σ) : Tfin → CR+ [0, ∞) maps a rooted R-tree to an excursion from Figure 3.2. We first show that the lengths of the line segments of the contour process are independent of each other and then we use this to obtain the Markov property and to identify the generator.

104

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

Step 1 (Independence of the lengths of the line segments). Recall from Figure 3.2 the assignment of a contour process to the representative of a Galton-Watson family forest embedded in the plane. Each piece of the contour process with constant slope sign corresponds to a sum of a number of lifetimes, so independence of the different line segments is not obvious. We shall make use of the fact that the reactant process can be represented in two ways without changing the distribution of its total mass process or the genealogical distances between individuals, either as continuous time binary Galton-Watson process, or as a birth and death process. • In the continuous time Galton-Watson process the branch points occur at rate ηttot , and the number of offspring at each branch point is 0 or 2 with equal probability. • In the birth and death process each individual dies at rate 21 ηttot , and during its lifetime gives birth to new offspring at rate 12 ηttot . If in the Galton-Watson process at each birth time we choose to identify the life of one of the offspring as a continuation of the life of its parent, we obtain the birth-and-death process. The family forest of the Galton-Watson process has a canonical planar embedding coming from a linear order on its vertices induced by the linear order of the tree as in Figure 3.5(a), while the family forest of a birth-anddeath process with that same linear order on the vertices has two canonical planar embeddings. In Figure 3.5(b) we always choose to identify the continuation of the life of the parent with the life of the offspring of higher linear order, and the branch of the offspring is always drawn to the left of the branch of the parent. In Figure 3.5(c) we always choose to identify the continuation of the life of the parent with the life of the offspring of lower linear order, and the branch of the offspring is always drawn to the right of the branch of the parent. The key observation now is that since all three planar embeddings respect the same linear order on the vertices they also have the same contour process (compare Figure 3.2). In the birth and death process the line segments of constant slope correspond to a lifetime of exactly one individual. With the parent identification as in Figure 3.5(b) each line segments of negative slope corresponds to a lifetime of an individual, while in the parent identification as in Figure 3.5(c) each line segment of positive slope corresponds to a lifetime of an individual. Since lifetimes of individuals are independent this implies the independence of all constant slope line segments in the contour process as claimed. Step 2 (Identification of the generator). Given the branch rates (ηttot )t≥0 , the law of the length l of a lifetime of an individual is R t∗ +t tot © ª P l > t = e− t∗ ηs ds , where t∗ is the time of birth of that individual. Hence the law of the length of each segment is the law of the first point of a Poisson process with rate

3.6. RANDOM EVOLUTIONS: PROOF OF THEOREM 3.5.3

e d

m

f

c

e

k

h

l i

g b

d

g b

a (a)

m l

c f

j

e

k

h i

j a

(b)





d c b a ∅

105

k

h

m

f g

l i j

(c)

Figure 3.5. illustrates three different planar embeddings of the reactant family forest with the same linear order and a common contour process. (ηttot )t≥t∗ or (ηttot )t≤t∗ if the slope of the line segment is +1 and −1, retot where C is the spectively. The alternating sign of Vu changes at rate ηC u u current value of the contour process. In between the jumps of Vu , Cu moves with unit speed in the direction determined by Vu . Hence, the contour process paired with its slope is Markovian, and furthermore its generator agrees on D(A) with the operator given in (3.69) and (3.70). Since D is dense in CEcont ×Eslope [0, ∞) the generator is the closure of the operator (A, D). ¤ We next notice that (C, V ) is a random evolution, i.e., a Markov process moving at constant speed in a direction which changes stochastically. Specifically, for the pair (C, V ) the change in speed is a counting process whose rate is governed by the catalyst mass process η tot . Lemma 3.6.2 (Random evolution representation). Let N := (N (u))u≥0 be a unit rate Poisson process, and consider the following system ¡R ¢ Z u N 0u η tot dv Cv (3.71) Cu = Vv dv, Vu = (−1) , 0

in int(Econt ) × Eslope and with reflection on ∂(Econt ) × Eslope . There exists a unique random evolution satisfying this system, and its distribution is the same as the distribution of the contour process and its slope (C, V ). Proof. See Chapter 12 of [EK86] for the definition of a random evolution. Existence and uniqueness of a random evolution satisfying the system (3.71) follow from standard theorems on existence and solutions of a system of stochastic differential equations with continuous local martingales as differentials (see Theorem 3.15 in [Pro77]). Equality in distribution of this random evolution with the contour process and its slope follows by simply comparing the generator of this system to the generator obtained in the previous lemma. ¤

106

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

We will also need the following easy consequence of the above arguments. Fix δ > 0, and let T δ,1 be the first time that the catalyst process started from 1 individual drops below δ. Also recall that QT δ,1 from (3.59) is the map that takes a tree and cuts the portion of its branches that lie above height T δ,1 . Corollary 3.6.3 (Truncated process). If the contour C(ξ for ; 1) solves (3.71) then C(QT δ,1 (ξ for ); 1) solves (3.71) but with the state space Econt reδ placed by Econt = [0, T δ,1 ]. We next show that the rescaled random evolutions converge to the solution of the well-posed martingale problem stated in Theorem 3.5.3. We rely on averaging techniques established for random evolutions. Throughout we fix realizations of η˜tot,n , n ∈ N, and of X on a single probability space such that (3.62) holds and we choose a truncation parameter δ > 0. Step 3 (The rescaled random evolution system). Recall the rescaled re£ ¤ ˜ δ,n := 0, T˜δ,n , and define actant contour process C˜ δ,n from (3.63). Let E cont ¡ ¢ its rescaled slope process V˜ δ,n := sign slope(C˜·δ,n ) . Then Lemma ¡ ¢ 3.6.1 applied to the rescaled reactant populations implies that C˜ δ,n , V˜ δ,n is a Markov process whose generator is the closure of the operator (3.72)

∂ A˜δ,n f (c, v) = nv f (c, v) + n2 η˜ctot,n [f (c, −v) − f (c, v)], ∂c

acting on all f ∈ D(A˜δ,n ), where © D(A˜δ,n ) = f ∈ C 1,0 ˜ δ,n

Econt ×Eslope

[0, ∞) :

ª ∂f ¯¯ ≡ 0 . δ,n ˜ ∂c {0,T }×Eslope

Furthermore, the analogous argument to that of Lemma 3.6.2 implies that the distribution of the pair (C˜ δ,n , V˜ δ,n ) has the same distribution as the random evolution which is the unique solution to the system ¢ ¡ R Z u N n2 0u η˜tot,n dv δ,n ˜ C v , (3.73) C˜uδ,n = n V˜vδ,n dv, V˜uδ,n = (−1) 0

where N is a unit rate Poisson process. In other words, the rescaled process C˜ δ,n evolves deterministically with speed n, and changes the sign at rate n2 times a counting process whose rate is governed by the rescaled catalyst mass process η˜tot,n . Step 4 (The velocity process). The convergence result stated in Theorem 3.5.3 relies on the fact that the velocity component V˜ δ,n evolves much faster than the contour component C˜ δ,n , as is it clear from (3.73). Hence in the limit the velocity process will average out and can be replaced with its stationary measure.

3.6. RANDOM EVOLUTIONS: PROOF OF THEOREM 3.5.3

107

If Γn is the occupation time measure ofV˜ δ,n on Eslope , i.e., for u ≥ 0, and v ∈ Eslope Z u ¡ ¢ n 0 (3.74) Γ [0, u] × {v} := 1{v} (V˜uδ,n 0 ) du , 0

then it is clear from the description (3.73) of V˜ δ,n that Γn =⇒ Γ = λ ⊗ π, n→∞

where λ denotes Lebesgue measure on [0, ∞), and π({1}) = π({−1}) = 12 . In the limit the contour component will have spent on average half the time increasing and half the time decreasing. Step 5 (Averaging for martingale problems). The proof of Theorem 3.5.3 will rely on the following result taken from Theorem 2.1 in [Kur92] and adapted to our specific situation in which the state spaces are compact. Proposition 3.6.4 (Stochastic averaging). Suppose there is an operator 0 0 : D(Aδ ) ⊆ C[0,∞) [0, τ δ ] → C[0,∞) [0, τ δ ] × {−1, 1} such that (i) for all f ∈ D(Aδ ) there is a process εδ,f,n for which Z t ´ ³ ¡ ¡ δ ¢¡ δ,n δ,n ¢ δ,n ¢ ˜ A f C˜s , V˜s ds + εδ,f,n (3.75) f Ct − t Aδ

t≥0

0

is a martingale, 0 (ii) D(Aδ ) is dense in C[0,∞) [0, τ δ ] with respect to the uniform topology, and (iii) for f ∈ D(Aδ ) and T > 0 there exists p > 1 such that hZ T ¯ ¡ δ,n δ,n ¢¯p i δ ¯ P A f C˜s , V˜s ¯ ds < ∞, 0

and £ ¤ lim P sup |εδ,f,n | = 0. t

(3.76)

n→∞

t≤T

Then for Γn defined as in (3.74) n o δ,n n ˜ (C , Γ ) is relatively compact inD[0,∞) [0, ∞) × M([0, ∞) × {−1, 1}), n∈N

where M([0, ∞) × {−1, 1}) is the space of all measures on [0, ∞) × {−1, 1} for which µ([0, u) × {−1, 1}) = u, and for any limit point (ζ δ , π) there exists a filtration such that ³ ¡ ¢ Z t X ´ ¡ δ,n ¢ f ζtδ − Aδ f C˜sζ , v dsπ{v} 0 v∈{−1,1}

t≥0

is a martingale with respect to this filtration, for all f ∈ D(Aδ ).

108

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

Recall the operator (Aδ , D(Aδ )) from (3.64) and (3.65) which is the generator for the rescaled contour process C˜ δ,n . Our goal is to show that all three assumptions (i)-(iii) of Proposition 3.6.4 above are satisfied. (i) We first show that we can define small error functions {εδ,f ; f ∈ D(Aδ )} such that Z t ³ ¡ ´ ¡ ¢ δ,n ¢ ˜ f Ct − Aδ f C˜sδ,n , v ds + εδ,f,n t t≥0

0

D(Aδ ).

is a martingale for all f ∈ Notice that f ∈ D(Aδ ) if and only if there exists a function g ∈ 2 C[0,∞) [0, τ δ ] with g(0) = g(τ δ ) = 0 such that Z x (3.77) f (x) = f (0) + Xs g(s)ds, 0

for all x ≥ 0. Let f ∈

D(Aδ )

be of the form (3.77) and let Z c ¡ ˜δ,n ¢ n ˜ f (c) := f (0) + η˜stot,n g Tτ δ s ds. 0

(A˜δ,n , D(A˜δ,n ))

from (3.72) to functions f n given by ¡ n ¢0 v f n (c, v) := f˜n (c) + f˜ (c) 2n˜ ηctot,n

Apply the operator

to get ¡ ¢0 A˜δ,n f n (c, v) = nv f˜n (c) − n2 η˜ctot,n ³

¡ n ¢0 ´0 f˜ (c). 2˜ ηctot,n Let then for all t ≥ 0, =

εδ,f,n t

v n˜ ηctot,n

³ ¡ n ¢0 f˜ (c) + v 2

1 2˜ ηctot,n

¡ n ¢0 ´0 f˜ (c),

1

¢ ¡ ¢ ¡ := f˜n C˜tδ,n − f C˜tδ,n + Z +

V˜tδ,n ¡ ˜n ¢0 ¡ ˜ δ,n ¢ f Ct 2n˜ η tot,n ˜ δ,n Ct



¢¡ ¢ Aδ f − A˜δ,n f n C˜sδ,n , V˜sδ,n .

0 Rt δ,n ˜ δ,n n ˜ Since f (Ct , Vt ) − 0 A˜δ,n f n (C˜sδ,n , V˜sδ,n ) D(A˜δ,n ), it follows that (3.75) holds for all f

is a martingale for all f n ∈ ∈ D(Aδ ). δ (ii) We next show that the domain D(A ) is dense in the space of continuous functions on [0, τ δ ]. 0 Lemma 3.6.5 (Dense domain). Fix δ > 0 and X ∈ C[δ,∞) [0, τ δ ]. Then the set of functions F defined by Z c ¯ © ª F := f : f (c) = C + Xc0 g(c0 )dc0 ; C ∈ R, g ∈ CR2 [0, τ δ ], g ¯{0,τ δ } ≡ 0 . 0

is dense in

CR0 [0, τ δ ].

3.6. RANDOM EVOLUTIONS: PROOF OF THEOREM 3.5.3

109

Proof. It is well-known that each continuous function on [0, τ δ ] can be approximated by piecewise linear functions. It is therefore enough to show that any piecewise linear function can be approximated by functions in F. This follows by continuity of X and the fact that Xu ≥ δ, for all u ∈ [0, τ δ ]. ¤ We finally verify the last point. It is standard to show (3.76) holds for all f ∈ D(Aδ ), T > 0 and p > 1. Moreover, since 1/˜ η tot,n is bounded by 1δ , ˜ δ,n Ct

for all t ≥ 0, f˜n → f , and

° ° ¯ ° T δ,n ¯° ° kA˜δ,n f n − Aδ f ° ≤ |τ δ − T δ,n |°g 0 ° + ¯1 − δ ¯°g 00 ° −→ 0, n→∞ τ (3.76) is satisfied as well. Altogether we can apply Proposition 3.6.4 to the effect that the family of rescaled contours {C˜ δ,n ; n ∈ N} is relatively compact in law and any limit point satisfies the (Aδ , D(Aδ ))-martingale problem. Step 6 (Uniqueness of the limit martingale problem). We next show that the (Aδ , D(Aδ ))-martingale problem has a unique solution ζ δ , for which we first need the following lemma which characterizes solutions of transforms of a reflecting Brownian motion. Recall that X is a Feller diffusion started at R τδ X0 = 1 and τ δ is the first time it falls below δ. Let s : [0, τ δ ] → [0, 0 Xu du] defined by Z x (3.79) s(x) := Xu du, (3.78)

0

denote the scale function and denote its inverse by s−1 . Consider the operator ³ ¯ ¡ δ ¢ © ª´ B , D(B δ ) := 12 Xs−1 (·) f 00 , h ∈ CR2 + [0, ∞) : h0 ¯{0,s(τ δ )} ≡ 0 . Lemma 3.6.6 (Relation to Brownian motion). Let δ > 0. If ζ δ solves the (Aδ , D(Aδ ))-martingale problem, then ¡ ¢ Bt := s ζtδ , t ∈ [0, ∞) solves the (B δ , D(B δ ))-martingale problem. ¯ 0¯ 2 Proof. Fix H ∈ C[0,s(τ δ )] [0, ∞) with H {0,s(τ δ )} ≡ 0. It is easy to check that then H ◦ s ∈ D(Aδ ). ¯ We therefore obtain that for all H ∈ CR2 + [0, ∞) such that H 0 ¯{0,s(τ δ )} ≡ 0 the process Z t ³¡ ¢¡ δ ¢ ¡ ¢¡ ¢ ´ H ◦ s ζt − Aδ H ◦ s ζuδ du t≥0 0 ³ ¡ ¢ Z t1 ´ = H Bt − Xs−1 (Bu ) h00 (Bu )du 2 t≥0 0

110

3. EXAMPLES OF LIMIT TREES I: BRANCHING TREES

is a martingale. Here we have used that Aδ (H ◦ s)(ζ) = 1 00 2 Xζ· (H ◦ s).

1 2

¡ 0 ¢0 H ◦ s (ζ) = ¤

Proposition 3.6.7 (Uniqueness). The (Aδ , D(Aδ ))-martingale problem has a unique solution ζ δ . Proof of Proposition 3.6.7. Assume that ζ δ,1 and ζ δ,2 are two solutions of the (Aδ , D(Aδ ))-martingale problem. By Theorem 5.6 in [?], since (Xu )u≥0 is bounded away from zero, the (B δ , D(B δ ))-martingale problem has a unique solution. Hence Z Z d Xu du = Xu du. [0,ζ δ,1 ]

[0,ζ δ,2 ]

Since the scale function is strictly increasing on [0, τ δ ], the one-dimensional distributions of ζ δ,1 and ζ δ,2 agree. It follows from Theorem 4.4.2 in [EK86] d

that therefore ζ δ,1 = ζ δ,2 .

¤

Step 7 (Conclusion). We close the section by giving a proof of Theorem 3.5.3. Proof of Theorem 3.5.3. We have shown in Step 2 that the sequence (C˜ δ,n )n∈N is relatively compact, and that any limit point ζ δ of C˜ δ,n is a solution of the (Aδ , D(Aδ ))-martingale problem. This proves existence of a solution for all δ > 0. Furthermore, by Proposition 3.6.7, this martingale problem has only one solution. That is, the (Aδ , D(Aδ ))-martingale problem is well-posed and if ζ δ is its unique solution (3.66) holds. Finally, it follows from Theorem 4.4.2 in [EK86] that ζ δ has the Markov property. ¤

CHAPTER 4

Examples of limit trees II: Coalescent trees In this chapter we apply the theory developed in Chapters 1 and 2 to illustrate how certain classes of Λ-coalescents can be associated with random metric measure spaces. The Λ-coalescents were introduced in [Pit99] (see also [Sag99]) and have since been the subject of many papers (see, for example, [MS01, BG05, BBC+ 05, LS06, BBS06]). They appear as the duals of population models whose evolution is based on resampling. The fact that Λ-coalescents allow for multiple collisions is reflected in a possibly infinite variance of the resampling offspring distribution. Moreover, Λ-coalescents are up to time change dual to the process of relative frequencies of families of a GaltonWatson process with possibly infinite variance offspring distribution (compare [BBC+ 05]). In Section 4.1 we start with a characterization of the class of Λ-coalescents which can be described by an ultra-metric measure space. The spatially structured (Kingman) coalescent, or short the spatial coalescent, on a class of Abelian groups was introduced in [GLW05] and extended to the spatially structured Λ-coalescents in [LS06]. Spatial(ly structured) coalescents are of interest since they appear, of course, as the duals of of spatially interacting resampling models where interaction is due to additional migration. In Section 4.2 we construct the spatially structured Λ-coalescent trees spanned by partition elements initially located in a finite box. In Section 4.3 we then restrict to spatially structured Λ-coalescent coalescents on Zd , where d ≥ 3, which come down from infinity. As shown in [LS06] these coalescents then automatically come down in an uniform way which implies convergence of the linearly rescaled spatially structured Λ-coalescent coalescents on large tori towards the (non-spatial) Kingman coalescent tree as the side length of the torus goes to infinity. In Section 4.4 we focus on Z2 and further restrict to the spatially structured Kingman coalescent. Since the dimension d = 2 is the critical dimension in the recurrence versus transience dichotomy of the interaction between non-spatial coalescents (which is migration of partition elements), we here encounter the so-called diffusive clustering regime. That is, the partition elements at time t have collected initial partition elements whose initial loα cations cover areas of side length t 2 , for a random α ∈ (0, 1]. We will show that the spatially structured Kingman coalescent trees spanned by partition α elements initially in a box of side length t 2 , for a fixed α ∈ (0, 1], can be 111

112

4. EXAMPLES OF LIMIT TREES II: COALESCENT TREES

(non-linearly) rescaled to converge to a non-spatial Kingman coalescent tree on a logarithmic scale, as t → ∞. 4.1. Λ-coalescent measure trees In this section we characterize the class of Λ-coalescents which can be described by a metric measure space. We start with a quick description of Λ-coalescents. Recall that a partition of a set S is a collection {Aλ } of pairwise disjoint subsets of S, also called blocks, such that S = ∪λ Aλ . Denote by S∞ the collection of partitions of N := {1, 2, 3, ...}, and for all n ∈ N, by Sn the collection of partitions of {1, 2, 3, ..., n}. Each partition P ∈ S∞ defines an equivalence relation ∼P by i ∼P j if and only if there exists a partition element π ∈ P with i, j ∈ π. Write ρn for the restriction map from S∞ to Sn . We say that a sequence (Pk )k∈N converges in S∞ if for all n ∈ N, the sequence (ρn Pk )k∈N converges in Sn equipped with the discrete topology. We are looking for a strong Markov process defined as follows. Definition 4.1.1 (The Λ-coalescent). The Λ-coalescent is a strong Markov process ξ starting in P0 ∈ S∞ such that for all n ∈ N, the restricted process ξn := ρn ◦ ξ is an Sn -valued Markov chain which starts in ρn P0 ∈ Sn , and given that ξn (t) has b blocks, each k-tuple of blocks of Sn is merging to form a single block at rate λb,k . Pitman [Pit99] showed that such a process exists and is unique (in law) if and only if Z 1 (4.1) λb,k := Λ(dx) xk−2 (1 − x)b−k , 0

for some non-negative and finite measure Λ on the Borel subsets of [0, 1]. Let therefore Λ be a non-negative finite measure on B([0, 1]) and P ∈ S∞ . We denote by PΛ,P the probability distribution governing ξ with ξ(0) = P on the space of cadlag paths with the Skorohod topology. Example 4.1.2. If we choose (4.2)

© ª P 0 := {1}, {2}, ... , 0

Λ = δ0 , or Λ(dx) = dx, then PΛ,P is the Kingman and the BolthausenSznitman coalescent, respectively. ¤ For each non-negative and finite measure Λ, all initial partitions P ∈ S∞ and PΛ,P -almost all ξ, there is a (random) metric rξ on N defined by ¡ ¢ © ª (4.3) rξ i, j := inf t ≥ 0 : i ∼ξ(t) j . ¡ ¢ That is, for a realization ξ of the Λ coalescent, rξ i, j is the time it needs i and j to coalesce. Notice that rξ is an ultra-metric on N, almost surely.

4.1. Λ-COALESCENT MEASURE TREES

113

For all ξ ∈ S∞ , let (U ξ , rξ ) denote the completion of (N, rξ ). Clearly, the extension of rξ to U ξ is also an ultra-metric. Recall from Example 1.5.8 that ultra-metric spaces are associated with R-trees. The main goal of this section is to introduce the Λ-coalescent measure trees as metric spaces (U ξ , rξ ) equipped with the “uniform distribution”. Notice that since the Kingman coalescent is known to “come down immediately to finitely many partition elements” the corresponding metric space is almost surely compact ([Eva00a]). Even so there is no abstract concept of the “uniform distribution” on compact spaces, the reader may find it not surprising that in particular examples one can easily make sense out of this notion by approximation. We will see, that for Λ-coalescents, under an additional assumption on Λ, one can extend this strategy to locally compact metric spaces. Within this class falls, for example, the Bolthausen-Sznitman coalescent which is known to have infinitely many partition elements for all times, and whose corresponding metric space is therefore not compact. Define Hn to be the map which takes a realization of the S∞ -valued coalescent and maps it to (an isometry class of) a metric measure space as follows: ³ ´ Xn (4.4) Hn : ξ 7→ U ξ , rξ , µξn := n1 δi . i=1

Put then for given P0 ∈ S∞ , (4.5)

¡ ¢ QΛ,n := Hn ∗ PΛ,P0 .

Next we give the characterization of existence and uniqueness of the Λ-coalescent measure tree. Theorem 4.1.3 (The Λ-coalescent measure tree). The family {QΛ,n ; n ∈ N} converges in the weak topology with respect to the Gromov-weak topology if and only if Z 1 (4.6) Λ(dx) x−1 = ∞. 0

Remark 4.1.4 (“Dust-free” property). By exchangeability and the de Finetti Theorem, the family {f˜(π); π ∈ ξ(t)} of frequencies ª 1 © (4.7) f˜(π) := lim # j ∈ {1, ..., n} : j ∈ π n→∞ n Λ,P 0 exists for P almost all π ∈ ξ(t) and all t > 0. Define f := (f (π); π ∈ ξ(t)) to be the ranked rearrangements of {f˜(π); π ∈ ξ(t)} meaning that the entrees of the vector f are non-increasing. Let PΛ,P0 denote the probability P distribution of f . Call the frequencies f proper if f i≥1 (πi ) = 1. By Theorem 8 in [Pit99], the Λ-coalescent has in the limit n → ∞ proper frequencies if and only if Condition (4.6) holds. According to Kingman’s correspondence (see, for example, Theorem 14 in [Pit99]), the distribution PΛ,P0 and PΛ,P0 determine each other uniquely.

114

4. EXAMPLES OF LIMIT TREES II: COALESCENT TREES

For P ∈ S∞ and i ∈ N, let (4.8)

© ª P i := j ∈ N : i ∼P j

denote the partition element in P which contains i. Then the following are equivalent: (a) Condition (4.6) holds. (b) For all t > 0, ¢ ª © ¡ (4.9) PΛ,P0 f˜ ξ(t)1 = 0 = 0. The latter is often referred to as the “dust”-free property. (c) The total coalescence rate of a given {i} ∈ P0 being infinite (compare with the proof of Lemma 25 in [Pit99]). ¤ Definition 4.1.5 (The Λ-coalescent measure tree). Assume that the dust-free property holds. The Λ-coalescent measure tree QΛ is the limit of the family {QΛ,n ; n ∈ N}. Proof of Theorem 4.1.3. For existence we will apply the characterization of tightness as given in Theorem 2.8.2, and verify the two conditions. (i) By definition, for all n ∈ N, QΛ,n [wX ] is exponentially distributed with parameter λ2,2 . Hence the family {QΛ,n [wX ]; n ∈ N} is tight. (ii) Assume that Condition 4.6 holds. Fix t ∈ (0, 1). Then for all δ > 0, by the uniform distribution and exchangeability, © ª QΛ,n vδ (Hn (ξ)) ≥ t n n X o ¯ © ª Λ,P0 1 ξ ξ ξ ¯ ¯ = P µ x ∈ U : µ ( B (x)) ≤ δ x = i ≥ t t n n n (4.10) i=1

© ª 1 ≤ PΛ,P0 µξn (Bt (1)) ≤ δ . t

¡ ¢ n→∞ By the de Finetti theorem, µξn (Bt (1)) −−−→ f˜ (ξ(t))1 , PΛ,P0 -almost surely. Hence, dominated convergence yields © ª © ª 1 lim lim QΛ,n vδ (Hn (ξ)) ≥ t ≤ lim PΛ,P0 f˜((ξ(t))1 ) ≤ δ δ→0 n→∞ δ→0 t (4.11) © ª 1 = PΛ,P0 f˜((ξ(t))1 ) = 0 . t We have therefore shown that Condition (4.6) implies that a limit of QΛ,n exists. Assume to the contrary, that the dust-free property does not hold. Then for all t > 0, limn→∞ µξ (U ξ ) < 1, by Remark 4.1.4(c). Uniqueness of the limit points follows from the projective property, i.e. restricting the observation to a tagged subset of initial individuals is the same as starting in this restricted initial state. ¤

4.2. SPATIALLY STRUCTURED Λ-COALESCENT TREES

115

We conclude this section by characterizing the measure Λ for which the Λ-coalescent measure tree is compact. Proposition 4.1.6 (Compact Λ-coalescent measure trees). The Λcoalescent measure tree is compact if and only if µ ¶ ´−1 X∞ ³ Xb b (4.12) (k − 1) λb,k < ∞. b=2 k=2 k Remark 4.1.7 (Coming down from infinity). Notice that Condition (4.12) is equivalent to the property that the Λ-coalescent comes down from infinity in infinitesimal small time, i.e., (4.13)

#ξt < ∞,

PΛ,P0 -almost surely,

for all t > 0 (see, Theorem 1 in [Sch00b]).

¤

Proof of Proposition 4.1.6. Since for ultra-metric spaces, balls do either agree or are disjoint, the number of partition elements #ξε at time ε > 0 equals the number of ε-balls one needs to cover the ultra-metric space representing the Λ-coalescent tree. Hence the claim is an immediate consequence of Remark 4.1.7. ¤ 4.2. Spatially structured Λ-coalescent trees The main goal of this section is to introduce the spatially structured Λ-coalescent measure tree as the random metric space (U (C,L) , r(C,L) ) associated with the Λ-coalescent and equipped with the “uniform distribution” on partition elements with a mark in a fixed finite subset. Let G be an Abelian group: The basic ingredient for our processes is a random walk (RW) on a(n at most) countable Abelian group G. Let a(·, ·) be a recurrent random walk kernel on G, i.e., (4.14) for all x, y ∈ G, and (4.15)

a(x, y) = a(0, y − x), X

a(n) (0, 0) = ∞.

n∈N

Assume in addition that a(·, ·) is aperiodic and irreducible, and denote its rate 1 continuous time transition kernel by X tn e−t (4.16) at (x, y) := a(n) (x, y) , x, y ∈ G. n! n∈N

for all x, y ∈ G where a(n) (x, y) is the n-step transition probability. Assume in addition that at (x, y) is aperiodic and irreducible. There is a standard way to construct particle systems that possibly (if G is countable infinite) start in configurations with countably many particles.

116

4. EXAMPLES OF LIMIT TREES II: COALESCENT TREES

Fix a finite measure α on G with α{x} > 0, for all x ∈ G, and such that there exists a constant Γ with X a(x, y)α{y} ≤ Γ · α{x}, (4.17) y∈G

for all x ∈ G. Denote by N (G) the set of all locally finite N-valued measures on G. Let then E = Eα be the Liggett-Spitzer space (corresponding to α), i.e., X © ª (4.18) E := η ∈ N (G) : η{x}α{x} < ∞ . x∈G

Remark 4.2.1. Let {(Xti )t≥0 : i ∈ I} be a countable P collection of independent random walks, and put for all t ≥ 0, ηt := i∈I δXti ∈ N (G). Notice that if (4.19)

η0 ∈ E,

a.s.,

then an easy calculation shows that the process (e−Γt a super-martingale. In particular, under (4.19), © ª (4.20) P ηt ∈ E, ∀t ∈ [0, ∞) = 1.

P i∈I

α({Xti }))t≥0 is

Note that this implies ηt ∈ N (G), for all t ≥ 0, almost surely.

¤

To combine migration and coalescence, fix a countable index set I. Notice then that from any P ∈ ΠI one can form a marked partition © ª (4.21) P G := (π, L(π)); π ∈ P , by assigning each partition element π ∈ P, its mark L(π) ∈ G. Denote the space of marked partitions by ΠI,G .

(4.22) For all I 0 ⊆ I, write (4.23)

0

ρI 0 : ΠI,G → ΠI ,G

for the restriction map. In this way, for all P ∈ ΠI,G and i01 , i02 ∈ I 0 , (4.24)

i01 ∼ρI 0 P i02

if and only if i01 ∼P i02 .

Definition 4.2.2 (Topology for marked partitions). We say that a sequence (Pn )n∈N converges in ΠI,G if and only if for all finite subsets I 0 ⊆ I, 0 the sequence (ρI 0 Pn )n∈N converges in ΠI ,G equipped with the discrete topology. Let Λ be a non-negative finite measure on B([0, 1]). Recall the Λcoalescent from Definition 4.1.1. We next define the spatially structured Λ-coalescent.

4.2. SPATIALLY STRUCTURED Λ-COALESCENT TREES

117

Definition 4.2.3 (The spatially structured Λ-coalescent). The spatially structured Λ-coalescent, or short the spatial Λ-coalescent, (4.25)

(C, L) := (Ct , Lt )t≥0 ,

is a strong ΠI,G -valued Markov process with c` adl` ag paths such that for all subsets I 0 ⊆ I with X (4.26) δL0 (π) ∈ E, π∈C0

0 (C, L)I

0

the restricted process = ρI 0 ◦ (C, L) is a ΠI ,G -valued strong Markov particle system which undergoes the following two independent mechanisms: • Migration The marks of the partition elements perform independently continuous time rate 1 random walks with kernel a(·, ·). • Coalescence The partition built by restricting to elements with the same mark perform a Λ-coalescent. Example 4.2.4 (Spatially structured Kingman-coalescent). If Λ = δ0 then the spatial Λ-coalescent is referred to as the spatial Kingman-coalescent, or even shorter the spatial coalescent. ¤ Remark 4.2.5 (Spatially structured Λ-coalescent is well-defined). Existence of the Λ-coalescent for finite groups G is shown in Proposition 3.4 in [GLW05] and Theorem 1 in [LS06]. One can extend the construction to infinite graphs with infinite initial configuration can be done via approximation of G by finite subgroups. This, of course, requires an extra condition that the group G can be suitably approximated by sub-groups (compare, for example, Condition 6.1 in [GLW05]), which is satisfied if G = Zd , d ∈ N, for example. ¤ In the following we are interested in the spatial Λ-coalescent which starts with locally infinitely many particles at each site in given finite subset G0 ⊆ G. We will choose indices and marks randomly as follows: • sample a Poisson field N on G × [0, ∞) with intensity n ⊗ λ, where n is counting measure on G, and λ is Lebesgue measure on [0, ∞), • put the random index set © ª (4.27) I := i ∈ [0, ∞) : N (G × {i}) > 0 , • and let for i ∈ I, xi be the unique mark in G such that N ((xi , i)) > 0.

(4.28) Let then (4.29)

© ª P0G := ({i}, xi ); i ∈ I ∈ ΠI,G . G

We denote by PΛ,P0 the probability distribution governing (C, L) with (C0 , L0 ) = P0G on the space of cadlag paths with the Skorohod topology.

118

4. EXAMPLES OF LIMIT TREES II: COALESCENT TREES G

For each non-negative and finite measure Λ and PΛ,P0 -almost all (C, L), there is a (random) metric r(C,L) on I defined by ¡ ¢ © ª (4.30) r(C,L) i, j := inf t ≥ 0 : i ∼Ct j . That ¡ is, ¢ for a realization (C, L) of the spatially structured Λ-coalescent, r(C,L) i, j is the time it needs i and j to coalesce. Notice that r(C,L) is an ultra-metric on I, almost surely. We next want to construct the spatially structured Λ-coalescent measure trees as the metric space (U (C,L) , r(C,L) ) equipped with the “uniform distribution” on partition elements with a mark in a fixed finite subset G0 ⊆ G. Define for each ρ ∈ (0, ∞), Hρ to be the map which takes a realization of the ΠI,G -valued coalescent and a finite subset G0 ⊆ G and maps it to (an isometry class of) a metric measure space as follows: (4.31) ´ ³ X 0 , G0 . HρG : (C, L) 7→ U (C,L) , r(C,L) , µ(C,L) := ρ1 δ i ρ 0 i∈I∩[0,ρ],L({i})∈G

Put then (4.32)

¡ 0 0¢ G QΛ,ρ,G := HρG ∗ PΛ,P0 .

The following result generalizes the characterization of existence and uniqueness of (non-spatial) Λ-coalescent trees to the spatially structured Λcoalescent measure tree spanned by the partition elements with marks in G0 . Recall the “dust-free” property from (4.6) (see also Remark 4.1.4). Theorem 4.2.6 (The spatial(ly structured) Λ-coalescent measure tree). 0 The family {QΛ,ρ,G ; ρ ∈ (0, ∞)} converges in the weak topology with respect to the Gromov-weak topology if and only if the “dust-free”-property holds. (C,L),G0

Remark 4.2.7. We want to point out that the measures µρ , for ρ ∈ (0, ∞), are not probability measures. However in analogy to the weak topology on the space of finite measures we can define the Gromov-weak topology on the space of metric (not necessarily probability) measure spaces by requiring in addition to convergence of all bounded continuous monomials of degree n ≥ 2 that also the total masses (which can be considered as the monomial of degree 1 and with the constant 1 playing the role of a test function) converge. By the law of large number µρ (I) → 1, as ρ → ∞. Hence by Remark 2.4.3(ii) pre-compactness characterizations are essentially the same as for probability measures and in the following we therefore ignore (C,L),G0 the fact that µρ is only “almost” a probability measure. ¤ Definition 4.2.8 (The Λ-coalescent measure tree). Assume that the dust-free property holds. The spatially structured Λ-coalescent measure tree 0 0 QΛ,G is the limit of the family {QΛ,ρ,G ; ρ ∈ (0, ∞)}.

4.2. SPATIALLY STRUCTURED Λ-COALESCENT TREES

119

Proof of Theorem 4.2.6. For existence we will once more apply the characterization of tightness as given in Theorem 2.8.2, and verify the two conditions. (i) Given i, j ∈ I and P ∈ ΠI,G , recall from (4.8) that P i denotes the partition element which contains i and write: • τ i,j := inf{s ≥ 0 : i ∼Cs j} for the time the initial partition elements {i} and {j} need to coalesce (in particular, r(C,L) (i, j) = τ i,j ), • σ i,j := inf{s ≥ 0 : Ls (Csi ) = Ls (Csj )} for the first time the partition elements containing i and j share the same mark (in particular, σ i,j ≤ τ i,j ), • σ ˜ x,y for the hitting time of two random walks started in x and y (in particular, σ i,j equals in distribution σ ˜ L0 ({i}),L0 ({j}) ), • G for a shifted geometric random variable with success probability λ2,2 i,j and σ i,j , and 2+λ2,2 independent of τ • {τk ; k ∈ N} for a family of independent random variables which are all distributed as the length of the (almost surely finite) excursion away from 0 and independent of τ i,j , σ i,j and G. i,j Then to above by PG it is clear, that τ can be stochastically bounded i,j σ + k=0 τk , which is almost surely finite. Moreover, σ i,j is stochastically bounded to above by maxx,y∈G σ ˜ x,y which again is almost surely finite. © PG ª ε cε Fix ε > ©0. Then we can choose cε such that P ≤ 2 and k=0 τk ≥ 2 ª maxx,y∈G0 P σ ˜ x,y ≥ c2ε ≤ 2ε . Therefore by definition, for all ρ ∈ (0, ∞), ¤ 0£ QΛ,ρ,G wX [cε , ∞) ¤ G£ ≤ PΛ,P0 µ(C,L) {(i, j) : τ i,j ≥ cε } ρ (4.33)

≤P

Λ,P0G

G ©X £ (C,L) cε ª cε ¤ i,j τk ≥ µρ {(i, j) : σ ≥ } + P 2 2 k=0

© x,y © cε ª ≤ sup P σ ˜ ≥ +P 2 x,y∈G0

G X k=0

τk ≥

cε ª 2

≤ ε, 0

which proves the tightness of the family {QΛ,ρ,G [wX ]; ρ ∈ (0, ∞)}. (ii) Fix ε > 0. Given a realization (C, L) denote by I ε the set of all indices of initial partition elements which have not moved during the time interval [0, − log (1 − ε)). In particular, by the strong law of large numbers, (4.34)

lim µ(C,L),G (I ε ) = ε, ρ

ρ→∞

PΛ,P0 -almost surely. Write for all x ∈ G0 , © ª (4.35) I ρ,x\ε := i ∈ supp(µ(C,L) ) \ I ε : L0 ({i}) = x . ρ

120

4. EXAMPLES OF LIMIT TREES II: COALESCENT TREES ε

G

ε

Notice that under PΛ,P0 the process (C x\ 2 , Lx\ 2 ) obtained by restrictε ing (C, L) to I ρ,x\ 2 is a (non-spatial) Λ-coalescent during the time period (0, − log (1 − 2ε )) ⊃ (0, 2ε ), for all x ∈ G0 . Hence it follows from Theorem 4.2.6 together with Theorem 2.8.2 that the dust-free property is equivalent to the existence of a δε ∈ (0, 1) (for any ε > 0) such that h ¡ ¢i ε ε ε 0 (4.36) max sup QΛ,ρ vδε HρG ((C ρ,x\ 2 , Lρ,x\ 2 )) ≤ . x∈G ρ∈(0,∞) 2 Then for all ρ ∈ (0, ∞), (4.37)h ¡ ¢i 0 QΛ,ρ vδε HρG ((C, L)) h n oi ª 0© 0 0 ¯ 0 = QΛ,ρ inf ε0 > 0 : µρ(C,L),G u ∈ U (C,L) : µ(C,L),G ( B (x)) ≤ δ ≤ ε ε ε ρ h n ¢ ¡ ε 0 ≤ QΛ,ρ inf ε0 > 0 : µρ(C,L),G I 2 ≤ ε0 ; oi ª 0© ρ,x\ 2ε (C,L),G0 ¯ 0 0 0 µ(C,L),G u ∈ I : µ ( B (x)) ≤ δ ≤ ε , ∀x ∈ G ε ε ρ ρ ε ≤ , 2 0

by (4.35) and (4.36). Hence the sequence QΛ,ρ,G [vδ (X )] converges to zero as δ tends to zero uniformly in ρ ∈ (0, ∞) if and only if the dust-free property holds. Summarizing (i) and (ii), we have shown that sequence (QΛ,ρ )ρ∈(0,∞) has a limit if and only if the dust-free property holds. Uniqueness of the limit points follows as in the proof of Theorem 4.1.3 from the projective property, i.e. restricting the observation to a tagged subset of initial individuals is the same as starting in this restricted initial state. ¤ 4.3. Scaling limit of spatial Λ-coalescent trees on Zd , d ≥ 3 In this section we further restrict the setting in the following way: • The graph G (as well as the subgraph G0 ) is the d-dimensional torus (4.38)

GN := [−N, N ]d ∩ Zd ⊆ Zd ,

where d ≥ 3 is fixed. • The migration is the corresponding random walk on the torus with periodic boundary conditions, i.e., given the migration kernel a(x, y) from (4.14) and (4.15), we consider the random walk kernel X (4.39) aN (x, y) = a(x, z), x, y ∈ GN , z∼y

where here ∼ denotes equivalence modulo 2N +1 in each coordinate.

4.3. SCALING LIMIT OF SPATIAL Λ-COALESCENT TREES ON Zd , d ≥ 3

(4.40)

121

We restrict our attention to random walk kernels with X¡ ¢ a(0, x) + a(x, 0) k x k2+d < ∞. x∈Zd

• We assume that Λ is a finite measure on B([0, 1]) which satisfies Condition (4.12), i.e., (4.41)

the Λ-coalescent comes down from infinity. In particular, λ2,2 > 0.

Remark 4.3.1 (Coming down from infinity in the spatial setting). Recall from Remark 4.1.7 that Condition (4.12) is equivalent to the property that the corresponding non-spatial Λ-coalescent comes down from infinity. This is also true in the spatial setting. Proposition 11 in [LS06] even states that the spatially structured Λ-coalescent comes down from infinity in an uniform way, i.e., if © ª (4.42) T∞ := inf t ≥ 0 : #Ct = 1 then the following are equivalent: (a) Condition (4.12) holds. GN

(b) PΛ,P0 [T∞ ] < ∞. (c) QΛ,GN ∈ M1 (Uc ).

¤

We are concerned with the convergence of the suitably rescaled family of spatially structured Λ-coalescent trees on GN towards the (non-spatial) Kingman-coalescent measure tree. As it will turn out the interesting time scale is given by the order of time which two typical partition elements need to first hit to then be able to coalesce (with a delay). It is known that in d ≥ 3 two random walk particles on a large torus which started at randomly sampled points meet after time of the order of the volume. Therefore in the present set-up the time scale is given by the map defined by (4.43)

βN : t 7→ (2N + 1)d t,

t ∈ [0, ∞).

Recall that under the above assumptions, a random walk on Zd , d ≥ 3, with kernel a(·, ·) is transient. In particular, X (4.44) g := a(k) (0, 0) < ∞. k∈N

In more detail the mean time until two partition elements interact is typically of order κ times the volume, where ¡ g ¢−1 (4.45) κ := λ−1 . 2,2 + 2

122

4. EXAMPLES OF LIMIT TREES II: COALESCENT TREES

Remark 4.3.2 (Probabilistic interpretation of κ). Let I := {1, 2}, and (C, L) the spatially structured Kingman coalescent (with rate λ2,2 ) started in C0 = {{1}, {2}} with L0 ({1}) = L0 ({2}) = 0. Then © ª (4.46) κ = λ2,2 · Pλ2,2 δ0 ,(C0 ,L0 ) #C∞ = 1 . ¤ Recall the law Qκδ0 of the (rate κ) Kingman coalescent tree from Definition 4.1.5 and the law QΛ,GN from Definition 4.2.8. Put ¡ ¢ e Λ,N := βN QΛ,GN , (4.47) Q ∗

where, as usual, multiplication of a metric measure space (X, r, µ) by a factor c gives the metric measure space (X, cr, µ). Theorem 4.3.3 (Convergence to the Kingman measure tree). (4.48)

e Λ,N =⇒ Qκδ0 . Q N →∞

Before we can prove Theorem 4.3.3 we need two preparatory results. The first proposition states that the suitably rescaled spatially structured Λ-coalescent started with finitely many partition elements within mutual distance of the order of the side length of the torus converge to the (non-spatial) Kingman coalescent. The following result is Proposition 14 in [LS06]. The statement generalizes Lemma 7.3 in [GLW05] in two directions. First it extends the consideration of spatially structured Kingman coalescents to spatially structured Λ-coalescents which come down from infinity. Secondly, the statement in [GLW05] is phrased for marginals. Proposition 4.3.4. Fix n ∈ N and a sequence (aN )N ∈N which tends to ∞ slowly enough that aNN → 0, as N → ∞. Let (C0 , L0 ) ∈ Π{1,2,...,n},GN be such that C0 = {{1}, ..., {n}} and √ (4.49) k L0 ({i}) − L0 ({j}) k∈ [aN , dN ], for all 1 ≤ i 6= j ≤ n. Then ´ ¡ (4.50) CβN (t)

¡ ¢ =⇒ ρn Kκt t≥0 ,

t≥0 N →∞

where (Kt )t≥0 is the Kingman coalescent (with index set N). Remark 4.3.5 (Heuristics on Proposition 4.3.4). The scale factor κ can be explained as follows. Restrict on two typical initial partition elements on the (large) torus GN . As long as they don’t feel the boundary they try to to coalesce as they would do on Zd . If they do not succeed they start wrapping around the torus. This probability is given by ¢−1 © ª ¡ g (4.51) Pλ2,2 δ0 ,(C0 ,L0 ) #C∞ = 1 = λ2,2 + 1 . 2

4.3. SCALING LIMIT OF SPATIAL Λ-COALESCENT TREES ON Zd , d ≥ 3

123

Given that our two partition elements have not coalesced without wrapping, they will soon after wrapping forget all information about their initial spatial distance and become uniformly distributed on the torus. They now need time of the order of the volume to meet and possibly coalesce. The time intervals during which the particles are within distance O(1) are very short when compared to the time intervals until wrapping around. This means that the two elements under focus are now trying to coalesce on time scale βN (·) at rate λ2,2 according to the Kingman coalescent. If we restrict to initially a finite number of initial partition elements they will soon get distributed uniformly on the torus. Since on time scale βN (·) we won’t find more than two partition elements at the same site, also an ensemble of finitely many elements will perform on time scale βN (·) a rate λ2,2 Kingman coalescent. ¤ We will apply Proposition 4.3.4 to obtain “f.d.d.”-convergence in Theorem 4.3.3. Corollary 4.3.6 (“F.d.d.”-convergence). For all polynomials Φ ∈ Π, £ ¤ £ ¤ e Λ,N Φ =⇒ Qκδ0 Φ . (4.52) Q N →∞

Proof. Fix a monomial Φ with degree n ∈ N and test function φ ∈ n ¯ C((R+ )( 2 ) ). Given ρ ∈ (0, ∞) and (C, L) ∈ ΠI∩(0,ρ],GN , denote by (4.53) J ρ,(C,L) © ª := (i1 , ..., in ) ∈ (I ∩ (0, ρ])n : {L0 ({i1 }), ..., L0 ({in })} satisfies (4.49) . Then by Proposition 4.3.4, for all ρ ∈ (0, ∞), h X ¡ ³¡ r(C,L),GN (i , i ) ¢ ´i ¢ ρ l k Λ,ρ,GN (C,L),GN ⊗n Q µρ {i} φ 1≤l 2n+2 η −1 PΛ,GN [T∞ ] ≤ 2−n , 4 by the Markov inequality. Moreover, for all realization (C, L) and ε > 0 recall from (4.34) the set I ε of all indices of initial partition elements which have not moved during the time interval [0, − log (1 − ε)). In particular, since particle jump at rate 1, ¡ ¢ (4.59) µρ(C,L),GN I ε = 1 − ε. Therefore by Proposition 4.3.7, (4.60) ¤ © PΛ,GN [#ρI 2−n C− log (1−2−n ) ] ª £ PΛ,GN #ρI 2−n CβN (2−n ) ≤ cd max 1, βN (2−n ) + log (1 − 2−n ) ª © PΛ [C− log (1−2−n ) ] = cd max 1, #GN −n −n βN (2 ) + log (1 − 2 ) −→ 2n · cd · PΛ [#C− log (1−2−n ) ],

N →∞

and hence PΛ,GN [#ρI 2−n CβN (2−n ) ] ≤ 2n+1 · cd · PΛ [#C− log (1−2−n ) ].

(4.61)

for all n ∈ N and N sufficiently large. Therefore ª © PΛ,GN #ρI 2−n CβN (2−n ) > 8 · 4n · η −1 · cd · PΛ [#C− log (1−2−n ) ] (4.62)

PΛ,GN [#ρI 2−n CβN (2−n ) ] 8 · 4n · cd · PΛ [#C− log (1−2−n ) ] η ≤ 2−n . 4 ≤

4.4. SCALING LIMIT OF SPATIAL KINGMAN COALESCENT TREES ON Z2

125

Put for each n ∈ N and η > 0, © 4 · 2n Λ,G ª 8 · 4n (4.63) Kn,η := max P N [T∞ ]; · cd · PΛ [C− log (1−2−n ) ] , η η and let then \© Γη := (U, r, µ) ∈ U : ∃ Un ⊆ U such that µ(Un ) ≥ 1 − 2−n ; n∈N

(4.64)

diam(Un ) ≤ Kn,η ;

ª Un can be covered by Kn,η 2−n -balls . Notice that Γη is compact by Theorem 2.4.1(c). Moreover, by (4.58), (4.59) and (4.62), (4.65) ¡ ¢ e Λ,N Γη Q ³\© ª´ = PΛ,GN ∃I 0 ⊆ I : #ρI 0 CβN (2−n ) ≤ Kn,η ; #ρI 0 CβN (Kn,η ) = 1 n∈N

≥P

Λ,GN

³\© ª´ #ρI 2−n CβN (2−n ) ≤ Kn,η ; #ρI 2−n CβN (Kn,η ) = 1 n∈N

X³ ª © ≥1− PΛ,GN #ρI 2−n CβN (2−n ) > Kn,η + n∈N

© ª´ + PΛ,GN #ρI 2−n CβN (Kn,η ) > 1 ´ X ¡η η 2−n + 2−n ≥1− 4 4 n∈N

= 1 − η, which establishes (4.57) and therefore finishes the proof. Uniqueness of the limit follows from Corollary 4.3.6 together with Proposition 2.1.8. ¤ 4.4. Scaling limit of spatial Kingman coalescent trees on Z2 In this section we restrict the setting of Section 4.2 in the following way: • The graph G is the two-dimensional lattice Z2 . • Fix α ∈ (0, 1], and consider for t > 0, £ α α ¤2 (4.66) Gαt := − t 2 , t 2 ∩ Z2 ⊂ G. • Recall the migration kernel a(x, y) from (4.14) and (4.15) and assume that a(·, ·) has finite exponential moments, i.e., X (4.67) eλ1 z1 +λ2 z2 a(0, z) < ∞, z1 ,z2 ∈Z2

for all λ1 , λ2 ∈ R.

126

4. EXAMPLES OF LIMIT TREES II: COALESCENT TREES

• Choose the coalescent to be the Kingman coalescent with rate γ, i.e., (4.68)

Λ(dx) := γδ0 .

We are concerned with the convergence of the suitably rescaled family of spatially structured Λ-coalescent trees on Gαt towards the (non-spatial) Kingman-coalescent measure tree (on a logarithmic scale). The rescaling is motivated by a well-known result by Erd¨os and Taylor [ET60] for planar random walks with finite variance: if σ is the first hitting time of the origin of a two-dimensional random walk, then ª α α/2 © (4.69) lim Pxt σ > tβ = ∧ 1, t→∞ β for all α, β ∈ [0, 1], and all x ∈ R2 \ {(0, 0)} (see, for example, Proposition 1 in [CG86]). Remark 4.4.1. In particular, the right hand side of (4.69) does not depend on x ∈ R2 \{(0, 0)}. Due to this peculiar (specific to d = 2) property, the behavior of the spatial coalescent started in Gαt and observed at time tβ , asymptotically as t → ∞, depends only on the logarithmic scales α and β, while all the finer distinctions are washed out. ¤ by

Define the scaling maps λα : [0, ∞) → [0, ∞) and τtα : [0, ∞) → [0, ∞)

(4.70)

λα : β 7→ − log

¡β ¢ ∨1 , α

and (4.71)

r

τtα : r 7→ tα·e .

In particular, since (λα )−1 (r) = α · er , for all r > 0, we have (τtα )−1 (r) = λα (logt (r)), and therefore (4.69) applied to y := (τtα )−1 (tβ ) = λα (β) (or equivalently β = (λα )−1 (y)) yields that ª α α/2 © lim Pxt (τtα )−1 (σ) > y = α −1 = e−y . (4.72) t→∞ (λ ) (y) α/2

In words, (τtα )−1 (σ) (under the laws Pxt ) converges as t → ∞ to a rate 1 exponentially distributed random variable (on a logarithmic time scale). Recall the law Qδ0 of the (rate 1) Kingman coalescent tree from Definiα tion 4.1.5 and the law Qγδ0 ,Gt of the (rate γ) spatially structured Kingman coalescent tree (spanned by the initial partition elements with marks in Gαt ) from Definition 4.2.8. Put ¡ ¢ e γδ0 ,t,α := τ α ◦ λα Qγδ0 ,Gαt , (4.73) Q t

and (4.74)



¡ ¢ e δ0 := λα Qδ0 , Q α ∗

4.4. SCALING LIMIT OF SPATIAL KINGMAN COALESCENT TREES ON Z2

127

where a map f : (0, ∞) → (0, 1] sends an ultra-metric measure space (X, r, µ) to the ultra-metric space (X, f ◦ r, µ) with ¡ ¢ ¡1 ¢ (4.75) f ◦ r u1 , u2 := 2 · f r(u1 , u2 ) 2 Remark 4.4.2. To understand the factors 2 and 21 , recall from Example 1.5.8 that ultra-metric spaces (U, r) can be embedded in to an R-tree (X, r) and think then for u1 , u2 ∈ U of u1 ∧ u2 as the unique point in X such that r(u1 , u1 ∧ u2 ) = r(u2 , u1 ∧ u2 ) = 21 r(u1 , u2 ). The rescaling (4.75) can then be read as rescaling the two distances r(u1 , u1 ∧ u2 ) and r(u2 , u1 ∧ u2 ) on the f -scale. ¤ Theorem 4.4.3 (Convergence to the Kingman measure tree). For all α ∈ (0, 1], (4.76)

e γδ0 ,t,α =⇒ Q e δ0 . Q α t→∞

Remark 4.4.4 (Open problem: Pathwise convergence). It is an open e γδ0 ,t,α )α∈(0,1] converge to a limit process problem to show that the family (Q e δ0 . We conjecture that the whose one-dimensional marginals are equal to Q α limit process is given by suitably rescaled tree-valued Fleming-Viot dynamics which we construct in Chapter 7. In forthcoming work [GLW] we provide the necessary tightness estimates. ¤ Before we can prove Theorem 4.4.3 we need two preparatory results. The first proposition states in analogy of Proposition 4.3.4 that the rescaled spatially structured Kingman coalescent started with finitely many partition elements within mutual distance of the order of the side length of the torus converge to the (non-spatial) Kingman coalescent on the logarithmic scale. The following result is taken from Proposition 5.1 in [GLW07]. Proposition 4.4.5 (Finite sparse coalescents: large time scales). Fix α n ∈ N, α ∈ (0, 1] and c > 0, and let (C0 , L0 ) ∈ Π{1,2,...,n},Gt be such that C0 = {{1}, ..., {n}}, and £ α¤ α (4.77) k L0 ({i}) − L0 ({j}) k∈ (c log t)−1 · t 2 , (c log t) · t 2 , for all 1 ≤ i 6= j ≤ n. Then ¡ ¡ ¢ ¢ (4.78) C(τtα ◦λα )(β) β∈[α,∞) =⇒ ρn Kλα (β) β∈[α,∞) , t→∞

where (Kt )t≥0 is the Kingman coalescent (with index set N). As done before we will again apply Proposition 4.4.5 to obtain “f.d.d.”convergence in Theorem 4.4.3.

128

4. EXAMPLES OF LIMIT TREES II: COALESCENT TREES

Corollary 4.4.6 (“F.d.d.”-convergence). For all polynomials Φ ∈ Π and α ∈ (0, 1], £ ¤ £ ¤ e δ0 Φ . e γδ0 ,t,α Φ =⇒ Q (4.79) Q α t→∞

Proof. The proof follows the same line of argument as the proof of Corollary 4.3.6. ¤ We will also rely on the following tightness result which is stated in Proposition 6.1 in [GLW07]. Proposition 4.4.7 (Uniformly bounded expectation on logarithmic scale). There are finite constants M and t0 such that for all t ≥ t0 , satisfying α ∈ (0, ∞), and β ∈ (α, ∞), (4.80)

γδ0 ,Gα t

P

© £ ¤ #C(τtα ◦λα )(β) ≤ M · max

α α Pγδ0 ,Gt [#C2 ] ª , ,1 . 2(β − α) #Gαt

Proof of Theorem 4.4.3. We will proceed similarly to the proof of Theorem 4.3.3. Fix α ∈ (0, 1] and η > 0. For existence of limit points we will construct a compact set Γη ⊂ M such that now (4.81)

inf

t∈(0,∞)

e γδ0 ,t,α (Γη ) ≥ 1 − η. Q

For all realization (C, L) and ε > 0 recall I ε from (4.34). Since #ρI ε < ∞, and any two random walks on Z2 with kernel a(·, ·) satisfying Assumption (4.67) meet (and coalesce) in finite time, © ª ε (4.82) T∞ := inf s ≥ 0 : #ρI ε Cs = 1 < ∞, α

for all ε > 0, Pγδ0 ,Gt almost surely. In particular, for each n ∈ N and η, we can choose Ln,η ∈ (0, ∞) such that ª η α © 2−n (4.83) Pγδ0 ,Gt T∞ > Ln,η ≤ 2−n . 4 Moreover, by Proposition 4.4.7 applied to β = 2n α, for all n ≥ 1, α

(4.84)

P

γδ0 ,Gα t

£ ¤ © Pγδ0 ,Gt [#ρI 2−n C2 ] ª ,1 #ρI 2−n C(τtα ◦λα )(2−n ) ≤ M · max #Gαt = M · Pγδ0 [#K2 ],

for all n ∈ N and t sufficiently large. Therefore ª η 4 · 2n α© M · Pγδ0 [#K2 ] ≤ 2−n . (4.85) Pγδ0 ,Gt #ρI 2−n C(τtα ◦λα )(2−n ) > η 4

4.4. SCALING LIMIT OF SPATIAL KINGMAN COALESCENT TREES ON Z2

129

Put now for each n ∈ N and η > 0, © ª 4 · 2n (4.86) Kn,η := max Ln,η , M · Pγδ0 [#K2 ] , η and let then Γη be the compact set defined in (4.64). Moreover, by (4.59), (4.83) and (4.85), (4.87) ¡ ¢ e γδ0 ,t,α Γη Q ³\© ª´ α 2−n = Pγδ0 ,Gt ∃I 0 ⊆ I : #ρI 0 C(τtα ◦λα )(2−n ) ≤ Kn,η ; T∞ ≤ Kn,η n∈N

γδ0 ,Gα t

≥P

³\©

−n

2 #ρI 2−n C(τtα ◦λα )(2−n ) ≤ Kn,η ; T∞

≤ Kn,η

ª´

n∈N

≥1−

X³ n∈N

≥1−

ª ª´ α © 2−n α© > Kn,η Pγδ0 ,Gt #ρI 2−n C(τtα ◦λα )(2−n ) > Kn,η + Pγδ0 ,Gt T∞

´ η 2−n + 2−n 4 4

X ¡η

n∈N

= 1 − η, for all sufficiently large t, which establishes (4.81) and therefore finishes the proof. Once more, uniqueness of the limit follows from uniqueness of the finite dimensional distributions given by Corollary 4.4.6 together with Proposition 2.1.8. ¤

CHAPTER 5

Root growth and Regrafting The Aldous–Broder algorithm is a Markov chain on the space of rooted combinatorial trees with N vertices that has the uniform tree as its stationary distribution. In this chapter we construct and study the so-called root growth and regrafting dynamics which is a Markov process on the space of all rooted compact real trees that has the continuum random tree as its stationary distribution and arises as the scaling limit of the Aldous–Broder chain as N → ∞. Before we outline the chapter we give a more detailed description and motivation of the dynamics. Given an irreducible Markov matrix P with state space V , there is a natural probability measure on the collection of combinatorial trees with vertices labeled by V that assigns mass Y (5.1) C −1 P(x, y) to the tree T , where C is a normalization constant and the product is over pairs of adjacent vertices (x, y) in T ordered so that y is on the path from the root to x. For example, if P(x, y) ≡ 1/|V | for all x, y ∈ V , (so that the associated Markov chain consists of successive uniform random picks from V ), then the distribution (5.1) is uniform on the set of |V ||V |−1 rooted combinatorial trees labeled by V . The Aldous-Broder algorithm [AT89, Bro89, Ald90] is a tree-valued Markov chain that has the distribution in (5.1) as its stationary distribution. The discrete time version of the algorithm has the following transition dynamics (see the left hand side of Figure 5.1 for an illustration). • Pick a vertex υ at random according to P(ρ, ·), where ρ is the current root. • If υ = ρ, do nothing. • If υ 6= ρ: – Erase the edge connecting υ to the unique vertex adjacent to υ and on the path from ρ to υ. – Insert a new edge between υ and ρ. – Designate υ as the new root. Since this Markov chain is aperiodic, irreducible on the sub-spaces of rooted trees with a fixed number of vertices and symmetric it is clear that it converges in distribution to the uniform distribution on the space of rooted trees with a given number of vertices. 131

132

5. ROOT GROWTH AND REGRAFTING

r

ρ1 rP r ¡

¡

PP

r

r P Pu a ¡ ¡ r¡

rP r P ¡ PPP ρ2 PPr ¡ au ¡ ¡ r r¡

r u r r @ @¡ r ¡ρ1 u r @r @

r

r¡ ¡

ρ1

@r ρ2 r

ρ r 3

rP P P r

rP P P rP P

r PP PPr ¡ ¡ bu¡

r PP P

Pr PP PP r Pr ρ4

u

r ¡ r ¡ ρ1

r ρ2 ¡

r ρ3 r r¡ ¡

ρ1

r ρ2 r ρ3 r @ @r ρ4

Figure 5.1. illustrates how the Aldous-Broder chain evolve. We can, of course, rephrase the Broder-Aldous chain in a way which stresses it as an rooted tree valued process (see the right hand side of Figure 5). To do so, assume we are starting with (T, r, ρ) ∈ Troot such that T has exactly n vertices (i.e., points x ∈ T such that T \ {x} does either not get disconnected or gets disconnected into more than 2 components). • Pick a vertex υ ∈ T at random according to the uniform distribution on the set of vertices. • If υ = ρ, do nothing. • If υ 6= ρ: – Prune off the sub-tree above υ, erase the edge which is adjacent to υ and lies on [ρ, υ]. – Insert a new edge adjacent to ρ and pointing away from the rest of the tree. Let its end be the new root. – Regraft the pruned sub-tree by gluing together the new root with υ.

5. ROOT GROWTH AND REGRAFTING

133

We now interested in a suitable rescaling of these dynamics. Suggested by (3.22), we know for the one-dimensional distributions to converge, we need to rescale edge lengths by a factor √1n . To find the right rescaling for the time, recall the notion of rooted subtrees and trimmings from Section 1.9. For a fixed η > 0, we want to observe the dynamics of the trimmed subtree Rbη √1 c (T ) (assuming our eyes can only focus on things which are far enough n

from the fringy margin, the rest - still there - will not be visible for us). Letting the Broder-Aldous chain run, there are two possible scenarios. • with probability √1n µT (Rbη √1 c (T )) the picked vertex belongs to n

the visible sub-tree and we observe that the (sub)subtree above this point gets pruned off (this way losing one edge of side length 1 n ) and gets regrafted to the current root via an additional (and from now on visible) adjoining edge of side length √1n ,

• while with probability 1 − √1n µT (Rbη √1 c (T )) a for us invisible subn

tree from T \ Rbη √1 c (T ) gets regrafted to the root via an additional n

(and from now on visible) adjoining edge of side length

√1 . n

This suggests that a law of large numbers holds for √ the second ingredient of the dynamics if we let run the chain at the scale n. Moreover, if we rescale time and edge length as suggested the following limit dynamics is expected. Definition 5.0.8 (Root growth with regrafting). The root growth with regrafting dynamics is a strong Markov process X with values in Troot such that if X0 = T ∈ Troot , then for all η > 0, X η := (Rη (Xt ))t≥0 starts in Rη (T ) and evolves according to the following dynamics: • Root growth. The root grows at speed 1. η • Regrafting. At rate µX· (dx) the sub-tree above a point x ∈ X·η falls off and gets regrafted with the current root. Before we can establish such an convergence result, we need to show that the root growth with regrafting dynamics makes even sense for compact real trees with infinite total length. The plan of this chapter is therefore as follows. We construct the extended root growth with re-grafting process in Sections 5.1 and 5.2 via a procedure that is roughly analogous to building a discontinuous Markov process in Euclidean space as the solution of a stochastic differential equation with respect to a sufficiently rich Poisson noise. This approach is particularly well-suited to establishing the strong Markov property. In Section 5.3 we will rely on Aldous’s line-breaking construction to show that the root growth and regrafting dynamics converges to the Brownian CRT. We prove in Section 5.4 that the extended root growth with re-grafting process is recurrent. We verify that the extended process has a Feller semigroup in Section 5.5, and show in Section 5.6 that it is a rescaling limit of the Markov chain appearing in the Aldous–Broder algorithm

134

5. ROOT GROWTH AND REGRAFTING

for simulating a uniform rooted tree on some finite number of vertices. We devote Section 5.7 to a discussion on the Rayleigh process described above. 5.1. A deterministic construction We are now ready to begin in earnest the construction of the Troot -valued Markov process, X, having the root growth with re-grafting dynamics. Fix a tree (T, r, ρ) ∈ Troot . This tree will be the initial state of X. Following the semi-formal description, the “stochastic inputs” to the construction of X will be a collection of cut times and a corresponding collection of cut points. Based on these the strategy is as follows: • Construct simultaneously for each finite rooted subtree T ∗ ¹root T ∗ ∗ a process X T with X0T = T ∗ that evolves according to the root growth with re-grafting dynamics. • Carry out this construction in such a way that if T ∗ and T ∗∗ are ∗ ∗∗ two finite subtrees with T ∗ ¹root T ∗∗ , then XtT ¹root XtT and ∗ ∗∗ the cut points for X T are those for X T that happen to fall on ∗ ∗∗ ∗∗ XτT− for a corresponding cut time τ of X T . Cut times τ for X T ∗ for which the corresponding cut point does not fall on XτT− are not ∗ cut times for X T . • The tree (T, ρ) is a rooted Gromov-Hausdorff limit of finite R-trees with root ρ (indeed, any subtree spanned by a finite ε-net and ρ is finite and has rooted Gromov-Hausdorff distance less than ε from (T, ρ)). In particular, (T, ρ) is the “smallest” rooted compact R-tree that contains all of the finite rooted subtrees of (T, ρ). • Because of the consistent projective nature of the construction, we can define Xt := XtT for t ≥ 0 as the “smallest” element of Troot ∗ that contains XtT , for all finite trees T ∗ ¹root T . It will be convenient for establishing features of the process X such as the strong Markov property to introduce randomness later and work initially in a setting where the cut times and cut points are fixed. There are two types of cut points: those that occur at points which were present in the initial tree T and those that occur at points which were added due to subsequent root growth. Accordingly, we consider two countable subsets π0 ⊂ R++ ×T o and π ⊂ {(t, x) ∈ R++ × R++ : x ≤ t} (compare Figure 5.2). (Once again we note that we are moving backwards and forwards between thinking of T as a metric space or as an equivalence class of metric spaces. As we have written things here, we are thinking of π0 being associated with a particular class representative, but of course π0 corresponds to a similar set for any representative of the same equivalence class by mapping across using the appropriate root invariant isometry.) Assumption 5.1.1 (Nice point processes). Suppose that the sets π0 and π have the following properties.

5.1. A DETERMINISTIC CONSTRUCTION

r

To

6

135

r

r r r

r

x

6

- t

¡ ¡ r

r

¡

¡

¡

¡

r

¡

- t

Figure 5.2. illustrates the cut appearing in the initial tree (above) and in everything which will be introduced due to root growth (below). (a) For all t0 > 0, each of the sets π0 ∩({t0 }×T o ) and π ∩({t0 }×]0, t0 ]) has at most one point and at least one of these sets is empty. (b) For all t0 > 0 and all finite subtrees T 0 ⊆ T , the set π0 ∩(]0, t0 ]×T 0 ) is finite. (c) For all t0 > 0, the set π ∩ {(t, x) ∈ R++ × R++ : x ≤ t ≤ t0 } is finite. Remark 5.1.2. Conditions (a)–(c) of Assumption 5.1.1 will hold almost surely if π0 and π are realizations of Poisson point processes with respective intensities λ ⊗ µ and λ ⊗ λ (where λ is Lebesgue measure), and it is this random mechanism that we will introduce later to produce a stochastic process having the root growth with re-grafting dynamics. It will be convenient to use the notations π0 and π to also refer to the integer-valued measures that are obtained by placing a unit point mass at each point of the corresponding set. Consider a finite rooted subtree T ∗ ¹root T . It will avoid annoying circumlocutions about equivalence via root invariant isometries if we work with particular class representatives for T ∗ and T , and, moreover, suppose that T ∗ is embedded in T . ∗ Put τ0∗ := 0, and let 0 < τ1∗ < τ2∗ < . . . (the cut times for X T ) be the ∗ ++ points of {t > 0 : π0 ({t} × T ) > 0} ∪ {t > 0 : π({t} × R ) > 0} (see Figure 5.1 for an illustration). ∗

An explicit construction of XtT is then given in two steps:

136

5. ROOT GROWTH AND REGRAFTING

b

To

6

b

b b r

T∗

τ2∗ x

6

¡

- t

τ3∗

¡ ¡ r

r

¡

¡

r

¡

r

¡

τ1∗

- t

τ4∗ Figure 5.3



Step 1 (Root growth) At any time t ≥ 0, XtT as a set is given by the ∗ disjoint union T ∗ t]0, t]. The root of XtT is the point ρt := t ∈]0, t]. The ∗ ∗ ∗ metric rtT on XtT is defined inductively as follows. Set r0T to be the metric ∗ ∗ ∗ on X0T = T ∗ ; that is, r0T is the restriction of r to T ∗ . Suppose that rtT ∗ ∗ has been defined for 0 ≤ t ≤ τn∗ . Define rtT for τn∗ < t < τn+1 by  ∗  rτn∗ (a, b), if a, b ∈ XτTn∗ ,  ∗ (5.2) rtT (a, b) := |b − a|, if a, b ∈]τn∗ , t],  ∗  |a − τn∗ | + rτn∗ (ρτn∗ , b), if a ∈]τn∗ , t], b ∈ XτTn∗ . ∗

Step 2 (Re-Grafting) Note that the left-limit XτT∗ − exists in the rooted n+1 Gromov-Hausdorff metric. As a set this left-limit is the disjoint union (5.3)



∗ ∗ XτTn∗ t]τn∗ , τn+1 ] = T ∗ t]0, τn+1 ],

∗ and the corresponding metric rτn+1 − is given by a prescription similar to (5.2). ∗ Define the (n + 1)st cut point for X T by ( ∗ , a)}) > 0, a ∈ T ∗, if π0 ({(τn+1 ∗ (5.4) pn+1 := ∗ ], if π({(τ ∗ , x)}) > 0. x ∈]0, τn+1 n+1 ∗

∗ Let Sn+1 be the subtree above p∗n+1 in XτT∗

n+1 −

(5.5)



, that is,

∗ ∗ ∗ Sn+1 := {b ∈ XτTn+1 ∗ − , b[ }. − : pn+1 ∈ [ρτn+1

5.1. A DETERMINISTIC CONSTRUCTION

137

∗ Define the metric rτn+1 by

(5.6) ∗ rτn+1 (a, b)  ∗ ,  ∗ if a, b ∈ Sn+1  rτn+1 − (a, b), ∗ ∗ , ∗ if a, b ∈ XτT∗ \ Sn+1 := rτn+1 − (a, b), n+1   ∗ , b ∈ S∗ . rτ ∗ − (a, ρτ ∗ ) + rτ ∗ − (p∗n+1 , b), if a ∈ X T∗∗ \ Sn+1 n+1 τ n+1 n+1 n+1 n+1

∗ XτT∗ n+1

In other words is obtained from ∗ Sn+1 and re-attaching it to the root.

∗ XτT∗ − n+1

by pruning off the subtree

Now consider two other finite, rooted subtrees (T ∗∗ , ρ) and (T ∗∗∗ , ρ) of ∗∗ ∗∗∗ T such that T ∗ ∪ T ∗∗ ⊆ T ∗∗∗ (with induced metrics). Build X T and X T ∗ from π0 and π in the same manner as X T (but starting at T ∗∗ and T ∗∗∗ ). It is clear from the construction that: ∗ ∗∗ ∗∗∗ • XtT and XtT are rooted subtrees of XtT for all t ≥ 0, ∗ ∗∗ ∗∗∗ • the Hausdorff distance between XtT and XtT as subsets of XtT does not depend on T ∗∗∗ , ∗ ∗∗ • the Hausdorff distance is constant between jumps of X T and X T (when only root growth is occurring in both processes). ∗

The following lemma shows that the Hausdorff distance between XtT and ∗∗ ∗∗∗ XtT as subsets of XtT does not increase at jump times. Lemma 5.1.3. Let T be a finite rooted tree with root ρ and metric r, and let T 0 and T 00 be two rooted subtrees of T (both with the induced metrics and root ρ). Fix p ∈ T , and let S be the subtree in T above p (recall (5.5)). Define a new metric rˆ on T by putting   if a, b ∈ S, r(a, b), (5.7) rˆ(a, b) := r(a, b), if a, b ∈ T \ S,   r(a, p) + r(ρ, b), if a ∈ S, b ∈ T \ S. Then the sets T 0 and T 00 are also subtrees of T equipped with the induced metric rˆ, and the Hausdorff distance between T 0 and T 00 with respect to rˆ is not greater than that with respect to r. Proof. Suppose that the Hausdorff distance between T 0 and T 00 under r is less than some given ε > 0. Given a ∈ T 0 , there then exists b ∈ T 00 such that r(a, b) < ε. Because r(a, a ∧ b) ≤ r(a, b) and a ∧ b ∈ T 00 , we may suppose (by replacing b by a ∧ b if necessary) that b ≤ a. We claim that rˆ(a, c) < ε for some c ∈ T 00 . This and the analogous result with the roles of T 0 and T 00 interchanged will establish the result. If a, b ∈ S or a, b ∈ T \ S, then rˆ(a, b) = r(a, b) < ε. The only other possibility is that a ∈ S and b ∈ T \ S, in which case p ∈ [b, a] (for T

138

5. ROOT GROWTH AND REGRAFTING

equipped with r). Then rˆ(a, ρ) = r(a, p) ≤ r(a, b) < ε, as required (because ρ ∈ T 00 ). ¤ Now let T1 S ⊆ T2 ⊆ · · · be an increasing sequence of finite subtrees of T such that n∈N Tn is dense in T . Thus limn→∞ rH (Tn , T ) = 0. Let X 1 , X 2 , . . . be constructed from π0 and π starting with T1 , T2 , . . .. Applying Lemma 5.1.3 yields (5.8)

lim sup dGHroot (Xtm , Xtn ) = 0.

m,n→∞ t≥0

Hence by completeness of Troot , there exists a c`adl` ag Troot -valued process X such that X0 = T and (5.9)

lim sup dGHroot (Xtm , Xt ) = 0.

m→∞ t≥0

A priori, the process X could depend on the choice of the approximating sequence of trees (Tn )n∈N . To see that this is not so, consider two approximating sequences T11 ⊆ T21 ⊆ · · · and T12 ⊆ T22 ⊆ · · · . For k ∈ N, write Tn3 for the smallest rooted subtree of T that contains both Tn1 and Tn2 . As a set, Tn3 = Tn1 ∪ Tn2 . Now let ((Xtn,i )t≥0 )n∈N for i = 1, 2, 3 be the corresponding sequences of finite tree-value processes and let (Xt∞,i )t≥0 for i = 1, 2, 3 be the corresponding limit processes. By Lemma 5.1.3, dGHroot (Xtn,1 , Xtn,2 ) ≤ dGHroot (Xtn,1 , Xtn,3 ) + dGHroot (Xtn,2 , Xtn,3 ) (5.10)

≤ dH (Xtn,1 , Xtn,3 ) + dH (Xtn,2 , Xtn,3 ) ≤ dH (Tn1 , Tn3 ) + dH (Tn2 , Tn3 ) ≤ dH (Tn1 , T ) + dH (Tn2 , T ) −→ 0. n→∞

(Xtn,1 )n∈N

(Xtn,2 )n∈N

Thus, for each t ≥ 0 the sequences and do indeed have the same rooted Gromov-Hausdorff limit and the process X does not depend on the choice of approximating sequence for the initial tree T . 5.2. Introducing randomness In Section 5.1 we constructed a Troot -valued function t 7→ Xt starting with a fixed triple (T, π0 , π), where T ∈ Troot and π0 , π satisfy the conditions of Assumption 5.1.1. We now want to think of X as a function of time and such triples. Let Ω∗ be the set of triples (T, π0 , π), where T is a rooted compact Rtree (that is, a class representative of an element of Troot ) and π0 , π satisfy Assumption 5.1.1. The root invariant isometry equivalence relation on rooted compact Rtrees extends naturally to an equivalence relation on Ω∗ by declaring that two triples (T 0 , π00 , π 0 ) and (T 00 , π000 , π 00 ), where π00 = {(σi0 , x0i ) : i ∈ N} and π000 = {(σi00 , x00i ) : i ∈ N}, are equivalent if there is a root invariant isometry 0 f mapping T 0 to T 00 and a permutation γ of N such that σi00 = σγ(i) and

5.2. INTRODUCING RANDOMNESS

139

x00i = f (x0γ(i) ) for all i ∈ N. We write Ω for the resulting quotient space of equivalence classes. In order to do probability, we require that Ω has a suitable measurable structure. We could do this by specifying a metric on Ω, but the following approach is a little less cumbersome and suffices for our needs. Let Ωfin denote the subset of Ω consisting of triples (T, π0 , π) such that T , π0 and π are finite. We are going to define a metric on Ωfin . Let (T 0 , π00 , π 0 ) and (T 00 , π000 , π 00 ) be two points in Ωfin , where π00 = {(σ10 , x01 ), . . . , (σp0 , x0p )}, π 0 = {τ10 , . . . , τr0 }, π000 = {(σ100 , x001 ), . . . , (σq00 , x00q )}, and π 00 = {τ100 , . . . , τs00 }. Assume that 0 < σ10 < · · · < σp0 , 0 < τ10 < · · · < τr0 , 0 < σ100 < · · · < σq00 , and 0 < τ100 < · · · < τs00 . The distance between (T 0 , π00 , π 0 ) and (T 00 , π000 , π 00 ) will be 1 if either p 6= q or r 6= s. Otherwise, the distance is ³1 ´ (5.11) 1∧ inf dis(Rroot,cuts ) + max |σi0 − σi00 | + max |τj0 − τj00 | , i j 2 Rroot,cuts where the infimum is over all correspondences between T 0 and T 00 that contain the pairs (ρT 0 , ρT 00 ) and (x0i , x00i ) for 1 ≤ i ≤ p. Equip Ωfin with the Borel σ-field corresponding to this metric. For t ≥ 0, let Fto be the σ-field on Ω generated by the family of maps from Ω into Ωfin given by (T, π0 , π) 7→ (Rη (T ), π0T∩ (]0, t] × (Rη (T ))o ), π ∩ {(s, x)W: x ≤ s ≤ t}) for η > 0. As usual, set Ft+ := u>t Fuo for t ≥ 0. Put F o := t≥0 Fto . It is straightforward to establish the following result from Lemma 1.9.2 and the construction of X in Section 5.1, and we omit the proof. Lemma 5.2.1. The map (t, (T, π0 , π)) 7→ Xt (T, π0 , π) from R+ × Ω into is progressively measurable with respect to the filtration (Fto )t≥0 . (Here, of course, we are equipping Troot with the Borel σ-field associated with the metric dGHroot .)

Troot

Given T ∈ Troot , let PT be the probability measure on Ω defined by the following requirements. • The measure PT assigns all of its mass to the set {(T 0 , π00 , π 0 ) ∈ Ω : T 0 = T }. • Under PT , the random variable (T 0 , π00 , π 0 ) 7→ π00 is a Poisson point process on the set R++ × T o with intensity λ ⊗ µ, where µ is the length measure on T . • Under PT , the random variable (T 0 , π00 , π 0 ) 7→ π 0 is a Poisson point process on the set {(t, x) ∈ R++ × R++ : x ≤ t} with intensity λ ⊗ λ restricted to this set. • The random variables (T 0 , π00 , π 0 ) 7→ π00 and (T 0 , π00 , π 0 ) 7→ π 0 are independent under PT . Of course, the random variable (T 0 , π00 , π 0 ) 7→ π00 takes values in a space of equivalence classes of countable sets rather than a space of sets per se, so, more formally, this random variable has the law of the image of a Poisson

140

5. ROOT GROWTH AND REGRAFTING

process on an arbitrary class representative under the appropriate quotient map. For t ≥ 0, g a bounded Borel function on Troot , and T ∈ Troot , set Pt g(T ) := PT [g(Xt )].

(5.12)

˜ η for η > 0 also denote the map from With a slight abuse of notation, let R Ω into Ω that sends (T, π0 , π) to (Rη (T ), π0 ∩ (R++ × (Rη (T ))o ), π). Our main construction result is the following. Theorem 5.2.2. (i) If T ∈ Troot is finite, then (Xt )t≥0 under PT is a Markov process that evolves via the root growth with re-grafting dynamics on finite trees. ˜ η )t≥0 under PT (ii) For all η > 0 and T ∈ Troot , the law of (Xt ◦ R coincides with the law of (Xt )t≥0 under PRη (T ) . (iii) For all T ∈ Troot , the law of (Xt )t≥0 under PRη (T ) converges as η ↓ 0 to that of (Xt )t≥0 under PT (in the sense of convergence of laws on the space of c` adl` ag Troot -valued paths equipped with the Skorohod topology). (iv) For g ∈ bB(Troot ), the map (t, T ) 7→ Pt g(T ) is B(R+ ) × B(Troot )measurable. (v) The process (Xt , PT ) is strong Markov with respect to the filtration (Ft+ )t≥0 and has transition semigroup (Pt )t≥0 . Proof. (i) This is clear from the definition of the root growth and regrafting dynamics. (ii) It is enough to check that the push-forward of the probability measure PT under the map Rη : Ω → Ω is the measure PRη (T ) . This, however, follows from the observation that the restriction of length measure on a tree to a subtree is just length measure on the subtree. (iii) This is immediate from part (ii), the limiting construction in Section 5.1, and part (iv) of Lemma 1.9.2. Indeed, we have that (5.13)

˜ η ) ≤ dH (T, Rη (T )) ≤ η. sup dGHroot (Xt , Xt ◦ R t≥0

(iv) By a monotone class argument, it is enough to consider the case where the test function g is continuous. It follows from part (iii) that Pt g(Rη (T )) converges pointwise to Pt g(T ) as η ↓ 0, and it is not difficult to show using Lemma 1.9.2 and part (i) that (t, T ) 7→ Pt g(Rη (T )) is B(R+ )×B(Troot )-measurable. We omit the details, because we will establish an even stronger result in Proposition 5.5.1. (v) By construction and part (ii) of Lemma 1.9.4, we have for t ≥ 0 and (T, π0 , π) ∈ Ω that, as a set, Xto (T, π0 , π) is the disjoint union T o t]0, t].

5.3. CONNECTION TO ALDOUS’S LINE-BREAKING CONSTRUCTION

141

Put θt (T, π0 , π) ³ © ª := Xt (T, π0 , π), (s, x) ∈ R++ × T o : (t + s, x) ∈ π0 , © ª´ ++ ++ (s, x) ∈ R × R : (t + s, t + x) ∈ π (5.14) ³ © ª = Xt (T, π0 , π), (s, x) ∈ R++ × Xto (T, π0 , π) : (t + s, x) ∈ π0 , © ª´ (s, x) ∈ R++ × R++ : (t + s, t + x) ∈ π . Thus θt maps Ω into Ω. Note that Xs ◦ θt = Xs+t and that θs ◦ θt = θs+t , that is, the family (θt )t≥0 is a semigroup. It is not hard to show that (t, (T, π0 , π)) 7→ θt (T, π0 , π) is jointly measurable, and we leave this to the reader. Fix t ≥ 0 and (T, π0 , π) ∈ Ω. Write µ0 for the measure on T o t]0, t] that restricts to length measure on T o and to Lebesgue measure on ]0, t]. Write µ00 for the length measure on Xto (T, π0 , π). The strong Markov property will follow from a standard strong Markov property for Poisson processes if we can show that µ0 = µ00 . This equality is clear from the construction if T is finite: the tree Xt (T, π0 , π) is produced from the tree T and the set ]0, t] by a finite number of dissections and rearrangements. The equality for general T follows from the construction and part (iii) of Lemma 1.9.4. ¤ 5.3. Connection to Aldous’s line-breaking construction Recall the law of the Brownian CRT from Definition 3.2.1(i). The main result of this section is to show that the root growth dynamics started in the trivial tree (consisting of one point only) converges to Aldous’s Brownian CRT. Proposition 5.3.1. If {ρ} ∈ Troot is the trivial tree then the law of Xt under P{ρ} converges weakly to the law of the Brownian CRT as t → ∞. Recall Aldous’s line-breaking process R = (R)t≥0 from (3.23), (3.26) and (3.24) in Section 3.3. By Theorem 3.3.1, Rt converges weakly to that of the Brownian CRT as t → ∞. For the proof of Proposition 5.3.1, we will show the following stronger result. Proposition 5.3.2. The random finite rooted tree Rτn − has the same distribution as the Xτn − under P{ρ} , for all n ∈ N. Proof. We will rely on a coupling of both processes. For that purpose, we let π ⊂ {(t, x) ∈ R++ × R++ : x ≤ t} be a Poisson point process on {(t, x) ∈ R++ × R++ : x ≤ t} with intensity measure λ ⊗ λ, where again λ denotes Lebesgue measure. Again using the points 0 < τ1 < τ2 < ... being the points of {t > 0 : π({t} × R++ ) > 0} as cut times {τ1 , τ2 , ...}, the process (Tt )t≥0 with root

142

5. ROOT GROWTH AND REGRAFTING

growth and regrafting dynamics starting in the trivial tree T0 := {ρ} can be described as follows (here again the tree at time t will have total edge length t): • Start with the 1-tree (with one end identified as the root and the other as a leaf,) T0 , of length zero. • Let this segment grow at unit speed on the time interval [0, τ1 [, and for t ∈ [0, τ1 [ let Tt be the rooted 1-tree that has its points labeled by the interval [0, t] in such a way that the root is t and the leaf is 0. • At time τ1 sample the first cut point uniformly along the tree Tτ1 − , prune off the piece of Tτ1 − that is above the cut point (that is, prune off the interval of points that are further away from the root t than the first cut point). • Re-graft the pruned segment such that its cut end and the root are glued together. Just as we thought of T0 as a tree with two points, (a leaf and a root) connected by an edge of length zero, we take Tτ1 to be the the rooted 2-tree obtained by “ramifying” the root Tτ1 − into two points (one of which we keep as the root) that are joined by an edge of length zero. • Proceed inductively: Given the labeled and rooted n-tree, Tτn−1 , for t ∈ [τn−1 , τn [, let Tt be obtained by letting the edge containing the root grow at unit speed so that the points in Tt correspond to the points in the interval [0, t] with t as the root. At time τn , the nth cut point is sampled randomly along the edges of the n-tree, Tτn − , and the subtree above the cut point (that is the subtree of points further away from the root than the cut point) is pruned off and re-grafted so that its cut end and the root are glued together. The root is then “ramified” as above to give an edge of length zero leading from the root to the rest of the tree. Recall from Section 3.3 the construction of R = (Rt , rt , µt )t≥0 based on the successive arrival times of a inhomogeneous Poisson process with rate r(dt) = tdt, and build now R = (Rt , rt , µt )t≥0 based on the points {(τi , xτi ); i ∈ N}, where for τ ∈ {t > 0 : π({t} × R++ ) > 0}, xτ denotes the unique element in [0, τ ] such that π({τ } × {xτ }) > 0. The link between these two dynamics for growing trees is Figure 5.4. Let Rn denote the object obtained by taking the rooted finite tree with edge-lengths Rτn − and labeling the leaves with 1, ..., n, in the order as they are added in Aldous’s construction. Let Tn be derived similarly from the rooted finite tree with edge-lengths Tτn − , by labeling the leaves with 1, ..., n in the order that they appear in the root growth with re-grafting construction. It will suffice to show that Rn and Tn have the same distribution. Note that both Rn and Tn are rooted, bifurcating trees with n labeled leaves and edge-lengths. Such a tree Sn is uniquely specified by its shape, denoted by shape(Sn ), which is a rooted, bifurcating, leaf-labeled combinatorial tree,

5.3. CONNECTION TO ALDOUS’S LINE-BREAKING CONSTRUCTION

r r R 0

Rτ1 −

r

Rτ1

Rτ2 −

Rτ2

a ¡

a r T 0

Tτ1 −

143

¡T

¡

Tτ2 −

τ1

@

Tτ2

Figure 5.4. illustrates how the coupled tree-valued processes (Rt ; t ≥ 0) and (Tt ; t ≥ 0) evolve. (The bold dots re-present an edge of length zero, while the small dots indicate the position of the cut point that is going to show up at the next moment.) and by the list of its (2n − 1) edge-lengths in a canonical order determined by its shape, say ¡ ¢ (5.15) lengths(Sn ) := length(Sn , 1), ..., length(Sn , 2n − 1) , where the edge-lengths are listed in order of traversal of edges by first working along the path from the root to leaf 1, then along the path joining that path to leaf 2, and so on. Then, by construction, the common collection of edge-lengths of Rn and of Tn is the collection of lengths of the 2n − 1 subintervals of ]0, τn ] obtained by cutting this interval at the 2n − 2 points n−1 [© ª © (n) ª Xi , 1 ≤ i ≤ 2n − 2 := xτi , τi

(5.16)

i=1 (n) Xi

(n)

where the are indexed to increase in i for each fixed n. Let X0 (n) and X2n−1 := τn . Then (5.17)

(n)

length(Rn , i) = Xi

:= 0

(n)

− Xi−1 ,

and (5.18)

length(Tn , i) = length(Rn , σn,i ),

for all 1 ≤ i ≤ 2n − 1 and for some almost surely unique random indices σn,i ∈ {1, ..., 2n − 1} such that i 7→ σn,i is almost surely a permutation of {1, ..., 2n − 1}. According to [Ald93, Lemma 21], the distribution of Rn may be equivalently characterized as follows:

144

5. ROOT GROWTH AND REGRAFTING

(a) the sequence lengths(Rn ) is exchangeable, with the same distribution as the sequence of lengths of subintervals obtained by cutting ]0, τn ] at 2n − 2 uniformly chosen points {xτn : 1 ≤ i ≤ 2n − 2}; (b) shape(Rn ) is uniformly distributed on the set of all 1 × 3 × 5 × · · · × (2n − 3) possible shapes; (c) lengths(Rn ) and shape(Rn ) are independent. In view of this characterization and (5.18), to show that Tn has the same distribution as Rn it is enough to show that (i) the random permutation {i 7→ σn,i : 1 ≤ i ≤ 2n − 1} is a function of shape(Tn ); (ii) shape(Tn ) = Ψn (shape(Rn )) for some bijective map Ψn from the set of all possible shapes to itself. This is trivial for n = 1, so we assume below that n ≥ 2. Before proving (i) and (ii), we recall that (b) above involves a natural bijection (5.19)

(I1 , . . . , In−1 ) ↔ shape(Rn )

where In−1 ∈ {1, . . . , 2n − 3} is the unique i such that xτn−1 ∈ (n−1) (n−1) (Xi−1 , Xi ). Hence In−1 is the index in the canonical ordering of edges of Rn−1 of the edge that is cut in the transformation from Rn−1 to Rn by attachment of an additional edge, of length τn − τn−1 , connecting the cut-point to leaf n. Thus (b) and (c) above correspond via (5.19) to the facts that I1 , . . . , In−1 are independent and uniformly distributed over their ranges, and independent of lengths(Rn ). These facts can be checked directly from the construction of (Rn )n∈N from (τn )n∈N and (xτn )n∈N using standard facts about uniform order statistics. Now (i) and (ii) follow from (5.19) and another bijection (5.20)

(I1 , . . . , In−1 ) ↔ shape(Tn )

where each possible value i of Im is identified with edge σm,i in the canonical ordering of edges of Tm . This is the edge of Tm whose length equals length(Rm , i). The bijection (5.20), and the fact that σn,i depends only on shape(Tn ), will now be established by induction on n ≥ 2. For n = 2 the claim is obvious. Suppose for some n ≥ 3 that the correspondence between (I1 , . . . , In−2 ) and shape(Tn−1 ) has been established, and that the length of edge σn−1,i in the canonical ordering of edges of Tn−1 equals the length of the ith edge in the canonical ordering of edges of Rn−1 , for some σn−1,i which is a function of i and shape(Tn−1 ). According to the construction of Tn , if In−1 = i then Tn is derived from Tn−1 by splitting Tn−1 into two branches at some point along edge σn−1,i in the canonical ordering of the edges of Tn−1 , and forming a new tree from the two branches and an extra segment of length τn − τn−1 . Clearly, shape(Tn ) is determined by shape(Tn−1 ) and In−1 , and in the canonical ordering of the edge-lengths of Tn the length of the ith edge equals the length of the edge σn,i of Rn , for some σn,i which is a function of shape(Tn−1 ) and In−1 , and hence a function of shape(Tn ). To

5.4. RECURRENCE, STATIONARITY AND ERGODICITY

145

complete the proof, it is enough by the inductive hypothesis to show that the map (shape(Tn−1 ), In−1 ) → shape(Tn ) just described is invertible. But shape(Tn−1 ) and In−1 can be recovered from shape(Tn ) by the following sequence of moves: • delete the edge attached to the root of shape(Tn ); • split the remaining tree into its two branches leading away from the internal node to which the deleted edge was attached; • re-attach the bottom end of the branch not containing leaf n to leaf n on the other branch, joining the two incident edges to form a single edge; • the resulting shape is shape(Tn−1 ), and In−1 is the index such that the joined edge in shape(Tn−1 ) is the edge σn−1,In−1 in the canonical ordering of edges on shape(Tn−1 ). ¤ Proof of Proposition 5.3.1. We saw in Proposition 5.3.2 that, Tτn − has the same distribution as Rτn − . Moreover, we recalled in Section 3.3 that Rt converges in distribution to the Brownian CRT, as t → ∞. Clearly, the rooted Gromov–Hausdorff distance between Tt and Tτn+1 − is at most τn+1 − τn for τn ≤ t < τn+1 . It remains to observe that τn+1 − τn → 0 in probability as n → ∞. ¤ 5.4. Recurrence, stationarity and ergodicity In this section we show that the Brownian CRT is the unique equilibrium under the root growth with regrafting dynamics is recurrent. We first state convergence to the Brownian CRT. Proposition 5.4.1 (Convergence to the Brownian CRT). For any T ∈ Troot , the law of Xt under PT converges weakly to that of the Brownian CRT as t → ∞. We prepare the proof with the following lemma. Lemma 5.4.2. For any (T, r, ρ) ∈ Troot we can build on the same probability space two Troot -valued processes X 0 and X 00 such that: • X 0 has the law of X under P{ρ} , where {ρ} is the trivial tree consisting of just the root ρ, • X 00 has the law of X under PT , • for all t ≥ 0, © ª (5.21) dGHroot (Xt0 , Xt00 ) ≤ dGHroot ({ρ}, T ) = sup r(ρ, x) : x ∈ T , • (5.22)

lim d root (Xt0 , Xt00 ) t→∞ GH

= 0,

almost surely.

146

5. ROOT GROWTH AND REGRAFTING

Proof. The proof follows almost immediately from construction of X in Section 5.1 and Lemma 5.1.3. The only point requiring some comment is (5.22). For that it will be enough to show for any ε > 0 that for PT -a.e. (T, π0 , π) ∈ Ω there exists t > 0 such that the projection of π0 ∩ (]0, t] × T o ) onto T is an ε-net for T . Note that the projection of π0 ∩ (]0, t] × T o ) onto T is a Poisson process under PT with intensity tµ, where µ is the length measure on T . Moreover, T can be covered by a finite collection of ε-balls, each with positive µmeasure. Therefore, the PT -probability of the set of (T, π0 , π) ∈ Ω such that the projection of π0 ∩(]0, t]×T o ) onto T is an ε-net for T increases as t → ∞ to 1. ¤ Proof of Proposition 5.4.1. This is an immediate consequence of Lemma 5.4.2 together with Proposition 5.3.1. ¤ The next result states recurrence of the root growth with regrafting dynamics. Proposition 5.4.3 (Recurrence). Consider a non-empty open set U ⊆ Troot . For each T ∈ Troot , © ª (5.23) PT for all s ≥ 0, there exists t > s such that Xt ∈ U = 1. Proof. It is straightforward, but notationally rather tedious, to show that if B 0 ⊆ Troot is any ball and {ρ} is the trivial tree, then © ª (5.24) P{ρ} Xt ∈ B 0 > 0 for all t sufficiently large. Thus, for any ball B 0 ⊆ Troot there is, by Lemma 5.4.2, a ball B 00 ⊆ Troot containing the trivial tree such that © ª (5.25) inf 00 PT Xt ∈ B 0 > 0 T ∈B

for each t sufficiently large. By a standard application of the Markov property, it therefore suffices to show for each T ∈ Troot and each ball B 00 around the trivial tree that © ª (5.26) PT there exists t > 0 such that Xt ∈ B 00 = 1. By another standard application of the Markov property, equation (5.26) will follow if we can show that there is a constant p > 0 depending on B 00 such that for any T ∈ Troot © ª (5.27) lim inf PT Xt ∈ B 00 > p. t→∞

This, however, follows from Proposition 5.4.1 and the observation that for any ε > 0 the law of the Brownian CRT assigns positive mass to the set of trees with height less than ε (which is just the observation that the law of the Brownian excursion assigns positive mass to the set of excursion paths with maximum less that ε/2). ¤

5.5. FELLER PROPERTY

147

Proposition 5.4.4. The law of the Brownian CRT is the unique stationary distribution CRT, then R R for X. That is, if ξ is the law of the root ξ(dT )Pt f (T ) = ξ(dT )f (T ) for all t ≥ 0 and f ∈ bB(T ), and ξ is the unique probability measure on Troot with this property. Proof. This is a standard argument given Proposition 5.4.1 and the Feller property for the semigroup (Pt )t≥0 established in Proposition 5.5.1, but we include the details for completeness. Consider a test function f : Troot → R that is continuous and bounded. By Proposition 5.5.1 below, the function Pt f is also continuous and bounded for each t ≥ 0. Therefore, by Proposition 5.4.1, Z Z Z ξ(dT ) f (T ) = lim ξ(dT ) Ps f (T ) = lim ξ(dT ) Ps+t f (T ) s→∞ s→∞ Z Z (5.28) = lim ξ(dT )Ps (Pt f )(T ) = ξ(dT ) Pt f (T ) s→∞

for each t ≥ 0, and hence ξ is stationary. Moreover, if ζ is a stationary measure, then Z Z ζ(dT )f (T ) = ζ(dT )Pt f (T ) µZ ¶ Z Z (5.29) → ζ(dT ) ξ(dT ) f (T ) = ξ(dT ) f (T ), and ζ = ξ, as claimed.

¤

5.5. Feller property In this section we show that the law of Xt under PT is weakly continuous in the initial value T for each t ≥ 0. This property is sometimes referred to as the Feller property of the semigroup (Pt )t≥0 , although this terminology is often restricted to the case of a locally compact state space and transition operators that map the space of continuous functions that vanish at infinity into itself. A standard consequence of this result is that the law of the process (Xt )t≥0 is weakly continuous in the initial value (when the space of c`adl`ag Troot -valued paths is equipped with the Skorohod topology). Proposition 5.5.1. If the function f : Troot → R is continuous and bounded, then the function Pt f is also continuous and bounded for each t ≥ 0. We will prove the proposition by a coupling argument that, inter alia, builds processes with the law of X under PT for two different finite values of T on the same probability space. The key to constructing such a coupling is the following pair of lemmas.

148

5. ROOT GROWTH AND REGRAFTING

We require the following notion. A rooted combinatorial tree is just a connected, acyclic graph with one vertex designated as the root. Equivalently, we can think of a rooted combinatorial tree as a finite rooted tree in which all edges have length one. Thus any finite rooted tree is associated with a unique rooted combinatorial tree by changing all the edge lengths to one, and any two finite rooted trees with the same topology are associated with the same rooted combinatorial tree. If U and V are two rooted combinatorial trees with leaves labeled by (x1 , . . . , xn ) and (y1 , . . . , yn ), then we say that U and V are isomorphic if there exists a graph isomorphism between U and V that maps the root of U to the root of V and xi to yi for 1 ≤ i ≤ n. Lemma 5.5.2. Let (T, ρ) be a finite rooted trees with leaves {x1 , . . . , xn } = T \ T o . (recall the definition of the skeleton T o from (1.19)). Write η for the minimum of the (strictly positive) edge lengths in T . Suppose that (T 0 , ρ0 ) is η another finite rooted tree with dGHroot ((T 0 , ρ0 ), (T, ρ)) < δ < 16 . Then there 00 0 root 0 0 00 exists a subtree (T , ρ ) ¹ (T , ρ ) and a map f¯ : T → T such that: (i) f¯(ρ) = ρ0 , (ii) T 00 is spanned by {f¯(x1 ), . . . , f¯(xn ), ρ0 }, (iii) dH (T 0 , T 00 ) < 3δ, (iv) dis(f¯) < 8δ, (v) T 00 has leaves {f¯(x1 ), . . . , f¯(xn )}, (vi) by possibly deleting some internal edges from the rooted combinatorial tree associated to T 00 with leaves labeled by (f¯(x1 ), . . . , f¯(xn )), one can obtain a leaf-labeled rooted combinatorial tree that is isomorphic to the rooted combinatorial rooted tree associated to T with leaves labeled by (x1 , . . . , xn ). Proof. We have from (1.26) that there is a correspondence Rroot containing (ρ, ρ0 ) between T and T 0 such that dis(Rroot ) < 2δ. For x ∈ T \ {ρ}, choose f (x) ∈ T 0 such that (x, f (x)) ∈ Rroot , and put f (ρ) := ρ0 . Set T 00 to be the subtree of T 0 spanned by {f (x1 ), . . . , f (xn ), ρ0 }. For x ∈ T define f¯(x) ∈ T 00 to be the point in T 00 that has minimum distance to f (x). In particular, f¯(ρ) = f (ρ) = ρ0 and f¯(xi ) = f (xi ) for all i, so that (i) and (ii) hold. For x0 ∈ T 0 \ {ρ0 } choose g(x0 ) such that (g(x0 ), x0 ) ∈ Rroot and put 0 g(ρ ) := ρ. Then f (T ) and g(T 0 ) are 2δ-nets for T 0 and T , respectively, and dis(f ) ∨ dis(g) < 2δ. For each i ∈ {1, . . . , n} and y 0 ∈ T 0 we have (ρ, ρ0 ), (xi , f (xi )), (g(y 0 ), y 0 ) ∈ Rroot . Hence rT 0 (ρ0 , y 0 ) < rT (ρ, g(y 0 )) + 2δ, and rT 0 (y 0 , f (xi )) < rT (g(y 0 ), xi ) + 2δ. Now fix y 0 ∈ T 0 , and choose i ∈ {1, . . . , n} such that g(y 0 ) ∈ [ρ, xi ]. Then ¡ ¢ rT 0 (ρ0 , f (xi )) + 2dH {y 0 }, [ρ0 , f (xi )] = rT 0 (ρ0 , y 0 ) + rT 0 (y 0 , f (xi )) (5.30)

< rT (ρ, xi ) + 4δ < rT 0 (ρ0 , f (xi )) + 2δ + 4δ,

5.5. FELLER PROPERTY

149

and hence dH ({y 0 }, T 00 ) < 3δ. Thus (iii) holds. For x, y ∈ T , (5.31) |rT (x, y) − rT 00 (f¯(x), f¯(y))| ≤ |rT (x, y) − rT 0 (f (x), f (y))| + rT 0 (f¯(x), f (x)) + rT 0 (f¯(y), f (y)) ≤ dis(f ) + 2dH (T 0 , T 00 ) < 8δ, and (iv) holds. In order to establish (v), it suffices to observe for 1 ≤ i 6= j ≤ n that, by part (iv), rT 00 (f¯(xi ), f¯(xj )) + rT 00 (f¯(xj ), ρ0 ) − rT 00 (f¯(xi ), ρ0 ) (5.32)

≥ rT (xi , xj ) + rT (xj , ρ) − rT (xi , ρ) − 3dis(f¯) > 2η − 24δ > 0.

Similarly, part (vi) follows from part (iv) and the observations in Section 1.6 about re-constructing tree shapes from distances between the points in subsets of size four drawn from the leaves and the root of T 00 once we observe the inequality 21 · 4dis(f¯) < 16δ < η. ¤ Lemma 5.5.3. Let (T, ρ) be a finite rooted tree and ε > 0. There exists δ > 0 depending on T and ε such that if (T 0 , ρ0 ) is a finite rooted tree with dGHroot ((T 0 , ρ0 ), (T, ρ)) < δ, then there exist subtrees (S, ρ) ¹root T and (S 0 , ρ0 ) ¹root T 0 for which: (i) dH (S, T ) < ε and dH (S 0 , T 0 ) < ε, (ii) S and S 0 have the same total length, (iii) there is a bijective measurable map ψ : S → S 0 that preserves length measure and has distortion at most ε, (iv) the length measure of the set of points a ∈ S such that {b0 ∈ S 0 : ψ(a) ≤ b0 } 6= ψ({b ∈ S : a ≤ b}) (that is, the set of points a such that the subtree above ψ(a) is not the image under ψ of the subtree above a) is less than ε. Proof. As in Lemma 5.5.2, denote by η the minimum of the (strictly positive) edge lengths of T . Let (T 0 , ρ0 ) be a finite rooted tree with η (5.33) dGHroot ((T 0 , ρ0 ), (T, ρ)) < δ < , 16 where δ depending on T and ε will be chosen later. Set (T 00 , ρ0 ) and f¯ to be a subtree of T 0 and a function from T to T 00 whose existence is guaranteed by Lemma 5.5.2 for this choice of δ. Let {x1 , . . . xn } denote the leaves of T and write x0i := f (xi ) = f¯(xi ) for i = 1, . . . , n. Define inductively subtrees S1 , . . . , Sn of T (all with root ρ) and S10 , . . . , Sn0 of T 00 ⊆ T 0 (all with root ρ0 ) as follows. Set S1 := [ρ, y1 ] and

150

5. ROOT GROWTH AND REGRAFTING

S10 := [ρ, y10 ], where y1 and y10 are the unique points on the arcs [ρ, x1 ] and [ρ0 , x01 ], respectively, such that rT (ρ, y1 ) = rT 0 (ρ0 , y10 ) = rT (ρ, x1 ) ∧ rT 0 (ρ0 , x01 ).

(5.34)

0 have been defined. Let z Suppose that S1 , . . . , Sm and S10 , . . . , Sm m+1 and 0 0 zm+1 be the points on Sm and Sm closest to xm+1 and x0m+1 . Put Sm+1 := 0 0 ∪]z 0 0 0 Sm ∪]zm+1 , ym+1 ] and Sm+1 := Sm m+1 , ym+1 ], where ym+1 and ym+1 are 0 0 the unique points on the arcs ]zm+1 , xm+1 ] and ]zm+1 , xm+1 ], respectively, such that 0 0 rT 0 (zm+1 , ym+1 ) = rT 0 (zm+1 , ym+1 ) (5.35) 0 = rT (zm+1 , xm+1 ) ∧ rT 0 (zm+1 , x0m+1 ).

Set S := Sn and S 0 := Sn0 . Put z1 := ρ, and z10 := ρ0 . By construction, the arcs ]zk , yk ], 1 ≤ k ≤ n, are disjoint and their union is S \ {ρ}. Similarly, the arcs ]zk0 , yk0 ] are disjoint and their union is S 0 \ {ρ0 }. Moreover, the arcs ]zk , yk ] and ]zk0 , yk0 ] have the same length (in particular, S and S 0 have the same length and part (ii) holds). We may therefore define a measure-preserving bijection ψ between S and S 0 by setting ψ(ρ) = ρ0 and letting the restriction of ψ to each arc ]zk , yk ] be the obvious length preserving bijection onto ]zk0 , yk0 ]. More precisely, if a ∈]zk , yk ], then ψ(a) is the uniquely determined point on ]zk0 , yk0 ] such that rS 0 (zk0 , ψ(a)) = rS (zk , a). We next estimate the distortion of ψ to establish part (iii). We first claim that for a, b ∈ S, (5.36)

|rS (a, b) − rS 0 (ψ(a), ψ(b))| ≤ 5γ,

where (5.37) γ :=

0 max |rS (yk , ym )−rS 0 (yk0 , ym )|∨ max |rS (yk , ρ)−rS 0 (yk0 , ρ0 )|.

1≤k,m≤n

1≤k≤n

To see (5.36), consider a, b ∈ S \ {ρ} with a ∈]zk , yk ] and b ∈]zm , ym ] where k 6= m. (The case where a = ρ or b = ρ holds “by continuity” and is left to the reader.) Without loss of generality, assume that k < m, so that 0 ≤ z 0 < ψ(b) ≤ yk ∧ ym ≤ zm < b ≤ ym in the partial order on S and yk0 ∧ ym m 0 0 ym in the partial order on S . Note that yk ∧ ym and zk are comparable in 0 and z 0 . Moreover, by part (vi) of Lemma the partial order, as are yk0 ∧ ym k 0 ≤ z 0 . We then have to consider four 5.5.2, yk ∧ ym ≤ zk if and only if yk0 ∧ ym k 0 , ψ(a). cases depending on the relative positions of yk ∧ ym , a and yk0 ∧ ym 0 < ψ(a) ≤ y 0 . Case I: yk ∧ ym < a ≤ yk and yk0 ∧ ym k We have

(5.38)

rS (yk , ym ) = rS (yk , a) + rS (a, b) + rS (b, ym )

and (5.39)

0 0 rS 0 (yk0 , ym ) = rS 0 (yk0 , ψ(a)) + rS 0 (ψ(a), ψ(b)) + rS 0 (ψ(b), ym ).

5.5. FELLER PROPERTY

151

By construction, (5.40)

rS (yk , a) = rS 0 (yk0 , ψ(a))

and (5.41)

0 rS (b, ym ) = rS 0 (ψ(b), ym ).

Hence (5.42)

0 |rS (a, b) − rS 0 (ψ(a), ψ(b))| = |rS (yk , ym ) − rS 0 (yk0 , ym )| ≤ γ.

0 < y0 . Case II: yk ∧ ym < a ≤ yk and ψ(a) ≤ yk0 ∧ ym k Note that in this case zk ≤ yk ∧ ym . We again have

(5.43)

rS (yk , ym ) = rS (yk , a) + rS (a, b) + rS (b, ym ),

but now (5.44)

0 0 rS 0 (yk0 , ym ) = rS 0 (yk0 , ψ(a)) + rS 0 (ψ(a), ψ(b)) + rS 0 (ψ(b), ym ) 0 − 2rS 0 (ψ(a), yk0 ∧ ym ).

0 . Let y` be such that zk = y` ∧ yk = y` ∧ ym and hence zk0 = y`0 ∧ yk0 = y`0 ∧ ym Observe from Section 1.6 that

(5.45) 0 rS 0 (ψ(a), yk0 ∧ ym ) 1 0 0 ) + rS 0 (yk0 , ρ0 ) − rS 0 (yk0 , ym ) − rS 0 (y`0 , ρ0 )) − rS 0 (zk0 , ψ(a)) = (rS 0 (y`0 , ym 2 1 1 ≤ (rS (y` , ym ) + rS (yk , ρ) − rS (yk , ym ) − rS (y` , ρ)) + 4γ − rS (zk , a) 2 2 = rS (zk , yk ∧ ym ) − rS (zk , a) + 2γ ≤ 2γ, and hence (5.46)

|rS (a, b) − rS 0 (ψ(a), ψ(b))| ≤ 5γ.

0 ≤ ψ(a) < y 0 . Case III: a ≤ yk ∧ ym < yk and yk0 ∧ ym k 0 0 0 Note that in this case, zk ≤ yk ∧ ym . This case is similar to Case II, but we record some of the details for use later in the proof of part (iv). Letting the index ` be as in Case II, we have

(5.47)

rS (yk , ym ) = rS (yk , a) + rS (a, b) + rS (b, ym ) − 2rS (a, yk ∧ ym )

and (5.48)

0 0 rS 0 (yk0 , ym ) = rS 0 (yk0 , ψ(a)) + rS 0 (ψ(a), ψ(b)) + rS 0 (ψ(b), ym ).

152

5. ROOT GROWTH AND REGRAFTING

We have rS (a, yk ∧ ym ) 1 = (rS (y` , ym ) + rS (yk , ρ) − rS (yk , ym ) − rS (y` , ρ)) − rS (zk , a) 2 1 1 0 0 ≤ (rS 0 (y`0 , ym ) + rS 0 (yk0 , ρ0 ) − rS 0 (yk0 , ym ) − rS 0 (y`0 , ρ0 )) + 4γ (5.49) 2 2 − rS 0 (zk0 , ψ(a)) 0 = rS 0 (zk0 , yk0 ∧ ym ) − rS 0 (zk0 , ψ(a)) + 2γ ≤ 2γ,

and hence (5.50)

|rS (a, b) − rS 0 (ψ(a), ψ(b))| ≤ 5γ.

0 < y0 . Case IV: a ≤ yk ∧ ym < yk and ψ(a) ≤ yk0 ∧ ym k Letting the index ` be as in Case II, we have

(5.51)

rS (zk , ym ) = rS (zk , a) + rS (a, b) + rS (b, ym )

and (5.52)

0 0 rS 0 (zk0 , ym ) = rS 0 (zk0 , ψ(a)) + rS 0 (ψ(a), ψ(b)) + rS 0 (ψ(b), ym ).

Hence, from Section 1.6, rS (a, b) − rS 0 (ψ(a), ψ(b)) 0 = rS (zk , ym ) − rS 0 (zk0 , ym )

(5.53)

0 0 = rS (y` ∧ ym , ym ) − rS 0 (y`0 ∧ ym , ym ) 1 = (rS (ym , ρ) + rS (y` , ym ) − rS (y` , ρ)) 2 1 0 0 , ρ0 ) + rS 0 (y`0 , ym ) − rS 0 (y`0 , ρ0 )). − (rS 0 (ym 2

Thus 3 |rS (a, b) − rS 0 (ψ(a), ψ(b))| ≤ γ. 2 Combining Cases I–IV, we see that (5.36) holds. We thus require an estimate of γ to complete the estimation of the distortion of ψ . Clearly, (5.54)

0 |rS (yk , ym ) − rS 0 (yk0 , ym )| ≤ |rT (xk , xm ) − rT 00 (x0k , x0m )|

(5.55)

+ rT (yk , xk ) + rT (ym , xm ) 0 + rT 00 (yk0 , x0k ) + rT 00 (ym , x0m ).

By (5.34), (5.56)

rT (y1 , x1 ) ∨ rT 00 (y10 , x01 ) = |rT (ρ, x1 ) − rT 00 (ρ0 , x01 )| ≤ dis(f¯) < 8δ.

5.5. FELLER PROPERTY

153

For 2 ≤ k ≤ n there exists by construction an index i ∈ {1, 2, . . . , k −1} such that zk ∈ [zi , yi ] and zk0 ∈ [zi0 , yi0 ]. Applying the observations of Section 1.6, rT (yk , xk ) ∨ rT 00 (yk0 , x0k ) = |rT (yk , xk ) − rT 00 (yk0 , x0k )|

(5.57)

= |rT (zk , xk ) − rT 00 (zk0 , x0k )| 1 ≤ {|rT (xi , xk ) − rT 00 (x0i , x0k )| 2 + |rT (ρ, xi ) − rT 00 (ρ0 , x0i )| + |rT (ρ, xk ) − rT 00 (ρ0 , x0k )|} 3 ≤ dis(f¯) 2 ≤ 12δ.

Thus, from (5.55), (5.58)

0 |rS (yk , ym ) − rS 0 (yk0 , ym )| < (8 + 4 × 12)δ = 56δ.

A similar argument shows that |rS (yk , ρ) − rS 0 (yk0 , ρ0 )| < (8 + 2 × 12)δ = 32δ, and hence γ < 56δ. Substituting into (5.36) gives (5.59)

dis(ψ) ≤ 5γ < (5 × 56)δ = 280δ.

Moving to part (i), apply (5.57) to obtain (5.60)

rH (S, T ) ≤ max r(yk , xk ) ≤ γ < 56δ 1≤i≤n

and, by similar arguments, (5.61)

rH (S 0 , T 0 ) ≤ rH (S 0 , T 00 ) + rH (T 00 , T 0 ) < 59δ.

Finally, we consider part (iv). Suppose that a ∈ S is such that the subtree of S 0 above ψ(a) is not the image under ψ of the subtree of S above a. Let k be the unique index such that a ∈]zk , yk ] (and hence ψ(a) ∈]zk0 , yk0 ]). It follows from the construction of ψ that there must exist an index ` such that either zk < a ≤ z` and zk0 < z`0 ≤ ψ(a) or zk < z` ≤ a and zk0 < ψ(a) ≤ z`0 . These two situations have already been considered in Case III and Case II 0 . above (in that order): there we represented z` as yk ∧ ym and z`0 as yk0 ∧ ym It follows from the inequality (5.49) that the mass of the set of points a that satisfy the first alternative is at most 2γn < 112δn. Similarly, from the inequality (5.45) and the fact that ψ is measure-preserving, the mass of the set of points a that satisfy the second alternative is also at most 2γn < 112δn. Thus the total mass of the set of points of interest is at most 224δn. ¤ Before completing the proof of Proposition 5.5.1, we recall the definition of the Wasserstein metric. Suppose that (E, r) is a complete, separable metric space. Write B for the set of continuous functions functions f : E → R such that |f (x)| ≤ 1 and |f (x) − f (y)| ≤ r(x, y) for x, y ∈ E. The

154

5. ROOT GROWTH AND REGRAFTING

Wasserstein (sometimes transliterated as Vasershtein) distance between two Borel probability measures α and β on E is given by ¯Z ¯ Z ¯ ¯ (5.62) dW (α, β) := sup ¯¯ f dα − f dβ ¯¯ . f ∈B

The Wasserstein distance is a genuine metric on the space of Borel probability measures and convergence with respect to this distance implies weak convergence (see, for example, Theorem 3.3.1 and Problem 3.11.2 of [EK86]). If V and W are two E-valued random variables on the same probability space (Σ, A, P) with distributions α and β, respectively, then dW (α, β) ≤ sup |P[f (V )] − P[f (W )]| (5.63)

f ∈B

≤ sup P[|f (V ) − f (W )|] ≤ P[d(V, W )]. f ∈B

Proof of Proposition 5.5.1. For (T, ρ) ∈ Troot and t ≥ 0, let (5.64)

Pt ((T, ρ), ·) := P(T,ρ) {Xt ∈ ·}.

We need to show that (T, ρ) 7→ Pt ((T, ρ), ·) is weakly continuous for each t ≥ 0. This is equivalent to showing for each (T, ρ) ∈ Troot and t ≥ 0 that ¡ ¢ (5.65) lim dW Pt ((T, ρ), ·), Pt ((T 0 , ρ0 ), ·) = 0. (T 0 ,ρ0 )→(T,ρ)

From the coupling argument in the proof of part (iii) of Theorem 5.2.2 (in particular, the inequality (5.13)), we have that dW (Pt ((T, ρ), ·), Pt ((T 0 , ρ0 ), ·)) ≤ dW (Pt ((T, ρ), ·), Pt ((Rη (T ), ρ), ·)) (5.66)

+ dW (Pt ((Rη (T ), ρ), ·), Pt ((Rη (T 0 ), ρ0 ), ·)) + dW (Pt ((Rη (T 0 ), ρ), ·), Pt ((T 0 , ρ0 ), ·)) ≤ dW (Pt ((Rη (T ), ρ), ·), Pt ((Rη (T 0 ), ρ0 ), ·)) + 2η.

By part (ii) of Lemma 1.9.2, Rη (T 0 ) converges to Rη (T ) as (T 0 , ρ0 ) converges to (T, ρ), and so it suffices to establish (5.65) when (T, ρ) and (T 0 , ρ0 ) are finite trees, and so we will suppose this for the rest of the proof. Fix (T, ρ) and ε > 0. Suppose that δ > 0 depending on (T, ρ) and ε is sufficiently small that the conclusions of Lemma 5.5.3 hold for any (T 0 , ρ0 ) within distance δ of (T, ρ). Let (S, ρ) and (S 0 , ρ0 ) be the subtrees guaranteed by Lemma 5.5.3. From the coupling argument in proof of part (iii) of Theorem 5.2.2 we have (5.67) and (5.68)

dW (Pt ((T, ρ), ·), Pt ((S, ρ), ·)) < ε ¡ ¢ dW Pt ((T 0 , ρ0 ), ·), Pt ((S 0 , ρ0 ), ·) < ε.

It therefore suffices to give a bound on dW (Pt ((S, ρ), ·), Pt ((S 0 , ρ), ·)) that only depends on ε and converges to zero as ε converges to 0.

5.5. FELLER PROPERTY

155

Construct on some probability space (Σ, A, P) a Poisson point process Π0 on the set R++ ×S o with intensity λ⊗µ, where µ is the length measure on S. Construct on the same space another independent Poisson point process on the set {(t, x) ∈ R++ × R++ : x ≤ t} with intensity λ ⊗ λ restricted to this set. If we set Π00 := {(t, ψ(x)) : (t, x) ∈ Π0 } ⊂ R++ × (S 0 )o , then Π00 is a Poisson process on the set R++ × (S 0 )o with intensity λ ⊗ µ0 , where µ0 is the length measure on S 0 (because ψ preserves length measure.) Now apply the construction of Section 5.1 to realizations of Π0 and Π (respectively, Π00 and Π) to get two Troot -valued processes that we will denote by (Yt )t≥0 and (Yt0 )t≥0 . We see from the proof of Theorem 5.2.2 that Y (respectively, Y 0 ) 0 0 has the same law as X under P(S,ρ) (respectively, P(S ,ρ ) ). Define a map ψt from Yt = St]0, t] to Yt0 = S 0 t]0, t] by setting the restriction of ψt to S be ψ and the restriction of ψt to ]0, t] be the identity map. Let rt and rt0 be the metrics on Yt and Yt0 , respectively. We will bound the rooted Gromov-Hausdorff distance between Yt and Yt0 by bounding the distortion of ψt . The cut-times for Y and Y 0 coincide. If ξ is a cut-point of Y at some cut-time τ , then the corresponding cut-point for Y 0 will be ψ(ξ). It is clear that the distortion of ψt is constant between cut-times. Write Bt for the set of points b ∈ Yt such that the subtree of Yt0 above ψt (b) is not the image under ψt of the subtree of Yt above b. The set Bt is unchanged between cut-times. Consider a cut-time τ such that the corresponding cut-point ξ is in Yτ − \ Bτ − . If x and y are in the subtree above ξ in Yτ − , then they are moved together by the re-grafting operation and their distance apart is unchanged in Yτ . Also, ψτ − (x) and ψτ − (y) are in subtree above ψτ − (ξ) in Yτ0− and these two points are also moved together. More precisely, (5.69)

rτ (x, y) = rτ − (x, y)

and (5.70)

rτ0 (ψτ (x), ψτ (y)) = rτ0 − (ψτ − (x), ψτ − (y)).

The same conclusion holds if neither x or y are in the subtree above ξ in Yτ − . If x is in the subtree above ξ in Yτ − and y is not, then (5.71)

rτ (x, y) = rτ − (x, ξ) + rτ − (τ, y)

and (5.72)

rτ0 (ψτ (x), ψτ (y)) = rτ0 − (ψτ − (x), ψτ − (ξ)) + rτ0 − (τ, ψτ − (y))

(where we recall that τ is the root in each of the trees Yτ − , Yτ , Yτ0− , Yτ0 ). Combining these cases, we see that (5.73)

dis(ψτ ) ≤ 2dis(ψτ − ).

Moreover, if ξ ∈ Yτ − \ Bτ − , then Bτ = Bτ − .

156

5. ROOT GROWTH AND REGRAFTING

Also, for any t ≥ 0 we always have the upper bound dis(ψt ) ≤ diam(Yt ) + diam(Yt0 ) ≤ diam(S) + diam(S 0 ) + 2t (5.74)

≤ diam(T ) + diam(T 0 ) + 2t ≤ 2diam(T ) + dGHroot ((T, ρ), (T 0 , ρ0 )) + 2t ≤ 2diam(T ) + δ + 2t =: Dt

Set Nt := |Π0 ∩ (]0, t] × S o )| + |Π ∩ {(s, x) : 0 < x ≤ s ≤ t}| and write It for the indicator of the event {Π0 ∩ (]0, t] × B0 ) 6= ∅}, which, by the above argument, is the event that ξ ∈ Bτ − for some (cut-time, cut-point) pairs (τ, ξ) with 0 < τ ≤ t. We have dW (Pt ((S, ρ), ·), Pt ((S 0 , ρ), ·)

(5.75)

≤ P[dGHroot (Yt , Yt0 )] 1 ≤ P[dis(ψt )] 2 ¤ 1 £ ≤ P ε2Nt + It Dt 2½ µ ¶ ¾ 1 t2 = ε exp µ(T )t + + [1 − exp (−εt)] Dt , 2 2

and this suffices to complete the proof.

¤

5.6. Asymptotics of the Aldous-Broder algorithm Recall the Broder-Aldous Markov chain from the introduction in this chapter. In this section we state that the suitable rescaled Broder-Aldous Markov chain converges indeed to the root growth with regrafting dynamics. It will be more convenient for us to work with the continuous time version of this algorithm in which the above transitions are made at the arrival times of an independent Poisson process with rate |V |/(|V | − 1) (so that the continuous time chain makes actual jumps at rate 1). We can associate a rooted compact real tree with a rooted labeled combinatorial tree in the obvious way by thinking of the edges as line segments with length 1. Because we don’t record the labeling, the process that arises from mapping the continuous-time Aldous-Broder algorithm in this way won’t be Markovian in general. However, this process will be Markovian in the case where P is the transition matrix for i.i.d. uniform sampling (that is, when P(x, y) = 1/|V | for all x, y ∈ V ) and we assume this from now on. The following result says that if we rescale “space” and time appropriately, then this process converges to the root growth with re-grafting process. If T = (T, r, ρ) is a rooted compact real tree and c > 0, we write cT for the

5.6. ASYMPTOTICS OF THE ALDOUS-BRODER ALGORITHM

157

tree (T, c r, ρ) (that is, cT = T as sets and the roots are the same, but the metric is re-scaled by c). Proposition 5.6.1. Let Y n = (Ytn )t≥0 be a sequence of Markov processes that take values in the space of rooted compact real trees with integer edge lengths and evolve according to the dynamics associated with the continuous-time Aldous-Broder chain for i.i.d. uniform sampling. Suppose that each tree Y0n is non-random with total branch length Nn , that Nn con−1/2 verges to infinity as n → ∞, and that Nn Y0n converges in the rooted Gromov-Hausdorff metric to some rooted compact real tree T as n → ∞. Then, in the sense of weak convergence of processes on the space of c` adl` ag −1/2 n 1/2 paths equipped with the Skorohod topology, (Nn Y (Nn t))t≥0 converges as n → ∞ to the root growth with re-grafting process X under PT . Proof. Define Z n = (Ztn )t≥0 by Ztn := Nn−1/2 Y n (Nn1/2 t).

(5.76)

For η > 0, let Z η,n be the Troot -valued process constructed as follows. −1/2

1/2

• Set Z0η,n = Rηn (Z0n ), where ηn := Nn bNn ηc. • The value of Z η,n is unchanged between jump times of (Ztn )t≥0 . • At a jump time τ for (Ztn )t≥0 , the tree Zτη,n is the subtree of Zτn n spanned by Zτη,n − and the root of Zτ . An argument similar to that in the proof of Lemma 5.1.3 shows that sup dH (Ztn , Ztη,n ) ≤ ηn ,

(5.77)

t≥0

and so it suffices to show that Z η,n converges weakly as n → ∞ to X under PRη (T ) . Note that Z0η,n converges to Rη (T ) as n → ∞. Moreover, if Λ is the map that sends a tree to its total length (that is, the total mass of its length measure,) then limn→∞ Λ(Z0η,n ) = Λ ◦ Rη (T ) < ∞ by Lemma 1.9.3. The pure jump process Z η,n is clearly Markovian. If it is in a state 0 (T , ρ0 ), then it jumps with the following rates. 1/2

1/2

1/2

• With rate Nn (Nn Λ(T 0 ))/Nn = Λ(T 0 ), one of the Nn Λ(T 0 ) −1/2 points in T 0 that are at distance a positive integer multiple of Nn from the root ρ0 is chosen uniformly at random and the subtree −1/2 above this point is joined to ρ0 by an edge of length Nn . The −1/2 chosen point becomes the new root and an arc of length Nn 0 that previously led from the new root toward ρ is erased. Such a transition results in a tree with the same total length as T 0 . 1/2 • With rate Nn − Λ(T 0 ), a new root not present in T 0 is attached −1/2 to ρ0 by an edge of length Nn . This results in a tree with total −1/2 length Λ(T 0 ) + Nn .

158

5. ROOT GROWTH AND REGRAFTING

It is clear that these dynamics converge to those of the root growth with re-grafting process, with the first class of transitions leading to re-graftings in the limit and the second class leading to root growth. ¤ An alternative algorithm for simulating from the distribution in (5.1) in the case of i.i.d. uniform sampling is the complete graph special case of Wilson’s loop-erased walk algorithm for generating a uniform spanning tree of a graph [PW98, Wil96, WP96]. Asymptotics of the latter algorithm have been investigated in [Pit]. Wilson’s algorithm was also used in [PR04] to show that the finite-dimensional distributions of the re-scaled uniform random spanning tree for the d-dimensional discrete torus converges to the Brownian CRT as the number of vertices goes to infinity when d ≥ 5. 5.7. An application: The Rayleigh process Suppose that we take the root growth with re-grafting process (Xt )t≥0 under PT for some T ∈ Troot , we fix a point x ∈ T , and we denote by Rt the distance between x and the root t of Xt (that is, Rt is the height of x in Xt ). According to the root growth with re-grafting dynamics, Rt grows deterministically with unit speed between cut-time τ for which the corresponding cut-point falls on the arc [τ, x]. Such cut-times τ come along at intensity Rt− dt in time, and at τ − the position of the corresponding cutpoint is uniformly distributed on the arc [τ, x] conditional on the past up to τ −, so that Rτ is uniformly distributed on [0, Rτ − ] conditional on the past up to τ −. Consequently, the R+ -valued process (Rt )t≥0 is autonomously Markovian. In particular, (Rt )t≥0 is an example of the class of piecewise deterministic Markov processes discussed in the Introduction. In order to describe the properties of (Rt )t≥0 , we need the following definitions. A non-negative random variable R is said to have standard Rayleigh distribution if it is distributed as the length of a standard normal vector in R2 , that is, µ 2¶ r (5.78) P{R > r} = exp − , r ≥ 0. 2 If R∗ is distributed according to the size-biased standard Rayleigh distribution, that is, r rP{R ∈ dr} 2 2 − 1 r2 ∗ (5.79) P{R ∈ dr} = = r e 2 dr, r ≥ 0, P[R] π and if U is a uniform random variable that is independent of R∗ , then U R∗ has the inverse size-biased standard Rayleigh distribution: r r−1 P{R ∈ dr} 2 − 1 r2 ∗ (5.80) P{U R ∈ dr} = = e 2 dr, r ≥ 0. P[R−1 ] π Thus R∗ and U R∗ are distributed as the length of a standard normal vector in R3 and R, respectively.

5.7. AN APPLICATION: THE RAYLEIGH PROCESS

159

For reasons that are apparent from Proposition 5.7.1 below, we call the process (Rt )t≥0 the Rayleigh process. We note that there is a body of literature on stationary processes with Rayleigh one-dimensional marginal distributions that arise as the length process of a vector-valued process in R2 with coordinate processes that are independent copies of some stationary centered Gaussian process (see, for example, [Has70, MBB58, BS02]). Proposition 5.7.1. Consider the Rayleigh process (Rt )t≥0 . Write Pr for the law of (Rt )t≥0 started at r ≥ 0. (i) The unique stationary distribution of the Rayleigh process is the standard Rayleigh distribution and the total variation distance between Pr {Rt ∈ ·} and the standard Rayleigh distribution converges to 0 as t → ∞. (ii) Under P0 , for each fixed t > 0, Rt has the same law as R ∧ t, where R has the standard Rayleigh distribution. 1 2 (iii) For x > 0, the mean return time to x is x−1 e 2 x . (iv) If τn denotes the nth jump time of (Rt )t≥0 , then as n → ∞ the triple (Rτn , Rτn+1 − , Rτn+1 ) converges in law to the triple (U 0 R∗ , R∗ , U 00 R∗ ), where U 0 and U 00 are independent uniform random variables on ]0, 1[ independent of R∗ , and R∗ has the sizebiased Rayleigh distribution. (v) The jump counting process N (t) := |{n ∈ N : τn ≤ t}| has asymptotically stationary increments under Pr for any r ≥ 0, and r 1 π (5.81) N (t) → , Pr − a.s. t 2 as t → ∞. ¯ be a Poisson point process in R × R+ with Lebesgue Proof. (i) Let Π intensity. For −∞ < t < ∞ let (5.82)

¯ t := inf{x + (t − s) : (s, x) ∈ Π, ¯ s ≤ t}. R

¯ t )t∈R is a stationary Markov process with the transition It is clear that (R dynamics of the Rayleigh process. Similarly, for r ∈ R+ and t ≥ 0, set (5.83)

¯ 0 ≤ s ≤ t}. Rtr = (r + t) ∧ inf{x + (t − s) : (s, x) ∈ Π,

Then (Rtr )t≥0 has the same law as the Rayleigh process under Pr . ¯ t > r} is the event that Π ¯ has no points in the Note that the event {R ¯t > triangle with vertices (t − r, 0), (t, 0), (t, r) and area r2 /2. Thus P{R 2 r} = exp(−r /2) and the standard Rayleigh distribution is a stationary distribution for the Rayleigh process. ¯ t for all t ≥ T r . ¯ t }. Note that Rr = R Let T r := inf{t ≥ 0 : Rtr = R t ¯ 0 > r and Π ¯ puts no points into the Note that T r > t if and only if either R ¯ 0 ≤ r and Π ¯ quadrilateral with vertices (0, 0), (t, 0), (t, r + t), (0, r), or R ¯ puts no points into the quadrilateral with vertices (0, 0), (t, 0), (t, R0 + t),

160

5. ROOT GROWTH AND REGRAFTING

¯ 0 ). Hence (0, R µ ¶ µ 2¶ r 1 exp − (r + (r + t))t P{T > t} = exp − 2 2 µ ¶ µ 2¶ Z r 1 x + exp − (x + (x + t))t x exp − dx. 2 2 0 r

(5.84)

By the standard coupling inequality, the total variation between Pr {Rt ∈ ·} ¯ t ∈ ·} is at most 2P{T r > t}, which converges to 0 as t → ∞. and P{R This certainly shows that the standard Rayleigh distribution is the unique stationary distribution. (ii) Note that Rtr > x if and only if r + t ≥ t > x and there are no points ¯ in the triangle with vertices (t − x, 0), (t, 0), (t, x) of area x2 /2 or r + of Π ¯ in the quadrilateral with vertices t > x ≥ t and there are not points of Π (0, 0), (t, 0), (t, x), (0, x − t) of area ((x − t) + x)t/2 = x2 /2 − (x − t)2 /2. In either case, µ ¶ 1 2 1 r 2 (5.85) P {Rt > x} = 1{r + t > x} exp − x + ((x − t)+ ) . 2 2 Taking r = 0 gives the result. (iii) Let Ty := inf{t > 0 : Rt = y}. It is obvious from the Poisson construction that, for all x ≥ 0 and y > 0, Px {0 < Ty < ∞} = 1 and Px [exp(uTy )] < ∞ for all u in some neighborhood of 0. The Laplace transforms Px [exp −λTy ] are determined by standard methods of renewal theory: Px [exp −λTx ] =

(5.86)

Ux (λ) , 1 + Ux (λ)

λ > 0,

where by (5.85),

(5.87)

Z ∞ Px {Rt ∈ dx} Ux (λ) := dt e−λt dx 0 Z x e−λx 2 = dt e−λt t e−xt+t /2 + P{R ∈ dx} . λ 0

In particular, it follows easily that the mean return time of state x (5.88)

1 Px [Tx ] = − lim Px [exp −λTx ] λ↓0 λ 1 2

is the inverse of the density of R at x, that is x−1 e 2 x , as claimed. ¯ t 6= R ¯ t− }. By part (i), the joint distribution (iv) Let τ¯ := inf{t > 0 : R ¯0, R ¯ τ¯− , R ¯ τ¯ ) of (Rτn , Rτn+1 − , Rτn+1 ) converges to the joint distribution of (R ¯ ¯ conditional on R0 6= R0− . Let C denote the intensity of the stationary point

5.7. AN APPLICATION: THE RAYLEIGH PROCESS

161

¯ t 6= R ¯ t− }. Then process {t ∈ R : R ¯ 0 ∈ dx, R ¯ τ¯− ∈ dy, R ¯ τ¯ ∈ dz | R ¯0 = ¯ 0− } P{R 6 R µ ¶ µ ¶ 1 1 = C −1 exp − x2 dx exp − (x + y)(y − x) dy dz (5.89) 2 2 · ¸ · µ ¶ ¸ · ¸ 1 1 1 2 −1 2 = dy × dx × C y exp − y dz y 2 y for x p < y and z < y. The result now follows from (5.79), which also identifies C = π/2. ¯ t 6= R ¯ t− } is clearly ergodic by (v) The stationary point process p {t ∈ R : R construction, and it has intensity π/2 from the argument in part (iv). For ¯ t for all t any r > 0, it follows from the argument in part (i) that Rtr = R sufficiently large, and so the result follows from the ergodic theorem applied ¯ t 6= R ¯ t− }. to {t ∈ R : R ¤ Then the following corollary is a consequence of Proposition 5.6.1. See [DGR02], where similar scaling limits are derived. ˜ tN )t≥0 denote a continuous Corollary 5.7.2. For each N ∈ N, let (R time Markov chain with state space {1, . . . , N } and infinitesimal generator matrix  1/N, 1 ≤ j ≤ i − 1,     ˜ N (i, j) := −(N − 1)/N, j = i, . (5.90) Q  (N − i)/N, j = i + 1,    0, otherwise. ˜ N,r , r ∈ {1, . . . , N }, for the corresponding family of laws. If a seWrite P quence (rN )N ∈N , rN ∈ {1, . . ³. , N }, ´is such that limN →∞ N −1/2 rN = r∞ ˜ N,rN converges to that of ˜ N√ exists, then the law of N −1/2 R under P t N t≥0

the Rayleigh process (Rt )t≥0 under Pr∞ in the usual sense of convergence of c` adl` ag processes with the Skorohod topology.

CHAPTER 6

Subtree Prune and Regraft As mentioned in the Introduction, Markov chains that move through a space of finite trees are an important ingredient for several algorithms in phylogenetic analysis. In the present chapter we construct and investigate with the Subtree Prune and Regraft dynamics (SPR) the asymptotics of one of the standard sets of moves that are implemented in several phylogenetic software packages. Recall from Figure 0.2 in the Introduction that in an SPR move, a binary tree T is cut “in the middle of an edge” to give two subtrees, say T 0 and T 00 . Another edge is chosen in T 0 , a new vertex is created “in the middle” of that edge, and the cut edge in T 00 is attached to this new vertex. Lastly, the “pendant” cut edge in T 0 is removed along with the vertex it was attached to in order to produce a new binary tree that has the same number of vertices as T . As motivated in Sections 1.7 and 1.11, any compact real tree has an analogue of the length measure on it, but in general there is no canonical analogue of the weight measure. Consequently, the process we construct has as its state space the set of weighted trees, i.e., its elements are pairs (T, ν), where T is a compact real tree and ν is a probability measure on T . Let µ be the length measure associated with T . Our candidate for the limiting subtree prune with regraft dynamics will be a pure jump Markov process with values in the space of weighted R-trees (compare Section 1.11). The process jumps away from T by first choosing a pair of points (u, v) ∈ T × T according to the rate measure µ ⊗ ν and then transforming T into a new tree by cutting off the subtree rooted at u that does not contain v and re-attaching this subtree at v. This jump kernel (which typically has infinite total mass – so that jumps are occurring on a dense countable set) is precisely what one would expect for a limit (as the number of vertices goes to infinity) of the particular SPR Markov chain on finite trees described above in which the edges for cutting and re-attachment are chosen uniformly at each stage. Since the process we want to construct is reversible with respect to the distribution of the weighted Brownian continuum random tree (Definition 3.2.1), the framework of Dirichlet forms allows us to translate the above description into rigorous mathematics. This chapter is organized as follows. First we need to understand in detail the Dirichlet form arising from the combination of the jump kernel 163

164

6. SUBTREE PRUNE AND REGRAFT

with the distribution of the weighted Brownian CRT as a reference measure. In Section 6.1 we introduce a kernel corresponding to an SPR move and accomplish based on calculations we presented in Section 3.4 that we can control the intensity of “big” and “small” jumps. We construct the Dirichlet form in Section 6.2 and the resulting process in Section 6.3. We use potential theory for Dirichlet forms to show in Section 6.4 that from almost all starting points (with respect to the continuum random tree reference measure) our process does not hit the trivial tree consisting of a single point. 6.1. A symmetric jump measure on (Twt , dGHwt ) Recall the space (Twt , dGHwt ) of weighted real trees from Section 1.11. In this section we will construct and study a measure on Twt × Twt that is related to the decomposition discussed at the beginning of Section 3.4. Define a map Θ from {((T, r), u, v) : T ∈ T, u ∈ T, v ∈ T } into T by setting Θ((T, r), u, v) := (T, r(u,v) ) where letting  r(x, y), if x, y ∈ S T,u,v ,    r(x, y), if x, y ∈ T \ S T,u,v , (6.1) r(u,v) (x, y) := r(x, u) + r(v, y), if x ∈ S T,u,v , y ∈ T \ S T,u,v ,    r(y, u) + r(v, x), if y ∈ S T,u,v , x ∈ T \ S T,u,v . That is, Θ((T, r), u, v) is just T as a set, but the metric has been changed so that the subtree S T,u,v with root u is now pruned and re-grafted so as to have root v. If (T, r, ν) ∈ Twt and (u, v) ∈ T ×T , then we can think of ν as a weight on (T, r(u,v) ), because the Borel structures induces by r and r(u,v) are the same. With a slight misuse of notation we will therefore write Θ((T, r, ν), u, v) for (T, r(u,v) , ν) ∈ Twt . Intuitively, the mass contained in S T,u,v is transported along with the subtree. Define a kernel κ on Twt by © ª (6.2) κ((T, rT , νT ), B) := µT ⊗ νT (u, v) ∈ T × T : Θ(T, u, v) ∈ B for B ∈ B(Twt ). Thus κ((T, rT , νT ), ·) is the jump kernel described informally in the Introduction. Remark 6.1.1. It is clear that κ((T, rT , νT ), ·) is a Borel measure on Twt for each (T, rT , νT ) ∈ Twt . In order to show that κ(·, B) is a Borel function on Twt for each B ∈ B(Twt ), so that κ is indeed a kernel, it suffices to observe for each bounded continuous function F : Twt → R that Z Z T F (Θ(T, u, v)) µ (du)νT (dv) = lim F (Θ(T, u, v)) µRε (T ) (du)νT (dv) ε↓0

and that

Z (T, rT , νT ) 7→

F (Θ(T, u, v)) µRε (T ) (du)νT (dv)

is continuous for all ε > 0 (the latter follows from an argument similar to that in Lemma 7.3 of [EPW06], where it is shown that the (T, rT , νT ) 7→

6.1. A SYMMETRIC JUMP MEASURE ON (Twt , dGHwt )

#

#

u

165

*

v

#

*

*

Figure 6.1. A subtree prune and re-graft operation on an excursion path: the excursion starting at time u in the top picture is excised and inserted at time v, and the resulting gap between the two points marked # is closed up. The two points marked # (resp. ∗) in the top (resp. bottom) picture correspond to a single point in the associated R-tree. µRε (T ) (T ) is continuous). We have only sketched the argument that κ is a kernel, because κ is just a device for defining the measure J on Twt × Twt in the next paragraph. It is actually the measure J that we use to define our Dirichlet form, and the measure J can be constructed directly as the push-forward of a measure on U 1 × U 1 – see the proof of Lemma 6.1.2. ¤ We show in part (i) of Lemma 6.1.2 below that the kernel κ is reversible with respect to the probability measure P. More precisely, we show that if we define a measure J on Twt × Twt by Z (6.3) J(A × B) := P(dT ) κ(T, B) A

B(Twt ),

for A, B ∈ then J is symmetric, where P denotes the law of the weighted Brownian CRT (compare Definition 3.2.1). Recall ∆GHwt from (1.56) Lemma 6.1.2. (i) The measure J is symmetric. (ii) For each compact subset K ⊂ Twt and open subset U such that K ⊂ U ⊆ Twt , J(K, Twt \ U) < ∞. (iii) The function ∆GHwt is square-integrable with respect to J, that is, Z J(dT, dS) ∆2GHwt (T, S) < ∞. Twt ×Twt

166

6. SUBTREE PRUNE AND REGRAFT

Proof. (i) Given e0 , e00 ∈ U 1 , 0 ≤ u ≤ 1, and 0 < ρ ≤ 1, define ∈ U 1 by

e◦ (·; e0 , e00 , u, ρ)

(6.4) e◦ (t; e0 , e00 , u, ρ)  00  0 ≤ t ≤ (1 − ρ)u, S1−ρ e (t), 00 0 := S1−ρ e ((1 − ρ)u) + Sρ e (t − (1 − ρ)u), (1 − ρ)u ≤ t ≤ (1 − ρ)u + ρ,   S1−ρ e00 (t − ρ), (1 − ρ)u + ρ ≤ t ≤ 1. That is, e◦ (·; e0 , e00 , u, ρ) is the excursion that arises from Brownian re-scaling e0 and e00 to have lengths ρ and 1 − ρ, respectively, and then inserting the re-scaled version of e0 into the re-scaled version of e00 at a position that is a fraction u of the total length of the re-scaled version of e00 . Define a measure J on U 1 × U 1 by Z J(de∗ , de∗∗ )K(e∗ , e∗∗ ) U 1 ×U 1

(6.5)

Z 1 Z dρ 1 p P(de0 ) ⊗ P(de00 ) := du ⊗ dv √ 2 2π 0 (1 − ρ)ρ3 [0,1]2 ¡ ¢ × K e◦ (·; e0 , e00 , u, ρ), e◦ (·; e0 , e00 , v, ρ) . Z

Clearly, the measure J is symmetric. It follows from the discussion at the beginning of the proof of part (i) of Theorem 3.4.1 and Corollary 3.4.4 that the measure J is the push-forward of the symmetric measure 2J by the map (6.6)

U 1 × U 1 3 (e∗ , e∗∗ ) ¢ ¡ 7→ (T2e∗ , rT2e∗ , νT2e∗ ), (T2e∗∗ , rT2e∗∗ , νT2e∗∗ ) ∈ Twt × Twt ,

and hence J is also symmetric. (ii) The result is trivial if K = ∅, so we assume that K 6= ∅. Since Twt \ U and K are disjoint closed sets and K is compact, we have that (6.7)

c :=

inf

T ∈K,S∈U

∆GHwt (T, S) > 0.

Fix T ∈ K. If (u, v) ∈ T × T is such that ∆GHwt (T, Θ(T, u, v)) > c, then diam(T ) > c (so that we can think of Rc (T ), recall (1.46), as a subset of T ). Moreover, we claim that either • u ∈ Rc (T, v) (recall (1.32)), or • u∈ 6 Rc (T, v) and νT (S T,u,v ) > c (recall (3.30)). Suppose, to the contrary, that u ∈ / Rc (T, v) and that νT (S T,u,ρ ) ≤ c. Because u ∈ / Rc (T, v), the map f : T → Θ(T, u, v) given by ( u, if w ∈ S T,u,v , f (w) := w, otherwise.

6.2. DIRICHLET FORMS

167

is a measurable c-isometry. There is an analogous measurable c-isometry g : Θ(T, u, v) → T . Clearly, dPr (f∗ ν T , ν Θ(T,u,v) ) ≤ c and dPr (ν T , g ∗ ν Θ(T,u,v) ) ≤ c. Hence, by definition, ∆GHwt (T, Θ(T, u, v)) ≤ c. Thus we have J(K, Twt \ U) Z ≤ P{dT } κ(T, {S : ∆GHwt (T, S) > c}) K Z Z (6.8) ≤ P(dT ) νT (dv) µT (Rc (T, v)) K T Z Z P(dT ) νT (dv)µT {u ∈ T : νT (S T,u,v ) > c} + T

K

< ∞, where we have used Theorem 3.4.1. (iii) Similar reasoning yields that Z J(dT, dS) ∆2GHwt (T, S) wt wt T ×T Z Z ∞ = P{dT } dt 2t κ(T, {S : ∆GHwt (T, S) > t}) wt ZT Z 0∞ Z ≤ P(dT ) dt 2t νT (dv) µT (Rc (T, v)) wt T 0 T Z Z ∞ Z (6.9) + P(dT ) dt 2t νT (dv)µT {u ∈ T : νT {S T,u,v } > t} wt Z ∞T Z 0 Z T ≤ dt 2t P(dT ) νT (dv) µT (Rc (T, v)) 0 Twt T Z Z Z + P(dT ) νT (dv) µT (du) νT2 (S T,u,v ) Twt

T

T

< ∞, where we have applied Theorem 3.4.1 once more. 6.2. Dirichlet forms Consider the bilinear form Z ¡ ¢¡ ¢ (6.10) E(f, g) := J(dT, dS) f (S) − f (T ) g(S) − g(T ) , Twt ×Twt

for f, g in the domain © ª (6.11) D∗ (E) := f ∈ L2 (Twt , P) : f is measurable, and E(f, f ) < ∞ ,

¤

168

6. SUBTREE PRUNE AND REGRAFT

(here as usual, L2 (Twt , P) is equipped with the inner product (f, g)P := R P(dx) f (x)g(x)). By the argument in Example 1.2.1 in [FOT94] and Lemma 6.1.2, (E, D∗ (E)) is well-defined, symmetric and Markovian. Lemma 6.2.1. The form (E, D∗ (E)) is closed. That is, if (fn )n∈N be a sequence in D∗ (E) such that lim (E(fn − fm , fn − fm ) + (fn − fm , fn − fm )P ) = 0,

m,n→∞

then there exists f ∈ D∗ (E) such that lim (E(fn − f, fn − f ) + (fn − f, fn − f )P ) = 0.

n→∞

Proof. Let (fn )n∈N be a sequence such that limm,n→∞ E(fn − fm , fn − fm ) + (fn − fm , fn − fm )P = 0 (that is, (fn )n∈N is Cauchy with respect to E(·, ·) + (·, ·)P ). There exists a subsequence (nk )k∈N and f ∈ L2 (Twt , P) such that limk→∞ fnk = f , P-a.s, and limk→∞ (fnk − f, fnk − f )P = 0. By Fatou’s Lemma, Z ¡ ¢2 (6.12) J(dT, dS) (f (S) − f (T ) ≤ lim inf E(fnk , fnk ) < ∞, k→∞

and so f ∈

(6.13)

D∗ (E).

Similarly,

E(fn − f, fn − f ) Z ¡ ¢2 = J(dT, dS) lim (fn − fnk )(S) − (fn − fnk )(T ) k→∞

≤ lim inf E(fn − fnk , fn − fnk ) → 0 k→∞

as n → ∞. Thus (fn )n∈N has a subsequence that converges to f with respect to E(·, ·)+(·, ·)P , but, by the Cauchy property, this implies that (fn )n∈N itself converges to f . ¤ Let L denote the collection of functions f : Twt → R such that (6.14)

sup |f (T )| < ∞ T ∈Twt

and (6.15)

sup S,T ∈Twt , S6=T

|f (S) − f (T )| < ∞. ∆GHwt (S, T )

Note that L consists of continuous functions and contains the constants. It follows from (1.61) that L is both a vector lattice and an algebra. By Lemma 6.2.2 below, L ⊆ D∗ (E). Therefore, the closure of (E, L) is a Dirichlet form that we will denote by (E, D(E)). Lemma 6.2.2. Suppose that {fn }n∈N is a sequence of functions from Twt into R such that sup sup |fn (T )| < ∞, n∈N T ∈Twt

6.3. AN ASSOCIATED MARKOV PROCESS

sup

sup

n∈N

S,T ∈Twt , S6=T

169

|fn (S) − fn (T )| < ∞, ∆GHwt (S, T )

and lim fn = f,

n→∞

P-a.s.

for some f : Twt → R. Then {fn }n∈N ⊂ D∗ (E), f ∈ D∗ (E), and lim (E(fn − f, fn − f ) + (fn − f, fn − f )P ) = 0.

n→∞

Proof. By the definition of the measure J (see (6.3)) and the symmetry of J (Lemma 6.1.2(i)), we have that fn (x)−fn (y) → f (x)−f (y) for J-almost every pair (x, y). The result then follows from part (iii) of Lemma 6.1.2 and the dominated convergence theorem. ¤ 6.3. An associated Markov process In this section we associate the Dirichlet form (E, D(E)) with a nice Markov process. Before, we remark that L, and hence D(E) is quite a rich class of functions: we show in the proof of Theorem 6.3.1 below that L separates points of Twt and hence if K is any compact subset of Twt , then, by the Arzela-Ascoli theorem, the set of restrictions of functions in L to K is uniformly dense in the space of real-valued continuous functions on K. The following theorem states that there is a well-defined Markov process with the dynamics we would expect for a limit of the subtree prune and regraft chains. Theorem 6.3.1. There exists a recurrent P-symmetric Hunt process X = (Xt , PT ) on Twt whose Dirichlet form is (E, D(E)). Proof. We will check the conditions of Theorem 7.3.1 in [FOT94] to establish the existence of X. Because Twt is complete and separable (recall Theorem 1.11.7) S there is a wt sequence H1 ⊆ H2 ⊆ . . . of compact subsets of T such that P( k∈N Hk ) = 1. Given α, β > 0, write Lα,β for the subset of L consisting of functions f such that (6.16)

sup |f (T )| ≤ α T ∈Twt

and (6.17)

sup S,T ∈Twt , S6=T

|f (S) − f (T )| ≤ β. ∆GHwt (S, T )

By the separability of the continuous real-valued functions on each Hk with respect to the supremum norm, it follows that for each k ∈ N there is a countable set Lα,β,k ⊆ Lα,β such that for every f ∈ Lα,β (6.18)

inf

sup |f (T ) − g(T )| = 0.

g∈Lα,β,k T ∈Hk

170

6. SUBTREE PRUNE AND REGRAFT

S Set Lα,β := k∈N Lα,β,k . Then for any f ∈ Lα,β there S exists a sequence {fn }n∈N in Lα,β such that limn→∞ fn = f pointwise on S k∈N Hk , and hence P-almost surely. By Lemma 6.2.2, the countable set m∈N Lm,m is dense in L, and hence also dense in D(E), with respect to E(·, ·) + (·, ·)P . Now fix a countable dense subset S ⊂ Twt . Let M denote the countable set of functions of the form (6.19)

T 7→ p + q(∆GHwt (S, T ) ∧ r)

for some S ∈ S and p, q, r ∈ Q. Note that M ⊆ L, that M separates the points of Twt , and, for any T ∈ Twt , that there is certainly a function f ∈ M with f (T ) 6= 0. S Consequently, if C is the algebra generated by the countable set M ∪ m∈N Lm,m , then it is certainly the case that C is dense in D(E) with respect E(·, ·) + (·, ·)P , that C separates the points of Twt , and, for any T ∈ Twt , that there is a function f ∈ C with f (T ) 6= 0. All that remains in verifying the conditions of Theorem 7.3.1 in [FOT94] is to check the tightness condition that there exist compact subsets K1 ⊆ K2 ⊆ ... of Twt such that limn→∞ Cap(Twt \ Kn ) = 0 where Cap is the capacity associated with the Dirichlet form – see Remark 6.3.2 below for a definition. This convergence, however, is the content of Lemma 6.3.5 below. Finally, because constants belongs to D(E), it follows from Theorem 1.6.3 in [FOT94] that X is recurrent. ¤ Remark 6.3.2. In the proof of Theorem 6.3.1 we used the capacity associated with the Dirichlet form (E, D(E)). We remind the reader that for an open subset U ⊆ Twt , Cap(U) := inf {E(f, f ) + (f, f )P : f ∈ D(E), f (T ) ≥ 1, P−a.e.T ∈ U} , and for a general subset A ⊆ Twt Cap(A) := inf {Cap(U) : A ⊆ U is open} . We refer the reader to Section 2.1 of [FOT94] for details and a proof that Cap is a Choquet capacity. The following results were needed in the proof of Theorem 6.3.1. Lemma 6.3.3. For ε, a, δ > 0, put Vε,a := {T ∈ T : µT (Rε (T )) > a} δ := {T ∈ T : d and, as usual, Vε,a GH (T, Vε,a ) < δ}. Then, for fixed ε > 3δ, \ δ Vε,a = ∅. a>0 δ , then there exists T ∈ V Proof. Fix S ∈ T. If S ∈ Vε,a ε,a such that dGH (S, T ) < δ. Observe that Rε (T ) is not the trivial tree consisting of a single point because it has total length greater than a. Write {y1 , . . . , yn } for the leaves of Rε (T ). For all i = 1, ..., n, the connected component of T \Rε (T )o that contains yi contains a point zi such that rT (yi , zi ) = ε.

6.3. AN ASSOCIATED MARKOV PROCESS

171

Let R be a correspondence between S and T with dis(R) < 2δ (recall Definition 1.2.3). Pick x1 , ..., xn ∈ S such that (xi , zi ) ∈ R, and hence |rS (xi , xj ) − rT (zi , zj )| < 2δ for all i, j. By Lemma 1.7.3, (6.20) ¡ ¢ µT Rε (T ) = rT (y1 , y2 ) +

n X

^

k=3 1≤i≤j≤k−1

¢ 1¡ rT (yk , yi ) + rT (yk , yj ) − rT (yi , yj ) . 2

Now the distance in S from the point xk to the arc [xi , xj ] is

(6.21)

1 (rS (xk , xi ) + rS (xk , xj ) − rS (xi , xj )) 2 1 ≥ (rT (zk , zi ) + rT (zk , zj ) − rT (zi , zj ) − 3 × 2δ) 2 1 = (rT (yk , yi ) + 2ε + rT (yk , yj ) + 2ε − rT (yi , yj ) − 2ε − 6δ) 2 >0

by the assumption that ε > 3δ. In particular, x1 , . . . , xn are leaves of the subtree spanned by {x1 , . . . , xn }, and Rγ (S) has at least n leaves when 0 < γ < 2ε − 6δ. Fix such a γ. Now µS (Rγ (S)) ≥ rS (x1 , x2 ) − 2γ (6.22)

+

n X

·

^

k=3 1≤i≤j≤k−1

1 (rS (xk , xi ) + rS (xk , xj ) − rS (xi , xj )) − γ 2

¸

T

≥ µ (Rε (T )) + (2ε − 2δ − 2γ) + (n − 2)(ε − 3δ − γ) ≥ a + (2ε − 2δ − 2γ) + (n − 2)(ε − 3δ − γ). δ Because µS (Rγ (S)) is finite, it is apparent that S cannot belong to Vε,a when a is sufficiently large. ¤

Lemma 6.3.4. For ε, a > 0, let Vε,a be as in Lemma 6.3.3. Set Uε,a := {(T, ν) ∈ Twt : T ∈ Vε,a }. Then, for fixed ε, (6.23)

lim Cap(Uε,a ) = 0.

a→∞

Proof. Observe that (T, rT , νT ) 7→ µRε (T ) (T ) is continuous (this is essentially Lemma 1.9.3), and so Uε,a is open. Choose δ > 0 such that ε > 3δ. Suppressing the dependence on ε and δ, define ua : Twt → [0, 1] by ¡ ¢ (6.24) ua ((T, ν)) := δ −1 δ − dGH (T, Vε,a ) + .

172

6. SUBTREE PRUNE AND REGRAFT

Note that ua takes the value 1 on the open set Uε,a , and so Cap(Uε,a ) ≤ E(ua , ua ) + (ua , ua )P . Also observe that (6.25)

|ua ((T 0 , ν 0 )) − ua ((T 00 , ν 00 ))| ≤ δ −1 dGH (T 0 , T 00 ) ≤ δ −1 ∆GHwt ((T 0 , ν 0 ), (T 00 , ν 00 )).

It therefore suffices by part (iii) of Lemma 6.1.2 and the dominated convergence theorem to show for each pair ((T 0 , ν 0 ), (T 00 , ν 00 )) ∈ Twt × Twt that ua ((T 0 , ν 0 )) − ua ((T 00 , ν 00 )) is 0 for a sufficiently large and for each T ∈ Twt that ua ((T, ν)) is 0 for a sufficiently large. However, ua ((T 0 , ν 0 )) − δ , while ua ((T 00 , ν 00 )) 6= 0 implies that either T 0 or T 00 belong to Vε,a δ . The result then follows ua ((T, ν)) 6= 0 implies that T belongs to Vε,a from Lemma 6.3.3. ¤ Lemma 6.3.5. There is a sequence of compact sets K1 ⊆ K2 ⊆ . . . such that limn→∞ Cap(Twt \ Kn ) = 0. Proof. By Lemma 6.3.4, for n = 1, 2, . . . we can choose an so that Cap(U2−n ,an ) ≤ 2−n . Set (6.26)

Fn := Twt \ U2−n ,an = {(T, ν) ∈ Twt : µT (R2−n (T )) ≤ an }

and (6.27)

Kn :=

\

Fm .

m≥n

By Proposition 1.11.5 and Proposition 1.10.1, each set Kn is compact. By construction,   [ Cap(Twt \ Kn ) = Cap  U2−m ,am  m≥n (6.28) X X ≤ Cap(U2−m ,am ) ≤ 2−m = 2−(n−1) . m≥n

m≥n

¤ 6.4. The trivial tree is essentially polar From our informal picture of the process X evolving via re-arrangements of the initial tree that preserve the total branch length, one might expect that if X does not start at the trivial tree T0 consisting of a single point, then X will never hit T0 . However, an SPR move can decrease the diameter of a tree, so it is conceivable that, in passing to the limit, there is some probability that an infinite sequence of SPR moves will conspire to collapse the evolving tree down to a single point. Of course, it is hard to imagine from the approximating dynamics how X could recover from such a catastrophe

6.4. THE TRIVIAL TREE IS ESSENTIALLY POLAR

173

– which it would have to since it is reversible with respect to the continuum random tree distribution. In this section we will use potential theory for Dirichlet forms to show that X does not hit T0 from P-almost all starting points; that is, that the set {T0 } is essentially polar. Let r¯ be the map which sends a weighted R tree (T, d, ν) to the νaveraged distance between pairs of points in T . That is, Z Z ¡ ¢ (6.29) r¯ (T, r, ν) := ν(dx)ν(dy) r(x, y), (T, r, ν) ∈ Twt . T

T

In order to show that T0 is essentially polar, it will suffice to show that the set ¡ ¢ (6.30) {(T, r, ν) ∈ Twt : r¯ (T, r, ν) = 0} is essentially polar. Lemma 6.4.1. The function r¯ belongs to the domain D(E). ¡ ¢ R R Proof. If we let r¯n (T, r, ν) := T T ν(dx)ν(dy) [r(x, y) ∧ n], for n ∈ N, then r¯n ↑ r¯, P-a.s. By the triangle inequality, Z Z ¡ ¢2 2 (6.31) (¯ r, r¯)P ≤ P(dT ) (diam(T )) ≤ P(de) 4 sup e(t) < ∞, t∈[0,1]

and hence r¯n → r¯ as n → ∞ in L2 (Twt , P). Notice, moreover, that for (T, r, ν) ∈ Twt and u, v ∈ T , ³ ¡ ¢ ¡ ¢´2 r¯ (T, r, ν) − r¯ Θ((T, r, ν), u, v) Z Z ¡ ¢2 (6.32) =2 ν(dx)ν(dy) r(y, u) − r(y, v) =

S T,u,v T \S T,u,v 2νT (S T,u,v )ν(T \

S T,u,v ) r2 (u, v).

Hence, applying Corollary 3.4.4 and the invariance of the standard Brownian excursion under random re-rooting (see Section 2.7 of [Ald91b]), (6.33) Z ¡ ¢2 J(dT, dS) r¯(T ) − r¯(S) Twt ×Twt Z Z =2 P(dT ) νT (dv)µT (du)νT (S T,u,v )νT (T \ S T,u,v ) rT2 (u, v) Twt T ×T Z Z ds ⊗ da ζ(ˆ es,a )ζ(ˇ es,a )(2a)2 ≤ 2 P(de) 2 s ¯ (e, s, a) − s(e, s, a) Γe Z 1 Z ¡ ¢2 8 dρ p =√ P(de0 ) ⊗ P(de00 ) ρ(1 − ρ) sup S1−ρ e00 2π 0 (1 − ρ)ρ3 Ã !2 Z 1 Z 8 dρ 2 p =√ ρ(1 − ρ) P(de) sup e(t) < ∞. 2π 0 (1 − ρ)ρ3 t∈[0,1]

174

6. SUBTREE PRUNE AND REGRAFT

Consequently, by dominated convergence, E(¯ r − r¯n , r¯ − r¯n ) → 0 as n → ∞. It is therefore enough to verify that r¯n ∈ L for all n ∈ N. Obviously, (6.34)

sup r¯n (T ) ≤ n,

T ∈Twt

and so the boundedness condition (6.14) holds. To show that the “Lipschitz” wt property that ¡ (6.15) holds, ¢fix ε > 0, and let (T, νT ), (S, νSε ) ∈ T be such ε ∆GHwt (T, νT ), (S, νS ) < ε. Then there exist f ∈ FT,S and g ∈ FS,T such ε from (1.55)). Hence that dP (νT , g∗ νS ) < ε and dP (f∗ νT , νS ) < ε (recall FT,S

(6.35)

¯ ¯ ¯ ¡ ¢ ¡ ¢¯ ¯r¯n (T, νT ) − r¯n (S, νS ) ¯ ¯ ¯ ¯Z Z ¯ ≤ ¯¯ νT (dx)νT (dy) (rT (x, y) ∧ n) T T ¯ Z Z ¯ − g∗ νS (dx)g∗ νS (dy) (rT (x, y) ∧ n)¯¯ g(S) g(S) ¯Z Z ¯ ¯ +¯ g∗ νS (dx)g∗ νS (dy) (rT (x, y) ∧ n) g(S) g(S) ¯ Z Z ¯ 0 0 0 0 − νS (dx )νS (dy ) (rS (x , y ) ∧ n)¯¯. S

S

For the first term on the right hand side of (6.35) we get

(6.36)

¯Z Z ¯ ¯ νT (dx)νT (dy) (rT (x, y) ∧ n) ¯ T T ¯ Z Z ¯ − g∗ νS (dx)g∗ νS (dy) (rT (x, y) ∧ n)¯¯ g(S) g(S) ¯Z Z ¯ ≤ ¯¯ νT (dx)νT (dy) (rT (x, y) ∧ n) T T ¯ Z Z ¯ − νT (dx)g∗ νS (dy) (rT (x, y) ∧ n)¯¯ T g(S) ¯Z Z ¯ g∗ νS (dx)νT (dy) (rT (x, y) ∧ n) + ¯¯ S(g) T ¯ Z Z ¯ − g∗ νS (dx)g∗ νS (dy) (rT (x, y) ∧ n)¯¯. g(S)

g(S)

By assumption and Theorem 3.1.2 in [EK86], we can find a probability measure ν on T × T with marginals νT and g∗ νS such that (6.37)

© ª ν (x, y) : rT (x, y) ≥ ε ≤ ε.

6.4. THE TRIVIAL TREE IS ESSENTIALLY POLAR

175

Hence, for all x ∈ T , ¯Z ¯ Z ¯ ¯ ¯ νT (dy) (rT (x, y) ∧ n) − ¯ g ν (dy) (r (x, y) ∧ n) ∗ S T ¯ ¯ T g(S) ¯ ¯ Z ¯ ¡ ¢¯ ≤ ν d(y, y 0 ) ¯¯(rT (x, y) ∧ n) − (rT (x, y 0 ) ∧ n)¯¯ (6.38) T ×g(S) Z ¡ ¢ ≤ ν d(y, y 0 ) (rT (y, y 0 ) ∧ n) T ×g(S)

¡ ¢ ≤ 1 + (diam(T ) ∧ n) · ε. For the second term in (6.35) we use the fact that g is an ε-isometry, that is, |(rS (x0 , y 0 ) ∧ n) − (rT (g(x0 ), g(y 0 )) ∧ n)| < ε for all x0 , x00 ∈ T . A change of variables then yields that ¯Z Z ¯ ¯ g∗ νS (dx)g∗ νS (dy) (rT (x, y) ∧ n) ¯ g(S) g(S) ¯ Z Z ¯ 0 0 0 0 − νS (dx )νS (dy ) (rS (x , y ) ∧ n)¯¯ S S ¯Z Z ¯ (6.39) ≤ ε + ¯¯ g∗ νS (dx)g∗ νS (dy) (rT (x, y) ∧ n) g(S) g(S) ¯ Z Z ¯ 0 0 0 0 − νS (dx )νS (dy ) (rT (g(x ), g(y )) ∧ n)¯¯ S

S

= ε. Combining (6.35) through (6.39) yields finally that ¯ ¡ ¡ ¢¯ ¢ ¯r¯n (T, νT ) − r¯n (S, νS ) ¯ ¡ ¢ ≤ 3 + 2n. (6.40) sup (T,νT )6=(S,νS )∈Twt ∆GHwt (T, νT ), (S, νS ) ¤ Proposition 6.4.2. The set {T ∈ Twt : r¯(T ) = 0} is essentially polar. In particular, the set {T0 } consisting of the trivial tree is essentially polar. Proof. We need to show that Cap({T ∈ Twt : r¯(T ) = 0}) = 0 (see Theorem 4.2.1 of [FOT94]). For ε > 0 set (6.41)

Wε := {T ∈ Twt : r¯(T ) < ε}.

By the argument in the proof of Lemma 6.4.1, the function r¯ is continuous, and so Wε is open. It suffices to show that Cap(Wε ) ↓ 0 as ε ↓ 0. Put µ ¶ r¯(T ) (6.42) uε (T ) := 2 − , T ∈ Twt . ε +

176

6. SUBTREE PRUNE AND REGRAFT

Then u ∈ D(E) by Lemma 6.4.1 and the fact that the domain of a Dirichlet form is closed under composition with Lipschitz functions. Because uε (T ) ≥ 1 for T ∈ Wε , it thus further suffices to show (6.43)

lim (E(uε , uε ) + (uε , uε )P ) = 0. ε↓0

By elementary properties of the standard Brownian excursion, © ª (6.44) (uε , uε )P ≤ 4P T : r¯(T ) < 2ε → 0 as ε ↓ 0. Estimating E(uε , uε ) will be somewhat more involved. ˆ and E ˇ be two independent standard Brownian excursions, and let Let E ˆ U and V be two independent random variables that are independent of E ˇ and E and uniformly distributed on [0, 1]. With a slight abuse of notation, we will write P for the probability measure on the probability space where ˆ E, ˇ U and V are defined. E, Set · ¸ Z ˆ := 4 ˆs + E ˆt − 2 inf E ˆw D ds ⊗ dt E s≤w≤t

0≤s

Suggest Documents