Markov Processes and Related Topics: A Festschrift

0 downloads 0 Views 4MB Size Report
Martingale problems for conditional distributions of Markov processes. Elec- .... Gary Shon Katzenberger (1990), Solutions of a stochastic differential equation.
Institute of Mathematical Statistics COLLECTIONS Volume 4

Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz

Stewart N. Ethier, Jin Feng and Richard H. Stockbridge, Editors

Institute of Mathematical Statistics Beachwood, Ohio, USA

Institute of Mathematical Statistics Collections

Series Editor: Anirban DasGupta

The production of the Institute of Mathematical Statistics Collections is managed by the IMS Office: Rong Chen, Treasurer and Elyse Gustafson, Executive Director.

Library of Congress Control Number: 2008943066 International Standard Book Number 978-0-940600-76-8 International Standard Serial Number 1939-4039 DOI: 10.1214/074921708 c 2008 Institute of Mathematical Statistics Copyright  All rights reserved Printed in the Lithuania

Contents PART A. MARKOV CHAINS AND BRANCHING PROCESSES The Decomposition-Separation Theorem for Finite Nonhomogeneous Markov Chains and Related Problems Isaac M. Sonin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Conditional Limit Laws and Inference for Generation Sizes of Branching Processes P. E. Ney, A. N. Vidyashankar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Absorption Time Distribution for an Asymmetric Random Walk S. N. Ethier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

PART B. STOCHASTIC DIFFERENTIAL EQUATIONS Fractional Stability of Diffusion Approximation for Random Differential Equations Yuriy V. Kolomiets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 From Particles with Random Potential to a Nonlinear Vlasov–Fokker–Planck Equation Jin Feng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

Diffusion Processes on Manifolds Fabrice Debbasch, Claire Chevalier . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

Stochastic Equations Driven by a Cauchy Process Vladimir P. Kurenok . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

PART C. FILTERING On Detecting Fake Coin Flip Sequences Michael A. Kouritzin, Fraser Newton, Sterling Orsten, Daniel C. Wilson . . . . . . . 107 A Class of Multivariate Micromovement Models of Asset Price and Their Bayesian Model Selection via Filtering Laurie C. Scott, Yong Zeng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 PART D. STOCHASTIC CONTROL Determining the Optimal Control of Singular Stochastic Processes Using Linear Programming Kurt Helmes, Richard H. Stockbridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 A Degenerate Variance Control Problem with Discretionary Stopping Daniel Ocone, Ananda Weerasinghe . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 PART E. STOCHASTIC NETWORKS AND QUEUEING Double Skorokhod Map and Reneging Real-Time Queues L  ukasz Kruk, John Lehoczky, Kavita Ramanan, Steven Shreve

. . . . . . . . . . . . . 169

Bounding Stationary Expectations of Markov Processes Peter W. Glynn, Assaf Zeevi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Internet Traffic and Multiresolution Analysis Ying Zhang, Zihui Ge, Suhas Diggavi, Z. Morley Mao, Matthew Roughan, Vinay Vaishampayan, Walter Willinger, Yin Zhang . . . . . . . . . . . . . . . . . . . . . . . 215 Maximum Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input Tyrone E. Duncan, Yasong Jin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

iii

iv

Contents

Fluid Model for a Data Network with α-Fair Bandwidth Sharing and General Document Size Distributions: Two Examples of Stability H. C. Gromoll, R. J. Williams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 PART F. FINANCE No Arbitrage and General Semimartingales Philip Protter, Kazuhiro Shimbo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Optimal Asset Allocation under Forward Exponential Performance Criteria Marek Musiela, Thaleia Zariphopoulou . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Estimates of Dynamic V aR and Mean Loss Associated to Diffusion Processes Laurent Denis, Bego˜ na Fern´ andez, Ana Meda . . . . . . . . . . . . . . . . . . . . . . . 301 PART G. POPULATION GENETICS A Duality Identity between a Model of Bacterial Recombination and the Wright– Fisher Diffusion Xavier Didelot, Jesse E. Taylor, Joseph C. Watkins . . . . . . . . . . . . . . . . . . . 315

Contributors to this volume Chevalier, Claire, Universit´e Pierre et Marie Curie–Paris 6 Debbasch, Fabrice, Universit´e Pierre et Marie Curie–Paris 6 Denis, Laurent, Universit´e d’Evry-Val-d’Essonne Didelot, Xavier, University of Oxford Diggavi, Suhas, EPFL Duncan, Tyrone E., University of Kansas Ethier, S. N., University of Utah Feng, Jin, University of Kansas Fern´ andez, Bego˜ na, Universidad Nacional Aut´ onoma de M´exico, UNAM Ge, Zihui, AT&T Labs-Research Glynn, Peter W., Stanford University Gromoll, H. C., University of Virginia Helmes, Kurt, Humboldt-Universit¨ at zu Berlin Jin, Yasong, University of Kansas Kolomiets, Yu. V., Kent State University and Institute for Applied Mathematics and Mechanics NAS of Ukraine Kouritzin, Michael A., University of Alberta Kruk, L  ukasz, Maria Curie–Sklodowska University Kurenok, Vladimir P., University of Wisconsin–Green Bay Lehoczky, John, Carnegie Mellon University Mao, Z. Morley, University of Michigan Meda, Ana, Universidad Nacional Aut´ onoma de M´exico, UNAM Musiela, Marek, BNP Paribas Newton, Fraser, Random Knowledge Inc. Ney, P. E., University of Wisconsin–Madison Ocone, Daniel, Rutgers University Orsten, Sterling, Random Knowledge Inc. Protter, Philip, Cornell University Ramanan, Kavita, Carnegie Mellon University Roughan, Matthew, University of Adelaide Scott, Laurie C., University of Missouri–Kansas City and DST Systems, Inc. Shimbo, Kazuhiro, Mizuho Alternative Investments, LLC Shreve, Steven, Carnegie Mellon University Sonin, Isaac M., University of North Carolina at Charlotte Stockbridge, Richard H., University of Wisconsin–Milwaukee Taylor, Jesse E., University of Oxford Vaishampayan, Vinay, AT&T Labs-Research Vidyashankar, A. N., Cornell University Watkins, Joseph C., University of Arizona Weerasinghe, Ananda, Iowa State University Williams, R. J., University of California, San Diego Willinger, Walter, AT&T Labs-Research Wilson, Daniel C., Invidi Technologies Corporation Zariphopoulou, Thaleia, University of Texas at Austin Zeevi, Assaf, Columbia University v

vi

Contributors to this volume

Zeng, Yong, University of Missouri–Kansas City Zhang, Yin, University of Texas at Austin Zhang, Ying, University of Michigan

Preface

A four-day conference, “Markov Processes and Related Topics,” was held at the University of Wisconsin–Madison July 10–13, 2006, in celebration of Tom Kurtz’s 65th birthday and his many contributions to mathematics. Speakers were invited to submit a paper to this collection, and after a lengthy refereeing and editing process, the present “Festschrift” volume has emerged. Its diversity of topics is a reflection of the wide range of subjects to which Tom has contributed. Tom Kurtz was born in Kansas City on July 14, 1941. He graduated from La Plata High School in La Plata, Missouri, in 1959, earned a B.A. degree from the University of Missouri in 1963, and completed his Ph.D. in mathematics at Stanford University in 1967 under the supervision of James McGregor. That same year he began his career at the University of Wisconsin–Madison, where he remained through his retirement in 2008. He became Professor of Mathematics in 1975, Professor of Statistics in 1985, and Paul L´evy Professor of Mathematics and Statistics in 1996. Over the course of his career, he served his profession in numerous capacities, including Director of the Center for the Mathematical Sciences (1990–1996), Editor of the Annals of Probability (2000–2002), and President of the Institute of Mathematical Statistics (2005–2006). He organized the Summer Intern Program in Probability for nearly a decade; this program had a significant impact on the next generation of probabilists. Tom has published some 90 papers (with 46 distinct coauthors), two books, and two sets of lecture notes, and he has had 26 Ph.D. students. A complete list is provided beginning on page ix. Topics to which Tom has contributed include • operator semigroup theory: 2, 4, 9, 12, 15, 19, 21, 22, 25, 36. • theory of Markov processes: 13, 33, 44, 67. • limit theorems for Markov processes: 3, 6, 7, 8, 10, 18, 20, 23, 29, 31, 32, 34, 35, 37, 45, 46, 50, 51, 60, 85. • stochastic equations: 28, 49, 59, 73, 82, 83, 87, 91. • filtering: 38, 39, 43, 77. • stochastic control: 42, 47, 68, 75, 79. • queueing theory: 58, 61, 64, 66, 70, 86. • branching processes: 5, 14, 24, 65, 90. • point processes: 17, 88. • population genetics models: 30, 41, 48, 53, 55, 57, 62, 63, 69, 74, 76, 80, 81. • other population processes: 56, 71, 72, 84, 89. • miscellaneous probability theory: 11, 16, 26, 27. • analysis of algorithms: 40, 52, 54, 78. • fluid mechanics: 1. vii

viii

The conference was sponsored by the National Science Foundation, the National Security Agency, the U.S. Army Research Office, the Office of Naval Research, the UW–Madison Departments of Mathematics and Statistics, the UW–Milwaukee Department of Mathematical Sciences, and the Institute of Mathematical Statistics. We are grateful for their help in making the conference, and therefore this volume, possible. July, 2008

Stewart N. Ethier Jin Feng Richard H. Stockbridge

Publications of Thomas G. Kurtz Articles 1. On numerical calculation of transonic flow patterns (with S. Bergman and J. G. Herriot), Math. Comp. 22 (1968), 13–27. MR0224335 (36 #7379). 2. Extensions of Trotter’s operator semigroup approximation theorems, J. Functional Analysis 3 (1969), 354–375. MR0242016 (39 #3351). 3. A note on sequences of continuous parameter Markov chains, Ann. Math. Statist. 40 (1969), 1078–1082. MR0246370 (39 #7674). 4. A general theorem on convergence of operator semigroups, Trans. Amer. Math. Soc. 148 (1970), 23–32. MR0256210 (41 #867). 5. Approximation of age dependent, multitype branching processes, Ann. Math. Statist. 41 (1970), 363–368. MR0258137 (41 #2784). 6. Solutions of ordinary differential equations as limits of pure jump Markov processes, J. Appl. Probab. 7 (1970), 49–58. MR0254917 (40 #8124). 7. Comparison of semi-Markov and Markov processes, Ann. Math. Statist. 42 (1971), 991–1002. MR0278382 (43 #4112). 8. Limit theorems for sequences of jump Markov processes approximating ordinary differential processes, J. Appl. Probab. 8 (1971), 344–356. MR0287609 (44 #4812). 9. A random Trotter product formula, Proc. Amer. Math. Soc. 35 (1972), 147– 154. MR0303347 (46 #2484). 10. The relationship between stochastic and deterministic models for chemical reactions, J. Chem. Phys. 57 (1972), 2976–2978. 11. Inequalities for the law of large numbers, Ann. Math. Statist. 43 (1972), 1874– 1883. MR0378045 (51 #14214). 12. A limit theorem for perturbed operator semigroups with applications to random evolutions, J. Functional Analysis 12 (1973), 55–67. MR0365224 (51 #1477). 13. A generalization of Dynkin’s identity and some applications (with Krishna B. Athreya), Ann. Probab. 1 (1973), 570–579. MR0348847 (50 #1342). 14. The nonexistence of the Yaglom limit for an age dependent subcritical branching process (with Stephen Wainger), Ann. Probab. 1 (1973), 857–861. MR 0353473 (50 #5956). 15. Convergence of sequences of semigroups of nonlinear operators with an application to gas kinetics, Trans. Amer. Math. Soc. 186 (1973), 259–272. MR0336482 (49 #1256). Erratum 209 (1975), 442. 16. Berry–Esseen estimates in Hilbert space and an application to the law of the iterated logarithm (with James Kuelbs), Ann. Probab. 2 (1974), 387–407. MR0362427 (50 #14868). 17. Point processes and completely monotone set functions, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 31 (1974), 57–67. MR0362480 (50 #14921). 18. Semigroups of conditioned shifts and approximation of Markov processes, Ann. Probab. 3 (1975), 618–642. MR0383544 (52 #4425). 19. An abstract averaging theorem, J. Functional Analysis 23 (1976), 135–144. MR0425674 (54 #13628). ix

x

20. Limit theorems and diffusion approximations for density dependent Markov chains. Stochastic Systems: Modeling, Identification and Optimization. Ed. Roger J. B. Wets, North Holland, Ames (1976), 67–78. MR0445626 (56 #3962). 21. Applications of an abstract perturbation theorem to ordinary differential equations, Houston J. Math. 3 (1977), 67–82. MR0425288 (54 #13245). 22. Landau–Kolmogorov inequalities for semigroups and groups (with Melinda W. Certain), Proc. Amer. Math. Soc. 63 (1977), 226–230. MR0458242 (56 #16445). 23. Strong approximation theorems for density dependent Markov chains, Stochastic Process. Appl. 6 (1978), 223–240. MR0464414 (57 #4344). 24. Diffusion approximations for branching processes, Branching Processes, Advances in Probability 5, Ed. Joffe and Ney (1978), 268–292. MR0517538 (80g:60089). 25. A variational formula for the growth rate of a positive operator semigroup, SIAM J. Math. Anal. 10 (1979), 112–117. MR0516757 (80e:47034). 26. Necessary and sufficient conditions for complete convergence in the law of large numbers (with Soren Asmussen), Ann. Probab. 8 (1980), 176–182. MR 0556425 (81d:60034). 27. The optional sampling theorem for martingales indexed by directed sets, Ann. Probab. 8 (1980), 675–681. MR0577309 (81f:60063). 28. Representations of Markov processes as multiparameter time changes, Ann. Probab. 8 (1980), 682–715. MR0577310 (82d:60130). 29. Relationships between stochastic and deterministic population models, Proceedings of the Heidelberg Conference on Models of Biological Growth and Spread, Lecture Notes in Biomathematics 38, Springer-Verlag, Berlin (1980), 449–467. MR0609379 (83m:60100). 30. On the infinitely-many-neutral-alleles diffusion model (with Stewart N. Ethier), Adv. in Appl. Probab. 13 (1981), 429–452. MR0615945 (82j:60143). 31. Approximation of discontinuous processes by continuous processes, Proceedings, Bielefeld Conf. on Stochastic Nonlinear Systems in Physics, Chemistry and Biology, Springer-Verlag, Berlin (1981), 22–35 32. The central limit theorem for Markov chains, Ann. Probab. 9 (1981), 557–560. MR0624682 (83e:60069). Acknowledgement of priority, 12, 282. 33. Applications of duality to measure-valued processes (with Donald A. Dawson). Advances in Filtering and Optimal Stochastic Control. Lecture Notes in Control and Information Sci. 42, Springer-Verlag, Berlin (1982), 91–105. MR0794506 (86j:60175). 34. Representation and approximation of counting processes. Advances in Filtering and Optimal Control. Lecture Notes in Control and Information Sci. 42, Springer-Verlag, Berlin (1982), 177–191. MR0794515 (86m:60128). 35. Gaussian approximation for Markov chains and counting processes. Bull. Inst. Internat. Statist., Proceedings of the 44th Session, Invited Paper Vol. 1 (1983), 361–376. MR0820729 (87c:60054). 36. A counter example for the Trotter product formula (with Michel Pierre), J. Differential Equations 52 (1984), 407–414. MR0744304 (85i:47072). 37. Approaches to weak convergence. Multifunctions and Integrands, Lecture Notes in Math 1091, Springer-Verlag, Berlin (1984), 173–183. MR0785584 (86k:60004). 38. Unique characterization of conditional distributions in nonlinear filtering (with Daniel L. Ocone). Proceedings of IEEE Conference on Decision and Control

xi

(1984). 39. A martingale problem for conditional distributions and uniqueness for the nonlinear filtering equations (with Daniel Ocone). Stochastic Differential Systems, Filtering and Control. Lecture Notes in Control and Information Sci. 69, Springer-Verlag, Berlin (1985), 224–234. MR0798326 (86m:60116). 40. A probabilistic distributed algorithm for set intersection and its analysis (with Udi Manber). Theoret. Comput. Sci. 49 (1987), 267–282. MR0909334 (89h:68066). 41. The infinitely-many-alleles model with selection as a measure-valued diffusion (with Stewart N. Ethier). Stochastic Methods in Biology. Lecture Notes in Biomath. 70, Springer-Verlag, Berlin (1987), 72–86. MR0893637 (89c:92037). 42. Martingale problems for controlled processes. Stochastic Modelling and Filtering. Lecture Notes in Control and Information Sci. 91, Springer-Verlag, Berlin (1987), 75–90. MR0894107 (88i:93062). 43. Unique characterization of conditional distributions in nonlinear filtering (with Daniel L. Ocone). Ann. Probab. 16 (1988), 80–107. MR0920257 (88m:93146). 44. Martingale problems for constrained Markov problems. Recent Advances in Stochastic Calculus, J. S. Baras and V. Mirelli, eds. Springer-Verlag, New York (1990), 151–168. MR1255166 (95d:60081). 45. Random time changes and convergence in distribution under the Meyer– Zheng conditions. Ann. Probab. 19 (1991), 1010–1034. MR1112405 (92m: 60033). 46. Weak limit theorems for stochastic integrals and stochastic differential equations (with Philip Protter). Ann. Probab. 19 (1991), 1035–1070. MR1112406 (92k:60130). 47. A control formulation for constrained Markov processes. Mathematics of Random Media. Lectures in Applied Mathematics 27, AMS, Providence (1991), 139–150. MR1117242 (92h:60122). 48. On the functional central limit theorem for the Ewen sampling formula (with Peter Donnelly and Simon Tavare). Ann. Appl. Probab. 1 (1991), 539–545. MR1129773 (93e:60064). 49. Wong–Zakai corrections, random evolutions, and simulation schemes for SDEs (with Philip Protter). Stochastic Analysis: Liber Amicorum for Moshe Zakai. Academic Press, Boston. (1991), 331–346. MR1119837 (92k:60131). 50. Characterizing the weak convergence of stochastic integrals (with Philip Protter). Stochastic Analysis, M. T. Barlow and N. H. Bingham, eds. Cambridge University Press, Cambridge (1991), 255–259. MR1166413 (93e:60104). 51. Averaging for martingale problems and stochastic approximation. Applied Stochastic Analysis. Proceedings of the US-French Workshop. Lecture Notes in Control and Information Sci. 177 (1992), 186–209. MR1169928 (93h:60070). 52. Results on local stability of fixed step size recursive algorithms. Proceedings of the International Conference on Acousitics, Speech and Signal Processing. San Francisco, March 1992. 53. On the stationary distribution of the neutral diffusion model in population genetics (with Stewart N. Ethier). Ann. Appl. Probab. 2 (1992), 24–35. MR 1143391 (93j:60107). 54. Weak convergence and local stability properties of fixed step size recursive algorithms (with James Bucklew and William Sethares). IEEE Trans. Information Theory 39 (1993), 966–978. MR1237722 (94g:94001). 55. Fleming–Viot processes in population genetics (with Stewart N. Ethier). SIAM J. Control Optim. 31 (1993), 345–386. MR1205982 (94d:60131).

xii

56. Correlation and variability in birth processes (with Peter Donnelly and Paul Marjoram). J. Appl. Probab. 30 (1993) 275–284. MR1212661 (94c:60135). 57. Convergence of Fleming–Viot processes in the weak atomic topology (with Stewart N. Ethier). Stochastic Process. Appl. 54 (1994), 1–27. MR1302692 (95m:60075). 58. Large loss networks (with Philip J. Hunt). Stochastic Process. Appl. 53 (1994), 363–378. MR1302919 (95k:60246). 59. Stratonovich stochastic differential equations driven by general semimartin´ gales (with Etienne Pardoux and Philip Protter). Ann. Inst. H. Poincar´e Probab. Statist. 31 (1995), 351–377. MR1324812 (96d:60085). 60. Averaging stochastically perturbed Hamiltonian systems (with Federico Marchetti). Stochastic Analysis. AMS Proceedings of Symposia in Pure Mathematics 57 (1995), 93–114. MR1335465 (96f:60098). 61. A multiclass station with Markovian feedback in heavy traffic (with J. G. Dai). Math. Oper. Res. 20 (1995), 721–742. MR1354779 (96m:60209). 62. A countable representation of the Fleming–Viot measure-valued diffusion (with Peter Donnelly). Ann. Probab. 24 (1996), 698–742. MR1404525 (98f: 60162). 63. The asymptotic behavior of an urn model arising in population genetics (with Peter Donnelly). Stochastic Process. Appl. 64 (1996), 1–16. MR1419488 (98g:60152). 64. Limit theorems for workload input models. Stochastic Networks: Theory and Applications, F. P. Kelly, S. Zachary, I. Ziedins, eds. Royal Statistical Society Lecture Note Series 4. Oxford Science Publications, Oxford (1996), 119–140. 65. A conceptual proof of the Kesten–Stigum theorem for multi-type branching processes (with Russell Lyons, Robin Pemantle, and Yuval Peres). Classical and Modern Branching Processes, K. B. Athreya and P. Jagers, eds. IMA Volumes in Mathematics and its Applications 84. Springer, New York (1997), 181–185. MR1601737 (98j:60122). 66. Looking behind and beyond self-similarity: Scaling phenomena in measured WAN traffic (with A. Feldmann, A. C. Gilbert, and W. Willinger). Proceedings of the 35th Annual Allerton Conference on Communication, Control and Computing, University of Illinois at Urbana-Champaign, Allerton House, Monticello, IL, September 29 – October 1, 1997, (1997), 269–280. 67. Martingale problems for conditional distributions of Markov processes. Electron. J. Probab. 3 (1998), no. 9, 29pp. MR1637085 (99k:60186). 68. Existence of Markov controls and characterization of optimal Markov controls (with Richard H. Stockbridge). SIAM J. Control Optim. 36 (1998), 609–653. MR1616514 (99b:93051). Erratum 37 (1999) 1310–1311. MR1691943 (2000e:93077). 69. Coupling and ergodic theorems for Fleming–Viot processes (with Stewart N. Ethier). Ann. Probab. 26 (1998), 533–561. MR1626158 (99f:60074). 70. The changing nature of network traffic: Scaling phenomena (with A. Feldman, A. C. Gilbert and W. Willinger). ACM SIGCOMM Computer Communication Review, April, 1998. 71. Particle representations for measure-valued population processes with spatially varying birth and death rates. Proceedings of International Conference on Stochastic Models (Ottawa, June, 1998). CMS Conference Proceedings 26, 299–317. AMS, Providence. MR1765017 (2002b:60082). 72. Particle representations for measure-valued population models (with Peter Donnelly). Ann. Probab. 27 (1999), 166–205. MR1681126 (2000f:60108).

xiii

73. Particle representations for a class of nonlinear SPDEs (with Jie Xiong). Stochastic Process. Appl. 83 (1999), 103–126. MR1705602 (2000g:60108). 74. Genealogical processes for Fleming–Viot models with selection and recombi˙ nation (with Peter Donnelly). AnnAppl. Probab. 9 (1999), 1091–1148. MR 1728556 (2001h:92029). 75. Martingale problems and linear programs for singular control (with Richard H. Stockbridge). Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing (1999), 11–20. 76. Continuum-sites stepping-stone models, coalescing exchangeable partitions and random trees (with Peter Donnelly, Steven N. Evans, Klaus Fleischmann, and Xiaowen Zhou). Ann. Probab. 28 (2000), 1063–1110. MR1797304 (2001j: 60183). 77. Numerical solutions for a class of SPDEs with application to filtering (with Jie Xiong). Stochastics in Finite and Infinite Dimensions (In honor of Gopinath Kallianpur) (2001), 233–258. Birkh¨ auser, Boston. MR1797090 (2001h:65013). 78. Mixed time scale recursive algorithms (with James A. Bucklew). IEEE Trans. Signal Process. 49 (2001), 1824–1830. MR1850132 (2002i:94006). 79. Stationary solutions and forward equations for controlled and singular martingale problems (with Richard H. Stockbridge). Electron. J. Probab. 6 (2001), no. 17, 52pp. MR1873294 (2002j:60128). 80. Gaussian limits associated with the Poisson–Dirichlet distribution and the Ewens sampling formula (with Paul Joyce and Stephen M. Krone). Ann. Appl. Probab. 12 (2002), 101–124. MR1890058 (2002m:62019). 81. When can one detect overdominant selection in the infinite-alleles model? (with Paul Joyce and Stephen M. Krone). Ann. Appl. Probab. 13 (2003), 181–212. MR1951997 (2004c:60059). 82. A stochastic evolution equation arising from the fluctuation of a class of interacting particle systems (with Jie Xiong). Commun. Math. Sci. 2 (2004), 325–358. MR2118848 (2005m:60136). 83. The approximate Euler method for L´evy driven stochastic differential equations (with Jean Jacod, Sylvie M´el´eard, and Philip Protter) Ann. Inst. H. Poincar´e Probab. Statist. 41 (2005), 523–558. MR2139032 (2005m: 60149). 84. Spatial birth and death processes as solutions of stochastic equations (with Nancy Lopes Garcia). ALEA Lat. Am. J. Probab. Math. Stat. 1 (2006), 281– 303. MR2249658 (2007k:60326). 85. Diffusion approximation of transport processes with general reflecting boundary conditions (with Cristina Costantini). Math. Models Methods Appl. Sci. 16 (2006), 717–762. MR2226124 (2008b:82081). 86. Asymptotic analysis of multiscale approximations to reaction networks (with Karen Ball, Lea Popovic, and Greg Rempala). Ann. Appl. Probab. 16 (2006), 1925–1961. MR2288709 (2007i:60089). 87. The Yamada–Watanabe–Engelbert theorem for general stochastic equations and inequalities. Electron. J. Probab. 12 (2007), 951–965. MR2336594 (2008h: 60291). 88. Spatial point processes and the projection method (with Nancy Lopes Garcia). In and Out of Equilibrium 2, Vladas Sidoravicius and Maria Eulalia Vares, eds. Progress in Probability 60. Birkh¨ auser Boston, Inc., Boston, MA (2008), 271–298. 89. Limit theorems for an epidemic model on the complete graph (with Elcio Lebensztayn, Alexandre R. Leichsenring, and F´ abio P. Machado). ALEA Lat. Am. J. Probab. Math. Stat. 4 (2008), 45–55. MR2399589.

xiv

90. Poisson representations of branching Markov and measure-valued branching processes (with Eliane Rodrigues). Ann. Probab. (to appear). 91. Macroscopic limits for stochastic partial differential equations of McKean– Vlasov type (with Peter Kotolenez). Probab. Theory Related Fields (to appear). Books/notes 1. Approximation of Population Processes. CBMS–NSF Regional Conference in Applied Mathematics 36. SIAM, Philadelphia (1981). MR0610982 (82j:60160). 2. Markov Processes: Characterization and Convergence (with Stewart N. Ethier). Wiley, New York (1986). MR0838085 (88a:60130). 3. Weak convergence of stochastic integrals and differential equations I, II (with Philip E. Protter). Probabilistic Models for Nonlinear Partial Differential Equations. D. Talay and L. Tubaro, eds. Lecture Notes in Math. 1627. SpringerVerlag, Berlin (1996), 1–38, 197–279. MR1431298 (98h:60073), MR 1431303 (98h:60074). 4. Large Deviations for Stochastic Processes (with Jin Feng). AMS Mathematical Surveys and Monographs 131. American Mathematical Society, Providence, RI (2006). MR2260560. Ph.D. students 1. Frank J. S. Wang (1973), Limit theorems for age and density dependent stochastic population models. 2. Stewart N. Ethier (1975), An error estimate for the diffusion approximation in population genetics. 3. Harry M. Pierson (1976), Random linear transformations of point processes. 4. Joseph C. Watkins (1982), A central limit problem in random evolutions. 5. Daniel Johnson (1983), Diffusion approximation for optimal filtering of jump processes and for queuing networks. 6. Wiremu Solomon (1984), Limit theorems for random measures with applications. 7. Douglas J. Blount (1987), Comparison of a stochastic model of a chemical reaction with diffusion and the deterministic model. 8. Cristina Costantini (1987), Obliquely reflecting Brownian motion and diffusion approximations for physically reflecting processes. 9. Richard H. Stockbridge (1987), Time-average control of martingale problems. 10. Eimear Mary Goggin (1988), Weak convergence of conditional probabilities. 11. Gary Shon Katzenberger (1990), Solutions of a stochastic differential equation forced onto a manifold by a large drift. 12. Nancy Garcia (1993), Birth and death processes as projections of higher dimensional Poisson processes. 13. Mark Ingenoso (1993), Stability analysis for certain queueing systems and multi-access communication channels. 14. Nahnsook Cho (1994), Weak convergence of stochastic integrals and stochastic differential equations driven by martingale measure and its applications. 15. Anne Dougherty (1994), Averaging and diffusion approximations for stochastic network models.

xv

16. Jin Feng (1996), Martingale problems for large deviations for Markov processes. 17. Bradbury Franklin (1999), The limit of the normalized error in the approximation of stochastic differential equations. 18. Shun-Hwa Li (1999), Time invariance estimation and consistency results for spatial point processes. 19. Yong Zeng (1999), A class of partially observed models with discrete clustering and non-clustering noises: Application to micro-movement of stock prices. 20. Kevin Buhr (2002), Spatial Moran models with local interactions. 21. Jorge Garcia (2002), Large deviation principle for stochastic integrals. 22. Yoonjung Lee (2004), Two essays on modeling financial markets as complex, interactive systems. 23. Kathryn Temple (2005), Particle representations for measure-valued processes with interactions and exit measures. 24. Xin Qi (2007), The central limit theorems for space-time point processes. 25. Zhengxiao Wu (2007), A filtering approach to abnormal cluster identification. 26. Hye-Won Kang (2008), Multiple scaling methods in chemical reaction networks.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 1–15 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000264

The Decomposition-Separation Theorem for Finite Nonhomogeneous Markov Chains and Related Problems Isaac M. Sonin1 University of North Carolina at Charlotte Abstract: Let M be a finite set, P be a stochastic matrix and U = {(Zn )} be the family of all finite Markov chains (MC) (Zn ) defined by M , P , and all possible initial distributions. The behavior of a MC (Zn ) is a classical result of probability theory derived in the 1930s by A. N. Kolmogorov and W. Doeblin. If a stochastic matrix P is replaced by a sequence of stochastic matrices (Pn ) and transitions at moment n are defined by Pn , then U becomes a family of nonhomogeneous MCs. There are numerous results concerning the behavior of such MCs given some specific properties of the sequence (Pn ). But what if there are no assumptions about sequence (Pn )? Is it possible to say something about the behavior of the family U ? The surprising answer to this question is Yes. Such behavior is described by a theorem which we call a decompositionseparation (DS) theorem, and which was initiated by a small paper of A. N. Kolmogorov (1936) and formulated and proved in a few stages in a series of papers including D. Blackwell (1945), H. Cohn (1971, 1989), and I. Sonin (1987, 1991, 1996).

1. Introduction The notion of a (homogeneous) Markov chain (MC) is one of the key notions in the theory of stochastic processes and in probability theory. Its simplest version, the case of discrete time and finite state space, is specified by a pair (M, P ), where M is a state space and P = {p(i, j)} is a transition matrix indexed by the elements of M . We denote by Z = (Zn ), n ∈ N = {0, 1, . . .}, a MC from a family U0 of all MCs defined by M , P , and all initial distributions on M . The classical Kolmogorov–Doeblin results describing the decomposition of a state space M into essential and nonessential (transient) states, into ergodic classes and cyclic subclasses, and the asymptotic behavior of MCs from U0 can be found in most advanced books on probability theory as well as the monographs on MC (see for example Shiryayev [33], Kemeny and Snell [16], Isaacson and Madsen [15]). If a MC is irreducible and aperiodic then an ergodic property holds, i.e., there exists a limit (invariant) distribution π such that (1)

lim P (Zn = j | Z0 = i) = π(j) > 0,

n→∞

which does not depend on the initial state i. If the number of cyclic subclasses exceeds one, then the MC is aperiodic when considered only at the times of visiting the given subclass, and (1) is true for corresponding n. 1 Department of Mathematics, UNC at Charlotte, 9201 University Bld, Charlotte, NC 28223, e-mail: [email protected] AMS 2000 subject classifications: Primary 60J10; secondary 60K99; 82C35 Keywords and phrases: nonhomogeneous Markov chain, decomposition-separation theorem, simulated annealing

1

2

Isaac M. Sonin

These results, of course, represent only the basic facts about the structure of MCs and many more detailed and subtle properties of MCs are contained in the rich and extensive theory of this subject. A natural extension of this theory is a theory of nonhomogeneous MCs when the transitions at moment n are defined by a stochastic matrix Pn from a sequence of stochastic matrices (Pn ). We denote by U the family of all nonhomogeneous MCs specified by M , (Pn ), and all initial distributions on M now specified not only for an initial moment 0 but for all initial moments k = 0, 1, 2, . . .. We again denote by Z = (Zn ), n ≥ k, a MC from this family. There is a substantial body of literature on nonhomogeneous MCs, though this is still a small fraction of the literature on homogeneous MCs. See e.g. the classical works of R. Dobrushin [8], D. Griffeath [10], J. Hajnal [11], D. Isaacson and R. Madsen [15], M. Iosifescu [14], J. Kingman [17], V. Maksimov [20], A. Mukherjea [22], E. Seneta [25], and others. A survey of results about the products of stochastic matrices together with his own contributions can be found in D. Hartfiel [12]. More recent publications are e.g. [7], [19], and [32]. In the last ten years interest in this area has surged because an important class of computational algorithms, so called simulated annealing, is based on nonhomogeneous MCs with very specific transition probabilities. Namely, the transition probabilities have the form pn (i, j) = c(i, j) exp{−q(i, j)/Tn } for i = j, where {q(i, j} is a nonnegative matrix defined by an optimization problem, and Tn is a “temperature” that tends to zero. From the vast literature on this topic we mention only two papers, W. Niemiro and P. Pokarowski [23] and H. Cohn and M. Fielding [5], which are more closely related to our paper. Almost all literature mentioned above follows a rather natural format—given some assumptions about the structure of the sequence (Pn ), some results about the behavior of the corresponding family (Zn ) are obtained. A natural question, which was not asked for a long time, is as follows: Is it possible to say something about the behavior of the family U if there are no assumptions about the sequence (Pn )? At first sight, the answer seems to be negative, especially if we take into account that this question is equivalent to the question of how the products of matrices P1 P2 · · · Pn behave when n tends to infinity, if the only information available about these matrices is that they are stochastic. Nevertheless, surprisingly the answer to this question is affirmative: There is a fundamental theorem which describes such behavior. In particular this theorem generalizes the above mentioned results of Kolmogorov–Doeblin about homogeneous MCs. We call this theorem a decomposition-separation (DS) theorem, and briefly, this theorem states that a decomposition with properties similar to that of homogeneous MCs does exist but now it is not a decomposition of the state space M , but a decomposition of the space-time representation of M , i.e., of the sequence (Mn ) = M × N. The only assumption is that the set M is finite, |M | = N < ∞. The DS theorem was initiated by a small paper of A. N. Kolmogorov [18] who analyzed the situation when a sequence of stochastic matrices (Pn ) is given in inverse time, i.e., for n = 0, −1, −2, . . .. This paper is known mainly by the reversibility criterion introduced there but besides this result Kolmogorov asked two questions. First, does there exist a MC (Zn ) governed by this sequence, i.e., satisfying equalities P (Zn+1 = j | Zn = i) = pn (i, j) for all n, i, j. In a few lines Kolmogorov answered this question positively. The second question was: When is such a MC unique? Kolmogorov proved that a necessary and sufficient condition for the uniqueness is that the limits (2)

lim P (Zm = j | Zn = i) = πm (j),

n→−∞

The Decomposition-Separation Theorem

3

exist for all m, j and do not depend on the initial point i when n tend to minus infinity. A breakthrough step to the description of all MCs defined by a sequence (Pn ) was made in 1945 by David Blackwell [2]. In our terms his description can be explained as follows. Let us introduce a sequence (Mn ) of disjoint copies of the state space M , e.g., Mn = (M, n), n ≤ 0. Without loss of generality we can assume that the stochastic matrices (Pn ) are indexed by the elements of these sets, i.e., Pn = {pn (i, j), i ∈ Mn , j ∈ Mn+1 }. A sequence J = (Jn ), Jn ⊂ Mn , n ≤ 0, for brevity is called a jet. A tuple of jets (J 1 , ..., J c ) is called a partition of (Mn ) if correspondingly (Jn1 , . . . , Jnc ) form a partition of Mn for every n. Blackwell proved that there is a partition (T 0 , T 1 , . . . , T c ) of (Mn ) such that for any MC Z ∈ U the trajectories of this MC with probability one will reach and stay eventually in one of the jets T i , i = 1, . . . , c, i.e., (lim sup(Zn ∈ Tni )) = (lim inf(Zn ∈ Tni )), and for MCs “inside” of one of these jets there are limits similar to (2). In T 0 such limits may not exist but P (lim sup(Zn ∈ Tn0 )) = 0. The decisive point of his proof was the use of the existence of limits for almost all trajectories of bounded (sub)martingales, then a relatively new result of his Ph.D. advisor J. Doob. As Kolmogorov did, Blackwell considered MCs in reverse time. The next step was made in the works of Harry Cohn (see [3], [4], and the expository paper [6]). Cohn considered forward time, proved that the tail σ-algebra of any nonhomogeneous MC consists of a finite number c ≤ N of atomic (indecomposable) sets, each of them related with an element T k of Blackwell’s decomposition, k = 1, . . . , c. He also simplified Blackwell’s proof, though it was still very complicated. Note that the jets (Tni ) in the Blackwell–Cohn decomposition are defined up to jets (Jn ) such that P (lim sup(Zn ∈ Jn )) = P (lim inf(Zn ∈ Jn )) = 0, so generally there is a continuum of such partitions. The last step in the proof of the DS theorem was made by the author in a series of papers Sonin [26], [27], [28], [29], [30], where it was proved that among the Blackwell–Cohn partitions there are partitions into jets having the additional property that the expected number of transitions of trajectories of any MC (Zn ) between jets is finite on the infinite time interval. This additional separation property was not obvious and its existence was not noted or mentioned before. At the same time it played a crucial role in the initial problem that led the author to the formulation of the separation property, the problem of sufficiency of Markov strategies for the Dubins–Savage functional. An example of such a functional is the probability of visiting a given subset of the state space infinitely often. The study of the problem of sufficiency of Markov strategies led to the study of equivalent random sequences and to the proof of the so called Feinberg inequality (see [29]). In this paper the initial proof of sufficiency given for the finite case by T. Hill [13] was substantially simplified. Note also that the problem of sufficiency of Markov strategies is still an open problem for the countable case. The DS theorem also has a simple deterministic interpretation in terms of the behavior of the simplest model of an irreversible process represented by a system of N cups filled with a liquid with initial concentration of a second substance and mixed at discrete moments of time in some proportions defined at each moment n by a stochastic matrix Pn . The irreversibility of this process manifests itself in the martingale property of some bounded random sequences defined by the family of MCs U . Since the state space M is finite, |M | = N < ∞, these (sub)(super)martingales take no more than N values at each moment of time. Such martingales have extra properties that do not follow from the well-known Doob’s upcrossing lemma. The generalization of Doob’s lemma for these martingales, the theorem about the

4

Isaac M. Sonin

existence of “barriers,” published in [26] played a crucial role in the proof of the separation property. A survey of corresponding results and their interrelationship was given in [30]. The main goal of this expository paper is to complement this survey by presenting a more refined version of the DS theorem, to give an answer to an open problem, and to give a sketch of the proofs of the DS theorem and “barriers” theorem. The plan of this paper is: In Section 2 we present the deterministic version of DS theorem and its probabilistic counterpart; in Section 3 we discuss the third key ingredient—the martingale type (i.e., martingales or (sub)supermartingales) random sequences. In Section 4 we outline the sketch of the proof of DS theorem and we will show why the separation part needs the strengthening of Doob’s upcrossing lemma—the theorem about existence of barriers. Section 5 will outline the construction of barriers. Section 6 is about open problems related to the DS theorem. 2. A simple model of an irreversible process, two formulations of the DS theorem We assume that two sequences (Mn ) and (Pn ) are given, where a set Mn represents the state space at moment n. The stochastic matrices (Pn ) are indexed by the elements of these sets, i.e., Pn = {pn (i, j), i ∈ Mn , j ∈ Mn+1 }. At this stage it is does not matter whether the sets Mn are countable or finite, though the DS theorem holds only under the assumption (3)

|Mn | ≤ N < ∞,

n ∈ N.

The following simple physical model and physical interpretation of the DS theorem for a particular Markov chain was introduced in [26]. Given a sequence (Mn ), let Mn represent a set of “cups” containing a “liquid”—tea, schnapps, vodka, etc. A cup i ∈ Mn is characterized at moment n by a volume of liquid in this cup, mn (i). The matrix Pn describes the redistribution of liquid from the cups Mn to the (initially empty) cups Mn+1 at the time of the nth transition, i.e., pn (i, j) is the proportion of liquid transferred from cup i to cup j. The sequence (mn ), mn = (mn (i), i ∈ Mn ), n ∈ N, for the sake of brevity called the (discrete) flow, satisfies the relations  mn+1 (j) = (4) mn (i)pn (i, j). i

We assume that for some k∈ N an initial condition mk (i), i ∈ Mk , is given and without loss of generality i mk (i) = 1, and hence a similar equality holds for any n ≥ k. In Section 1 we introduced U , the family of all (nonhomogeneous) Markov chains (MC) Z = (Zn ), n ∈ N, specified by (Mn ) and (Pn ) and all possible initial distributions μ on Mk , k = 0, 1, . . .. Given a MC (Zn ) we can define a flow (mn ), with mn (i) = P (Zn = i), i ∈ Mn , n ≥ k. Vice versa, a flow (mn ), mn = (mn (i), i ∈ Mn ), satisfying (4) defines a MC Z ∈ U . Thus we have Proposition 1. There is a one-to-one correspondence between MCs (Zn ) ∈ U and flows (mn ). Let us assume additionally that each cup contains some material (substance, color), and let us denote αn (i), 0 ≤ α ≤ 1, a “concentration” of this material at cup i at moment n. The sequence (mn , αn ) = (mn (i), αn (i)), i ∈ Mn , n ∈ N),

The Decomposition-Separation Theorem

5

for the sake of brevity is called a colored (discrete) flow. Concentrations obviously satisfy the relations  αn+1 (j) = (5) mn (i)αn (i)pn (i, j)/mn+1 (j). i

Note that we can replace the notion of concentration by temperature since it follows the same formula (5). One more interpretation is obtained if we consider mn (i) as masses and αn (i) as their positions on a horizontal axis. The mixing is replaced by taking the center of gravity of corresponding subsystem. We can also use not one but many colors and so on. The initial conditions mk (i), αk (i), i ∈ Mk , for some k ∈ N are assumed given and the sequence mn (i), αn (i), i ∈ Mn , evolve in time according to (4) and (5) for n ≥ k. If we introduce sn (i) = mn (i)αn (i), the amount of “substance” contained at cup i at moment n, and we denote by mn and sn the corresponding row vectors, then the equations (4) and (5) can be presented in the symmetrical form (6)

mn+1 = mn Pn ,

sn+1 = sn Pn ,

n = 0, 1, . . . .

The colored flow described above is probably the simplest example of an irreversible process, i.e., a process such that no state can be repeated after a few steps, or in other words a process for which any sequence of states in reversed time is not an admissible sequence in a forward time. Such property holds of course except in the trivial cases when all concentrations become equal after some moment, or when the redistribution avoids any mixing. Intuitively the property of irreversibility seems obvious. The formal proof is based on the consideration e.g. of the following function describing the state of a system at moment n:  Fn (mn , αn ) = (7) mn (i)αn2 (i). i

It is easy to prove that for any colored flow function Fn is nonincreasing and Fn = Fn+1 only if there no mixing at moment n. As we will see later this function is equivalent to a variance of some random sequence. The colored flows also have a simple probabilistic interpretation. Let (Zn ) ∈ U , n ≥ k, be a Markov chain and set Dk ⊂ Mk . Let us denote (8)

αn (i) = P (Zk ∈ Dk | Zn = i).

It is easy to check that the sequence (mn (i), αn (i)), n ≥ k, specifies a colored flow with initial values αk (i) = 1 for i ∈ Dk , αk (i) = 0 otherwise. Vice versa, for every colored flow (mn , αn ) with initial data of concentrations equal to zero or one at the initial moment k, there is a pair ((Zn ), Dk ), where (Zn ) ∈ U , n ≥ k, Dk ⊆ Mk , such that αn (i) coincide with values given by (8). We will consider also a slightly more general colored flows which allows a jet (On ), On ⊂Mn , n ∈ N, called an “ocean,” where by definition for i ∈ Dn = Mn \ On , (9)

αn (i) = P (Zs ∈ Ds , s = k, . . . , n | Zn = i),

and for i ∈ On , n ∈ N, we define αn (i) ≡ 0. If Z = (Zn ), n ≥ k, is a MC and (Dn ) is a sequence of sets, Dn ⊆ Mn , n ≥ k, we call (Zn , Dn ) a Markov pair. Proposition 2. There is a one-to-one correspondence between Markov pairs (Zn , Dn ) and colored flows with ocean (mn , αn , On ), On = Mn \ Dn , with initial values of concentrations equal 0 or 1.

6

Isaac M. Sonin

First we formulate the DS theorem as a theorem about the asymptotic behavior of colored flows. Denote also rn (i, j) = P (Zn = i, Zn+1 = j) = mn (i)pn (i, j). In terms of flows, this is the amount of liquid transferred from cup i to cup j at moment n. To prepare the reader for the general case we first consider the cases of two and three cups. Given a colored flow (mn , αn ) we can relabel cups at each moment n ≥ k in such a way that αn (1) ≤ αn (2) ≤ · · · ≤ αn (N ). Then if N = 2 there are only two possibilities: limn→∞ αn (1) = limn→∞ αn (2) or limn→∞ αn (1) < limn→∞ αn (2). In the first case there is a complete mixing, i.e., concentrations in both cups have the same limit and then the sequences of volumes, mn (i), i = 1, 2, may have no limits. It is easy to prove that in the second case limn→∞ mn (i) always exist. But this a simple statement. A much less trivial fact (though still relatively simple) is the following statement Proposition 3. For the case of N = 2, if limn→∞ αn (1) < limn→∞ αn (2), then the ∞ total amount of liquid transferred between cups 1 and 2 is finite, i.e., n=0 [rn (1, 2)+ rn (2, 1)] < ∞. More than that, if this is true for one colored flow then it is true for any colored flow, i.e., a property of convergence of a sum in Proposition 3 is a property of a sequence (Pn ). Starting from a three cups situation becomes absolutely nontrivial. Let us again relabel the cups at each moment n so that αn (1) ≤ αn (2) ≤ αn (3). If limn→∞ αn (1) = limn→∞ αn (3) then again there is a complete mixing. If limn→∞ αn (1) = α∗ (1) < limn→∞ αn (3) = α∗ (3) it can happen that the concentration in the middle cup may have no limit at all, i.e., αn (2) will oscillate between α∗ (1) and α∗ (3). Such a situation is possible only if the volume in this cup tends to zero, limn→∞ mn (2) = 0. The direct total exchange between cups 1 and 3 will be finite as in Proposition 3 but the cup number 2 can actively participate in the between cups 1  exchange  and 3. Though its volume tends to zero, the series n rn (2, 1), n rn (2, 3) can be infinite. Thus the true analog of Proposition 3 will be a statement about the existence of two jets (Jn1 ) and (Jn2 ), such that at each moment n = 0, 1, 2, . . . they form a partition of Mn = ({1, 2, 3}, n), 1 ∈ Jn1 , 3 ∈ Jn2 , and the total exchange of liquid between cups from these two jets is finite. Note that the set of all such partitions has the power of the continuum and the existence of a partition with the finite exchange property cannot be obtained from the stabilization statement. But such decomposition does take place and it is universal with respect to the initial conditions of the colored flow. This is one of the main points of the DS theorem. The exact formulation is as follows. Given a flow m = (mn ) or equivalently a MC Z = (Zn ), denote by V (J k , J s | m) the total amount of liquid transferred between jets J k and J s , ⎤ ⎡ ∞    ⎣ V (J s , J k |m) = rn (i, j) + rn (i, j)⎦ . (10) n=0

k ,j∈J s i∈Jn n+1

s ,j∈J k i∈Jn n+1

Note that if one of these sums is finite then the other is finite also. The equality rn (i, j) = P (Zn = i, Zn+1 = j) implies also that V (J s , J k | m) is the expected number of transitions of trajectories of (Zn ) between these two jets. Theorem 1 (DS theorem, the elementary (deterministic) formulation). Let a sequence of disjoint sets (Mn ) satisfying condition (3) and a sequence of stochastic

The Decomposition-Separation Theorem

7

matrices (Pn ) be given. Then there exists an integer c, 1 ≤ c ≤ N , and a decomposition of the sequence (Mn ) into disjoint jets J 0 , J 1 , . . . , J c , J k = (Jnk ), such that for any colored flow (mn , αn , On ), (a) the stabilization of volume and concentration take place inside of any jet J k ,  k = 1, . . . , c, i.e., limn→∞ i∈J k m(i) = mk∗ ; limn→∞ α(in ) = α∗k , in ∈ Jnk ; the n 0 concentration in jet J may oscillate; the total volume in this jet tends to zero, i.e.,  limn→∞ i∈Jn0 m(i) = 0; (b) the total amount of liquid transferred between any two different jets is finite on the infinite time interval, i.e., V (J k , J s | m) < ∞, s = k; (c) this decomposition is unique up to jets (Jn ) such that for any flow (mn ) the relation limn mn (Jn ) = 0 holds and the total amount of liquid transferred between (Jn ) and (Mn \Jn ) is finite. The correspondence between (colored) flows and MCs (Markov pairs) allows us to reformulate the DS theorem as a statement about the behavior of MCs as follows: Theorem 1 (probabilistic formulation). Let a sequence of disjoint sets (Mn ) satisfying condition (3) and a sequence of stochastic matrices (Pn ) be given. Then there exists an integer c, 1 ≤ c ≤ N , and a decomposition of the sequence (Mn ) into disjoint jets J 0 , J 1 , . . . , J c , J k = (Jnk ), such that (a1 ) for any Markov chain Z ∈ U with probability one its trajectory after a finite number of steps enters into one of the jets J k , k = 1, . . . , c, and stays there forever; (a2 ) each jet J k , k = 1, . . . , c, is mixing, i.e., for any two Markov chains Z 1 , Z 2 ∈ U such that limn P (Zni ∈ Jnk ) > 0, i = 1, 2, and any sequence of states in ∈ Jnk , n ∈ N, P (Zn1 = in | Zn1 ∈ Jnk ) lim (11) = 1; 2 = i | Z2 ∈ J k) n P (Zn n n n (b) for any Markov chain Z ∈ U the expected number of transitions of its trajectories between two different jets is finite on the infinite time interval, i.e., (12)

∞ 

k k [P (Zn ∈ Jnk , Zn+1 ∈ / Jn+1 ) + P (Zn ∈ / Jnk , Zn+1 ∈ Jn+1 )] < ∞;

n=0

(c) this decomposition is unique up to jets (Jn ) such that for any Markov chain Z ∈ U the expected number of transitions of Z between (Jn ) and (Mn \Jn ) is finite and limn P (Zn ∈ Jn ) = 0. Property (b) combined with limn P (Zn ∈ Jn0 ) = 0 implies (a1 ), but we prefer to formulate (a1 ) and (c) separately. We refer to the points (a1 ), (a2 ) as the decomposition part and (b) as the separation part. It was proved in [29] that in the homogeneous case when all stochastic matrices Pn , n ∈ N, are copies of the same matrix P , the above decomposition is nothing else than the space-time representation of the decomposition of M into ergodic classes and cyclic subclasses, where each subclass is represented by a sequence J k , k = 0. Thus the DS theorem is a direct generalization of the Kolmogorov–Doeblin results. 3. Key elements of the proof of DS theorem Given a MC Z = (Zn ) a jet J = (Jn ) is called a trap if the event that “Z visits J infinitely often” coincides almost surely with the event that “Z stays in J forever,” i.e., if P (lim sup(Zn ∈ Jn )) = P (lim inf(Zn ∈ Jn )). If J = (Jn ) is a trap then it is

8

Isaac M. Sonin

easy to check that limn P (Zn ∈ J) does exists and coincides with these limits. We denote this limit as v(Z, J), the “volume” of J for Z. Given a MC Z = (Zn ) a jet J = (Jn ) is called a strap (strong trap) if the expected number of transitions of Z between (Jn ) and its complement (Mn \ Jn ) is finite, i.e., a sum similar to (12) is finite. Obviously, each strap is a trap but not vice versa, because it is possible that a (random) number of exits from a given jet is finite with probability one, but the expected value is infinite. In the language of flows, a jet (Jn ) is a strap for a flow m = (mn ) if the total “overflow” from jet J to all other jets is finite, i.e., a sum similar to (10) is finite. Given a MC Z = (Zn ) a strap (Jn ) is called indecomposable if it cannot be partitioned into two straps (Sn ) and (Kn ) of positive volume. The decomposition into straps J 0 , J 1 , . . . , J c described in the DS theorem has two key features: First, it is universal, i.e., the same decomposition for all MCs in U , second, for any MC (any colored flow) there is a mixing inside of every jet J k , k > 0. The first feature can be obtained if we consider a “universal” MC Z∗ , i.e., a MC which coincides with positive probability with any MC from U , and prove the existence of decomposition for this MC. The construction of Z∗ can be easily done, see details in [29]. The decomposition into indecomposable straps for this MC exists almost by definition. If (Mn ) is indecomposable then c = 1 and there is only one jet. If (Mn ) is decomposable then there are two straps of positive volume and if each of them is indecomposable then c = 2 and we obtained a decomposition. Otherwise we can continue this process and since every jet with positive volume for large n contains at least one point then in no more than N steps we obtain a decomposition into indecomposable jets. The only remaining question is: Why for any MC (colored flow) there is a mixing inside of an indecomposable jet? To answer this question we need to relate some martingales to a given colored flow (Markov pair (Z, D)) and to show that due to condition (3) they will have very specific properties. Given two sequences (an ) and (bn ) we say that they intersect at moment k if ak ≤ bk , ak+1 > bk+1 , or ak > bk , ak+1 ≤ bk+1 . Given a real valued r.s. X = (Xn ) and a nonrandom sequence d = (dn ), we denote RT (X | d) the expected number of intersections of trajectories of X with (dn ) on the interval (0, T ), (13) RT (X | d) =

T −1 

[P (Xn ≤ dn , Xn+1 > dn+1 ) + P (Xn > dn , Xn+1 ≤ dn+1 )].

n=0

A nonrandom sequence (dn ) is called a barrier for the r.s. X = (Xn ) if the expected number of intersections of (dn ) by the trajectories of X on the infinite time interval is finite, i.e., limT RT (X | d) < ∞. If additionally dn = d for all n, we call (dn ) a level barrier. To prove the existence of barriers and relate them to the separation part of the DS theorem we introduce a r.s. (Yn ) as follows. Suppose a colored flow (mn , αn ), or equivalently a Markov pair (Zn , Dn ) is given, where αn (i) are as in (5). Then define (14)

Yn = αn (Zn ),

n ∈ N,

Lemma 1. A random sequence (Yn ) specified by (14) is a submartingale in reverse time. To see why this lemma is true it is sufficient to notice that if we denote qn (j, i) = mn (i)pn (i, j)/mn+1 ((j), the transition probabilities of a MC (Zn ) in inverse time,

The Decomposition-Separation Theorem

9

then formula (5) will represent the condition of a r.s. (Yn ) to be a martingale in reverse time. Since earlier we introduced colored flows with an “ocean,” where αn (i) are not calculated by formula (5) but are defined as αn (i) = 0 for i from the “ocean,” sometimes the equality in (5) is replaced by an inequality representing the submartingale property. This simple lemma is a bridge between the DS theorem and Theorem 2 about the existence of barriers. This theorem implies that r.s. (Yn ) has barriers inside of any interval (a, b). We will discuss Theorem 2 in the next section. Note also that the reversed martingales are used extensively e.g. in [34] and [35]. Proposition 4. Let (Z, D) be a Markov pair and (Yn ) be a corresponding submartingale, i.e., Yn = αn (Zn ). Then a sequence (dn ) is a barrier for (Yn ) iff a jet (Jn ), Jn = {i ∈ Mn : αn (i) ≤ dn }, is a strap for (Zn ). The validity of Proposition 4 follows from the definition of barriers and straps. Now we are able to explain heuristically why the existence of barriers is equivalent to the mixing property inside of any indecomposable strap J = (Jn ). Suppose that there is a colored flow such that there are two disjoint jets (Sn ) and (Kn ) such that lim inf n αn (in ) ≥ b >  a ≥ lim supn αn (jn ) for in ∈ Sn and jn ∈ Kn , and limn i∈Sn mn (i) > 0, limn j∈Kn mn (j) > 0. Then r.s. (Yn ) will have values above b and below a with positive probability. By Theorem 2 from the next section there is a barrier for (Yn ) inside of an interval (a, b) and therefore the indecomposable strap (Jn ) can be decomposed into two straps of positive volume, a contradiction. 4. Doob’s lemma and existence of barriers Generally barriers or level-barriers may not exist for a given r.s. (Xn ). The closest statement about intersections of a level or an interval is the well known Doob’s upcrossing lemma (Doob’s inequality). This lemma implies one of the central results in the theory of stochastic processes—a theorem of Doob about the existence of the limits of trajectories of a (sub)(super)martingale when time tends to infinity. We recall that given two numbers a and b, with a < b and a sequence (xn ), the number of upcrossings of an interval (a, b) by (xn ) on the interval (0, T ) is the maximal number of disjoint intervals (ni , ni+1 ) ⊂ (0, T ) such that xni ≤ a and xni+1 ≥ b. Given a random sequence (r.s.) X = (Xn ) we denote the expected number of upcrossings of (a, b) by trajectories of X as UT (X, (a, b)). Note that when an interval (a, b) is replaced by a sequence (dn ), then the notion of an (up)(down)crossing transforms naturally into a notion of intersection. We have obviously UT (X, (a, b)) ≤ RT (X, d) for any (dn ) ⊂ (a, b). Doob’s lemma. Let X = (Xn ) be a (sub)martingale. Then, for every T , (15)

UT (X, (a, b)) ≤

E(XT − a)+ . b−a

In particular, if supn EXn+ < ∞, then Doob’s lemma implies that the expected number of upcrossings of every fixed interval (a, b) on the infinite time interval, limT UT (X, (a, b)), is finite. The previously mentioned Doob’s theorem follows immediately. The inequalities similar to (15) hold for downcrossings and crossings, for the supermartingales and for sub(super)martingales in reversed time. We call all such r.s. a martingale type r.s. and we denote the class of all bounded

10

Isaac M. Sonin

(sub)(super)martingales in forward or inverse time by M. For simplicity we will further consider only bounded r.s. 0 ≤ Xn ≤ 1 for all n. The width of the interval (b − a) is in the denominator of the estimate (15) so Doob’s lemma does not imply that: 1) inside of the interval (a, b) there exists a level c such that the expected number of intersections of this level is finite, or, a weaker statement, that 2) there exists a nonrandom sequence d = (dn ) with a similar property. In [31] an example shows that not only the level barriers but even the barriers may not exist inside of a given interval if a bounded martingale (Xn ) takes a countable number of values. At the same time if (Xn ) at each moment n takes only a bounded number of values, i.e. if there exists a sequence of finite sets (Gn ) such that P (Xn ∈ Gn ) = 1 for each n and |Gn | ≤ N < ∞, n ∈ N, then Doob’s lemma can be substantially strengthened. Denote the class of all such r.s. by G N . The following theorem holds. Theorem 2. Let a and b be two numbers, a < b, and X = (Xn ) ∈ M ∩ G N . Then inside of the interval (a, b) there exists a barrier d = (dn ). This theorem follows from a more general theorem in [26] about the existence of barriers for processes with finite variation and a bounded number of values. Let ϕ(s), s ≥ 0, be a nondecreasing function, ϕ(0) = 0. Denote by VTϕ (X) the ϕ-variation of X on the time interval (0, T ), defined as (16)

VTϕ (X)

=

sup

k−1 

0≤n1 0,

n = 1, 2, . . . .

The following elementary lemma holds. Lemma 3. Let Δ1 and Δ2 be intervals, X1 , X2 be random variables such that |Δ1 ∩ Δ2 | ≥ h > 0, and P (Xi ∈ Δi ) = 0, i = 1, 2. Then for any numbers d1 ∈ Δ1 , d2 ∈ Δ2 , and nondecreasing function ϕ, (20)

P (X1 ≤ d1 , X2 ≥ d2 ) ≤ Eϕ(|X1 − X2 |)/ϕ(h).

The assertion of Lemma 3 follows immediately from the implications (X1 ≤ d1 , X2 ≥ d2 ) ⊂ (|X1 − X2 | ≥ h) and Chebyshev’s inequality, P (|Y | ≥ h) ≤ Eϕ(|Y |)/ϕ(h) for any r.v. Y . The existence of a sequence (Δn ) satisfying the conditions (A) and (B) implies an estimate (18). Indeed, in this case by Lemma 3 for any sequence (dn ), dn ∈ Δn , we have P (Xn ≤ dn , Xn+1 > dn+1 ) ≤ Eϕ(|Xn+1 − Xn |)/ϕ(h) and therefore T −1 RT (X, d) ≤ n=0 Eϕ(|(Xn+1 − Xn |)/ϕ(h) ≤ cVTϕ (X, d) with c = 1/ϕ(h). Since |Gn ∩ (a, b)| ≤ N there are of course sequences (Δn ) satisfying (A) for every h, 0 < h ≤ 1/(N + 1), but generally we cannot expect that for such (Gn ) there is a sequence of intervals that satisfy both (A) and (B). We will show that an estimate (18) still holds if condition (B) is replaced by a weaker condition (C). (C) (a) For any n there is r(n), 1 ≤ r(n) ≤ n, such that |Δr(n) ∩ Δn | ≥ h > 0, |Δr(n) ∩ Δn+1 | ≥ h,

12

Isaac M. Sonin

(b) every n is covered by no more than M intervals of the form [r(k), k]. First we formulate Lemma 4 which is an analog of Lemma 2 when the condition (B) is replaced by condition (C). Lemma 4. Let Δ1 , Δ2 , and Δ3 be intervals and Y1 , Y2 , and Y3 be random variables such that |Δ1 ∩ Δ2 | ≥ h > 0, |Δ1 ∩ Δ3 | ≥ h, and P (Yi ∈ Δi ) = 0, i = 1, 2, 3. Then for any numbers d2 ∈ Δ2 , d3 ∈ Δ3 , and nondecreasing function ϕ, (21)

P (Y2 ≤ d2 , Y3 ≥ d3 ) ≤ E[ϕ(|Y1 − Y2 |) + ϕ(|Y1 − Y3 |)]/ϕ(h).

Note that for any d1 ∈ Δ1 we have a trivial inequality P (Y2 ≤ d2 , Y3 ≥ d3 ) ≤ P (Y1 ≥ d1 , Y2 ≤ d2 ) + P (Y1 ≤ d1 , Y3 ≥ d3 ). Now the assertion of Lemma 4 can be easily obtained if we apply Lemma 2 to each of the terms in the right side of the last inequality using pairs (Y2 , Y1 ) and (Y1 , Y3 ). Then using (21) for Y1 = Xr(n) , Y2 = Xn , Y3 = Xn+1 , we obtain P (Xn ≤ dn , Xn+1 ≥ dn+1 ) ≤ E[ϕ(|Xr(n) − Xn |) + ϕ(|Xr(n) − Xn+1 |)]/ϕ(h). Summing this inequality over all n, and taking into account point (b) of condition (C), we obtain (18) with c = M/ϕ(h). Thus to prove Theorem 2 it remains to prove the pure combinatorial lemma Lemma 5. Let (a, b) be an interval, G = (Gn ) be a sequence of sets, |Gn ∩ (a, b)| ≤ N . Then there is a sequence of intervals (Δn ), calculated by a recursive formula Δn+1 = f (Δn , Gn+1 ) satisfying conditions (A) and (C) for some h > 0. The formal proof is rather complicated but the idea of the construction can be explained using cases N = 1, 2. Note that the statement of Lemma 2 is not quite trivial even in the case of N = 1. Without loss of generality we can assume that |Gn | = N , n = 1, 2, . . . and (a, b) = (0, 1). Initially we will construct a sequence of intervals (sn ), which will serve as a “frame” for intervals (Δn ), sn ⊂ Δn , n = 1, 2, . . .. The case N = 1. Let us divide the interval (0, 1) into three equal intervals and let us denote (0, 1/3) = (0) and (2/3, 1) = (1). To explain our construction we can use the following informal interpretation. There is a “hunter” and a “game” which tries to avoid the hunter hiding in one of the intervals (sn ) ∈ {(0), (1)}. The position of the hunter at moment n is one element set Gn . The game knows the position of the hunter so it always can avoid the hunter but its goal is to spend the minimal amount of “energy” switching from one of the possible hiding locations to the other. We define the sequences of intervals (sn ) and (Δn ) as follows: sn+1 = sn if sn ∩ Gn+1 = ∅, otherwise sn+1 = (0) if sn = (1) and sn+1 = (1) if sn = (0). The intervals Δn are defined as Δn+1 = sn+1 if sn+1 = sn and Δn+1 = (0, 1) \ sn , otherwise, n = 1, 2, . . .. Thus, Δn+1 = f (sn , Gn+1 ), where f (s, G) = s if s ∩ G = ∅, and f (s, G) = (0, 1) \ s otherwise. For N = 2 we divide the vertical interval (0, 1) into three equal intervals and each of the intervals (0, 1/3) = (0) and (2/3, 1) = (1) we again divide into three equal intervals and denote the lower and upper of the corresponding intervals as (00), (01), (10), and (11). Now there are two hunters with positions at two points from (Gn ), and a game who can use any of these four intervals for hiding. The minimizing energy strategy of a game is now as follows. First, sn+1 = sn if sn ∩ Gn+1 = ∅, otherwise if sn is e.g. in (0) and only one hunter is in (0), then game avoids the second hunter changing sn inside of (0), i.e., from (00) to (01) and back. Only when the second hunter is also in (0), is the hiding position in (1), i.e., sn = (10) or (11). A sequence (Δn ) is defined by Δn+1 = sn+1 if sn+1 = sn , and otherwise as follow: Δn+1 = (0) \ sn if a switching occurs inside of (0), Δn+1 = (1) \ sn if a

The Decomposition-Separation Theorem

13

switching occurs inside of (1), and e.g. Δn+1 = (0, 1)\(0) if a switching occurs from an interval in (0) to an interval in (1), n > 1. For general N ≥ 2 our construction proceeds as follows. An interval (0, 1) is divided as in the first N steps of Cantor’s construction of a perfect set, i.e., at each step a middle interval of the three equal intervals is eliminated. The process of switching the form of the hiding intervals to the other depends on how many hunters are close to the game.

6. Open Problems

1. The DS theorem is an existence theorem. The value c and the structure of the decomposition depends naturally on the structure and assumption about the sequence (Pn ). Most of the literature on nonhomogeneous MCs in general and on simulated annealing is in fact a study of such decompositions, without paying any attention to the difference between traps and straps (see as an example an interesting paper on simulated annealing [5]). At the same time such a distinction plays a very important practical role. The statement that some algorithm or computational procedure converges with probability one leaves a question whether such convergence on average requires a finite or infinite time. Thus a general open problem is to describe necessary and sufficient conditions on a certain structure of decomposition. In fact, in many papers, starting from the pioneering paper of Kolmogorov [18] the conditions for complete mixing, i.e., for c = 1 are well established. 2. The idea of using stochastic and especially doubly stochastic matrices for the description of ordering in the space of finite-dimensional vectors is the key idea of the so-called theory of majorization. We refer the reader to the monograph of Marshall and Olkin [21] for the theory of majorization and to Sonin [27], where the relation between the DS theorem and majorization theory is briefly described. There is an unpublished (in English) paper of the author about the economic interpretation of the DS theorem. The idea of using the theory of majorization in the description of irreversible physical processes was elaborated in some papers following the pioneering work of Ruch ([24]), see also [1]. The possible analog of the DS theorem on general irreversible processes should replace formula (6) by a general transformation of a system. 3. The analog of the DS theorem for the countable case. The main result of [31] is the following: Theorem 5. There exist a sequence of finite sets (Mn ), |Mn | → ∞, a sequence of stochastic matrices (Pn ) indexed by (Mn ), a Markov chain (Zn ), and a sequence of sets (Dn ), Dn ⊆ Mn , n ∈ N, such that the submartingale (in a reversed time) (Yn ) specified by (14) has no barriers inside of some interval (a, b). Note that while the above statement shows that the DS theorem is not true in the form presented in Section 2, it is nevertheless possible that its analog may exist in the countable case if the expected number of intersections is replaced by other characteristics of the transitions of trajectories. The analog of the DS theorem for the countable case would give a possibility to consider the case of continuous space. Even the finite case can serve as a basis for the generalization of the DS theorem to continuous time.

14

Isaac M. Sonin

Acknowledgments The author would like to thank Robert Anderson and Joseph Quinn who read the first version of this paper and made valuable comments, and a referee for his helpful suggestions. References [1] Alberti, P. and Uhlmann, A. (1982). Stochasticity and Partial Order: Doubly stochastic maps and unitary mixing. Mathematics and Its Applications 9. D. Reidel Publishing Co., Dordrecht, Boston. [2] Blackwell, D. (1945). Finite nonhomogeneous Markov chains. Ann. Math. 46 594–599. [3] Cohn, H. (1970). On the tail σ-algebra of the finite Markov chains. Ann. Math. Statist. 41 2175–2176. [4] Cohn, H. (1976). Finite nonhomogeneous Markov chains: Asymptotic behavior. Adv. Appl. Prob. 8 502–516. [5] Cohn, H. and Fielding, M. (1999). Simulated annealing: Searching for an optimal temperature schedule. SIAM J. Optim. 9 (3) 779–802. [6] Cohn, H. (1989). Products of stochastic matrices and applications. Int. J. Math. Sci. 12 209–333. [7] Dietz, Z. and Sethuraman, S. (2005). Large deviations for a class of nonhomogeneous Markov chains. Ann. Appl. Probab. 15 (1A) 421–486. [8] Dobrushin, R. (1956). Central limit theorem for non-stationary Markov chains. I. Theory Probab. Appl. 1 (1) 65–80. [9] Ethier, S. and Kurtz, T. (1986). Markov Processes: Characterization and Convergence. Wiley, New York. [10] Griffeath, D. (1975). Uniform coupling of non-homogeneous Markov chains. J. Appl. Probability 12 (4) 753–762. [11] Hajnal, J. (1956). The ergodic properties of non-homogeneous finite Markov chains. Proc. Cambridge Philos. Soc. 52 67–77. [12] Hartfiel, D. (2002). Nonhomogeneous Matrix Products. World Scientific Publishing Co., Inc., River Edge, NJ. [13] Hill, T. (1979). On the existence of good Markov strategies. Trans. Am. Math. Soc. 247 157-176. [14] Iosifescu, M. (1966). On the uniform ergodicity of a class of nonhomogeneous random systems with complete connections. Rev. Roumaine Math. Pures Appl. 11 763–772. [15] Isaacson, D. and Madsen, R. (1976). Markov Chains: Theory and Applications. Wiley, New York. [16] Kemeny, J. and Snell, J. (1976). Finite Markov chains. Reprinting of the 1960 original. Springer-Verlag, New York, Heidelberg. [17] Kingman, J. F. C. (1975). Geometrical aspects of the theory of nonhomogeneous Markov chains. Math. Proc. Cambridge Philos. Soc. 77 171–183. [18] Kolmogoroff, A. N. (1936). Zur Theorie der Markoffschen Ketten. Math. Ann. 112 155–160. Selected Works of A. N. Kolmogorov 2 (A. N. Shiryaev, ed.). Probability Theory and Math. Statistics. Kluwer Acad. Publ. [19] Liu, W. and Yang, W. (1996). An extension of Shannon–McMillan theorem and some limit properties for nonhomogeneous Markov chains. Stoch. Process. Appl. 61 (1) 129–145.

The Decomposition-Separation Theorem

15

[20] Maksimov, V. M. (1970). The convergence of nonhomogeneous doubly stochastic Markov chains. Theory Probab. Appl. 15 622–636. [21] Marshall, A. and Olkin, I. (1979). Inequalities: Theory of Majorization and its Applications. Academic Press, New York. [22] Mukherjea, A. (1983). Nonhomogeneous Markov chains: Tail idempotents, tail sigma-fields and basis. Math. Z. 183 (3) 293–309. [23] Niemiro, W. and Pokarowski, P. (1995). Tail events of some nonhomogeneous Markov chains. Ann. Appl. Probab. 5 (1) 261–293. [24] Ruch, E. and Mead, A. (1976). The principle of increasing mixing character and some of its consequences. Journ. Theor. Chem. A. 41 (2) 95–117. [25] Seneta, E. (1973). On the historical development of the theory of finite inhomogeneous Markov chains. Proc. Cambridge Philos. Soc. 74 507–513. [26] Sonin, I. (1987). Theorem on separation of jets and some properties of random sequences. Stochastics 21 231–250. [27] Sonin, I. M. (1988). The separation of jets and some asymptotic properties of random sequences. Discrete Event Systems: Models and Applications. Lecture Notes in Contr. Inf. 103 275–282. Springer. [28] Sonin, I. M. (1991a). On an extremal property of Markov chains and sufficiency of Markov strategies in Markov decision processes with the Dubins– Savage criterion. Ann. of Oper. Res. 29 417–426. [29] Sonin, I. (1991b). An arbitrary nonhomogeneous Markov chain with bounded number of states may be decomposed into asymptotically noncommunicating components having the mixing property. Theory Probab. Appl. 36 74–85. [30] Sonin, I. (1996). The asymptotic behaviour of a general finite nonhomogeneous Markov chain (the decomposition-separation theorem). In Statistics, Probability and Game Theory, papers in Honor of David Blackwell (T. S. Ferguson, L. S. Shapley and J. B. MacQueen, eds.) Lecture Notes-Monograph Series, 30 337–346. Inst. of Math. Stat. [31] Sonin, I. (1997). On some asymptotic properties of nonhomogeneous Markov chains and random sequences with countable number of values. In Stat. and Control of Stoch. Proc., The Liptser Festschrift, Proceedings of Steklov Math. Inst. Seminar (Y. Kabanov, B. Rozovskii, A. Shiryaev, eds.) 297–313. World Sci. Publ., River Edge, NJ. [32] Sethuraman, S. and Varadhan, S. R. S. (2005). A martingale proof of Dobrushin’s theorem for non-homogeneous Markov chains. Electron. J. Probab. 10 (36) 1221–1235. [33] Shiryaev, A. N. (1996). Probability. Springer-Verlag, New York. [34] Thorisson, H. (2000). Coupling, Stationarity, and Regeneration. Probability and its Applications. Springer-Verlag, New York. [35] Vershik, A. and Kachurovski, A. (1999). Rates of convergence in ergodic theorems for locally finite groups, and reversed martingales. Differ. Equat. and Contr. Proc. (1) 19–26.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 17–30 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000273

Conditional Limit Laws and Inference for Generation Sizes of Branching Processes P. E. Ney1 and A. N. Vidyashankar2,∗ University of Wisconsin–Madison and Cornell University Abstract: Let {Zn : n ≥ 0} denote a single type supercritical branching process initiated by a single ancestor. This paper studies the asymptotic behavior of the history of generation sizes conditioned on different notions of information about the “current” population size. A “suppression property” under the large deviation conditioning, namely that Rn ≡ Zn+1 /Zn > a, is observed. Furthermore, under a more refined conditioning, the asymptotic aposteriori distribution of the original offspring distribution is developed. Implications of our results to conditional consistency property of age is discussed.

1. Introduction The purpose of this note is to provide information on the history of the generation sizes given some “present” information concerning the branching process. We begin with a description of the process. Let {Zn : n ≥ 1} denote a single type branching process initiated by a single ancestor. Let {pj : j ≥ 1} denote the offspring distribution, that is P (Z1 = j) = pj . For 0 ≤ s ≤ 1, let f (s) = E(sZ1 | Z0 = 1) denote the probability generating function. Let m = E(Z1 ) = f  (1−), where f  (·) denotes the derivative of f (·). We denote by q the probability of extinction; then it is well-known that q satisfies the fixed point equation f (s) = s. It is also wellknown that the process {Zn : n ≥ 1} can be defined recursively, using a collection {ξk,j , k ≥ 1, j ≥ 1} of independent and identically distributed (i.i.d) non-negative integer valued random variables defined on a probability space (Ω, F, P ) as follows: Z0 = 1 and for n ≥ 0 (1.1)

Zn+1 =

Zn 

ξn,j ,

j=1

where ξn,j is interpreted as the number of children produced by the jth parent in the nth generation; and P (ξ0,1 = j) = pj . This implies that the generating function of the nth generation population size is given by the n-fold iteration of f (·); i.e., E(sZn ) = fn (s) = f (f (f...(s))) , 0 ≤ s ≤ 1. Let S denote the survival set of the process; i.e., S = {ω : Zn (ω) → ∞}. Then P (S) = 1 − q. We will assume in this paper that the process is supercritical; that is m > 1 and for the sake of exposition, that p0 = 0. This implies that P (S) = 1. Let Wn = Zn /mn . Let Gn denote the sigma field generated by the first n generation sizes, namely, {Z0 , Z1 , · · · Zn }. Then it is well-known that {(Wn , Gn ) : n ≥ 1} ∗ Research Supported in part by a grant from NSF DMS 000-03-07057 and also by grants from the NDCHealth Corporation. 1 Department of Mathematics, University of Wisconsin, Madison, WI 53706-1388. 2 Department of Statistical Science, Cornell University, Ithaca, NY 14853-4201. AMS 2000 subject classifications: Primary 60J80; secondary 60F10 Keywords and phrases: branching processes, functional equation, large deviations, uniform local limit theorem, conditional limit laws, age, conditional consistency, a posteriori distribution.

17

18

P. E. Ney and A. N. Vidyashankar

is a non-negative martingale sequence and hence converges with probability one to a random variable W . By the Kesten-Stigum theorem (see [3]), a necessary and sufficient condition for W to be non-trivial is that E(Z1 log Z1 ) < ∞. Furthermore, W has density w(·) and w(x) > 0 for x > 0. Let Rn = Zn+1 /Zn . The quantity Rn is called the Nagaev estimator of the mean of the branching process and is its maximum likelihood estimator when (Zn , Zn+1 ) are observed. Large deviations of Rn (which will be relevant) have been studied in [1], [4], [12], [13], [9]. It is known from these papers that the large deviation behavior of Rn is different depending on whether p1 + p0 > 0 or p1 + p0 = 0. The case when p1 + p0 > 0 is called the Schr¨oder case while p1 + p0 = 0 is called the B¨ ottcher case. Recent work in the area of evolutionary biology is concerned with statistically estimating the age of the last common ancestor using the fossil record ([11] and [15]). Such data are modeled using either discrete or continuous time branching processes or variants thereof. In these problems, an important difference between the age and the divergence time (to be defined below) have been observed. Furthermore, in the context of branching processes, an interesting recent work of [10] attempted to recreate the past based on the “present” observed generation size in order to determine the age of a population. One of the motivations for our study was to understand both these phenomena from the perspective of the conditional limit distributions. It turns out that, when viewed from the viewpoint of conditional limits, the difference between the age and the divergence time occurs if the population size is “smaller than expected” (see Remark 5 in Section 2). Now, “smaller than expected growth” is caused due to small values of Zk for various values of k. This phenomenon is peculiar to the Schr¨ oder case. For this reason, we deal with the Schr¨ oder case in this paper and treat the B¨ ottcher case in a different publication. Gibbs conditioning principle in the context of i.i.d. random variables {Xn : n ≥ 1} defined on R is concerned with the asymptotic behavior of  (1.2)

P



Sn ∈A , X1 ∈ · n

EX1 ∈ / A,

or more generally, of (1.3)

Sn ∈A , P (X1 , X2 , . . . , Xkn ) ∈ · n

n where A is a Borel subset of R, Sn = i=1 Xi , and kn → ∞. In the context of branching processes, one approach is to replace Snn by Rn ; or by the joint event {Rn ∈ (·), Zn ∈ (·)}. Now, unlike the i.i.d. case, two situations arise; namely the large n behavior of P (Z1 ∈ (·) | Rn > a > m) and that of P (ξn,1 ∈ (·) | Rn > a > m). We call the former case, a “global” conditional limit law while the latter a “local” conditional limit law. This paper is concerned with the global conditional limit laws. The main technical tools needed in this paper are a uniform local limit theorem in the range of Zn ∼ xmn where x belongs to a bounded interval, and rates of convergence of generating functions. To facilitate our discussions in the next sections, we introduce more notation concerning the rate of decay of generating functions. Let, for 0 ≤ s < 1 (1.4)

Qn (s) =

fn (s) − q , γn

Conditional Distributions for BP

19

where γ = f  (0). It is known that (see [3]) limn→∞ Qn (s) = Q(s) exists, with Q(1) = ∞. Furthermore, Q(·) admits a power series representation; that is,  (1.5) νk sk . Q(s) = k≥1

When p0 = 0, γ reduces to p1 . It follows from (1.5) that (see [3]) P (Zn = k) = νk . n→∞ γn lim

(1.6)

The quantities νk will show up at several places in the future sections. The rest of the paper is organized as follows: Section 2 contains statements and discussion of the main results while Section 3 contains proofs. Section 4 deals with the limit laws concerning the age of a branching process. 2. Statement and Discussion of Results We begin with the uniform local limit theorem which will be needed in the proof of Theorem 2 below. This is a uniform version of Theorem 4.1, Chapter II of [3]. Before we state the theorem, we need a definition. A sequence yn of real numbers is said to be regular if yn mn is an integer for all n ≥ 1. In the following let 0 < c < d < ∞ and Pk (·) = P (· | Z0 = k). Theorem 2.1. Assume that E(Z1 log Z1 ) < ∞ and that yn → Δ0 is a regular sequence. Assume further that for every n ≥ 1 there exists an xn ∈ [c, d] such that xn yn is an integer. Let Cn = {x ∈ [c, d] : xyn is an integer}. Then the following hold:

xn  (2.1) = wk (Δ0 ); 1. lim mn Pk Zn = mn yn 1 + n n→∞ m

x  2. lim mn sup Pk Zn = mn yn 1 + n (2.2) = wk (Δ0 ); n→∞ m {x∈Cn }

x  n 3. lim m inf Pk Zn = mn yn 1 + n (2.3) = wk (Δ0 ). n→∞ m {x∈Cn } Turning to conditional limits, we have Proposition 2.1. Assume that E(exp(θZ1 )) < ∞ for some θ > 0 and that p1 > 0. Then, lim P (Zn = k | Rn > a > m) = γ(k) ≥ 0,

(2.4) where

 k≥1

n→∞

γ(k) = 1.

This suggests that the main contribution to P (Rn > a) comes from “small” values of Zn , which implies that the usual large deviation estimates and Cramertype rate functions do not come into the calculation of (2.4). We refer to this as the suppression property, and it will manifest itself more subtly in future results. This leads at once to a “degeneracy” property on the early history, namely Proposition 2.2. Assume that E(exp(θZ1 )) < ∞ for some θ > 0 and that p1 > 0. Then, (2.5)

lim P (Z1 = k | Rn > a > m) = δ1 (k).

n→∞

20

P. E. Ney and A. N. Vidyashankar

Furthermore, for any k(n) → ∞ and kn = o(n) lim P ((Z1 = 1, . . . Zkn = 1) | Rn > a) = 1.

(2.6)

n→∞

Remark 2.1. The small values of Zn are caused due to p0 + p1 being positive. Thus, the suppression property is inherent in the Schr¨ oder case. Our next proposition is concerned with the behavior of the distribution of Zk when conditioned on Zn ∈ vn [c, d], c > 0. In this note we consider the case where vn ∼ mn . Proposition 2.3. Assume that E(Z1 log+ Z1 ) < ∞. Let, for c > 0,  dmk πl (c, d) =

(2.7)

cmk d c

wl (x)dx

.

w(x)dx

Then, (2.8)

lim P (Zk = l | Zn ∈ mn [c, d]) = πl (c, d)P (Zk = l),

n→∞

where



(2.9)

πl (c, d)P (Zk = l) = 1.

l≥1

Remark 2.2. Specializing when k = 1 we get from the above proposition that, for any c > 0, lim P (Z1 = l | Zn ∈ mn [c, d]) = πl (c, d)pl ,

(2.10)

n→∞

and lim P (Z1 = l | Zn = cmn ) =

(2.11)

n→∞

wl (mc) pl m. w(c)

Remark 2.3. Note that the conditional limit mentioned above can be viewed as a change of measure of P (Zk = l), which is reminiscent of the change of measure in the classical Gibbs conditioning principle. The more subtle and interesting result comes from the combined conditioning, namely that Rn > a, Zn ∈ mn [c, d]; Theorem 2.2. Assume that E(exp(θZ1 )) < ∞ for some θ > 0. Then, for any c > 0, (2.12)

lim P (Z1 = l | Rn > a, Zn ∈ mn [c, d]) =

n→∞

wl (mc) pl m. w(c)

Remark 2.4. Here the interaction between the events Rn > a and Zn ∈ mn [c, d] require estimates of Rn in the large deviation range, and uniform estimates of Zn as in Theorem 1. Note that the limits in (2.11) and (2.12) are the same even though the conditioning sets are different. This result follows from the fact that the addition of Rn > a to the conditioning Zn ∈ mn [c, d] and the previously mentioned suppression property, forces the limit to be “as small as possible,” i.e., πl (cm, dm) is replaced in (2.10) by (2.13)

lim πl (cm, dm) =

d→c

wl (mc) pl m. w(c)

Conditional Distributions for BP

21

Remark 2.5. The case when one conditions on Zn ∈ vn [c, d] with vn = o(mn ) leads to a different behavior. It turns out that the conditioned limit forces Z1 = . . . = Zkn = 1 up to kn < n − logm vn , and then Zk starts to increase for k > kn . One refers to this time as the divergence time (this is not a random time). Thus, under the conditioning in Propositions 1 and 2, divergence time is close to the “present,” i.e., there is no growth until last few generations. The age of the branching process is defined to be the number of generations of the process at the time of observation. In Proposition 3, when the conditioning is Zn ∈ vn [c, d] with vn ∼ mn , the process starts to grow immediately, so the age and the divergence time agree, but the growth distribution changes according to the distribution in (2.8). In Theorem 2, when the conditioning includes Rn > a, the age and the divergence time again agree. But if vn = o(mn ), the divergence time is of order n − logm vn . This result is treated elsewhere. Divergence time is important in several biological and population models as mentioned in Section 1. Remark 2.6. If instead of assuming Z0 = 1 we take P (Z0 = k) = π(k), where  k≥1 π(k) = 1, then under the conditioning carried out above, the initial distribution π(·) will undergo a change of measure along similar lines to Propositions 2 and 3 and Theorem 2. 3. Proofs In this section we provide the proofs  of our results in Section 2. k Proof of Proposition 1. Let X k = k −1 j=1 Xj , Xj ’s are i.i.d. with distribution Z1 . Then, using Theorem 1 of [4], and (1.5) it follows that P (Rn > a | Zn = k)P (Zn = k) P (Rn > a) P (Zn = k)p−n 1 = P (X k > a) P (Rn > a)p−n 1 νk = ak , → P (X k > a) La

P (Zn = k | Rn > a) =

where νk is as in (1.6), and lim p−n 1 P (Rn > a) = La =

n→∞

Thus,

 k≥1



νk P (X k > a).

k≥1

ak = 1.

Remark 3.1. The exponential moment hypothesis in Proposition 1 (or Proposition 2 below) is not necessary. If E(Z1r ) < ∞ and p1 mr > 1 then the above argument also goes through. Proof of Proposition 2. Let k(n) = o(n). Then using Theorem 1 of [4], it follows that P (Rn > a | Zk(n)=1 )P (Zk(n) = 1) P (Zk(n) = 1 | Rn > a) = P (Rn > a) P (Rn−k(n) > a) k(n) p1 = P (Rn > a) −(n−k(n))

=

p1

→1

P (Rn−k(n) > a) −k(n) k(n) p1 p1 p−n P (Rn > a) 1

22

P. E. Ney and A. N. Vidyashankar

implying (2.5) and (2.6). Proof of Proposition 3. Let Ak = mk [c, d]. Then, by Theorem II.4.1 in ([3]) P (Zn ∈ An | Zk = l)P (Zk = l) P (Zn ∈ An ) Pl (Zn−k ∈ mk An−k ) = P (Zk = l) P (Zn ∈ An ) ⎧ k ⎫ ⎨ dm wl (x)dx ⎬ cmk → P (Zk = l) d ⎩ w(x)dx ⎭

P (Zk = l | Zn ∈ An ) =

c

= πl,k (c, d)P (Zk = l). To complete the proof of Proposition 3, we need to show that l) = 1. This follows from Lemma 1 below.

 l≥1

πl (c, d)P (Zk =

Remark 3.2. One could take c = 0 in (2.10) in Proposition 3. In this case, the proof follows directly from the convergence in distribution of Wn to W and does not use Theorem II.4.1 from [3].  bmk l b  Lemma 3.1. l≥1 P (Zk = l) amk w (x)dx = a w(x)dx. Proof. Let φ(θ) = E(eiθW ). Then, by the inversion theorem ([7])  1 l w (x) = (3.1) e−iθx (φ(θ))l dθ. 2π R Now, integrating the LHS of (3.1) between amk and bmk we get  b  bmk wl (x)dx = mk wl (ymk )dy, amk

a

where the RHS of the above equation follows from the substitution x = ymk . Now,    b k mk b mk wl (ymk )dy = e−iθym (φ(θ))l dθdy 2π R a a  b 1 = e−iηy (φ(η/mk ))l dηdy, 2π a R where the last identity follows upon setting θmk = η. Thus,  b   bmk wl (x)dxP (Zk = l) = e−iηy (φ(η/mk ))l dηdyP (Zk = l) l≥1

amk

a l≥1



b

=

e−iηx φ(η)dηdx

R

a



R

b

w(x)dx,

= a

where we used the identity (which is a consequence of the branching property)  φl (η/mk )P (Zk = l) = φ(η). l≥1

This completes the proof of the lemma.

Conditional Distributions for BP

23

Proof of Theorem 1. By Theorem 4.2 in Chapter II of [3], it is sufficient to establish (ii) and (iii). We will establish (ii) as the proof of (iii) is similar. Let us set jn (x) = mn yn (1 + xm−n ) and recall that, Cn = {x ∈ [c, d] : xyn is an integer}. Then, it follows from the assumptions of the theorem that lim sup |m−n jn (x) − Δ0 | = 0.

(3.2)

n→∞ x∈Cn

Since jn (x) is an integer for all n and some x ∈ [c, d], Pk (Zn = jn (x)) is not identically zero for all x ∈ [c, d]. Now, by the inversion theorem ([7])  π 1 Pk (Zn = jn (x)) = (3.3) (fn (eiθ ))k e−ijn (x)θ dθ. 2π −π Now, integrating by parts the RHS of (3.3) and using fnk (eiπ )e−ilπ = fnk (e−iπ )eilπ for all integers k and l, it follows that 

k 1 Pk (Zn = jn (x)) = (3.4) I(n, k, x), 2π jn (x) where (3.5)



π

I(n, k, x) = −π

(fn (eiθ ))k−1 (fn (eiθ ))e−i(jn (x)−1)θ dθ.

Next, making a change of variable θ = tm−n and setting ψn (t) = E(eitWn ), (3.5) reduces to  πmn −n −n (3.6) I(n, k, x) = (ψn (t))k−1 (m−n fn (eitm ))e−itm (jn (x)−1) dt. −πmn

Thus, (3.7)

mn Pk (Zn = jn (x)) − wk (Δ0 ) = Tn (1, x) + Tn (2, x) + Tn (3),

where (3.8)

Tn (1, x) =

(3.9)

k ((m−n jn (x))−1 − Δ−1 0 )I(n, k, x), 2π

Tn (2, x) =

k (I(n, k, x) − I(n, k, 0)), 2πΔ0

and Tn (3) =

(3.10)

1 (kI(n, k, 0) − 2πΔ0 wk (Δ0 )). 2πΔ0

We will now show that supx∈Cn |Tn (2, x)| → 0 as n → ∞. To this end, note that (3.11)

I(n, k, x) − I(n, k, 0)  πmn −n ((ψn (t))k−1 )(m−n fn (eitm ))(B(n, x, t)dt = −πmn   n  πm

0

=

+ −πmn

−n

((ψn (t))k−1 )(m−n fn (eitm

0

= J(n, 1)(x) + J(n, 2)(x),

))B(n, x, t)dt

24

P. E. Ney and A. N. Vidyashankar

where −n

B(n, x, t) = e−itm

(3.12)

(jn (x)−1)

−n

− e−itm

(jn (0)−1)

.

Notice that |B(n, x, t)| ≤ 2 and that lim sup |B(n, x, t)| → 0.

(3.13)

n→∞ x∈Cn

We now establish that J(n, 2)(x) converges uniformly to 0. Similar arguments yield that J(n, 1)(x) converges uniformly to 0, thus establishing that supx∈Cn |Tn (2, x)| converges to 0. Returning to J(n, 2)(x), we express it as   n  πmr π  −n J(n, 2)(x) = ((ψn (t))k−1 (m−n fn (eitm ))B(n, x, t)dt + 0

(3.14)

=

n 

πmr−1

r=1

J(n, 2, r)(x),

r=0

where (3.15)



π

J(n, 2, 0)(x) =

−n

((ψn (t))k−1 (m−n fn (eitm

))B(n, x, t)dt,

0

and, for 1 ≤ r ≤ n,  (3.16)

πmr

J(n, 2, r)(x) = πmr−1

−n

((ψn (t))k−1 (m−n fn (eitm −n

Next, we observe that |(m−n fn (eitm gence theorem it follows that

))B(n, x, t)dt.

))| ≤ 1. Hence, using the bounded conver-

lim sup |J(n, 2, 0)(x)| = 0.

(3.17)

n→∞ x∈Cn

Now, let 1 ≤ r ≤ n. Then for t ∈ (πm(r−1) , πmr ), (3.18) (3.19)

−n

|m−n fn (eitm

−n

)| = |m−r fr (fn−r (eitm ≤

−n

 ))||(m−(n−r) fn−r (eitm

))|

−n |m−r fr (fn−r (eitm ))|.

Now, since t ∈ (πm(r−1) , πmr ), it follows that tm−n ∈ (πm−(n−r−1) , πm−(n−r ); −n which implies that fn−r (eitm ) ∈ S, where   −j π S= (3.20) ≤u≤π . fj (eium ) : m j≥0

Define μr = sup fr (s).

(3.21)

s∈S

Then, for t ∈ (πm(r−1) , πmr ), it follows from(3.18) that (3.22)

−n

|m−n fn (eitm

−n

)| ≤ |m−r fr (fn−r (eitm

))| ≤ m−r μr ,

Conditional Distributions for BP

where that,

 r≥1

25

μr < ∞, by Dubuc’s lemma (see Lemma 1, Page 80 of [3]). Also observe |J(n, 2, r)(x)| ≤ A(n, r),

(3.23) where  (3.24)

πmr

A(n, r) = πmr−1

−n

|m−n fn (eitm

))(| sup |B(n, x, t)|)dt ≤ 2μr , x∈Cn

 where the last inequality follows from (3.18)-(3.22). Since r≥1 μr < ∞, it follows by the dominated convergence theorem that   lim (3.25) |J(n, 2, r)(x)|I[0,n] (r) = lim |J(n, 2, r)(x)|. n→∞

r≥1

r≥1

n→∞

Now, to evaluate the limn→∞ |J(n, 2, r)(x)| we again apply the dominated convergence theorem. To this end, we first use (3.22) and then use (3.23) to take the limit inside the integral to get, (3.26)

0 ≤ lim sup |J(n, 2, r)(x)| n→∞ x∈Cn



(3.27)

n→∞

 (3.28)

πmr

−n

|m−n fn (eitm

≤ lim

πmr−1

πmr

= πmr−1

−n

lim |m−n fn (eitm

n→∞

))(| sup |B(n, x, t)|)dt x∈Cn

))(| sup |B(n, x, t)|)dt = 0. x∈Cn

This proves the uniform convergence of |Jn (2, x)| to 0 as n → ∞. Similar arguments yield uniform convergence of |Jn (1, x)| to 0 as n → ∞. Combining these two we get supx∈Cn |Tn (2, x)| → 0 as n → ∞. To complete the proof of the theorem, we need to establish the uniform convergence of |T (n, 1, x)| to 0 and the convergence of |Tn (3)| to 0 as n → ∞. However, it also follows from the calculations (3.5)-(3.24) that (3.29)

sup |I(n, r, x)| ≤ C < ∞ x∈Cn

where C is a positive constant. Thus, it follows from (3.2) that supx∈Cn |T (n, 1, x)| → 0 as n → ∞. Finally, convergence of |Tn (3)| to zero follows from Theorem 2 on page 81 of [3]. This completes the proof of (2). In fact, we have proved that supx∈Cn |mn Pk (Zn = jn (x)) − wk (Δ0 )| → 0. This then also implies, with some further analysis, that inf x∈Cn [mn Pk (Zn = jn (x)) − wk (Δ0 )] → 0, which is (3). n Proof of Theorem 2. Let X n = n1 k=1 Z1,k , where {Z1,k , k ≥ 1} are i.i.d. with distribution same as that of Z1 . Let Λ(θ) = log E(exp(θZ1 )) denote the cumulant generating function and Λ (a) = supθ [θa − Λ(θ)] denote the Legendre–Fenchel transform of Λ(θ). By the Bahadur–Rao theorem (see [5]), √  (3.30) lim lelΛ (a) P (X l > a) = ca . l→∞

Let us set (3.31)

B(l, a) =

√ lΛ (a) le P (X l > a) − ca .

26

P. E. Ney and A. N. Vidyashankar

Since a is fixed, we will suppress the dependence on a and write B(l) for B(l, a). Now, by definition of conditional probability, ⎫ ⎧ ⎨ ln,2 P (X l > a)Pk (Zn−1 = l) ⎬ l=ln,1 P (Z1 = k | Rn > a, Zn ∈ mn [c, d]) = pk ⎩ ln,2 P (X l > a)P (Zn = l) ⎭ l=ln,1 ≡

(3.32)

In , Jn

n where  + 1 and ln,2 = dmn . Let us set η(n, k, l) = Pk (Zn = l), h(l) = √ ln,1 = cm  (1/ l) exp(−lΛ (a)), and dn = ln,2 − ln,1 . Hence, we can express In = In,1 + In,2 , where

(3.33)

In,1 = pk h(ln,1 )

dn 

B(t + ln,1 )θ(n, t)η(n − 1, k, t + ln,1 )

t=0

and (3.34)

In,2 = pk ca h(ln,1 )

dn 

θ(n, t)η(n − 1, k, t + ln,1 );

t=0 −1

and where θ(n, t) = (1+t/ln,1 ) 2 e−tΛ (a) , l = t+ln,1 , and B is as in 3.31. Similarly, one can express Jn as a sum of Jn,1 and Jn,2 where (3.35)

Jn,1 = h(ln,1 )

dn 



B(t + ln,1 )θ(n, t)η(n, 1, t + ln,1 ),

t=0

and (3.36)

Jn,2 = ca h(ln,1 )

dn 

θ(n, t)η(n, 1, t + ln,1 ).

t=0

Thus the conditional probability in question becomes In,1 P (Z1 = k | Rn > a, Zn ∈ m [c, d]) = Jn,1 n



−1 

−1 Jn,2 Jn,1 In,2 + . 1+ 1+ Jn,1 Jn,2 Jn,2

We will now establish the following: 1. limn→∞ Jn,1 /Jn,2 = 0, 2. lim supn→∞ In,1 /Jn,1 ≤ C < ∞, 3. limn→∞ In,2 /Jn,2 =

mwk (mc) pk . w(c)

These facts will imply the theorem. We start with the proof of (3). Consider, I˜n,2 = mn−1 (pk ca h(ln,1 ))−1 In,2 = mn−1

dn 

θ(n, t)η(n − 1, t + ln,1 ).

t=0

By the local limit theorem (see Chapter 2, Section 4.1 in [3]), if vn → Δ then lim mn Pk (Zn = mn vn ) = wk (Δ).

n→∞

Conditional Distributions for BP

27

Using this we get that n−1

lim m

n→∞

η(n − 1, t + ln,1 ) = lim m

n−1

n→∞

 

t n−1 ln,1 Pk Zn−1 = m 1+ mn−1 ln,1

= wk (cm), l

n−1 η(n − since mn,1 n−1 → cm. Now applying the uniform bound in Theorem 1 to m  1, k, t + ln,1 ), and noting that θ(n, t) ≤ exp(−tΛ (a)), it follows from the dominated convergence theorem that

lim I˜n,2 = wk (cm)Γ,

n→∞

where Γ = exp(Λ (a))/(exp(Λ (a)) − 1). In a similar manner, one can show that lim J˜n,2 ≡ lim mn (ca h(ln,1 ))−1 Jn,2

n→∞

n→∞

= lim mn n→∞

dn 

θ(n, t)η(n, 1, t + ln,1 )

t=1

= w(c)Γ. Finally, In,2 m−(n−1) (pk ca h(ln,1 ))−1 I˜n,2 = lim n→∞ Jn,2 n→∞ m−n (ca h(ln,1 ))−1 J˜n,2 lim

=

mpk wk (cm)Γ w(c)Γ

yielding (3). Turning to the proof of (1), note that Jn,1 = Jn,2

dn

B(t + ln,1 )θ(n, t)η(n, 1, t + ln,1 ) . dn ca t=0 θ(n, t)η(n, 1, t + ln,1 )

t=1

Now, using sup1≤t≤dn |B(t + ln,1 )| → 0 as n → ∞ we have that lim

n→∞

Jn,1 = 0. Jn,2

Finally, turning to (2), by Theorem 1, dn mpk t=0 B(t + ln,1 )θ(n, t)mn−1 η(n − 1, k, t + ln,1 ) In,1 = dn n Jn,1 t=0 B(t + ln,1 )θ(n, t)m η(n, 1, t + ln,1 ) max0≤t≤dn mn−1 η(n − 1, k, t + ln,1 ) min0≤t≤dn mn η(n, 1, t + ln,1 ) ≤ C < ∞,

≤ mpk

where C is a constant. This completes the proof of Theorem 2.

28

P. E. Ney and A. N. Vidyashankar

4. Age of a Branching Process As explained in the introduction, statistical estimation of the age of a simple branching process is an important problem arising in several scientific contexts. It was first studied by Stigler ([14]) who estimated the age using maximum likelihood methods, i.e., by maximizing P (Zt = N (t) | Zt > 0) with respect to t. In this context, the population age t is treated as an unknown parameter and is estimated using the current population size N (t). Stigler derived the estimator T1 (N ) in (4.1) for offspring distributions with fractional linear generating functions, and suggested this as an estimator of the age for general offspring distributions. Stigler’s estimate is given by T1 (N ) =

(4.1)

log N (t) . log m

→0 Stigler established that T1 (N (t)) is β-consistent for t in the sense that T1 Nt(t)−t β a.s. for every β > 0 as t → ∞. More recently, [10] studied age by constructing a backward process Xj and defined the estimate of age as T2 (N ) = inf{r : Xr = 1}.

(4.2)

In this formulation, if the offspring distribution is geometric then the reverse process is a Galton-Watson process with immigration starting with N ancestors. Using the duality between the forward and the backward process, [10] obtained detailed results concerning T2 (N ) − T1 (N ) as N → ∞. The bias in the estimate of age is given by B(t) = t − T1 (N (t)).

(4.3)

Our next result shows that the bias B(t) conditioned on Rt > a > m diverges to infinity and is a corollary to Proposition 1. Corollary 4.1. Let k(t) be a sequence of constants converging to infinity such that k(t) = o(t) as t → ∞. Then, (4.4)

lim P (B(t) ≥ k(t) | Rt > a > m) = 1.

t→∞

 Proof. Let  > 0; then by Proposition 1, there exists k0 () such that j≤k0 γ(j) > 1 − . Now we observe, by simplifying, that [B(t) ≥ k(t)] = [Wt ≤ m−k(t) ]. Since k(t) = o(t), it follows that mt m−k(t) diverges to ∞. Thus, (4.5) (4.6) (4.7) (4.8)

P (B(t) ≥ k(t) | Rt > a) = P (Wt ≤ m−k(t) | Rt > a) = P (Zt ≤ mt−k(t) | Rt > a) ≥ P (Zt ≤ k0 | Rt > a) →

k0 

γ(j) > 1 − ,

j=1

by the choice of k0 . Thus, lim inf t→∞ P (B(t) ≥ k(t) | Rt > a) ≥ 1 − . Since  is arbitrary, the corollary follows.

Conditional Distributions for BP

29

5. Concluding Remarks In this paper, we studied the evolutionary structure of a branching process through the behavior of conditional limits under various notions of “information” about the current population size. We observed a “suppression property” which is a consequence of the assumption p0 + p1 > 0. This implies that conditionally on the large deviation type information, the bias in the estimate of the age diverges to infinity; or in other words, the estimator is conditionally inconsistent. A natural next question concerns the conditional consistency of the estimator of age under other notions of “information.” These and other related issues are studied in a subsequent paper.

Acknowledgements Authors would like to thank the anonymous referee for a careful reading of the manuscript and several useful suggestions. References [1] Athreya, K. B. (1994). Large deviation rates for branching processes-I, the single type case. Annals of Applied Probability 4 779–790. [2] Athreya, K. B. and Ney, P. E. (1970). The local limit theorem and some related aspects of supercritical branching processes. Transactions of the American Mathematical Society 2 233–251. [3] Athreya, K. B. and Ney, P. E. (1972). Branching Processes. SpringerVerlag, Berlin. [4] Athreya, K. B. and Vidyashankar, A. N. (1993). Large deviation results for branching processes. In Stochastic Processes. A Festschrift in Honour of Gopinath Kallianpur (S. Cambanis, J. K. Ghosh, R. L. Karandikar and P. K. Sen, eds.) 7–12. Springer, New York. [5] Bahadur, R. R. and Ranga Rao, R. (1960). On deviations of the sample mean. Annals of Mathematical Statistics 31 1015–1027. [6] Bingham, N. H. (1988). On the limit of a supercritical branching process. Journal of Applied Probability 25A 215–228. [7] Chow, Y. S. and Teicher, H. (1988). Probability Theory: Independence, Interchangeability, Martingales. Springer Verlag, Berlin. [8] Dubuc, S. and Senata, E. (1976). The local limit theorem for Galton–Watson Process. Annals of Probability 3 490–496. [9] Fleischmann, K. and Wachtel, V. (2005). Lower deviation probabilities for supercritical Galton–Watson Processes. Preprint. [10] Klebaner, F. C. and Sagitov, S. (2002). The age of a Galton–Watson population with a geometric offspring distribution. Journal of Applied Probability 4 816–828. [11] Leman, C. S., Chen, Y., Stajich, J. E., Noor, M. A. F. and Uyenoyama, M. K. (2005). Likelihoods from summary statistics: Recent divergence between species. Genetics 171 1419–1436. [12] Ney, P. E. and Vidyashankar, A. N. (2003). Harmonic moments and large deviation rates for supercritical branching processes. Annals of Applied Probability 13 475–489.

30

P. E. Ney and A. N. Vidyashankar

[13] Ney, P. E. and Vidyashankar, A. N. (2004). Local limit theory and large deviation rates for supercritical branching processes. Annals of Applied Probability 14 1135–1166. [14] Stigler, S. M. (1970). Estimating the age of a Galton–Watson branching process. Biometrika 57 507–512. ´, S., Marshall, C. R., Will, O., Soligo, C. and Martin, R. [15] Tavare (2002). Using the fossil record to estimate the age of the last common ancestor. Nature 416 726–729.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 31–40 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000282

Absorption Time Distribution for an Asymmetric Random Walk S. N. Ethier1 University of Utah Abstract: Consider the random walk on the set of nonnegative integers that takes two steps to the left (just one step from state 1) with probability p ∈ [1/3, 1) and one step to the right with probability 1 − p. State 0 is absorbing and the initial state is a fixed positive integer j0 . Here we find the distribution of the absorption time. The absorption time is the duration of (or the number of coups in) the well-known Labouchere betting system. As a consequence of this, we obtain in the fair case (p = 1/2) the asymptotic behavior of the Labouchere bettor’s conditional expected deficit after n coups, given that the system has not yet been completed.

1. Introduction Fix a positive integer j0 , and let {Xn }n≥0 be the random walk in Z+ with initial state X0 = j0 , one-step transition probabilities ⎧ ⎨ p if k = (j − 2)+ , P (j, k) := q if k = j + 1, (1) j ≥ 1, ⎩ 0 otherwise, where 1/3 ≤ p < 1 and q := 1 − p, and absorption at state 0. We are interested in the distribution of the absorption time (2)

N := min{n ≥ 1 : Xn = 0}.

Since −2p + q ≤ 0, N is finite with probability 1. Probabilities and expectations involving N will be subscripted to indicate their dependence on j0 . The random walk {Xn }n≥0 arises in connection with the Labouchere system (also known as the cancellation system), one of the two or three best-known betting systems. It was popularized by British journalist and Member of Parliament Henry Du Pr´e Labouchere (1831–1912), who attributed it to French mathematician and philosopher Marie Jean Antoine Nicolas de Caritat, Marquis de Condorcet (1743– 1794) (Thorold 1913, p. 66). The system is applied to games of repeated coups that pay even money. The gambler’s bet size at each coup is determined by an ordered list of positive integers kept on his score sheet and updated after each coup. Given such a list, the gambler’s bet size at the next coup is the sum of the extreme terms on the list. (This is the same as the sum of the first and last terms on the list, except when there is only one term.) Following the resolution of this bet, the list is updated as follows: After a win, the extreme terms are cancelled. After a loss, the amount just lost is appended 1 Department of Mathematics, University of Utah, 155 S. 1400 E., Salt Lake City, UT 84112, e-mail: [email protected] AMS 2000 subject classifications: Primary 60G50; secondary 60C05, 60G40 Keywords and phrases: random walk, absorption time, gambling system, martingale

31

32

S. N. Ethier

to the list as a new last term. The system is begun with an initial list, the most popular choice for which is 1, 2, 3, 4; however, Labouchere himself used 3, 4, 5, 6, 7 (Thorold 1913, p. 66). The initial list, together with the sequence of wins and losses, determines all bet sizes. Notice that the sum of the terms on the list plus the gambler’s cumulative profit remains constant in time (see Section 4 for details). Therefore, once the list becomes empty, betting is stopped and the gambler’s cumulative profit is the sum of the terms on the initial list. Notice also that Xn represents the length of the list after n independent coups, each with win probability p, so N is the duration of the system, that is, the number of coups required to complete it. Of course, we are making the unrealistic assumptions that the gambler has unlimited resources and that there is no maximum betting limit. Downton (1980) found a recursive formula for the distribution of N in the case j0 = 4. It is easy to generalize his result to arbitrary j0 . Theorem 1 (Downton). With ln := (2n + 1 − j0 )/3+ for all n ≥ 0, define a modified Pascal triangle (depending on j0 ) recursively by c(0, l) := δ0,l , where δ0,l is the Kronecker delta, and1 c(n − 1, l − 1) + c(n − 1, l) if ln ≤ l ≤ n, (3) c(n, l) := 0 otherwise, for all n ≥ 1. Then (4)

Pj0 (N = n) = c(n − 1, ln−1 )pn−ln−1 q ln−1

if n ≥ (j0 + 1)/2 and (n + j0 − 1)/3 ∈ / Z, and Pj0 (N = n) = 0 otherwise.   Proof. Let us redefine c(n, l) to be the number of the nl permutations of l losses and n − l wins for which the Labouchere bettor has not yet completed the system. For c(n, l) to be positive, we must have 0 ≤ l ≤ n and j0 + l − 2(n − l) ≥ 1, hence ln ≤ l ≤ n. To establish (3) it suffices to consider whether the nth coup results in a loss or a win. Then (4) follows by noting that, if N = n, then the first n − 1 coups must have ln−1 losses (the minimal number) and n − 1 − ln−1 wins, and the nth coup must result in a win. Finally, one can check that ⎧ ⎨ 3 if n + j0 − 1 ≡ 0 (mod 3), (5) j0 + ln−1 − 2(n − 1 − ln−1 ) = 1 if n + j0 − 1 ≡ 1 (mod 3), ⎩ 2 if n + j0 − 1 ≡ 2 (mod 3), so the system can be completed at the nth coup if and only if (n+j0 −1)/3 ∈ / Z. Downton’s theorem is useful for numerical computation. It also gives the upper bound

 n − 1 n−ln−1 ln−1 Pj0 (N = n) ≤ p (6) q ln−1 under the conditions of (4). For example, taking j0 = 1 for convenience, Downton’s bound gives 

3m m+1 2m p P1 (N = 3m + 1) ≤ (7) q m 1 There is a minor error in Downton’s formulation: He replaced n by n + 1 in the equation of our (3) without doing so in the inequalities of our (3). His tables are nevertheless correct.

Absorption Time Distribution for an Asymmetric Random Walk

and (8)

33

3m + 1 m+1 2m+1 p q m

 P1 (N = 3m + 2) ≤

for all m ≥ 0. As we will see in Section 2, the actual state of affairs is 

3m m+1 2m 1 p (9) q P1 (N = 3m + 1) = 2m + 1 m and (10)

P1 (N = 3m + 2) =

 3m + 1 m+1 2m+1 1 p q m m+1

for all m ≥ 0. Equations (9) and (10) allow us to determine (in Section 3) the asymptotic behavior of P1 (N ≥ n + 1) as n → ∞, and this leads (in Section 4) to the asymptotic behavior of the Labouchere bettor’s conditional expected deficit after n coups, given that the system has not yet been completed, at least if the game is fair (p = 1/2). Of course, we do not restrict our attention to the case j0 = 1. Downton (1980) observed that “no probability analysis specific to the [Labouchere] system appears to have been made.” He also remarked that “the probability structure of the size of the bets in the system remains an unsolved problem.” More than 25 years later, the problem is still open. The present note is a start, however, which we hope will encourage further investigation. 2. The absorption time distribution A direct derivation of the distribution of N is not entirely straightforward. Let Y1 , Y2 , . . . be i.i.d. with P (Y1 = −2) = p and P (Y1 = 1) = q. Define N1 := min{n ≥ 1 : 1 + Y1 + · · · + Yn ∈ {−1, 0}}, and note that N1 is distributed as the P1 -distribution of N . With g(u) := E[uY1 ] = pu−2 + qu (u = 0), apply the optional stopping theorem to the martingale Mn := u1+Y1 +···+Yn g(u)−n (n ≥ 0) stopped at time N1 . Using Cardano’s formula to solve the cubic equation v = 1/g(u) or qvu3 − u2 + pv = 0, one can show that the P1 -probability generating function of N is (11)

E1 [v N ] = 2

1 − cos((1/3) cos−1 (1 − (27/2)pq 2 v 3 )) 3qv [1 − cos((1/3) cos−1 (1 − (27/2)pq 2 v 3 ))]2 − (3qv)2 +3

sin2 ((1/3) cos−1 (1 − (27/2)pq 2 v 3 )) (3qv)2

for 0 < v < 1. With some difficulty this can then be written as a power series in v (the singularity at v = 0 is removable). Fortunately, the hard work has already been done in the combinatorics literature. The results we need follow easily from a generalization of the ballot theorem stated by Barbier (1887) and proved by Aeppli (1924) in his Ph.D. thesis under G. P´ olya. The ballot theorem itself is due to Bertrand (1887). See Takacs (1997) for a survey of this topic. Let am be the number of paths, with steps (1, 0) and (0, 1) (i.e., east and north), from (0, 0) to (2m, m) that never rise above (but may touch) the line

34

S. N. Ethier

y = x/2. Let bm be the number of paths, with steps (1, 0) and (0, 1), from (0, 0) to (2m, m) that never rise above (but may touch) the line y = (x + 1)/2. Then



 1 3m 3m + 1 1 (12) = am = m 2m + 1 m 3m + 1 and bm =

(13)

 3m + 1 1 . m m+1

To find the P1 -probability of the event {N = 3m + 1}, notice that, for this event to occur, the first 3m coups must result in exactly 2m losses and m wins with the cumulative number of wins never being more than half of the cumulative number of losses, for otherwise absorption would have occurred earlier. Finally, coup 3m + 1 must result in a win. We conclude from (12) that (9) holds for all m ≥ 0. To find the P1 -probability of the event {N = 3m + 2}, notice that, for this event to occur, the first coup must result in a loss, the next 3m coups must result in exactly 2m losses and m wins with the cumulative number of wins never being more than half of the cumulative number of losses (including the first one), for otherwise absorption would have occurred earlier. Finally, coup 3m + 2 must result in a win. We conclude from (13) that (10) holds for all m ≥ 0. Notice also that P1 (N = 3m + 3) = 0 for all m ≥ 0 because of the periodicity of the random walk. The distribution of N for arbitrary j0 can be found recursively using the Markov identity (14) Pj0 (N = n + 1) = pPj0 −2 (N = n) + qPj0 +1 (N = n), or equivalently (15)

Pj0 +1 (N = n) = q −1 Pj0 (N = n + 1) − pq −1 Pj0 −2 (N = n),

for all n ≥ 1, where P0 (N = n) := 0 and P−1 (N = n) := 0. The results for j0 = 1, 2, . . . , 9 are

(16)

P1 (N = 3m + 1) = am pm+1 q 2m P2 (N = 3m + 3) = am+1 pm+2 q 2m+1 P3 (N = 3m + 2) = am+1 pm+2 q 2m P4 (N = 3m + 4) = (am+2 − am+1 )pm+3 q 2m+1 P5 (N = 3m + 3) = (am+2 − 2am+1 )pm+3 q 2m P6 (N = 3m + 5) = (am+3 − 3am+2 )pm+4 q 2m+1 P7 (N = 3m + 4) = (am+3 − 4am+2 + am+1 )pm+4 q 2m P8 (N = 3m + 6) = (am+4 − 5am+3 + 3am+2 )pm+5 q 2m+1 P9 (N = 3m + 5) = (am+4 − 6am+3 + 6am+2 )pm+5 q 2m

for all m ≥ 0,

(17)

P1 (N = 3m + 2) = bm pm+1 q 2m+1 P2 (N = 3m + 1) = bm pm+1 q 2m P3 (N = 3m + 3) = bm+1 pm+2 q 2m+1 P4 (N = 3m + 2) = (bm+1 − bm )pm+2 q 2m P5 (N = 3m + 4) = (bm+2 − 2bm+1 )pm+3 q 2m+1

Absorption Time Distribution for an Asymmetric Random Walk

P6 (N P7 (N P8 (N P9 (N

35

= 3m + 3) = (bm+2 − 3bm+1 )pm+3 q 2m = 3m + 5) = (bm+3 − 4bm+2 + bm+1 )pm+4 q 2m+1 = 3m + 4) = (bm+3 − 5bm+2 + 3bm+1 )pm+4 q 2m = 3m + 6) = (bm+4 − 6bm+3 + 6bm+2 )pm+5 q 2m+1

for all m ≥ 0, and P1 (N P2 (N P3 (N P4 (N P5 (N P6 (N P7 (N P8 (N P9 (N

(18)

= 3m + 3) = 0 = 3m + 2) = 0 = 3m + 4) = 0 = 3m + 3) = 0 = 3m + 5) = 0 = 3m + 4) = 0 = 3m + 6) = 0 = 3m + 5) = 0 = 3m + 7) = 0

for all m ≥ 0. From these special cases we can easily conjecture and prove the general result. Theorem 2. For each m ≥ 0, (19)

(20)

Pj0 (N = 3m + 3j0 /2 − j0 + 2)

 j0 /3 −1  j0 − 1 − 2i am+ j0 /2 −i = (−1)i i i=0 · pm+ j0 /2 +1 q 2m+1−(j0 −2 j0 /2 ) , Pj0 (N = 3m + 3(j0 − 1)/2 − (j0 − 1) + 2)

 j0 /3 −1  i j0 − 1 − 2i bm+ (j0 −1)/2 −i = (−1) i i=0 · pm+ (j0 −1)/2 +1 q 2m+1−{(j0 −1)−2 (j0 −1)/2 } ,

and (21)

Pj0 (N = 3m + 3(j0 − 1)/2 − (j0 − 1) + 3) = 0.

Of course, Pj0 (N ≥ (j0 + 1)/2) = 1. Remark. The theorem can be derived directly from the combinatorics literature without reference to probability. The result needed is a formula of Niederhausen (2002, middle of p. 9)2 ; he attributed the formula to Koroljuk (1955). Proof. The proof of each of the equations, (19)–(21), proceeds by complete induction on j0 using (14), the case j0 = 1 having been already established (not to mention the cases j0 = 2, 3, . . . , 9). To avoid the awkward floor and ceiling functions, one can consider six cases, j0 = 3i0 + 1, j0 = 3i0 + 2, or j0 = 3i0 + 3, each with i0 an even or odd nonnegative integer. The details are straightforward but tedious. 2 His

d should be d − 1.

36

S. N. Ethier

3. Asymptotic tail behavior First we notice that (22)

am 27 3m(3m − 1)(3m − 2) < , = am−1 m(2m)(2m + 1) 4

(23)

bm 27 (3m + 1)(3m)(3m − 1) < , = bm−1 (m + 1)(2m)(2m + 1) 4

and (24)

lim

m→∞

m ≥ 1, m ≥ 1,

am bm 27 . = lim = m→∞ bm−1 am−1 4

Since a0 = b0 = 1, we obtain am < (27/4)m and bm < (27/4)m for each m ≥ 1. More precisely, using Stirling’s formula, √  m 3 −3/2 27 m+1 am = bm ∼ √ m (25) . 3m + 1 4 4 π Since p ∈ [1/3, 1), we have pq 2 ≤ 4/27 with strict inequality if p > 1/3, hence ρ :=

(26)

27 2 pq ≤ 1 4

(ρ < 1 if p > 1/3).

Consequently, (27)

P1 (N = 3m + 1) = am p

m+1 2m

q

√ 3p ∼ √ m−3/2 ρm 4 π

and P1 (N = 3m + 2) = bm pm+1 q 2m+1 ∼

(28)

√ 3 3 pq −3/2 m √ m ρ . 4 π

Now assume that p > 1/3. It follows that, with the convention that empty products are 1, (29)

P1 (N ≥ 3m + 1) = P1 (N = 3m + 1)

∞ 

m+n

n=0 l=m+1 ∞ 

P1 (N = 3l + 1) P1 (N = 3l − 2) m+n

P1 (N = 3l + 2) P (N = 3l − 1) n=0 l=m+1 1 " ∞ ! m+n  al = P1 (N = 3m + 1) (pq 2 )n a l−1 n=0 l=m+1 " ∞ ! m+n  bl + P1 (N = 3m + 2) (pq 2 )n b l−1 n=0 + P1 (N = 3m + 2)

l=m+1

∼ P1 (N = 3m + 1)

∞ 

ρn + P1 (N = 3m + 2)

n=0

where C1,1 (30)

∼ C1,1 m−3/2 ρm , √ √ := 3 p(1 + 3q)/[4 π(1 − ρ)]. Similarly, P1 (N ≥ 3m + 2) ∼ C1,2 m−3/2 ρm ,

∞  n=0

ρn

Absorption Time Distribution for an Asymmetric Random Walk

where C1,2 :=



37

√ 3 p(ρ + 3q)/[4 π(1 − ρ)], and

P1 (N ≥ 3m + 3) = P1 (N ≥ 3m + 4) ∼ C1,3 m−3/2 ρm ,

(31)

where C1,3 := ρC1,1 . We claim that the same type of asymptotic decay holds for the tail of the distribution of N for arbitrary j0 . Only the multiplicative constants differ. Theorem 3. Assume that p > 1/3. As m → ∞, (32)

Pj0 (N ≥ 3m + 1) ∼ Dj0 ,1 m−3/2 ρm ,

(33)

Pj0 (N ≥ 3m + 2) ∼ Dj0 ,2 m−3/2 ρm ,

(34)

Pj0 (N ≥ 3m + 3) ∼ Dj0 ,3 m−3/2 ρm ,

for suitable constants Dj0 ,1 , Dj0 ,2 , Dj0 ,3 to be defined below. Proof. Using Theorem 2 and arguing as above, we obtain (35)

Pj0 (N = 3m + 3j0 /2 − j0 + 2) ∼ Aj0 m−3/2 ρm ,

(36)

Pj0 (N = 3m + 3(j0 − 1)/2 − (j0 − 1) + 2) ∼ Bj0 m−3/2 ρm ,

where (37)

(38)

√ 3 Aj0 := √ 4 π √ 3 3 Bj0 := √ 4 π

j0 /3 −1

 i=0

(−1)i

 j0 /2 −i  27 j0 − 1 − 2i i 4

· p j0 /2 +1 q 1−(j0 −2 j0 /2 ) ,

 (j0 −1)/2 −i  j0 /3 −1  27 j0 − 1 − 2i (−1)i i 4 i=0 · p (j0 −1)/2 +1 q 1−{(j0 −1)−2 (j0 −1)/2 } .

It follows that (39)

Pj0 (N ≥ 3m + (j0 + 1)/2) ∼ Cj0 ,1 m−3/2 ρm ,

(40)

Pj0 (N ≥ 3m + (j0 + 1)/2 + 1) ∼ Cj0 ,2 m−3/2 ρm ,

(41)

Pj0 (N ≥ 3m + (j0 + 1)/2 + 2) ∼ Cj0 ,3 m−3/2 ρm ,

where (42) (43) (44)

Cj0 ,1 := (1 − ρ)−1 (Aj0 + Bj0 ), (1 − ρ)−1 (ρAj0 + Bj0 ) if j0 is odd, Cj0 ,2 := (1 − ρ)−1 (Aj0 + ρBj0 ) if j0 is even, (1 − ρ)−1 (ρAj0 + ρBj0 ) if j0 is odd, Cj0 ,3 := (1 − ρ)−1 (Aj0 + ρBj0 ) if j0 is even,

Finally, if we define (45)

Dj0 ,1 := Cj0 ,1 ,

Dj0 ,2 := Cj0 ,2 ,

Dj0 ,3 := Cj0 ,3

for j0 = 1, 2, (46)

Dj0 ,1 := ρ−1 Cj0 ,3 ,

Dj0 ,2 := Cj0 ,1 ,

Dj0 ,3 := Cj0 ,2

38

S. N. Ethier

for j0 = 3, 4, (47)

Dj0 ,1 := ρ−1 Cj0 ,2 ,

for j0 = 5, 6, and (48)

Dj0 ,2 := ρ−1 Cj0 ,3 ,

Dj0 ,3 := Cj0 ,1

Dj0 ,i := ρ− (j0 −1)/6 Dj0 −6 (j0 −1)/6 ,i

for all j0 ≥ 7 and i = 1, 2, 3, then (32)–(34) hold. It will be convenient to restate Theorem 3 in a condensed form. Let us define ⎧ ⎨ 33/2 Dj0 ,1 if n ≡ 0 (mod 3), Dj0 (n) := 33/2 ρ−1/3 Dj0 ,2 if n ≡ 1 (mod 3), (49) ⎩ 3/2 −2/3 3 ρ Dj0 ,3 if n ≡ 2 (mod 3), so that {Dj0 (n)}n≥0 is a sequence that repeatedly cycles through three specific constants. Corollary 4. Assume that p > 1/3. As n → ∞, (50)

Pj0 (N ≥ n + 1) ∼ Dj0 (n)n−3/2 (ρ1/3 )n .

Proof. By Theorem 3, (51)

Pj0 (N ≥ n + 1) ⎧ ⎨ Dj0 ,1 (n/3)−3/2 ρn/3 ∼ Dj0 ,2 ((n − 1)/3)−3/2 ρ(n−1)/3 ⎩ Dj0 ,3 ((n − 2)/3)−3/2 ρ(n−2)/3

if n ≡ 0 (mod 3), if n ≡ 1 (mod 3), if n ≡ 2 (mod 3),

and the conclusion follows from (49). 4. Application to gambling Recall the Labouchere system as explained in Section 1. Let ξ1 , ξ2 , . . . be i.i.d. with common distribution (52)

P (ξ1 = 1) = p

and P (ξ1 = −1) = q,

with ξn representing the profit per unit bet at the nth coup. Let Bn be the amount bet at the nth coup, let Fn be the gambler’s cumulative profit after the nth coup, and let Sn be the sum of the terms on the gambler’s list after the nth coup. We have already noted that Fn + Sn does not depend on n. Indeed, if 1 ≤ n ≤ N , then (53)

Fn − Fn−1 = Bn ξn

and Sn − Sn−1 = −Bn ξn ,

implying that Fn + Sn = Fn−1 + Sn−1 . Since F0 = 0, it follows that (54)

Fn = S0 − Sn ,

1 ≤ n ≤ N.

In particular, since p ≥ 1/3 by assumption, we have N < ∞ a.s., hence SN = 0 a.s., that is, P (FN = S0 ) = 1. (55) This equation says that, with probability 1, the gambler wins an amount equal to the sum of the terms on the initial list. Under our unrealistic assumptions that the

Absorption Time Distribution for an Asymmetric Random Walk

39

gambler has unlimited resources and that there is no maximum betting limit, the Labouchere system is an infallible one. Let us assume for now that 1/3 ≤ p ≤ 1/2, so that the game is fair or subfair. It follows that {Fn∧N }n≥0 is a supermartingale. By (55), it is not the case that E[FN ] ≤ 0 = E[F0 ], that is, the conclusion of the optional stopping theorem fails. More specifically, {Fn∧N }n≥0 must fail to be uniformly integrable. But (56)

|Fn∧N | ≤

N 

|Fl − Fl−1 | =

l=1

and in fact (57)

N 

Bl ,

l=1

|Fn∧N | ≤ max |Fl | ≤ 2S0 + max (−Fl ), 0≤l≤N

0≤l≤N

= 2Fl+ + (Fl− − Fl+ ) ≤ 2S0 + (−Fl ). where the last inequality uses |Fl | = Thus, the quantities on the right sides of (56) and (57) must fail to be integrable. In particular, the total amount bet by the Labouchere bettor has infinite expectation, as does his maximum deficit. This surprising result is due to Grimmett and Stirzaker (2001, Problem 12.9.15 and Solution). They also raised the question of whether the maximum bet size has infinite expectation as well, but this question is currently unresolved. We return to the original assumption that 1/3 ≤ p < 1. Fl+

+ Fl−

Theorem 5. If p = 1/2, then (58)

−E[Fn | N ≥ n + 1] = S0 {Pj0 (N ≥ n + 1)−1 − 1}

for all n ≥ 1. If 1/3 ≤ p < 1/2, then (58) holds with the = sign replaced by ≥. If 1/2 < p < 1, then (58) holds with the = sign replaced by ≤. In any case in which p > 1/3, as n → ∞, (59)

S0 {Pj0 (N ≥ n + 1)−1 − 1} ∼ S0 Dj0 (n)−1 n3/2 (ρ−1/3 )n ,

where Dj0 (n) is as in (49). Remark. In words, the Labouchere bettor’s conditional expected deficit after n coups at a fair game (p = 1/2), given that the system has not yet been completed, grows like a constant times n3/2 (ρ−1/3 )n , where ρ−1/3 = 25/3 /3 ≈ 1.058267. This geometric rate may be smaller than expected, but the factor n3/2 should not be overlooked. Indeed, it dominates the factor (25/3 /3)n for 2 ≤ n ≤ 128. The right side of (58), as well as the multiplicative constant in (59), depends on the initial list only through the sum (S0 ) and number (j0 ) of its terms. Proof. If p = 1/2, then {Fn∧N }n≥0 is a martingale, and therefore (60) hence (61)

0 = E[F0 ] = E[Fn∧N ] = E[FN 1{N ≤n} ] + E[Fn 1{N ≥n+1} ], −E[Fn 1{N ≥n+1} ] = S0 {1 − Pj0 (N ≥ n + 1)},

and the first conclusion follows from this. If p < 1/2 (resp., p > 1/2), then {Fn∧N }n≥0 is a supermartingale (resp., submartingale), and the second = sign in (60) is replaced by ≥ (resp., ≤). The asymptotic result is a consequence of Corollary 4.

40

S. N. Ethier

For example, assume an initial list of 1, 2, 3, 4. Given that the system is still incomplete after 128 coups, the Labouchere bettor’s conditional expected deficit is 461,933.96 units if p = 1/2. It is at least 142,204.88 units if p = 18/37. These figures were calculated from (58) using Theorem 2. If we apply the asymptotic formula (59), the corresponding figures are 360,566.66 and 108,272.40, respectively. The reason for the larger numbers in the fair case than in the subfair one is that we are conditioning on a less likely event. References [1] Aeppli, A. (1924). Zur Theorie verketteter Wahrscheinlichkeiten. Ph.D. thesis, University of Zurich. ´ (1887). G´en´eralisation du probl`eme r´esolu par M. J. Bertrand. [2] Barbier, E. Comptes Rendus des S´eances de l’Acad´emie des Sciences, Paris 105 407. [3] Bertrand, J. (1887). Solution d’un probl`eme. Comptes Rendus des S´eances de l’Acad´emie des Sciences, Paris 105 369. [4] Downton, F. (1980). A note on Labouchere sequences. Journal of the Royal Statistical Society, Series A 143 363–366. [5] Grimmett, G. R. and Stirzaker, D. R. (2001). One Thousand Exercises in Probability. Oxford University Press, Oxford. [6] Koroljuk, V. S. (1955). On the discrepancy of empiric distributions for the case of two independent samples. Izvestiya Acad. Nauk SSSR. Ser. Mat. 19 81–96. Translated in IMS & AMS 4 105–122, 1963. [7] Niederhausen, H. (2002). Catalan traffic at the beach. The Electronic Journal of Combinatorics 9 #R33. ´cs, L. (1962). On the ballot theorems. In Advances in Combinatorial Meth[8] Taka ods and Applications to Probability and Statistics (N. Balakrishnan, ed.) 97–114. Birkh¨ auser, Boston. [9] Thorold, A. L. (1913). The Life of Henry Labouchere. G. P. Putnam’s Sons, New York.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 41–61 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000291

Fractional Stability of Diffusion Approximation for Random Differential Equations Yuriy V. Kolomiets1,∗ Kent State University and Institute for Applied Mathematics and Mechanics NAS of Ukraine Abstract: We consider the systems of random differential equations. The coefficients of the equations depend on a small parameter. The first equation, “slow” component, Ordinary Differential Equation (ODE), has unbounded highly oscillating in space variable coefficients and random perturbations, which are described by the second equation, “fast” component, Stochastic Differential Equation (SDE) with periodic coefficients. Sufficient conditions for weak convergence as small parameter goes to zero of the solutions of the “slow” components to the certain stochastic process are given.

1. Introduction In the paper, we consider systems of random equations with a small parameter ε. The first equation, the “slow” component, is an Ordinary Differential Equations (ODE) with unbounded highly oscillating coefficients which depend on the Markov diffusion processes with periodic coefficients, which are the “fast” component of the systems. We will study the weak convergence of probability measures, induced by the solutions of the “slow” equations to the diffusion process. It is well known that, in the case of the Diffusion Approximation (DA), a drift coefficient of the approximating Stochastic Differential Equation (SDE), includes a derivative with respect to a space variable of the unbounded coefficients of the approximated random differential equation (see Ch. 2.2). That means, we cannot apply the DA results because of the highly oscillating character of dependency on the ε of the unbounded coefficient of the “slow” component. On the other hand, we cannot apply the limit theorem for SDEs because the “slow” component is an ODE, and consequently has no nonzero diffusion coefficient (the presence of strongly positive diffusion coefficient is a necessary condition for such kind of theorems). The method is a combination of the results of these two directions. We choose the order of oscillation (parameter δ) in such a way that the conditions, from the DA‘s theorem ((A) and (AB)), allow us to get the nonnegative “candidate” to be the diffusion coefficient, and then to use the second part of the conditions (from the Limit Theorem for SDE: (B) and (C)) to obtain the limit process (see Chapters 2.2, 2.4). The aim of this paper is to find the answers to the following questions: ∗ The author is grateful to Dr. V.V. Andrievskii, Dr. J. Diestel, and Dr. R.S. Varga for useful remarks and hospitality. 1 Department of Mathematical Sciences, P.O. Box 5190, Kent, OH, 44242-0001, and IAMM, 74 R. Luxemburgh St., Donetsk 83114, Ukraine, e-mail: [email protected] AMS 2000 subject classifications: Primary 60F17; secondary 60F99 Keywords and phrases: random dynamical systems, diffusion approximation, oscillation, fractional stability

41

42

Yuriy V. Kolomiets

1. Is it possible to extend DAs results to be true in the presence of high oscillation in space variable of the coefficients of the random processes? If the answer is yes, what kind of the conditions need to be added to usual conditions of DA? 2. How does the presence of oscillation in the coefficients of the random processes influence the order of convergence in DA? What is the precise order of convergence in DA and how does it depend on the order of oscillation? What is the critical case? For the second question, we get results that depend on an apparently critical number (order of oscillation), equals to 1/2 (Theorem 3.1.1 below). Asymptotic behavior of the solutions of the unperturbed stochastic equations with unbounded drift seems to be considered for the first time in the papers [1], [10], and [13]. For SDEs, with coefficients depending on a small parameter by irregular way without random perturbations, necessary and sufficient conditions of the weak convergence of solutions in more general situations are obtained in [11]. A different approach to the investigation of weak convergence of one-dimensional Markov processes was demonstrated in [3]. The Averaging Principle for SDEs with random perturbations and highly oscillating coefficients was considered in [7]. In that paper, the first component of the system is an SDE with a strongly positive diffusion coefficient. The asymptotic behavior of the first components, on the time intervals of the order O(ε−1 ), was studied. Sufficient conditions for weak convergence of the measures, induced by the first components, were stated and the apparently critical number (order of oscillation), equals to 1/3, was obtained. In the present work, the first component is a random ODE and does not contain a nonzero diffusion coefficient. This makes the investigation of the limit behavior of the first component more difficult but, instead, we consider the DAs scheme (on the time intervals of the order O(ε−2 )). The result here is Fractional Stability of the DA (note that the DA is a result of the type of the functional Central Limit Theorem). The study of the DA was initiated by Khasminskii R. Z. [6], and developed by many authors (see, e.g. monographs [2],[15] and bibliography, and [12]). We note that the case, when the coefficients of the first equations have no high oscillations with respect to space variable, the problems of the weak convergence of solutions under the various conditions on the coefficients and random perturbations have been studied. A generalization of Khasminskii’s result [6], for a mean-zero fluctuation, stationary field, was considered in [8]. 2. Conditions and preliminary results Let (Ω, F, P ) denote some probability space with filtration Ft , t ∈ [0, T ]. Let En be a n-dimensional Euclidean space, E+ = [0, +∞), symbol E denotes the mathematical expectation, f˙(x) be a derivative of the function f (x), and ∇y is the symbol of the gradient with respect to y ∈ En . We denote different positive constants by C, with indexes if need. Let us consider a system of random equations, t ∈ [0, T ], (2.1)





  ξε (s) 1 t ξε (s) g , η (s) ds + c , η (s) ds ε ε εδ εδ 0 εδ 0 

t ξε (s) + m , ηε (s) ds, εδ 0

ξε (t) = ξ0 +

1 ε 



t

Fractional Stability of Diffusion Approximation for Random Differential Equations

(2.2)

1 ηε (t) = η0 + 2 ε

 0

t

1 b (ηε (s)) ds + ε



43

t

σ (ηε (s)) dw1 (s). 0

Here {w1 (t), Ft } is n-dimensional standard Wiener process. The processes ξε (t) ∈ E1 ,ηε (s) ∈ En , the constants ξ0 , η0 are non random; g(x, y), c(x, y), and m(x, y) are functions from E1 ×En in E1 ; b(y), σ(y) are the functions from En in En and L(En ), respectively; ε > 0 is a small parameter, and δ is a fixed number from ]0, 12 [. If the equation (2.2) has a unique (in sense of law) solution, then the distribution ηε (tε2 ) coincides with the distribution of process η(t), the solution of the Ito stochastic equation: dη(t) = b(η(t))dt + σ(η(t))dw(t) and does not depend on ε. k,l Let us denote, by Cx,y (E1 , En ), the class of the functions f (x, y) which are k and l times continuously differentiable with respect to x ∈ E1 , and y ∈ En respectively. k,l The symbol “b” in the notation of this class (Cx,y,b (E1 , En )) indicates that these functions and their derivatives, of the stipulated order with respect to x ∈ E1 , are bounded. Let aij (y) be the components of the n × n matrix a(y) = σ(y)σ  (y), and let bi (y) be the components of the vector b(y). 2.1. Conditions A and AB We next introduce condition (A). Condition (A) A1. The functions aij (y), bi (y) ∈ Cy2 (En ), and are periodic of period 1; A2. There exists a constant λ0 > 0 such that for every y, ζ ∈ En aij (y)ζi ζj ≥ λ0 |ζ|2 ; 2,2 (E1 , En ) are periodic of A3. The functions, g(x, y), c(x, y), and m(x, y) ∈ Cx,y,b period 1 in y.

Remark 2.1. Under the condition (A) the system (2.1), (2.2) has a unique strong solution. Let us denote by L∗ the operator which is formally conjugate to the generating operator L of ηt : n n  1  ∂2 ∂ L = aij (y) + bi (y) . 2 i,j=1 ∂yi ∂yj ∂y i i=1

We shall denote by Y the unit torus in En . As it is well known (see, for example, [1]), the next problem  L∗ p(y) = 0, p(y)dy = 1, Y

has the unique positive periodic,  of period 1, solution p(y), and for a periodic of period 1 function h(y) so that Y h(y)p(y)dy = 0 , the next problem  Ld(y) = h(y), (2.3) d(y)dy = 0 Y

has the unique periodic of period 1 solution d(y) ∈ C 2 (En ). Lemma 4.1. ([14]) gives the estimation of the solution of the problem (2.3) by the right hand side h(y).

44

Yuriy V. Kolomiets

The estimations of the solution of the problem  (2.4)

d(x, y)dy = 0

Ld(x, y) = h(x, y), Y

and its first two derivatives with respect to the parameter x easily can be derived form Corollary 4.2. 2,2 If Y h(x, y)p(y)dy = 0 and if h(x, y) belongs to the class Cx,y,b (E1 , En ), and is periodic of period 1 with respect to y ∈ En , then, by Corollary 4.2, the solution of 2,2 (2.4), d(x, y), is an element of Cx,y,b (E1 , En ). We are going to introduce the “balance condition.” Condition (AB) For every x ∈ E1 ,  g(x, y)p(y)dy = 0. g¯(x) = g(x, · ) := Y

Let us consider the Poisson problem  (2.5)

Lψ (x, y) = −g (x, y) ,

ψ (x, y) dy = 0. Y

Remark 2.2. Under the Conditions (A) and (AB), problem (2.5) has the unique 2,2 (E1 , En ). solution ψ (x, y) ∈ Cx,y,b Let us set (2.6) (2.7)

2

α (x, y) := (σ (y) ∇y ψ (x, y)) , ∂ ψ (x, y) , β (x, y) := c (x, y) + g (x, y) ∂x

and pε (x, y) := p( εxδ , y), and lε (x) := l( εxδ ) . Let us consider 2.2. Model example Let us fix n = 1 and consider the system (2.1), (2.2) under previous assumptions, t ∈ [0, T ].

  1 t ξε (s) ξε (t) = ξ0 + cos(2πηε (s))ds, D ε 0 εδ 1 ηε (t) = η0 + w1 (t). ε We will investigate the limit behavior of ξε (t) as  goes to 0. d2 Let the Condition (A) is satisfied. Then, a(y) = 1, L = 12 dy 2 , and the problem L∗ p(y) = 0,



1

p(y)dy = 1 0

has the unique solution p(y) = 1. 1 Condition (AB) gives 0 cos(2πy)dy = 0.

Fractional Stability of Diffusion Approximation for Random Differential Equations

In this case ψ(x, y) = D(x)Ψ1 (y), where Ψ1 (y) = of the problem 

cos(2πy) 2π 2

45

is unique solution

1

LΨ1 (y) = − cos(2πy),

Ψ1 (y)dy = 0. 0

cos(2πy) So, ψ(x, y) = D(x) 2π . 2 Then for the process

 θε (ξε (t), ηε (t)) := ξε (t) + εψ

ξε (t) , η (t) , ε εδ

using Ito’s formula, we get



  t 1 ˙ ξε (s) D ξε (s) cos2 (2πηε (s))ds D 2π 2 εδ 0 εδ εδ

 t  ξε (s) 1 − D sin(2πηε (s))dw1 (s). π 0 εδ

θε (ξε (t), ηε (t)) = θε (ξ0 , η0 ) +

We will use the notation (2.6) and (2.7). Then   D2 x/εδ αε (x) = , (2π)3

    D x/εδ D˙ x/εδ β ε (x) = . 4π 3

and

The process ξε (t) is asymptotically “close” to the process 1 θˆε (ξε (t), ηε (t)) = θε (ξ0 , η0 ) + δ ε



t

β ε (x) + 0

 t#

αε (x)dw1 (s)

0

(the Lemma 4.6). So, using the conditions of classical DA, we can not pass to the limit (the coefficients of the process θˆε (ξε (t), ηε (t)) still depend on a small parameter  by irregular way). It is obviously, to answer the question posed in the beginning of the example we need some additional conditions. We will return to this example later on after introducing needed conditions . 2.3. Conditions B and C We are taking in a mind the process θˆε (ξε (t), ηε (t)) from the previous example. The next conditions are natural for investigation of the limit behavior of the SDE (compare for example with [11]) Let us introduce the next condition. Condition (B) B1. There exists a constant λ1 > 0 such that for every x ∈ E1 , α(x) ≥ λ1 ; B2. There exists a constant λ2 > 0 such that for every x ∈ E1 ,  x β(z) dz ≤ λ2 . 0 α(z)

46

Yuriy V. Kolomiets

Let us set  F (x) := exp −2

x

0

β(z) dz , α(z)



x

and h(x) :=

F (z)dz. 0

Condition (C) There exist the constants κ0 , κ1 , and κ2 such that for z ∈ E1 , z C0. lim|z|→∞ z1 0 F (x)dx = κ0 ; z 1 C1. lim|z|→∞ z1 0 α(x)F (x) dx = κ1 ;  z m(x) 1 C2. lim|z|→∞ z 0 α(x) dx = κ2 . Remark 2.3. It follows from Lemma 4.3 and our conditions above that there exist positive constants C1 , C2 such that 0 < C1 < κ0 < C2 ;

0 < C1 < κ1 < C2 ;

|κ2 | < C2 .

2.4. Model example (continuation) The Condition (B) gives: There exist the constants λ1 > 0 and λ2 > 0 such that for every x ∈ E1 D2 (x) ≥ λ1

and D(x) ≤ λ2 D(0).

D(0) The function F (x) has the form F (x) = D(x) . The Condition (C) implies: There exist the constant κ0 such that for z ∈ E1 ,  1 z dx κ0 lim = ; D(0) |z|→∞ z 0 D(x)

Remark 2.4.1. Under these conditions the constant from the Condition (C2) is (2π)3 defined by κ1 = κ0 D 2 (0) . Using the Theorem 3.1.1., for the process ξε (t) we got the limit process ξ(t) = ξ0 +

D(0) 3

κ0 (2π) 2

w(t).

Let us introduce (2.8)

f (x) :=

1 h(x) − x, κ0

x ∈ E1 ,

Obviously, (2.9)

1 Lx f (x) = β(x)f˙(x) + α(x)f¨(x) = −β(x). 2

Let β 0 (x, y) denotes the function such that (2.10)

1 Lx f (x) = β(x, y)f˙(x) + α(x, y)f¨(x) = β 0 (x, y). 2

Remark 2.4. From (2.9) and (2.10), we have

β 0 (x) = −β(x).

Fractional Stability of Diffusion Approximation for Random Differential Equations

47

3. Main result In this chapter we formulate and give the proof of our main theorem. 3.1. Fractional stability of diffusion approximation Let (C[0, T ], C), be a space of all continuous functions on [0, T ] with the family of σ-algebras C = {Ct }0≤t≤T . The space C0∞ (E1 ) is the space of all infinitely differentiable functions with compact support on E1 . For a fixed number δ ∈ ]0, 12 [, let us denote by {μεδ , ε > 0} the family of probability measures induced by the random processes {ξε (·), ε > 0} on C([0, T ]) and by “⇒”, the sign for the weak convergence of measures. We will prove the weak convergence μεδ ⇒ μ, as ε tends to 0 for each δ, where μ is the measure corresponding to the random process ξ(t) = ξ0 + β0 t + σ0 w(t). Here, ξ0 is the initial condition from (2.1), β0 and σ0 are the certain constant coefficients, and w(t), t ∈ [0, T ], is the standard one-dimensional Wiener process. Theorem 3.1.1. Let conditions (A), (AB), (B), and (C) be fulfilled. Then, for every δ ∈ ]0, 12 [ the measures μεδ ⇒ μ as ε tends to 0 . The random process ξ(t), which corresponds to μ, is defined by ξ(t) = ξ0 + β0 t + σ0 w(t), where β0 =

κ2 , κ0 κ1

σ0 = √

1 . κ0 κ1

Remark 3.1.1. The case δ = 0 corresponds to the classical DAs scheme. The sufficient conditions of the weak convergence of με0 as ε tends to 0 can be simplified in this case. 3.2. Proof of Theorem 3.1.1 First (I), we will prove that, for each δ ∈ ]0, 12 [, the family of measures {μεδ , ε > 0}, is weakly compact on C[0, T ], and, second (II), we are going to the limit as ε tends to 0, giving the possibility to the coefficients of equation (2.1) obtain the averaging form with respect to random perturbations (at that time for new processes, which is “close” to initial one we will have the equations with positive diffusion coefficients: conditions (A), (AB), (B)) and after that we will use the condition (C) for identification of the limit process.

 I. Now we fix arbitrary δ ∈ ]0, 12 [. By Ito’s formula applied to εψ ξεε(t) δ , ηε (t) ((2.5) and Remark 2.2.), for ζε (t) := ζε (ξε (t), ηε (t)) = ξε (t) + εψε (ξε (t), ηε (t)) , using the notation (2.6) and (2.7), we obtain ζε (ξε (t), ηε (t)) − ζε (ξ0 , η0 )

48

Yuriy V. Kolomiets



t

∂ ψε (ξε (s), ηε (s)) cε (ξε (s), ηε (s))ds ∂x 0  t  t ∂ 1−δ +ε ψε (ξε (s), ηε (s)) mε (ξε (s), ηε (s))ds + mε (ξε (s), ηε (s))ds 0 ∂x 0  t  t# 1 βε (ξε (s), ηε (s)) ds + αε (ξε (s), ηε (s))dw1 (s). + δ ε 0 0

1−2δ

=ε (3.1)

We 

willuse the notations (2.9) and (2.10). By Ito’s formula applied to the process

((2.8)) and setting B(x, y) := β(x, y) + β 0 (x, y), for ζε (t) + εδ f ζεε(t) ε f ζεε(t) δ δ we obtain δ

ζε (ξε (t), η(t)) + εδ fε (ζε (t)) = ζε (ξ0 , η0 ) + εδ fε (ζε (0))  t  1 1 t + Fε (ζε (s))mε (ξε (s), ηε (s))ds + δ Bε (ξε (s), ηε (s))ds κ0 0 ε 0 

 1 t + δ βε (ξε (s), ηε (s)) f˙ε (ζε (s)) − f˙ε (ξε (s)) ε 0



1 + αε (ξε (s), ηε (s)) f¨ε (ζε (s)) − f¨ε (ξε (s)) ds 2  ε1−2δ t ∂ + Fε (ζε (s)) ψε (ξε (s), ηε (s))cε (ξε (s), ηε (s))ds κ0 ∂x 0  ε1−δ t ∂ + Fε (ζε (s)) ψε (ξε (s), ηε (s))mε (ξε (s), ηε (s))ds κ0 0 ∂x  t # 1 + Fε (ζε (s)) αε (ξε (s), ηε (s))dw1 (s). κ0 0 ¯ Taking into account the equality B(x) = 0 (Remark 2.4.), let us consider a function n(x, y), the periodic function of period 1 with respect to y, the unique solution of  n(x, y)dy = 0, Ln(x, y) = −B(x, y), Y

for every parameter x ∈ E1 . Applying Ito’s formula to the function ε2−δ nε (ξε (t), ηε (t)) for λε (t) := λε (ξε (t), ηε (t)) = ζε (t) + εδ fε (ζε (t)) + ε2−δ nε (ξε (t), ηε (t)) we get  t 1 Fε (ζε (s))mε (ξε (s), ηε (s))ds λε (ξε (t), ηε (t)) = λε (ξ0 , η0 ) + κ0 0  t 1 + (3.2) Fε (ζε (s))σ(ηε (s))∇y ψε (ξε (s), ηε (s))dw1 (s) κ0 0  t 4  t  ε + Ai (ξε , ηε , s)ds + Aε5 (ξε , ηε , s)dw1 (s), i=0

where Aε0 (ξε , ηε , t) =

0

0

 1 ˙ ˙ β (ξ (t), η (t)) f (ζ (t)) − f (ξ (t)) ε ε ε ε ε ε ε εδ

Fractional Stability of Diffusion Approximation for Random Differential Equations

+

49



1 αε (ξε (t), ηε (t)) f¨ε (ζε (t)) − f¨ε (ξε (t)) ; 2



1 ∂ Fε (ζε (t)) ψε (ξε (t), ηε (t))cε (ξε (t), ηε (t)) κ0 ∂x

∂ nε (ξε (t), ηε (t))gε (ξε (t), ηε (t)) ; + ∂x ∂ Aε2 (ξε , ηε , t) = ε2(1−δ) nε (ξε (t), ηε (t))mε (ξε (t), ηε (t)); ∂x ε1−δ ∂ Aε3 (ξε , ηε , t) = Fε (ζε (t)) ψε (ξε (t), ηε (t))mε (ξε (t), ηε (t)); κ0 ∂x ε 2−3δ ∂ nε (ξε (t), ηε (t))cε (ξε (t), ηε (t)); A4 (ξε , ηε , t) = ε ∂x Aε5 (ξε , ηε , t) = ε1−δ σ(ηε (t))∇y nε (ξε (t), ηε (t)). Aε1 (ξε , ηε , t)



1−2δ

Now, for every t ∈ [0, T ], our conditions and estimations of Lemma 4.3, taking into account the statement of Lemma 4.5, imply the existence of the constant C(T ) =: C so that 5  i=1

E sup |Aεi (ξε , ηε , t)| ≤ ε1−2δ C. t∈[0,T ]

Also the integrands of the first two integrals in (3.2) are bounded by the constant C(T ) := C. Hence, for any fixed ε0 so that 0 < ε ≤ ε0  1 E sup |Fε (ζε (s))mε (ξε (s), ηε (s))| κ 0 s∈[0,T ]

5  # 1 + |Fε (ζε (s)) αε (ξε (s), ηε (s))|+ |Aεi (ξε , ηε , s)| ≤ C(1 + Cε ), κ0 i=0 where limε→0 Cε = 0 . From this, by standard arguments ([9]), we obtain that there exists a constant Cε0 (η0 ) such that for every 0 < ε < ε0 , E sup |λε (t)|2 ≤ Cε0 (η0 )(1 + |ξ0 |2 ), t∈[0,T ]

and for every s, t : 0 ≤ s ≤ t ≤ T E|λε (t) − λε (s)|4 ≤ Cε0 (η0 )|t − s|2 . Using (3.2) and Lemmas 4.6, 4.9, we can check the conditions of weak compactness ([4], Lemma 2, p.355) for the family of measures {μεδ , 0 < ε < ε0 }. Thus the set of measures, corresponding to the processes ξε (t) = λε (ξε (t), ηε (t)) − εψε (ξε (t), ηε (t)) − εδ fε (ζε (t)) − ε2−δ nε (ξε (t), ηε (t)) on C[0, T ], is weakly compact. II. We begin our considerations with the relationship (3.2). Let φ(x) ∈ C0∞ (E1 ), Φs (x) be a continuous bounded Cs -measurable functional. Applying Ito’s formula to φ(λε (t)), we obtain EΦr (ξε )[φ(λε (t)) − φ(λε (r))]

50

Yuriy V. Kolomiets

 t 

4  ˙ ε (s)) 1 Fε (ζε (s))mε (ξε (s), ηε (s)) + = EΦr (ξε ) Aεi (ξε , ηε , s) ds φ(λ κ0 r i=0 $  

2  t 1 ¨ ε (s)) 1 Fε (ζε (s)) αε ξε (s), ηε (s) + Aε (ξε , ηε , s) ds . (3.3) φ(λ + 5 2 r κ0 Now, we denote 

t

˙ ε (s)) φ(λ

Dε (r, t) = r



×

4 



t

¨ ε (s)) φ(λ

Aεi (ξε , ηε , s)ds + r

i=0

# 1 1 Fε (ζε (s)) αε (ξε (s), ηε (s))Aε5 (ξε , ηε , s) + (Aε5 (ξε , ηε , s))2 ds. κ0 2

Applying Lemma 4.10 to k(x, y) = m(x, y) − m(x, · ), H(x) = F (x), and P (x) = ˙ φ(x), we arrive at 

t

˙ ε (s))Fε (ζε (s))mε (ξε (s), ηε (s))ds φ(λ  t

˙ = EΦr (ξε ) φ(λε (s))Fε (ζε (s))mε (ξε (s))ds + G(ε, mε − mε , r, t) .

EΦr (ξε )

r

r

In a similar way, applying Lemma 4.10 to 2

2

k(x, y) = (σ (y) ∇y ψ (x, y)) − (σ ( · ) ∇· ψ (x, · )) , ¨ Hε (x) = Fε2 (x), and P (x) = φ(x), we obtain 

t

¨ ε (s))F 2 (ζε (s))αε (ξε (s), ηε (s))ds φ(λ ε r  t

¨ ε (s))F 2 (ζε (s))αε (ξε (s))ds + G(ε, αε − αε , r, t) . = EΦr (ξε ) φ(λ

EΦr (ξε )

ε

r

Rewriting (3.3), we arrive at ! (3.4)



EΦr (ξε ) φ(ξε (t)) − φ(ξε (r)) − r

t

˙ ε (s))β0 + 1 φ(ξ ¨ ε (s))σ 2 }ds {φ(ξ 0 2

"

= Iε0 + Iε1 + Iε2 + Iε3 + Iε4 + Iε5 , where  Iε0 = EΦr (ξε )

1 1 G(ε, mε − mε  , r, t) + 2 G(ε, αε − αε  , r, t) + Dε (r, t) ; κ0 2α0

Iε1 = EΦr (ξε )(φ(ξε (t)) − φ(λε (t)) − φ(ξε (r)) + φ(λε (r)));  t !

  " ˙ ε (s)) β0 + 1 φ(λ ¨ ε (s)) σ 2 ds; ¨ ε (s)) − φ(ξ ˙ ε (s)) − φ(ξ Iε2 = EΦr (ξε ) φ(λ 0 2 r  t! ˙ ε (s)) 1 Fε (ζε (s)) (mε (ξε (s), ·) − mε (ζε (s), ·)) Iε3 = EΦr (ξε ) φ(λ κ0 r " 1¨ −2 2 + φ(λε (s))κ0 Fε (ζε (s)) (αε (ξε (s), ·) − αε (ζε (s), ·)) ds; 2

Fractional Stability of Diffusion Approximation for Random Differential Equations

 Iε4

t

= EΦr (ξε ) r

1 Iε5 = EΦr (ξε ) 2



51

! " 1 ˙ Fε (ζε (s))mε (ζε (s), · ) − β0 ds; φ(λε (s)) κ0 t

¨ ε (s))[κ−2 F 2 (ζε (s))αε (ζε (s), · ) − σ 2 ]ds. φ(λ ε 0 0

r

We shall prove that the limit of the right hand side of (3.4), as ε → 0, is equal to zero. For small ε, we can estimate Dε (r, t), using Condition (A1) and Lemma 4.3, as E sup |Dε (t, r)| ≤ ε1−2δ (1 + Cε )CT , t∈[0,T ]

where limε→0 Cε = 0. From this inequality and Lemma 4.10 we have lim Iε0 = 0.

ε→0

According to Lemma 4.6 and Lemma 4.9, we get lim Iε1 = lim Iε2 = 0.

ε→0

ε→0

Using Lemma 4.3 and Lemma 4.12, we obtain lim Iε3 = 0.

ε→0

Lemma 4.15 implies lim Iε4 = lim Iε5 = 0.

5

ε→0

ε→0

i i=1 Iε

= 0. Consequently, limε→0 Let μδ denotes some limit point of the family {μεδ , 0 < ε < ε0 } and E μδ be an expectation on this measure. Let us come to the limit in (3.4) by a subsequence {εk } such that μεδk ⇒ μδ as εk → 0. We get

  t 1¨ 2 ˙ φ(ξ(s))σ }ds = 0. {φ(ξ(s))β + E μδ Φr (ξ) φ(ξ(t)) − φ(ξ(r)) − 0 0 2 r The coefficients do not depend on δ. That means μδ = μ. Consequently, μεδ ⇒ μ as ε → 0, and the limit measure coincides with the measure corresponding to the process κ2 1 ξ(t) = ξ0 + t+ √ w(t). κ0 κ1 κ0 κ1 4. Needed preliminary results In this section we prove the results used for the proof of the Theorem above. (Below we denote by ∂Y the boundary of unit cube Y in En ) Lemma 4.1. Let d = d (y) be a periodic function satisfying  d (y) dy = 0, Ld (y) = h (y) , Y

and Condition (A) holds. Then, sup |d| ≤ C max |h| , Y

Y

with the constant C depending only on the prescribed quantities, such as dimension n, ellipticity constant λ0 , etc.

52

Yuriy V. Kolomiets

Proof. Let BR denote a set in En which contains a cube Y . By changing d(y) to d(y) − inf d , we can suppose that inf d = 0. The Theorems 9.20, 9.22 ( [5]) imply BR

BR

sup d = sup d ≤ C(inf d + max |h|). En

BR

BR

BR

That means, sup d = osc(d) ≤ C max |h|. Y

En

Corollary 4.2. Under the previous assumptions, let d(x, y) and h(x, y) depend on a parameter x. Then the derivatives of d and h with respect to x satisfy ∂h ∂d sup ≤ C sup . ∂x ∂x Y E n

Proof. The proof follows immediately from the linearity of equation (2.4). The function h(x) is one-to-one function and, consequently, has an inverse function denoted by h−1 (x). Lemma 4.3. Let the Conditions (A) and (B) be satisfied. Then there exists a positive constant C such that ... 1. exp {−2λ2 } ≤ F (x) ≤ exp {2λ2 }; |F˙ (x)| ≤ C; |F¨ (x)| ≤ C; | F (x)| ≤ C; ˙ 2. |h(x)| ≤ exp {2λ2 }|x|; |h(x)| ≤ C; ˙ 3. |f (x)| ≤ C(1 + |x|); | f (x)| ≤ C; β(z) C 4. supz∈E1 α(z) ≤ λ1 = C; 5. supx∈E1 ,y∈En |β(x, y)| ≤ C; supx∈E1 ,y∈En |α(x, y)| ≤ C; 6. |h−1 (x)| ≤ exp {2λ2 }|x|; exp {−2λ2 } ≤ h˙ −1 (x) ≤ exp {2λ2 }. Proof. The assertions 1.–5. of the lemma follow from our assumptions. Next, let us consider 6. The equality h˙ −1 (x) =

1 1 = −1 ˙h(h−1 (x)) F (h (x))

and 1. imply second part 6. Then, using that h˙ −1 (0) = 0, from the previous equality, we have |h−1 (x)| = |h−1 (x) − h−1 (0)| ≤ exp {2λ2 }|x|. Lemma 4.4. Let the Conditions (A) and (B) be satisfied. The processes ξε (t), ηε (t) are the solutions of (2.1), (2.2) respectively. For every δ ∈ ]0, 12 [  ξε (t) 1−δ 1 +ε ψε (ξε (t),ηε (t)) εδ β(z) lim E sup δ dz = 0. ξε (t) ε→0 α(z) t∈[0,T ] ε δ ε

Proof. By Remark 2.2 and Lemma 4.3 (part 4.), we have  ξε (t) 1−δ 1 +ε ψε (ξε (t),ηε (t)) εδ β(z) lim E sup δ dz ξε (t) ε→0 α(z) t∈[0,T ] ε δ ε

≤ C lim ε1−2δ E sup |ψε (ξε (t), ηε (t))| = 0. ε→0

t∈[0,T ]

Fractional Stability of Diffusion Approximation for Random Differential Equations

53

Lemma 4.5. Let the Conditions (A) and (B) be satisfied. The processes ξε (t), ηε (t) are the solutions of (2.1), (2.2) respectively, ζε (t) is defined by (3.1). For every δ∈]0, 12 [ and for every t ∈ [0, T ], lim E sup |Aε0 (ξε , ηε , t)| = 0.

ε→0

t∈[0,T ]

Proof. Part A. From the definition of function f (x) ((2.8)) and, using Lemma 4.3 (part 2.), we have |Aε01 (ξε (t), ζε (t))| := f˙ε (ζε (t)) − f˙ε (ξε (t))      ξε (t) 1−δ +ε ψε (ξε (t),ηε (t)) εδ C β(z) dz − 1 . ≤ exp −2 ξε (s) κ0 α(z) δ ε

Now, using the Lemma 4.4, we get 1 ε |A01 (ξε (s), ζε (s))| ≤ C lim ε1−2δ = 0. δ ε→0 ε s∈[0,t]

lim E sup

ε→0

Part B. Similarly, by the Lemma 4.3 and Lemma 4.4, using technique of part A and the condition B2, we can derive 1 ε |A02 (ξε (s), ζε (s))| ≤ C lim ε1−2δ = 0, δ ε→0 ε→0 s∈[0,t] ε where |Aε02 (ξε (t), ζε (t))| := f¨ε (ζε (t)) − f¨ε (ξε (t)) . Part C. From parts A and B and by the estimations of Lemma 4.3, we arrive at lim E sup

lim E sup |Aε0 (ξε , ηε , t)|

ε→0

t∈[0,T ]

≤ C lim

ε→0

1 E sup |Aε (ξε (s), ζε (s)) + Aε02 (ξε (s), ζε (s))| ≤ C lim ε1−2δ = 0. ε→0 εδ t∈[0,T ] 01

Lemma 4.6. Let the Conditions (A), (B), and (C0) hold. The processes ξε (t), ηε (t) are the solutions of (2.1), (2.2) respectively. Then, for every δ ∈ ]0; 12 [, lim E sup |εψε (ξε (t), ηε (t)) + ε2−δ nε (ξε (t), ηε (t))|2 = 0.

ε→0

t∈[0,T ]

Proof. From the definitions and the properties of the functions ψ(x, y), (2.5), and n(x, y), E sup |εψε (ξε (t), ηε (t)) + ε2−δ nε (ξε (t), ηε (t))|2 ≤ lim ε2(1−δ) C = 0. ε→0

t∈[0,T ]

follows. Lemma 4.7. Let the Conditions (A) and (B) be satisfied. The processes ξε (t), ηε (t) are the solutions of (2.1), (2.2) respectively, ζε (t) is defined by (3.1). For every integer positive m and δ ∈ ]0, 12 [ and ε0 > 0, there exist constants Cm (ε0 ), such that for every ε < ε0 , E sup |ξε (t)|m ≤ Cm (ε0 , λ2 )(1 + Cm (ε0 , λ2 , ξ0 , η0 )). t∈[0,T ]

54

Yuriy V. Kolomiets

Proof. We fix arbitrary ε0 > 0 and, for ε < ε0 , apply Ito’s formula to the function εδ hε (ζε (t)) + κ0 ε2−δ nε (ξε (t), ηε (t)). Using Lemma 4.3 and Lemma 4.5, under Condition (B), in a standard way, as in ([9], Ch.II, Sec. 5, Corollary 12, p. 86), we can get the estimates E sup |εδ hε (ζε (t)) + κ0 ε2−δ nε (ξε (t), ηε (t))|m t∈[0,T ]

≤ Cm (1 + Cε )(1 + |εδ hε (ζ0 ) + κ0 ε2−δ nε (ξ0 , η0 )|m ) ≤ Cm (ε0 )(1 + |εδ hε (ζ0 ) + κ0 ε2−δ nε (ξ0 , η0 )|m ); here limε→0 Cε = 0, ζ0 = ξ0 + ε(ψε (ξ0 , η0 )). It follows from part 6. of Lemma 4.3., that εδ |h−1 ε (x)| ≤ exp {2λ2 }|x|. Let us estimate E supt∈[0,T ] |ξε (t)|m . We obtain E sup |ξε (t)|m = E sup |ξε (t) + εψε (ξε (t), ηε (t)) − εψε (ξε (t), ηε (t))|m t∈[0,T ]

t∈[0,T ] δ 2−δ = E sup |εδ h−1 nε (ξε (t), ηε (t)) ε (ε hε (ζε (t))) + κ0 ε t∈[0,T ]

− εψε (ξε (t), ηε (t)) − κ0 ε2−δ nε (ξε (t), ηε (t))|. Combining the results of Lemma 4.3 (part 6.), and Lemma 4.6, there is ε0 such that, for every ε ≤ ε0 , we can find constants C(ε0 , m) and C(ε0 , m, ξ0 , η0 ) so that E sup |ξε (t)|m ≤ exp {−2λ2 }(Cm (ε0 ) + Cm (ε0 , ξ0 , η0 )). t∈[0,T ]

From last inequality, the statement of the lemma follows. Corollary 4.8. Under the assumptions of the previous Lemma, there exist con  stants Cm (ε0 , λ2 ) and Cm (ε0 , ξ0 , η0 ) such that for every ε < ε0 ,   (ε0 , λ2 )(1 + Cm (ε0 , λ2 , ξ0 , η0 )). E sup |ζε (t)|m ≤ Cm t∈[0,T ]

Proof. The proof follows immediately from the proof of the previous Lemma. Lemma 4.9. Let the Conditions (A), (B), and (C0) be hold. The process ζε (t) is defined by (3.1). Then for every δ ∈ ]0; 12 [ lim E sup |εδ fε (ζε (t))|2 = 0.

ε→0

t∈[0,T ] δ

Proof. For every ε > 0 and N such that ε 2 < N < ∞, δ

E sup |εδ fε (ζε (t))|2 = E sup |εδ fε (ζε (t))|2 χ{|ζε (t)| < ε 2 } t∈[0,T ]

t∈[0,T ] δ

+ E sup |εδ fε (ζε (t))|2 χ{ε 2 ≤ |ζε (t)| ≤ N } t∈[0,T ]

+ E sup |εδ fε (ζε (t))|2 χ{|ζε (t)| > N } =

ε D11

+

t∈[0,T ] ε ε D12 + D13

Fractional Stability of Diffusion Approximation for Random Differential Equations

55

respectively. By part 3. of Lemma 4.3 we obtain ε = 0. lim D11

ε→0

Using the definition of the function f (x) and the condition (C0), we have lim

ε→0

ε D12

2 δ  ζε (t) ε εδ δ = lim E sup F (z)dz − ζε (t) χ{ε 2 ≤ |ζε (t)| ≤ N } = 0. ε→0 t∈[0,T ] κ0 0

Now,  ε D13

≤ε C 2δ

E supt∈[0,T ] |ζε (t)|4 1+ ε4δ

 12

E supt∈[0,T ] |ζε (t)|2 . N2

At first approaching the limit as ε → 0, then letting N → ∞ and using the estimation of Corollary 4.8, the statement of lemma follows. 2 Lemma 4.10. Let the functions H(x) ∈ Cx,b (E1 ) and the function k(x, y) ∈ 2,2 ¯ Cx,y,b (E1 , En ) satisfied the condition k(x) = 0. The processes ξε (t), ηε (t) are the solutions of (2.1), (2.2) respectively, ζε (t) is defined by (3.1), λε (t) is defined by (3.2). Then, for a function P (x) ∈ C0∞ (E1 ) and a continuous bounded Cs — measurable functional Φs (x), we have  t

P (λε (s))Hε (ζε (s))kε (ξε (s), ηε (s))ds = 0. lim EΦr (ξε ) ε→0

r

Proof. Let the function l(x, y) be the unique solution of the problem  l(x, y)dy = 0 Ll(x, y) = k(x, y), Y 2,2 for any x ∈ E1 (x play the role of parameter). Then l(x, y) ∈ Cx,y,b (E1 , En ). Applying Ito’s formula to the function 

ξε (s) 2 ε P (λε (s))Hε (ζε (s))l , ηε (s) , εδ

we get 

t

P (λε (s))Hε (ζε (s))kε (ξε (s), ηε (s))ds = G(ε, k, r, t), r

where the function G(ε, k, r, t) depends on the processes ξε (t), ζε (t), λε (t) and ηε (t) Using the estimates of Lemma 4.7, under our conditions, by the standard arguments we arrive at 

3  √ |ξε (t)|k = 0. lim EΦr (ξε )G(ε, k, r, t) = 0 ≤ C lim εEΦr (ξε ) 1 + sup ε→0

ε→0

t∈[0;T ] k=1

Corollary 4.11. The statement of the previous Lemma is also true for H(x) ∈ ˙ ¨ + |H(x)| ≤ C(1 + |x|). Cx2 (E1 ) such that |H(x)| + |H(x)|

56

Yuriy V. Kolomiets

Proof. The proof follows immediately from the Corollary 4.8 and the proof of the previous Lemma 4.10. Lemma 4.12. Let the Conditions (A) and (B) be satisfied. The processes ξε (t), ηε (t) are the solutions of (2.1), (2.2) respectively. Then, for every δ∈]0, 12 [, lim E sup | mε (ξε (t), ·) − mε (ζε (t), ·) |2 = 0.

ε→0

t∈[0,T ]

and lim E sup | αε (ξε (t), ·) − αε (ζε (t), ·) |2 = 0.

(4.1)

ε→0

t∈[0,T ]

Proof. According to the conditions, we have lim E sup | mε (ξε (t), ·) − mε (ζε (t), ·) |2

ε→0

t∈[0,T ]

% 

& 2 ∂ ξε (t) |ψε (ξε (t), ηε (t))|2 = 0. m ≤ lim ε2(1−2δ) E sup sup , · δ ε→0 ∂x ε E t∈[0,T ] 1 Similarly, we can prove (4.1). Lemma 4.13. Let the Conditions (A) and (B) hold. The processes ζε (t) is defined by (3.1), λε (t) is defined by (3.2), and let lε (x) be such a function that E sup |lε (ζε (t))|2 ≤ C,

(4.2)

t∈[0,T ]

and for every r, t : 0 ≤ r ≤ t ≤ T  t lim E lε (ζε (s))ds = 0.

(4.3)

ε→0

r

Then, for every φ(x) ∈ C0∞ (E1 ) and 0 ≤ r ≤ t ≤ T ,  t lim E φ(λε (s))lε (ζε (s))ds = 0. ε→0

r

Proof. Let {ti } be some partition of interval [r, t] : r ≤ t1 ≤ t2 ≤ ... ≤ tn = t such that |ti+1 − ti | ≤ ηi . Then  t n−1   φ(λε (s))lε (ζε (s))ds ≤ E E r

i=1

(4.4)

+

ti

n−1  i=1

 φ(λε (s)) − φ(λε (ti )) lε (ζε (s))ds

ti+1 

 E φ(λε (ti ))

ti+1

ti

lε (ζε (s))ds .

Let us denote first term in the right hand side (4.4) by L(ε, n), then, using (4.2) and the estimation of E|λε (t) − λε (s)|4 , there exist a constant C0 such that for every 0 < ε < ε0 L(ε, n) ≤

n−1   ti+1 i=1

ti

4 E φ(λε (s)) − φ(λε (ti )) ds

14 

ti+1

ti

4 E lε (ζε (s)) 3 ds

34

Fractional Stability of Diffusion Approximation for Random Differential Equations

≤ C0

n−1   ti+1

(s − ti )2 ds

14

3

(ti+1 − ti ) 4



√ ≤ C0 (t − r) max ηi .

ti

i=1

57

i

Consequently, L(ε, n) can be made arbitrarily small by making the partition of the interval [r, t] fine enough. By (4.3), the second term in the right hand side of (4.4) tends to 0 as ε → 0 . The lemma is proved. Lemma 4.14. Let us define two functions  x  z −1 κ0 F (y)m(y) − β0 γ(x) = 2 dydz, F (z) F (y)α(y) 0 0  x  z −2 2 κ0 F (y)α(y) − σ02 γ 1 (x) = 2 dydz. F (z) F (y)α(y) 0 0 Let the conditions (A), (B), and (C) be fulfilled. The process ζε (t) is defined by (3.1). Then, for every δ ∈ ]0; 12 [, lim E sup |ε2δ γε (ζε (t))| = lim E sup |εδ γ˙ ε (ζε (t))|2 = 0,

ε→0

(4.5)

lim E sup

ε→0

ε→0

t∈[0,T ]

|ε2δ γε1 (ζε (t))|

t∈[0,T ]

= lim E sup |εδ γ˙ ε1 (ζε (t))|2 = 0. ε→0

t∈[0,T ]

t∈[0,T ]

δ

Proof. For every N : ε 2 ≤ N < ∞,

δ E sup |ε2δ γε (ζε (t))| = E sup |ε2δ γε (ζε (t))| χ{|ζε (t)| < ε 2 } t∈[0,T ]

t∈[0,T ]



δ

+ χ{ε 2 ≤ |ζε (t)| ≤ N } + χ{|ζε (t)| > N } = γε(1) + γε(2) + γε(3) respectively. According to Lemma 4.3, under the Conditions (A), (B), we have 4 i  d ≤ C(1 + |x|). γ(x) |γ(x)| ≤ C(1 + |x|2 ), dxi i=1 Under the Condition (C) and by part 1. of Lemma 4.3, we obtain lim

sup

ε→0 0 0 and every f in the domain of A, φt is the unique measure-valued solution of the unnormalized filtering equation:  (3.3)

φ(f, t) = φ(f, 0) +

t



nj m   φ Af − f (λj,k − 1), s ds

0

+

nj  t m   j=1 k=1

0

j=1 k=1

φ((λj,k − 1)f, s−)dYj,k (s)

128

Laurie C. Scott and Yong Zeng

and πt is the unique measure-valued solution of the normalized filtering equation:  t m  t  π(Af, s)ds − (π(f aj , s) − π(f, s)π(aj , s))ds π(f, t) = π(f, 0) + 0

(3.4)

+

nj  t m   j=1 k=1

!

0

j=1

0

" π(f gj,k , s−) − π(f, s−) dYj,k (s), π(gj,k , s−)

  where aj = aj (θ(t), X(t), t) is the trading intensity, and gj,k = gj,k (yj,k | xj ) is the transition probability from xj = Xj (t) to yj,k , the k-th price level for the jth stock.   In the special case that aj (θ(t), X(t), t) = aj (t), the filtering equation of πt reduces to  t π(f, t) = π(f, 0) + (3.5) π(Af, s)ds 0

" nj  t ! m   π(f gj,k , s−) − π(f, s−) dYj,k (s). + π(gj,k , s−) 0 j=1 k=1

4. Bayesian Model Selection via Filtering With the irregular likelihood functions traditional parameter estimation and model selection techniques can be prohibitively difficult to use. However, as Kass and Rafferty [8] demonstrated critical evaluation is still possible that through Bayesian inference, i.e., while the overall “fit” of a model to the data may be difficult to calculate, it can be determined which of two models is a better “fit.” The calculated Bayes factors allow the researcher to estimate the pair-wise relative fit of models for the same data and to select the best of the available models. Bayes factors are used to select the best of two models using the integrated (or marginal) likelihood functions of each model given the observed data. The Bayes factor for model 1 over model 2 is the ratio of the two integrated likelihood functions. Kass and Rafferty, [8] suggested the following rules for the interpretation of Bayes factor. A Bayes factor with a value between 1 and 3 is considered insignificant, i.e., Model 1 is not clearly “better” than Model 2. A Bayes factor with a value between 3 and 12 indicates Model 1 is a somewhat better fit for the data than Model 2. From a value between 12 and 150, it indicates that Model 1 is an overall better fit than Model 2 can be inferred. A Bayes factor with a value greater than 150 implies that Model 1 is decisively better. Naturally, if the Bayes factor is less than 1 we simply consider the Bayes factor of Model 2 over Model 1 and interpret the result accordingly. 4.1. Evolution of the Bayes Factors In our efforts to develop the Bayes factors for the multivariate FM models, we begin with notational adjustments for simplicity in working with two models. Since, the Bayes factors are used to compare the fit of two models given the observed prices, we  (c) , Y) will let c = 1, 2 through out this section. Then we denote Model-c as (θ(c) , X (c) and its joint likelihood by L (t) given as before by (3.1) and have the following definitions. (c)  (c) (t))L(c) (t) | FY ]. Definition 4.1. Let φc (fc , t) = E Q [fc (θ(c) (t), X t

Multivariate Micromovement Models and Model Selection via Filtering

129

Then the integrated likelihood of Y would be φc (1, t) for Model c with proper priors. (c)  (c) ) given Definition 4.2. Denote πt as the conditional distribution of (θ(c) , X Y P (c) (c) Y  (t)) | F ]. Ft . Let πc (fc , t) = E [fc (θ (t), X t

Next, we need to define filter ratio processes as Definition 4.3. Let the filter ratio processes, q1 and q2 , be written as φ1 (f1 , t) , φ2 (1, t)

q1 (f1 , t) =

q2 (f2 , t) =

φ2 (f2 , t) . φ1 (1, t)

Remark 4.1. the Bayes factors can then be written as B1,2 (t) = q1 (1, t) and B2,1 (t) = q2 (1, t). The evolution of the filter ratio processes can be characterized by Theorem 4.1.  (c) ), Theorem 4.1. Suppose Model-c (c = 1, 2) has generator A(c) for (θ(c) , X (c) (c) (c)  (c) (t), t), and transition trading intensity for the jth asset aj = aj (θ (t), X (c)

(c)

probability gj,k = gj,k (yj,k | xj ) from xj to yj,k for the random transformation (1)

(2)

F (c) . If Model-c satisfies Assumptions 2.1 to 2.5, then (qt , qt ) are the unique measure-valued pair solution of the system of SDEs q1 (f1 , t) = q1 (f1 , 0)

"  t! (2) m   q1 (f1 , s)q2 (aj , s) (1) (1) + (4.1) ds q1 (A f1 , s) − q1 (aj f1 , s) − q2 (1, s) 0 j=1 +

nj  t ! (1) (1) m   q1 (aj f1 gj,k , s−) j=1 k=1

(2) (2)

q2 (aj gj,k , s−)

0

" q2 (1, s−) − q1 (f1 , s−) dYj,k (s)

for all t > 0 and f1 ∈ D(A(1) ), and for all t > 0 and f2 ∈ D(A(2) ) q2 (f2 , t) = q2 (f2 , 0)

"  t! (1) m   q2 (f2 , s)q1 (aj , s) (2) (2) + (4.2) ds q2 (A f2 , s) − q2 (aj f2 , s) − q1 (1, s) 0 j=1 +

nj  t ! (2) (2) m   q2 (aj f2 gj,k , s−) j=1 k=1

(1) (1)

q1 (aj gj,k , s−)

0

" q1 (1, s−) − q2 (f2 , s−) dYj,k (s).

In the special case that the intensity for the trading times of the jth asset depends (c)  (c) (t), t) = a(t), the preceding equations reduce to only on t, namely, aj (θ(c) (t), X 

t

q1 (A(1) f1 , s)ds

q1 (f1 , t) = q1 (f1 , 0) + 0

(4.3)

+

nj  t ! (1) m   q1 (f1 gj,k , s−) j=1 k=1

0

(2)

q2 (gj,k , s−)

" q2 (1, s−) − q1 (f1 , s−) dYj,k (s)

130

Laurie C. Scott and Yong Zeng

and 

t

q2 (A(2) f2 , s)ds

q2 (f2 , t) = q2 (f2 , 0) + 0

(4.4)

+

nj  t ! (2) m   q2 (f2 gj,k , s−) j=1 k=1

(1)

q1 (gj,k , s−)

0

" q1 (1, s−) − q2 (f2 , s−) dYj,k (s).

Proof. There are two steps. Step 1: Derivation of the evolution equations for q1 (f1 , t) and q2 (f2 , t) (c)  (c) (t), t) = aj (t) We will show that q1 (f1 , t) satisfies (4.1), and when aj (θ(c) (t), X (4.1) reduces to (4.3). Then, by symmetry, q2 (f2 , t) satisfies (4.2) and, in the special case, (4.4). Recall that φc (fs , t) satisfies (3.3). Then applying Ito’s formula for semimartingales ([12]) and simplifying gives us  t φ1 (A(1) f1 , s) φ1 (f1 , t) φ1 (f1 , 0) (4.5) = + ds φ2 (1, t) φ2 (1, 0) φ2 (1, t) 0 " (1) (2) m  t!  φ1 (aj f1 , s) φ1 (f1 , s)φ2 (aj , s) − ds − φ2 (1, s) φ22 (1, s) j=1 0 " nj  t ! m   φ1 (f1 , s) φ1 (f1 , s−) dYj,k (s). − + φ2 (1, s) φ2 (1, s−) 0 j=1 k=1

To transform this equation to the desired form, we make two observations. First, (2)

(2)

φ1 (f1 , s)φ2 (aj , s) = φ22 (1, s)

φ1 (f1 ,s) φ2 (aj ,s) φ2 (1,s) φ1 (1,s) φ2 (1,s) φ1 (1,s)

(2)

q1 (f1 , s)q2 (aj , s) . = q2 (1, s)

Next, again using (3.3) and assuming that the kth trade of the jth stock occurs at time s (1) (1)

(4.6)

φ1 (f1 , s−) + φ1 ((aj gj,k − 1)f1 , s−) φ1 (f1 , s) = (2) (2) φ2 (1, s) φ2 (1, s−) + φ2 (aj gj,k − 1, s−) (1) (1)

=

φ1 (aj gj,k f1 , s−) (2) (2)

(1)

=

φ2 (aj gj,k , s−)

(1)

q1 (aj f1 gj,k , s−) (2) (2)

q2 (aj gj,k , s−)

q2 (1, s−).

Substituting these results into the definition of the filter ratio process gives us (4.1). (c)  (c) (t), t) = aj (t) for c = 1, 2 then If aj (θ(c) (t), X q1 (f1 , s)q2 (aj , s) q1 (f1 , s)aj q2 (1, s) = = q1 (aj f1 , s) q2 (1, s) q2 (1, s) and

(1)

q1 (aj f1 gj,k , s−) (2)

q2 (aj gj,k , s−)

(1)

q2 (1, s−) =

q1 (f1 gj,k , s−) (2)

q2 (gj,k , s−)

q2 (1, s−)

giving us (4.3). Step 2: Uniqueness of q1 (f1 , t) and q2 (f2 , t) It remains only to show the uniqueness of the filtering ratio processes, which follows closely from the perturbation arguments of SPDE such as given in Kouritzin

Multivariate Micromovement Models and Model Selection via Filtering

131

and Zeng [9]. Towards this end, let T (1) , T (2) be the semi-groups with weak generators A(1) , A(2) . Define {τi }∞ i=1 as the jump times associated with the Y where τ0 = 0. Here τi represents a trading time for one of the j = 1, . . . m assets. For c = 1, 2 and l = 3 − c, let (q1 , q2 ) be finite measure process satisfying  (4.7)

t

qc (A(c) fc , s)ds

qc (fc , t) = qc (fc , τi ) + τi



m  t! 

(c)

qc (aj fc , s) −

j=1

τi

" (l) qc (fc , s)ql (aj , s) ds ql (1, s)

for all t ∈ [τi , τi + 1) and fc ∈ D(A(c) ). Hence, by Assumption 2.3 and using (4.7) for t ∈ [τi , τi + 1), we have (4.8)

exp(−C(t − τi ))qc (1, τi ) ≤ qc (1, t) ≤ exp(C(t − τi ))qc (1, τi ).

Now, we apply a standard technique in SPDE and define a convolution form χc for c = 1, 2 and l = 3 − c by (c)

χc (t, u, fc ) = qc (Tt−u fc , u) +

" (c) (l) m  t!  qc (Tt−s fc , s)ql (aj , s) (c) (c) − qc (aj Tt−s fc , s) ds ql (1, s) j=1 u

for all u ≤ t ∈ [τi , τi + 1) and fc ∈ D(A(c) ). Then, taking the fact that for all s ≥ 0, (c) Ts fc ∈ D(A(c) ) and applying Leibniz’s rule give us " (c) (l) m !  qc (Tt−u fc , u)ql (aj , u) d d (c) (c) (c) χc (t, u, fc ) = qc (Tt−u fc , u) − − qc (aj Tt−u fc , u) . du du ql (1, u) j=1 Observe that (4.9)

d (c) (c) (c) qc (Tt−u fc , u) = −qc (A(c) Tt−u fc , u) + qc (A(c) Tt−u fc , u) du " (c) (l) m !  qc (Tt−u fc , u)ql (aj , u) (c) (c) + − qc (aj Tt−u fc , u) . ql (1, u) j=1

d Therefore, du χc (t, u, fc ) = 0 for u ∈ [τi , t]. This implies χc (t, t, fc ) = χc (t, τi , fc ), which produces (4.10) " (c) (l) m  t!  qc (Tt−s fc , s)ql (aj , s) (c) (c) (c) qc (fc , t) = qc (Tt−τi fc , τi ) + − qc (aj Tt−s fc , s) ds. ql (1, s) j=1 τi

Now suppose that (r1 , r2 ) is a second process satisfying (4.7) such that (q1 (·, τi ), q2 (·, τi )) = (r1 (·, τi ), r2 (·, τi )), i.e., that the processes agree at the trading times. Then, using (4.10) for both pairs

132

Laurie C. Scott and Yong Zeng

we have that for all t ∈ [τi , τi + 1) and fc ∈ D(A(c) ), c = 1, 2 |r1 (f1 , t) − q1 (f1 , t)| + |r2 (f2 , t) − q2 (f2 , t)| (1) (2) (1) (2) m ! t  r1 (Tt−s f1 , s)r2 (aj , s) q1 (Tt−s f1 , s)q2 (aj , s) ds ≤ − r (1, s) q (1, s) 2 2 τ i j=1  t (2) (1) (2) (1) r2 (Tt−s f2 , s)r1 (aj , s) q2 (Tt−s f2 , s)q2 (aj , s) ds − r1 (1, s) q1 (1, s) τi  t (1) (1) (1) (1) + |r1 (aj Tt−s f1 , s) − q1 (aj Tt−s f1 , s)|ds +

τi t

"

 +

(2) (2) |r2 (aj Tt−s f2 , s)



(2) (2) q2 (aj Tt−s f2 , s)|ds

.

τi

However, together Assumption 2.3 and equation (4.8) imply (1) (2) (1) (2) r1 (Tt−s f1 , s)r2 (aj , s) q1 (Tt−s f1 , s)q2 (aj , s) − r2 (1, s) q2 (1, s) (2) (1) (2) (1) r2 (Tt−s f2 , s)r1 (aj , s) q2 (Tt−s f2 , s)q1 (aj , s) + − r1 (1, s) q1 (1, s)

(4.11)

(1)

(1)

(1)

(1)

(2)

(2)

(2)

(2)

+ |r1 (aj Tt−s f1 , s) − q1 (aj Tt−s f1 , s)| + |r2 (aj Tt−s f2 , s) − q2 (aj Tt−s f2 , s)| ≤ 2C sup |r1 (f1 , t) − q1 (f1 , t)| ¯ f1 ∈C(E),f 1 ∞ ≤1

+ 2C

sup ¯ f2 ∈C(E),f 2 ∞ ≤1

|r2 (f2 , t) − q2 (f2 , t)|

where s ∈ [τi , τi + 1), fc ∈ D(A(c) ) with f  ≤ 1. Now, using (4.8) and the compact (c) containment condition, there exist increasing compact sets Kn for c = 1, 2 such (c) (c) 1 that rc (Kn , t) ∧ qc (Kn , t) ≥ 1 − n for all t ∈ [τi , τi + 1). Then Assumption 2.3, (4.11) and (4.11), and Stone–Weierstrass imply   (1) (1) (2) (2) sup + sup f [dr − dq ] f [dr − dq ] t t (1) 1 t (2) 2 t ¯ ¯ f1 ∈C(E),f1 ∞ ≤1

f2 ∈C(E),f2 ∞ ≤1

Kn

Kn

2 sup |r1 (f1 , t) − q1 (f1 , t)| ≤ + n f1 ∈D(A(1) ),f1 ∞ ≤1 +

sup f2 ∈D(A(2) ),f2 ∞ ≤1



2 8C + (t − τi ) + 4C n n

|r2 (f2 , t) − q2 (f2 , t)|  (1) (1) f [dr − dq ] t (1) 1 t ¯ Kn f1 ∈C(E),f 1 ∞ ≤1 "  (2) (2) + sup (2) f2 [drt − dqt ] ds. ¯

 t!

sup

τi

f2 ∈C(E),f2 ∞ ≤1

Kn

Finally, we apply Gronwell’s inequality ([5]) and let n → ∞ and obtain   (1) (1) (2) (2) sup f1 [drt − dqt ] + sup f2 [drt − dqt ] = 0. ¯ ¯ f1 ∈C(E),f1 ∞ ≤1

E

f2 ∈C(E),f2 ∞ ≤1

E

Multivariate Micromovement Models and Model Selection via Filtering

133

Hence, we have the uniqueness for t ∈ [τ0 , τ1 ) and the updating equation implies the same holds for t = τ1 . By induction, we have the uniqueness on [0, ∞), and the theorem follows. 5. A Convergence Theorem and a Numerical Scheme Theorem 4.1 characterizes the evolution of the Bayes factors. To compute the Bayes factors, one constructs an algorithm to approximate qk (fk , t), where q1 (1, t) = B12 (t). The algorithm, based on the evolution of SDEs, is naturally recursive, handling a datum at a time. Thus, the algorithm makes real-time updates and can handle large data sets. One basic requirement for the recursive algorithm is consistency: The approximate qk , computed by the recursive algorithm, must converge to the true one. The following theorem summarizes the related convergence results and provides the theoretical foundation for consistency. Furthermore, the theorem furnishes a recipe for constructing consistent recursive algorithms. (c)  (c) Let c = 1, 2 throughout this section. First denote (θ , X x ) as an approximation (c)  (c) (c)  (c) (c)  (c)    of (θ , X ). Further, denote (θ , Xx ) ⇒ (θ , X ) as the weak convergence in the Skorohod topology as (,x ) → 0. Define for each j = 1, 2, . . . , m ⎛ ⎞ (c)  t (c) (c)  (c) (s), s)ds) Nj,1 ( 0 λj,1 (θ (s), X x ⎜ ⎟ ⎜ (c)  t (c) (c)  (c) (s), s)ds) ⎟ ⎜ Nj,2 ( 0 λj,2 (θ (s), X ⎟ ε,c    x (5.1) Yj (t) = ⎜ ⎟, .. ⎜ ⎟ ⎝ ⎠ . (c)  t (c)  (c)  Nj,nj ( 0 λj,nj (θ(c) (s), Xx (s), s)ds) where ε = max(|x |, | |) and | | is Euclidean norm of a vector. Analogously to the continuous case, we define the collection of the counting processes for the approximate model c as  ε,c (t), Y  ε,c (t), . . . , Y  ε,c (t)). Yε(c) (t) = (Y m 1 2

(5.2) Y (c)

(c)

Let Ft ε = σ(Yε (s), 0 ≤ s ≤ t). (c)  (c) (c) (c) (c) (c) Now, let (θ , X  x , Yε ) be defined on (Ωε , Fε , Pε ) with Assumptions 2.1 to (c)

2.5. Assumptions 2.1 to 2.3 imply the existence of a reference measure Qε having similar properties. The corresponding Radon–Nikodym derivative is dPε /dQε which is ! t m nj   (s−), s−))dYj,k (s) (5.3) exp log(λj,k (θ(s−), X Lε (t) = x j=1 k=1

0

 − 0

t

"   (s−), s−) − 1]ds . [λj,k (θ(s−), X x

Given this reference measure and the joint likelihood function we can similarly define for c = 1, 2 and l = 3 − c (c)

(c) Yε (c)  (c) (c) ], φε,c (fc , t) = E Qε [fc (θ , X  x )Lε (t) | Ft (c)

(c) Yε (c)  (c) πε,c (fc , t) = E Pε [fc (θ , X ],  x ) | Ft

134

Laurie C. Scott and Yong Zeng

and qε,c (fc , t) =

φε,c (fc , t) . φε,l (1, t)

We have the following theorem. Theorem 5.1. Let c = 1, 2. Suppose that Assumptions 2.1 to 2.5 hold for the  (c) , Y(c) ) and for the approximate model (θ(c) , X  (c) , Yε(c) ). Suppose model (θ(c) , X    x (c)  (c) (c) , X  (c) ) as (,x ) → 0. Then, as ε → 0, for bounded and continu) ⇒ ( θ (θ , X  x ous f1 and f2 , (i) Yε ⇒ Y, as ε → 0 (ii) φε,c (fc , t) ⇒ φc (fc , t) (iii) πε,c (fc , t) ⇒ πc (fc , t) (iv) qε,1 (f1 , t) ⇒ q1 (f1 , t) and qε,2 (f2 , t) ⇒ q2 (f2 , t) simultaneously. Remark 5.1. Part (i) implies the convergence of the observation in approximate model to that in the true one. We note that Part (ii) implies the consistency of the approximate (integrated) likelihood, while Part (iii) shows the consistency of approximate posterior. Lastly, Part (iv) implies the consistency of the approximate Bayes factors. Taken as a whole, Theorem 5.1 shows that there are discrete, computationally feasible, and consistent versions of the likelihoods, posterior, and Bayes factors for the class of models. Proof. Parts (i), (ii) and (iii) are proven in [14]. The proof relies on several theorems: Kurtz and Protter’s theorem on the convergence of stochastic integral (see Theorem 2.2 of [11]), two theorems on the convergence of conditional expectations (see [6] and [10]) and the continuous mapping theorem (see [5]). Part (iv) follows directly from Part (ii) since, again by the continuous mapping theorem qε,c (fc , t) is consistent if φε,c (fc , t) and φε,l (1, t) are for c = 1, 2, l = 3 − c. 5.1. Overview of the Recursive Algorithm The advantage of having a consistent discrete approximation of the model and Bayes factors is that it makes the model estimation and evaluation computationally feasible. To implement the Bayesian model selection via filtering as described here, we can construct recursive algorithms to calculate the approximate Bayes factors qε,c (fc , t), c = 1, 2. For simplicity of this discussion we will assume that (c)  (c) (t), t) = aj (t) for the jth asset. aj (θ(c) (t), X (c)  (c) The first step is to construct for c = 1, 2, (θ , X ) as a Markov chain ap 

 x

 (c) ) with generator Aε and obtain gj,k (y (c) | X (c) (t)) as an proximation to (θ(c) , X j,ε j,ε approximation to gj,k (yj | Xj (t)). We will restrict our space to the lattice points corresponding to the assumed prior distribution. In the second step, we obtain the approximate Bayes factors from (4.3) and (4.4) broken down into the propagation equation:  ti+1 qε(c) (Aε fc , s)ds, (5.4) qε(c) (fc , ti+1 −) = qε(c) (fc , 0) + ti

and the updating equation: (c)

(5.5)

qε(c) (fc , ti+1 )

=

(c)

qε (fc gj,k , ti+1 −)

qε(3−c) (1, ti+1 −). (3−c) (c) qε (fc gj,k , ti+1 −)

Multivariate Micromovement Models and Model Selection via Filtering

135

In the final step, we convert (5.4) and (5.5) into recursive algorithms setting fc as a lattice-point indicator with two sub-steps: (a) represents qε (·, t) as a finite array with the components being qε (f, t) and (b) approximates the time integral in (5.4) with an Euler scheme. 6. Conclusions and Future Works In this paper, we investigate the model selection problems for a general class of multivariate FM models of asset price and develop Bayesian model selection via filtering in two steps. We first derive the evolution system of SPDEs for the Bayes factors and prove its uniqueness. Then we prove a limit theorem which provide a recipe to develop consistent recursive algorithms for computing the Bayes factors. The Bayesian model selection via filtering is computationally intensive and even so in the multivariate FM models. To improve efficiency, we will extend the recent developments in particle filtering to the filtering problem with counting process observations. See [16] for the recent development in this direction for univariate case. With efficient algorithms for implementation, the developed Bayesian model selection via filtering offers a powerful tool to test related market microstructure theories, represented by the micromovement models. For examples, we may test whether NASDAQ has less trading noise after a market reform as argued in [1], test whether information affects trading intensity as argued by [3] and tested by [4], test whether inventory position of a market maker has an impact on price suggested in [7], test whether there is relationship between transaction times and limit order arrival times as in [13], and test whether there is a structure break in transaction periods as in [20]. The multivariate FM models can be further generalized to the filtering models with marked point process observations and likewise the related Bayesian inference via filtering. See [19] for further development. References [1] Barclay, M., Christie, W., Harris, J., Kandel, E. and Schultz, P. H. (1999). The effects of market reform on the trading costs and depths of nasdaq stocks. Journal of Finance 54 (1) 1–34. [2] Bremaud, P. (1981). Point Processes and Queues: Martingale Dynamics. Springer-Verlag, New York. [3] Easley, D. and O’Hara, M. (1992). Time and the process of security price adjustment. Journal of Finance 47 577–605. [4] Engle, R. (2000). The econometrics of ultra-high-frequency data. Econometrica 68 1–22. [5] Ethier, S. and Kurtz, T. (1986). Markov Processes: Characterization and Convergence. Wiley, New York. [6] Goggin, E. (1994). Convergence in distribution of conditional expectations. Annals of Probability 22 1097–1114. [7] Hasbrouck, J. (1988). Trades, quotes, inventories and information. Journal of Financial Economics 42 229–252. [8] Kass, R. E. and Raftery, A. E. (1995). Bayes factors and model uncertainty. Journal of the American Statistical Association 90 773–795.

136

Laurie C. Scott and Yong Zeng

[9] Kouritzin, M. and Zeng, Y. (2005). Bayesian model selection via filtering for a class of micro-movement models of asset price. International Journal of Theoretical and Applied Finance 8 97–121. [10] Kouritzin, M. and Zeng, Y. (2005). Weak convergence for a type of conditional expectation: Application to the inference for a class of asset price models. Nonlinear Analysis: Theory, Methods and Applications 60 231–239. [11] Kurtz, T. and Protter, P. (1991). Weak limit theorems for stochastic integrals and stochastic differential equations. Annals of Probability 19 1035– 1070. [12] Protter, P. (2003). Stochastic Integration and Differential Equations. Springer-Verlag, New York, 2nd ed. [13] Russell, J. (1999). Econometric modeling of multivariate irregularly-spaced high-frequency data. Working Paper, University of Chicago. [14] Scott, L. C. and Zeng, Y. (2006). Bayes estimation for a class of multivariate filtering micromovement models of asset price. Working Paper, University of Missouri at Kansas City. [15] Spalding, R., Tsui, K. W. and Zeng, Y. (2005). A micro-movement model with bayes estimation via filtering: Applications to measuring trading noises and trading cost. Nonlinear Analysis: Theory, Methods and Applications 64 295–309. [16] Xiong, J. and Zeng, Y. (2006). A branching particle approximation to the filtering problem with counting process observations. Working Paper, University of Tennessee at Knoxville. [17] Zeng, Y. (2003). A partially-observed model for micro-movement of asset prices with bayes estimation via filtering. Mathematical Finance 13 411–444. [18] Zeng, Y. (2004). Estimating stochastic volatility via filtering for the micromovement of asset prices. IEEE Transactions on Automatic Control 49 338– 348. [19] Zeng, Y. (2006). Statistical analysis of the filtering models with marked point process observations: Applications to ultra-high frequency data. Working Paper, University of Missouri at Kansas City. [20] Zhang, M. Y., Russell, J. R. and Tsay, R. S. (2001). A nonlinear autoregressive conditional duration model with applications to financial transaction data. Journal of Econometrics 104 179–207.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 137–153 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000354

Determining the Optimal Control of Singular Stochastic Processes Using Linear Programming Kurt Helmes1 and Richard H. Stockbridge2,∗ Humboldt-Universit¨ at zu Berlin and University of Wisconsin–Milwaukee Abstract: This paper examines the numerical implementation of a linear programming (LP) formulation of stochastic control problems involving singular stochastic processes. The decision maker has the ability to influence a diffusion process through the selection of its drift rate (a control that acts absolutely continuously in time) and may also decide to instantaneously move the process to some other level (a singular control). The first goal of the paper is to show that linear programming provides a viable approach to solving singular control problems. A second goal is the determination of the absolutely continuous control from the LP results and is intimately tied to the particular numerical implementation. The original stochastic control problem is equivalent to an infinite-dimensional linear program in which the variables are measures on appropriate bounded regions. The implementation method replaces the LP formulation involving measures by one involving the moments of the measures. This moment approach does not directly provide the optimal control in feedback form of the current state. The second goal of this paper is to show that the feedback form of the optimal control can be obtained using sensitivity analysis.

1. Introduction This paper examines a linear programming (LP) formulation for the long-term average cost of controlled stochastic processes. The processes under consideration have singular behavior (with respect to Lebesgue measure of time) that arises either from reflection or instantaneous jumping and which may include control decisions at the time of jumping. The use of linear programming to reformulate long-term average stochastic control problems began with Manne [17] in the context of a finitestate Markov chain in discrete time. This approach has been extended to general Markov processes in continuous time (lacking singular behavior) under a variety of optimality criteria in [2], [15] and [20]. The extension to include singular stochastic processes and control relies on an existence result given in [16]. In all of these LP formulations, the variables take the form of finite or probability measures, and as such, the problems are infinite-dimensional. ∗ This research has been supported in part by the U.S. National Security Agency under Grant Agreement Number H98230-05-1-0062. The United States Government is authorized to reproduce and distribute reprints notwithstanding any copyright notation herein. 1 Institut f¨ ur Operations Research, Spandauer Str. 1, D-10178 Berlin, Germany, e-mail: [email protected]; url: www.wiwi.hu-berlin.de/or 2 Department of Mathematical Sciences, Wisconsin 53201-0413, e-mail: [email protected]; url: www.uwm.edu/∼stockbri AMS 2000 subject classifications: Primary 93E20, 90C05; secondary 60J60, 60J75, 93E04 Keywords and phrases: singular processes, singular and absolutely continuous control, linear programming, regime switching, repair model, stationary distributions

137

138

Kurt Helmes and Richard H. Stockbridge

The LP approach has provided the foundation for analysis of uncontrolled stochastic processes by taking the control space to consist of a single value. Numerical implementation relies on a finite-dimensional approximation of the LP and has been shown to be effective in [7], [9] for exit time problems and [10] for steady-state analysis. Optimal stopping problems have also been solved using the LP methodology (see e.g. [3], [6], [11], [19]). The papers [6]-[11] reformulate the LP in terms of the moments of the measures rather than in terms of the measures themselves. This reformulation must also include Hausdorff moment conditions, that is, a set of linear conditions which are necessary and sufficient for the infinite collection of variables to be the moments of some measure or measures on bounded regions. The finite-dimensional approximation truncates the number of moments and the Hausdorff conditions which thus allows points to be feasible that are not the initial terms of a moment sequence. The feasible set is therefore enlarged, implying that the optimal value of the approximating LPs provides an upper or lower bound (depending on the type of optimization) for the true optimal value. The LP method has had only limited success so far in identifying optimal controls. Theoretically, an optimal control is obtained in relaxed feedback form from an optimal measure by taking the conditional distribution on the control space, given the state of the process. In practice, the selection of controls typically involves discretizing the control space (see e.g., [3], [18], [13]). This affects the reformulation by replacing measures on the product of the state space and control space by a finite collection of measures (one for each possible control value) on the state space alone. One difficulty with this discretization when using the moment reformulation is that the solution gives (pseudo-)moments of the measure corresponding to a value for the control and it is not transparent for which state values the control is active. The first goal of this paper is to demonstrate that an analysis of the reduced cost coefficients associated with the non-basic variables in the LP determines an approximate optimal control directly from the LP solution. This method is especially effective when the optimal control is of bang-bang type. The second goal is to show that singular control problems can be solved using the LP methodology. We consider three examples of increasing levels of complexity to illustrate the methodology. These examples are presented in the following sections. For a measurable space (S, Σ), M(S) denotes the collection of finite measures on (S, Σ) and P(S) is the subcollection of probability measures on (S, Σ). 2. Modified Bounded Follower Problem The bounded follower problem of [1] considers a controlled process X which satisfies the stochastic differential equation (2.1)

dX(t) = u(t) dt + σ dW (t),

in which W is a standard Brownian motion process, σ > 0 is constant and u(t) is a non-anticipative process which is required to satisfy the hard constraints u(t) ∈ [−1, 1], for all t. The objective of [1] is to minimize the long-term average second moment of X. The paper [8] modifies this problem by constraining X to remain in the interval [0, 1]. The constraints involve reflection at {0} and a jump mechanism at {1}. Specifically, X is modelled as a solution of the patchwork martingale problem [14] in which the diffusion specified in (2.1) is active in the open interval (0, 1), X sticks at {1} for an exponential length of time (parameter λ) at which point it

Optimal Control of Singular Processes

139

jumps to 0, and reflection occurs at {0} by restricting the domain of the generator Af (x, u) = uf  (x) + (σ 2 /2) f  (x) to functions f ∈ C 2 [0, 1] satisfying f  (0) = 0. The paper [8] demonstrates how to compare controls using a linear programming formulation of the problem and indicates numerical evidence of optimality. The current paper extends the analysis of this model in two ways. The first is to allow instantaneous jumps when X(t−) = 1 along with the reflection at {0}. We initially formulate the processes to be considered as a quadruplet (X, Λ, L0 , N1 ) which satisfies for each f ∈ C 2 [0, 1]  t  t f (X(t)) − Af (X(s), u) Λs (du) ds − B0 f (X(s)) dL0 (s) 0

[−1,1]



(2.2)

0 t



B1 f (X(s−)) dN1 (s). 0

is a martingale, in which A is the generator above, Λ denotes a relaxed control process (for each s, Λs is a distribution on [−1, 1]), B0 f (x) = f  (x), L0 denotes the local time of X at {0}, B1 f (x) = f (0) − f (x) and N1 denotes the process which counts the number of visits of X to {1}. Note, in particular, that the reflection of X at {0} is captured through the integral term involving B0 and so f is not required to satisfy the boundary condition f  (0) = 0. Also observe that the local time process L0 and the counting process N1 increase on sets of times which are singular with respect to Lebesgue measure of time. The objective of the decision maker is to minimize the long-term average second moment ! t " (2.3) lim sup t−1 E X 2 (s) ds . t→∞

0

This criterion does not include any cost for using the control u so one would anticipate u(t) taking only the extreme values, u(t) ∈ {−1, 1}. This insight, however, is not assumed in determining the solution. 2.1. LP Formulation based on Measures Let (X, Λ, L0 , N1 ) satisfy (2.2). Then for each t > 0, !  t E[f (X(0))] = E f (X(t)) − Af (X(s), u) Λs (du) ds  (2.4)

[−1,1]

0



t



B0 f (X(s)) dL0 (s) − 0

t

" B1 f (X(s−)) dN1 (s) .

0

For t > 0 define the expected occupation measures (up to time t) μt on [0, 1], ν0t on {0} and ν1t on {1} by, for every G ∈ B([0, 1] × [−1, 1]), 9  : t

μt (G) = t−1 E

IG (X(s), u) Λs (du) ds , 0

[−1,1]

ν0t ({0}) = t−1 E[L0 (t)], and ν1t ({1}) = t−1 E[N1 (t)]. Since [0, 1] × [−1, 1] is compact, the collection {μt : t > 0} is relatively compact and hence there exist limits as t → ∞. As a result, there will be corresponding limits

140

Kurt Helmes and Richard H. Stockbridge

of {ν0t } and {ν1t }. Dividing by t and passing to the limit in (2.4) demonstrates that for each weak limit (μ, ν0 , ν1 ) and for every f ∈ C 2 [0, 1]    (2.5) Af (x, u) μ(dx × du) + B0 f (x)ν0 (dx) + B1 f (x) ν1 (dx) = 0. The measure μ denotes the stationary distribution of (X, Λ) on [0, 1] × [−1, 1], ν0 gives the expected long-term average amount of local time per unit of time, and ν1 is the expected long-term average number of jumps per unit of time. Theorem 1.7 in [16] shows that for each (μ, ν0 , ν1 ) satisfying (2.5) there exists a stationary solution of (2.2) whose stationary distribution is given by μ and hence its objective function value (2.3) is given by x2 μ(dx × du). The relaxed control is given in feedback form as η(X(s), ·), where η is a regular conditional distribution on [−1, 1] satisfying μ(dx × du) = η(x, du)μ(dx × [−1, 1]). As a result, to any limiting (μ, ν0 , ν1 ) arising from any control process Λ there is a stationary solution having the corresponding value. This observation indicates that optimizing over stationary processes is equivalent to optimizing over any solutions. Hence the control problem can be reformulated as an infinite-dimensional LP. To simplify the expressions, for a measurable function g and a measure ν defined on a space (S, Σ), let g, ν denote g dν. Then the linear programming formulation is ⎧ Min. x2 , μ ⎪ ⎪ ⎪ ⎪ ∀f ∈ C 2 [0, 1], ⎨ S.t. Af, μ + B0 f, ν0  + B1 f, ν1  = 0, μ ∈ P([0, 1] × [−1, 1]), LP1 ⎪ ⎪ ν ⎪ 0 ∈ M({0}), ⎪ ⎩ ν1 ∈ M({1}).

Remark. An alternate approach to the solution of this stochastic control problem is to capture the singular behavior of the processes through restrictions on the domain of the generator A. Specifically, taking f ∈ C 2 [0, 1] with f  (0) = 0 and f (0) = f (1), the singular terms drop out of (2.2). Reflection of the process at {0} is obtained by requiring the first condition and the instantaneous jump by the second condition. One is now able to solve the stochastic control problem by solving the Bellman equation in this restricted domain. The dynamic programming approach, however, becomes more complex for the other examples in this paper when the objective criterion includes a cost for the jumps which are dependent on a control variable. 2.2. LP Formulation based on Moments and Control Discretization LP1 is the basis for the numerical solution of the control problem. Instead of allowing μ to be a probability measure on [0, 1] ×.[−1, 1], however, we discretize the set of controls to Uk = uj = kj : j = −k, . . . , k and require μ ∈ P([0, 1] × Uk ). This restriction reduces the number of feasible measures and thus provides an upper bound on LP1’s optimal value. The discretization is naturally an approximation of the given problem. For all cases where the optimal control is of bang-bang type, the resulting error can, in principle, be made as small as possible by a proper choice of the discrete subset Uk . In this example (and also the next example), the restricted LP includes an optimal measure μ for LP1 and therefore solving the restricted LP provides a solution to LP1.

Optimal Control of Singular Processes

141

Define μj (·) = μ(· × { kj }) for j  = −k, . . . , k, realizing that each μj is a subprobability measure on [0, 1], with μ = j μj being a probability measure. Also note that the “measures” ν0 and ν1 are actually point masses at {0} and {1}, respectively. Rather than work with the measures {μj } in the LP, we reformulate the problem again in terms of the moments, which completely determine the measures since they have support in the compact interval [0, 1]. For each j = −k, . . . , k and for each n ∈ N, define  (2.6) mj (n) = xn μj (dx). Take f (x) = xn in (2.5) and abuse notation slightly by letting ν0 and ν1 denote the masses of the measures on the endpoints. Then LP1 takes the form ⎧  Min. ⎪ j mj (2) ⎪ * ⎪  ) ⎪ n(n−1)σ 2 ⎪ (nu )m (n − 1) + m (n − 2) ⎨ S.t. j j j j 2 n−1 LP2 ν + (0n − 1)ν1 = 0, ∀n ∈ N, + n0 0 ⎪  ⎪ ⎪ ⎪ j mj (0) = 1, ⎪ ⎩ mj (n), ν0 , ν1 ≥ 0, ∀n ∈ N. In LP2, whenever the expression 00 appears, it is to be understood to equal 1. The variables in LP2 are supposed to be the moments of measures defined on [0, 1]; that is, we desire to have mj (n) = xn , μj  for some measure μj on [0, 1]. The constraints in LP2, however, are not sufficient for {mj (n) : n ∈ N} to be moments. Hausdorff [5] showed that necessary and sufficient conditions are provided by the set of linear inequalities obtained from the observation that for each m, n ∈ N    n  n j j+m (−1) x ν(dx) = xm (1 − x)n ν(dx) ≥ 0. (2.7) j [0,1] [0,1] j=0

Adding (2.7) when ν = μj , j = −k, . . . , k, to the constraint requirements of LP2 provides an equivalent LP formulation for the restricted LP1. 2.3. Finite-dimensional LPs and Numerical Results The difficulty with this modified version of LP2 is that there are an infinite number of variables and a corresponding infinite number of constraints. To be computable, it is necessary to approximate LP2 by a finite-dimensional linear program. One such approximation is obtained by restricting the number of moments to a finite collection, say n = 0, 1, . . . , M , and limiting the constraints to those involving only the selected number of moments. A result, however, of this approximation is that the variables {mj (n) : n = 0, . . . , M } are no longer guaranteed to correspond to the moments of a measure μj on [0, 1]. The constraint requirements are relaxed and hence the set of feasible “pseudo-moments” is larger; that is, the feasible set of the approximating LP contains the zeroeth to M th moments of the feasible measures of the amended LP2, but it contains other points which are not the initial terms of a moment sequence of some measure. Now consider more carefully the constraints (2.7) when restricted to j + m ≤ M . Each constraint defines a half-space and so the set of feasible finite sequences lies in a convex set defined by these half-spaces. This convex set is called the Hausdorff polytope. Helmes and R¨ohl [6] determine explicit formulas for the corner points of

142

Kurt Helmes and Richard H. Stockbridge

the Hausdorff polytope. A final modification to LP2 is therefore possible. Instead of imposing the finite Hausdorff conditions, characterize the Hausdorff polytope using convex combinations of the corner points. Thus the computable version of LP2 limits the number of variables to M + 1 for each measure and only imposes those constraints which involve these variables, and then rewrites the variables as convex combinations of the corner points. The variables in this computable version are the convex coefficients {λj (n) : n = 0, . . . , M ; j = −k, . . . , k}. In addition to giving an explicit formula for the corner points, the paper [6] proves convergence of the approximating optimal solutions to an optimal solution of LP2 and, moreover, shows that the corner points can be identified with a measure that is a single point mass. Table 1 displays a selection of values of the optimal convex coefficients λj (n) corresponding to the extreme points of the Hausdorff polytope when M = 60. Notice that the solution only has positive weights on the corner points corresponding to the use of drift rates {±1}, and that the weights correspond to u = −1 for the lower indices of the extreme points, whereas the higher indices have positive weights for u = 1. According to the results of [6], the extreme point having index n corresponds n (asymptotically) to a point mass at x = M . Thus Table 1 tends to indicate that the control u = −1 is used for smaller values of x and at some point (between 40 60 and 45 60 ) the control switches to u = 1. The λj (n) values do not provide a very accurate indication of the value of x where the switching occurs. Sensitivity analysis of the LP can be utilized to obtain better accuracy for the switch point. The “reduced costs” are amounts by which the cost coefficients of each λj (n) variable must change in order for the variable to become a basic variable; that is, should the cost coefficient change by the amount of the reduced cost for a variable λj (n), then λj (n) would be positive and be part of the basis for the solution. Table 2 displays the reduced costs for some of the values of n. First, notice that values of order 10−14 or 10−15 occur for those λj (n) which have positive weights. These values should be understood to be numerically equivalent to 0, since the weights currently in the basis do not need to have any change in their cost coefficients in order to be basic variables. To better distinguish the information contained in Table 2, it is helpful to scale the values by a factor of 100 and then round the values to the nearest integer. This scaling is displayed in Table 3. In contrast to the weights given in Table 1, a consistent pattern emerges with scaled reduced costs that indicates switching occurs close to index 43. Thus the control changes value from u = −1 to u = 1 43 when x is approximately 60 ≈ 0.71667. The numerical results depend, of course, on the choice of the highest moment. Table 4 displays the values of the optimal second moment, along with the values of the point masses p0 and p1 at {0} and {1}, respectively, for a selection of values of M . The exact values can be obtained (see [12]) in which the switching location is the solution of a transcendental equation that is then used to determine the stationary density for the optimal process and hence the exact optimal value via integration. Numerical evaluation of the switch location yields x = 0.70846; the resulting objective function and masses are also provided in Table 4 for comparison purposes. Table 5 displays some significant scaled reduced costs for the case M = 1024, using a scale factor of 1000. These results indicate that the switch location 725 726 ≈ 0.70801 and x = 1024 ≈ 0.70898. lies between x = 1024

Optimal Control of Singular Processes

143

Table 1 Values of the weight variables λj (n), j = −3, . . . , +3; σ = 1, M = 60 index of extreme point

control indices j: j corresponds to u = j/3

n

−3

−2

−1

0

1

2

3

. . . 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 . . .

. . . 0.175441 0 0 0 0 0 0 0 0 0.033439 0.086116 0 0 0 0 0 0 0 0.018578 0.036428 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . .

. . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . .

. . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . .

. . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . .

. . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . .

. . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . .

. . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.022700 0 0 0 0.001842 0.009488 0 0 0.004646 0.000624 . . .

3. Regime Switching Model with Jumping Costs The second model of this paper allows for changes in the regime of the diffusion along with control decisions to be made at the time the process hits {1}. The model contains two coordinate processes X and Y . The process Y , which tracks the regime, is a finite-state Markov chain having states Y = {y0 , . . . , yl } and transition rates given by a matrix Q = (qyz ). As in the modified bounded follower problem, the process X is a diffusion on the interior of (0, 1), is reflected at {0} and jumps instantaneously when X(t−) = 1. However, the coefficients of the diffusion now depend on the regime Y and in addition to selecting the drift rate, the decision maker also selects between several possible control actions when X hits {1}. Let 0 = x1 < · · · < xk1 −1 < xk1 = 1 be points in the unit interval and let V = {v1 , . . . , vk1 } denote the possible singular controls. For i < k1 , selecting control vi imposes an instantaneous jump to the target {xi } when the process hits {1}. The control vk1 imposes a reflection on the process X at {1}. The absolutely continuous

144

Kurt Helmes and Richard H. Stockbridge Table 2 Reduced cost coefficients for n = 31, . . . , 49; M = 60. index of extreme point

control indices j: j corresponds to u = j/3

n

−3

−2

−1

0

1

2

. . . 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 . . .

. . . 5.432 × 10−05 0.0001720 0.0003364 0.0004988 0.0005846 0.0005203 0.0002872 −5.551 × 10−15 −1.250 × 10−14 0.0009360 0.003792 0.009811 0.02028 0.03618 0.05777 0.08425 0.1139 0.1447 0.1753 . . .

. . . 0.02854 0.02782 0.02691 0.02575 0.02425 0.02233 0.01993 0.01717 0.01436 0.01213 0.01149 0.01367 0.02000 0.03151 0.04848 0.07020 0.09497 0.1207 0.1462 . . .

. . . 0.05703 0.05547 0.05348 0.05100 0.04792 0.04413 0.03958 0.03434 0.02871 0.02333 0.01918 0.01753 0.01972 0.02683 0.03920 0.05616 0.07604 0.09679 0.1171 . . .

. . . 0.08552 0.08311 0.08006 0.07626 0.07159 0.06594 0.05923 0.05150 0.04307 0.03453 0.02688 0.02139 0.01944 0.02215 0.02992 0.04212 0.05711 0.07286 0.08807 . . .

. . . 0.1140 0.1108 0.1066 0.1015 0.09526 0.08775 0.07887 0.06867 0.05742 0.04573 0.03457 0.02525 0.01916 0.01747 0.02064 0.02808 0.03819 0.04893 0.05900 . . .

. . . 0.1425 0.1384 0.1332 0.1268 0.1189 0.1096 0.09852 0.08584 0.07178 0.05693 0.04227 0.02910 0.01888 0.01279 0.01136 0.01404 0.01926 0.02500 0.02993 . . .

and singular generators of the pair process (X, Y ) are Af (x, y, u) = ub(y)fx (x, y) + (1/2)σ 2 (y)fxx (x, y) +



f (x, z)qyz ,

z∈Y

B0 f (x, y) = fx (x, y), B1 f (x, y, v) = −fx (x, y)I{vk1 } (v) +



[f (xi , y) − f (x, y)]I{vi } (v).

i=0,...,k1 −1

As in the previous example, u is again restricted to [−1, 1] which means, in light of the term b(y), that the decision maker is allowed to select different drift rates for  the different regimes. The model includes the jump generator z f (x, z)qyz which implies that the regimes switch according to a Markov chain. The processes under consideration form a sextuplet (X, Y, Λ, Ψ, L0 , N1 ), in which Ψ denotes a relaxed singular control process that chooses the values of v according to some probability measure, and satisfy the requirement that for each f ∈ C 2 ([0, 1] × Y)  t f (X(t), Y (t)) − (3.1) Af (X(s), Y (s), u)Λs (du) ds 

[−1,1]

0 t



B0 f (X(s), Y (s)) dL0 (s)  t B1 f (X(s−)Y (s−), v)Ψs (dv) dN1 (s) − 0

0

V

is a martingale, in which Λ, L  0 and N1 are the relaxed control process, local time process at x = 0 and counting process of visits to x = 1.

Optimal Control of Singular Processes

145

Table 3 Scaled reduced cost coefficients for n = 30, . . . , 50; M = 60. index of extreme point

control indices j: j corresponds to u = j/3

n

−3

−2

−1

0

1

2

3

. . . 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 . . .

. . . 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 6 8 11 14 18 21 . . .

. . . 3 3 3 3 3 2 2 2 2 1 1 1 1 2 3 5 7 9 12 15 17 . . .

. . . 6 6 5 5 5 5 4 4 3 2 2 2 2 2 3 4 7 8 10 12 14 . . .

. . . 9 9 8 8 8 7 7 6 5 4 3 3 2 2 2 3 4 6 7 9 10 . . .

. . . 12 11 11 11 10 10 9 8 7 6 5 3 3 2 2 2 3 4 5 6 7 . . .

. . . 15 14 14 13 13 12 11 10 9 7 6 4 3 2 1 1 1 2 3 3 3 . . .

. . . 17 17 17 16 15 14 13 12 10 9 7 5 3 2 1 0 0 0 0 0 0 . . .

The objective of the decision maker is to minimize the expected long-term average cost ! t (3.2) cabs (X(s), Y (s)) ds lim sup t−1 E t→∞ 0 "  t + csing (Y (s−), v)Ψs (dv) dN1 (s) , 0

Y

in which for illustrative purposes cabs (x, y) = c(y)x2 , where c(y) and csing (y, v) denote the regime-dependent and/or decision-dependent coefficients for the cost rates. We point out that the cost structure has different costs for the different possible singular actions. The cost is higher for larger control actions. In our numerical examples, there is no cost for reflection at {1} and the cost for jumping increases as the jump distance increases. 3.1. LP Formulation As in Section 2.1, the stochastic control problem can be equivalently written in terms of the stationary distribution and the expected long-term average occupation measures at {0} and on {1} × V. The infinite-dimensional LP is ⎧ Min. cabs , μ + csing , ν1  ⎪ ⎪ ⎪ ⎪ ⎨ S.t. Af, μ + B0 f, ν0  + B1 f, ν1  = 0, ∀f ∈ C 2 ([0, 1] × Y), μ ∈ P([0, 1] × Y × [−1, 1]), LP3 ⎪ ⎪ ν ⎪ 0 ∈ M({0}), ⎪ ⎩ ν1 ∈ M({1} × V).

146

Kurt Helmes and Richard H. Stockbridge Table 4 Objective function values and point masses as functions of M M

objective value

p0

p1

16 32 64 128 256 512 1024

0.10958 0.11117 0.11177 0.11193 0.11211 0.11218 0.11225

1.3796 1.4572 1.4640 1.5040 1.5337 1.5361 1.5363

0.6133 0.6377 0.6250 0.6317 0.6201 0.6276 0.6287

exact

0.11260

1.5319

0.6194

The finite-dimensional approximation uses f (x, y) = xn I{yi } (y) in LP3, restricts n to the set {0, . . . , M }, and employs the convex combination of the cornerpoints to characterize the feasible points in the Hausdorff polytope.

3.2. Numerical Results

To illustrate the success of the LP method for solving the stochastic control problem having both absolutely continuous and singular controls, we consider a particular set of parameters. In this example, there are two regimes (Y = {0, 1}) and the decision maker can select from three singular control actions, so k1 = 3 and V = {1, 2, 3}. Control v = 1 requires the process X to jump to x = 0 when it hits {1}. Under v = 2, the process jumps to x2 = 0.5, and the choice of v = 3 causes X to be reflected at {1} so as to stay in the interval [0, 1]. The model parameters are given in Table 6. Notice, in particular, that when y = 0 the jumping costs are approximately the same, whereas the jumping cost to {0} in state 1 is an order of magnitude larger than the cost for the process to be reset at x = 0.5. There is no cost for reflecting the process in either state. The selected diffusion coefficients and the switching rates are motivated by other studies ([21]). We also comment that since the optimal absolutely continuous control only takes values in {±1} (as evidenced in the previous example), we have limited our discrete choice of controls u to the set {−1, 0, 1}. The scaled reduced cost coefficients for M = 256 are presented in Table 7. These numerical results indicate that the switch points for the absolutely continuous 237 control should be located around x = 218 256 ≈ 0.852 when y = 0 and near x = 256 ≈ 0.926 for y = 1. Figure 1 displays, when y = 0, both the optimal u in feedback form as a function of the value of the driving force X and the optimal choice of singular control when X hits {1}. Similarly, Figure 2 displays the optimal values of u and v when y = 1. Since the cost for resetting to {0} is low when y = 0, the decision maker makes this choice, but when y = 1 the cost for such a resetting is prohibitively expensive and the controller opts to reset X to x = 0.5. For these cost parameters, the cost for jumping is not significant enough for the decision maker to pick the reflection option; such choices are obtained when the costs for jumping are larger.

Optimal Control of Singular Processes

147

Table 5 Scaled reduced cost coefficients when M = 1024. index of extreme point

control indices j: j corresponds to u = j/3

n

−3

−2

−1

0

1

2

3

. . . 555 . . . 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 . . . 867 . . .

. . . 0 . . . 14 15 16 16 17 17 18 19 19 20 21 21 22 23 24 25 25 26 . . . 239 . . .

. . . 26 . . . 16 17 17 17 18 18 19 19 20 20 20 21 21 22 22 23 24 24 . . . 199 . . .

. . . 52 . . . 19 19 19 19 19 19 19 20 20 20 20 20 21 21 21 22 22 22 . . . 160 . . .

. . . 78 . . . 21 21 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 . . . 120 . . .

. . . 104 . . . 23 22 22 22 21 21 21 20 20 20 20 19 19 19 19 19 18 18 . . . 80 . . .

. . . 129 . . . 25 24 24 23 23 22 21 21 20 20 19 19 18 18 17 17 17 16 . . . 40 . . .

. . . 155 . . . 27 26 25 24 24 23 22 21 21 20 19 18 18 17 16 16 15 14 . . . 0 . . .

4. Repair Model In our final example, the regime process Y represents the state of wear of a machine and X is a driving force for the level of deterioration. The levels of wear can be interpreted as moving from “new” (x, y) = (0, 0) to “broken” (x, y) = (1, 1), with several intermediate levels as well. In this framework, X can represent the fraction of deterioration at the current level. When X reaches {1}, Y instantaneously jumps up to the next level and X is instantaneously reset to {0}. In addition, a switching mechanism like in the previous example randomly makes the system jump from “newer” states to “older” states, with the implication that the rate matrix Q = (qyz ) is upper-triangular. The decision maker influences the evolution of the paired process (X, Y ) by choosing when to repair the machine. Thus when Y (t−) = i for 0 ≤ i < l and X(t−) ∈ V, V a finite set of points in the open interval (0, 1), the repair policy resets the driving process X(t) to 0 at a cost which depends on the value at which the resetting is initiated. So the machine will be “better” after the repair but does not become “younger”. Under this formulation, for levels of deterioration X(t) < 1 we only allow repair to the same level of wear. Should X(t−) = 1 when Y (t−) = i < l, the machine will be fixed, with X(t) = 0, but declared to have become “older,” so Y (t) = i + 1. If Y (t) = l we assume that repairs are no longer possible; should X(t−) = 1 when Y (t−) = l, the machine is declared to be “broken” and it is instantaneously replaced by a new machine, implying that the process has value (X(t), Y (t)) = (0, 0).

148

Kurt Helmes and Richard H. Stockbridge Table 6 Model Parameters Level

Drift

Diff.

cabs (x, y)

csing (y, v)

Q

y

b(y)

σ(y)

c(y)

v=1

v=2

v=3

0

1

0 1

1.5 20.0

0.44 0.63

1.5 2.0

0.05 0.29

0.06 0.02

0 0

–6.04 8.90

6.04 –8.90

Table 7 Scaled Reduced Cost Coefficients index of extreme point

Regime y = 0 control indices u = j

index of extreme point

Regime y = 1 control indices u = j

n

−1

0

1

n

−1

0

1

. . . 215 216 217 218 219 220 221 222 . . .

. . . 3 3 3 3 4 4 4 4 . . .

. . . 3 3 3 3 3 3 3 3 . . .

. . . 4 4 3 3 3 3 3 2 . . .

. . . 233 234 235 236 237 238 239 240 . . .

. . . 16 17 18 19 19 20 21 22 . . .

. . . 20 20 20 20 20 20 20 20 . . .

. . . 24 23 22 21 20 19 18 17 . . .

The driving force X satisfies the stochastic differential equation

dX(t) = b(Y (t)) dt + σ(Y (t)) dW (t)

with X(0) = 0 and Y (0) = 0. Notice that this model does not include any explicit control on X and that the coefficients depend on the level of wear y. From a modelling perspective, we briefly remark that since X is a diffusion process, the interpretation of this process as a “fraction of wear at the current level” has the implication that the deterioration of the machine can improve. One could replace the diffusion process by its running maximum so that the level of wear is monotone, as long as X is included in the model as a driving force so that the process is Markovian. Observe that the diffusion and its running maximum both hit a level within V at the same time so the repair mechanism would remain the same. The running maximum process increases singularly in time so would involve an additional singular generator along with an extra component (see e.g. [11] for a running maximum model). The model used in this section has the advantage of simplicity.

Optimal Control of Singular Processes

149

Fig 1. Optimal Absolutely Continuous and Singular Control Policies in Regime y = 0.

Fig 2. Optimal Absolutely Continuous and Singular Control Policies in Regime y = 1.

The generators for this repair model are Af (x, y) = b(y)fx (x, y) + (1/2)σ 2 (y)fxx (x, y) +



f (x, z)qyz ,

z∈Y

B0 f (x, y) = fx (x, y), B1 f (x, y) =

l−1 

[f (0, y + 1) − f (x, y)]I(1,i) (x, y)

i=0

+ [f (0, 0) − f (x, y)]I(1,l) (x, y), B2 f (x, y, v) =

l−1 

[f (0, y) − f (x, y)]I(v,i) (x, y).

i=0

A is the jump-diffusion operator for the driving force, B0 captures the reflection of X at {0}, B1 indicates that Y increases one level when X hits {1}, but resets when Y is at its maximum, and B2 incorporates the control decisions. For each level i < l, the decision maker selects a position v at which repair occurs. Note that for this example, the singular controls are choices of v ∈ V; in the most general case V could be the whole X-state space [0, 1]. The processes under consideration make  t (4.1) Af (X(s), Y (s)) ds f (X(t), Y (t)) − 0  t − B0 f (X(s), Y (s)) dL0 (s) 0  t − B1 f (X(s−), Y (s−)) dN1 (s) 0  t B2 f (X(s−), Y (s−), v) Ψs (dv) dN2 (s) − 0

V

150

Kurt Helmes and Richard H. Stockbridge

a martingale for every f ∈ C 2 ([0, 1] × Y), in which N2 is the counting process which counts the number of repairs. The cost criterion in which we are interested includes the cost of repairing or replacing the system and a cost associated with the second moment of the driving force, though with different coefficients for the different regimes so that higher levels of wear typically have higher costs. Let cabs (x, y) = c(y)x2 denote the running cost related to the position x, in which c(y) allows for different cost rate factors for the different states of wear. Also let c1 (x, y) denote the cost for replacement when the wear level is y; from the modelling, x = 1 when replacements occur. Finally, let c2 (x, y, v) denote the cost for repairs. The objective is to minimize the long-term average cost given by ! t (4.2) cabs (X(s), Y (s)) ds lim sup t−1 E t→∞

0



t

+

c1 (X(s−), Y (s−)) dN1 (s) 0 "  t c2 (X(s−), Y (s−), v)Ψs (dv)dN2 (s) . + V

0

For an example of a specific cost structure see Section 4.2. 4.1. LP Formulation It is helpful to carefully define the occupation measures before displaying the LP formulation. For each t > 0, define the measures (on the appropriate Borel sets) ! t " t −1 μ (G) = t E IG (X(s), Y (s)) ds , 0

ν0t ({(0, i)}) = t−1 E[I{i} (Y (t))L0 (t)], ν1t ({(1, i)}) = t−1 E[I{i} (Y (t−))N1 (t)], ! t  " t −1 IG (X(s−), Y (s−), v) Ψs (dv) dN2 (s) . ν2 (G) = t E 0

V

It is important to notice that though ν2t appears to be a measure on [0, 1] × Y × V, N2 only increases at times t such that X(t−) ∈ V. As a result, ν2t only charges points (x, v) on the diagonal of V × V. We can therefore simplify notation by taking ν2t to be a measure on Y × V. A similar tightness argument as in Section 2.1 implies existence of weak limits (μ, ν0 , ν1 , ν2 ) of {(μt , ν0t , ν1t , ν2t ) : t > 0} as t → ∞. As a result, (4.1) being a martingale implies (4.3)

Af, μ + B0 f, ν0  + B1 f, ν1  + B2 f, ν2  = 0,

∀f ∈ C 2 ([0, 1] × Y).

Thus the equivalent infinite-dimensional LP formulation for this repair model is ⎧ Min. cabs , μ + c1 , ν1  + c2 , ν2  ⎪ ⎪ ⎪ ⎪ S.t. Af, μ + B0 f, ν0  + B1 f, ν1  + B2 f, ν2  = 0, ⎪ ⎪ ⎪ ⎪ ∀f ∈ C 2 ([0, 1] × Y), ⎨ μ ∈ P([0, 1] × Y), LP4 ⎪ ⎪ ν0 ∈ M({0} × Y), ⎪ ⎪ ⎪ ⎪ ν ⎪ 1 ∈ M({1} × Y), ⎪ ⎩ ν2 ∈ M(Y × V).

Optimal Control of Singular Processes

151

Table 8 Parameters for the repair model Level

Drift

Diff.

cabs (x, y)

c1 (x, y)

c2 (y, v)

Q

y

b(y)

σ(y)

c(y)

x=1

1 ≤ v ≤ 99

0

1

2

0 1 2

0.7 0.8 0.9

0.44 0.44 0.44

1 2 3

40 50 100

10 20 30

–3 0 0

2 –1 0

1 1 0

The finite-dimensional approximation to LP4 is obtained using the test functions of the form f (x, y) = xn I{i} (y) in (4.3), with n = 0, . . . , M and i = 0, . . . , l. As before, this choice of functions results in conditions on the pseudo-moments associated with each measure (restricted to [0, 1]×{i} for each i), and the Hausdorff polytope associated with each measure is characterized through convex coefficient weights λi (n) on the corner points of the polytope. 4.2. Numerical Results The numerical illustration in this section has three wear levels (y = 0, 1, 2) and allows the possibility of repair and/or replacement from the 99 values x ∈ V = n : n = 1, 2, . . . , 99}. The other parameters are listed in Table 8. Since there is { 100 no absolutely continuous control for this example, it is only necessary to look at which locations x in each of state y = 0 and y = 1 repair occurs; recall that no repair is possible in state y = 2 so the only singular action is replacement. The masses of the measure ν1 on {1} × Y and the measure ν2 on Y × V are displayed in Table 9. Table 9 Masses of the singular measures ν1 at (1, y) and ν2 at (y, v) where v = V ∪ {1}

State y

n

i=0

i=1

i=2

. . . 26 27 28 29 30 . . . 42 43 44 45 46 . . . 98 99 100

. . . 0 0 0.647063 0 0 . . . 0 0 0 0 0 . . . 0 0 0

. . . 0 0 0 0 0 . . . 0 0 0.810149 0 0 . . . 0 0 0

. . . 0 0 0 0 0 . . . 0 0 0 0 0 . . . 0 0 0.526191

n 100

152

Kurt Helmes and Richard H. Stockbridge

Notice, in particular, that repair occurs when x = 0.28 and the machine is “new” (y = 0), and at x = 0.44 when y = 1, and that the only mass when y = 2 is when x = 1 since repair is not allowed. The solution has a nice “cascading” structure in that the repair location for the higher level of wear is to the right of the wear location for the lower level, with the replacement being at the endpoint of the highest level of wear. Thus the random shocks increase the level so that the process is in a new position to the left of any place where singular control occurs. It should be noted that this structure is an artifact of the particular choice of parameters in the model; different choices of parameters lead to more complex repair policies. References [1] Beneˇ s, V. E., Shepp, L. A. and Witsenhausen, H. S. (1980). Some solvable stochastic control problems. Stochastics 4 39–83. [2] Bhatt, A. G. and Borkar, V. S. (1996). Occupation measures for controlled Markov processes: Characterization and optimality. Annals of Probability 24 1531–1562. [3] Cho, M. J. and Stockbridge, R. H. (2002). Linear programming formulation for optimal stopping problems. SIAM Journal of Control and Optimization 40 1965–1982. [4] Decker, T. (2006). Die Charakterisierung des verallgemeinerten DalePolytops und ihre Verwendung in linearen Programmen zur L¨ osung von Austrittszeit-, Stopp- und anderen Optimierungsproblemen. Dissertation, Humboldt Universitaet, Berlin. [5] Hausdorff, F. (1923). Momentprobleme f¨ ur ein endliches Intervall. Mathematische Zeitschrift 16 220–248. ¨ hl, S. (2008). A geometrical characterization of multi[6] Helmes, K. and Ro dimensional Hausdorff polytopes with applications to exit time problems. Math. Oper. Res. 33 315–326. ¨ hl, S. and Stockbridge, R. H. (2001). Computing mo[7] Helmes, K., Ro ments of the exit time distribution for Markov processes by linear programming. Operations Research 49 516–530. [8] Helmes, K. and Stockbridge, R. H. (2000). Numerical comparison of controls and verification of optimality for stochastic control problems. Journal of Optimization Theory and Applications 106 107–127. [9] Helmes, K. and Stockbridge, R. H. (2001). Numerical evaluation of resolvents and Laplace transforms of Markov processes. Mathematical Methods of Operations Research 53 309–331. [10] Helmes, K. and Stockbridge, R. H. (2003). Extension of Dale’s moment conditions with application to the Wright–Fisher model. Stochastic Models 19 255–267. [11] Helmes, K. and Stockbridge, R. H. (2007). Linear programming approach to the optimal stopping of stochastic processes. Stochastics 79 309–335. [12] Kaczmarek, P. (2006). Numerical analysis of a long-term average control problem. M.S. Thesis, University of Wisconsin–Milwaukee. [13] Kaczmarek, P., Kent, S. T., Rus, G. A., Stockbridge, R. H. and Wade, B. A. (2007). Numerical solution of a long-term average control problem for singular stochastic processes. Math. Oper. Res. 66 451–473. [14] Kurtz, T. G. (1991). A control formulation for constrained Markov processes. Mathematics of Random Media. Lectures in Applied Mathematics 27.

Optimal Control of Singular Processes

153

[15] Kurtz, T. G. and Stockbridge, R. H. (1998). Existence of Markov controls and characterization of optimal Markov controls. SIAM Journal of Control and Optimization 36 609–653. [16] Kurtz, T. G. and Stockbridge, R. H. (2001). Stationary solutions and forward equations for controlled and singular martingale problems. Electronic Journal of Probability 6 paper 14, 1–52. [17] Manne, A. S. (1960). Linear programming and sequential decisions. Management Science 6 259–267. [18] Mendiondo, M. S. and Stockbridge, R. H. (1998). Approximation of infinite-dimensional linear programming problems which arise in stochastic control. SIAM Journal of Control and Optimization 36 1448–1472. ¨ hl. S. (2001). Ein linearer Programmierungsansatz zur L¨ [19] Ro osung von Stoppund Steuerungsproblemen. Dissertation, Humboldt Universitaet, Berlin. [20] Stockbridge, R. H. (1990). Time-average control of martingale problems: A linear programming formulation. Annals of Probability 18 206–217. [21] Zhang, Q. (2001). Stock trading: An optimal selling rule. SIAM Journal of Control and Optimization 40 64–87.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 155–167 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000363

A Degenerate Variance Control Problem with Discretionary Stopping Daniel Ocone1 and Ananda Weerasinghe2,∗ Rutgers University and Iowa State University Abstract: We consider an infinite horizon stochastic control problem with discretionary stopping. The state process is given by a one dimensional stochastic differential equation. The diffusion coefficient is chosen by an adaptive choice of the controller and it is allowed to take the value zero. The controller also chooses the quitting time to stop the system. Here we develop a martingale characterization of the value function and use it and the principle of smooth fit to derive an explicit optimal strategy when the drift coefficient of the state process is of the form b(x) = −θx where θ > 0 is a constant.

1. Introduction. Degenerate variance control problems are those in which the controller has access to the diffusion coefficient in the state dynamics and may even set it to zero. A simple model is the one-dimensional equation  (1.1)

Xxu (t)



t

b(Xxu (s))ds

=x+

+

0

t

u(s)dW (s) 0

where x is a real number, {W (t) : t ≥ 0} is a standard one-dimensional Brownian motion and u(·) is a suitably adapted control process subject to the constraint (1.2)

0 ≤ u(t) ≤ σ0 for all

t ≥ 0.

Here σ0 is a given positive constant. Several control problems based on (1.1)-(1.2) are considered in the literature. Assaf [1] studies minimizing a combination of location and control cost when σ0 ↑ ∞ for a specific control problem generated by a model for dynamic sampling. Papers [10] and [11] generalize Assaf’s control structure, but for σ0 < ∞. In these papers the cost is a discounted, infinite horizon integral of location cost, which increases as the state approaches the origin, plus a control cost, which increases in the control effort u. The deterministic solutions to x˙ = b(x) associated with fully degenerate control (u ≡ 0) evolve toward the origin in the direction of higher cost. In [10] and [11], it is shown how to construct cost minimizing controls of bang-bang type that use maximum variance control (u = σ0 ) to move the state to lower cost regions. ∗ This

research is partially supported by Army Research Office Grant W 911NF0510032. of Mathematics, Rutgers University, New Brunswick, NJ 08854, e-mail: [email protected] 2 Department of Mathematics, Iowa State University, Ames, IA 50011, e-mail: [email protected] AMS 2000 subject classifications: Primary 93E20, 93E20; secondary 60H30 Keywords and phrases: degenerate variance control, diffusion processes, discretionary stopping 1 Department

155

156

Daniel Ocone and Ananda Weerasinghe

This article presents an example of control of (1.1) combined with discretionary stopping. The object is now to choose a control u(·) and a stopping time τ to maximize the reward functional  τ e−αt C(Xxu (t))dt. (1.3) J(x, u, τ ) = E 0

Here, the discount rate α is a positive constant. The value function is given by (1.4)

V (x) = sup J(x, u, τ ), U

where U is a collection of all policies (u(·), τ ) to be described precisely below. We have in mind here a situation in which: (i) the origin is a unique asymptotically stable equilibrium point for x˙ = b(x); and (ii) C(·) is a unimodal function with a unique positive maximum at the origin and limx→−∞ C(x) = limx→∞ C(x) = −∞. We will place more restrictive hypotheses on b(·) and C(·) for the statements and proofs of the results, but the assumptions (i) and (ii) will serve for motivation. In this case, the solution to x˙ = b(x) obtained for the zero variance control u ≡ 0 does evolve in a favorable direction, in contrast to the problems in [1], [10] and [11]. It is of interest to ask whether and, if so, where, positive variance control should be employed to boost the expected reward. Consider first the deterministic stopping problem when  ∞ u ≡ 0 is imposed. The discounted reward if τ = ∞ is given by V0 (x) = E[ 0 e−αt C(Xx0 (t)) dt]. If (i) and (ii) hold and, say, b grows linearly, V0 (x) decreases with increasing |x|, and there will be constants −∞ < a0 < 0 < b0 < ∞ such that V0 (x) > 0 if and only if a0 < x < b0 . The optimal choice of τ is then easy; the controller, having the option of stopping, will not accept a negative reward. Hence τ = ∞, if a0 < x < b0 , and τ = 0 otherwise. Consider next adding the possibility of positive variance control. Intuitively, if the state is close to the origin, positive variance control ought never to be applied, as diffusive behavior of the state would lessen the reward. However, let x be a point larger than b0 but close to it. Then, if u(·) is positive, some sample paths of Xxu (·) will move more quickly toward the origin than the solution of x˙ = b(x) and doing so will enable an overall positive reward; at the same time, the option to stop allows bailing out along sample paths that move the wrong way. Therefore, one should have positive expected reward even for some x > b0 . Assuming concavity of C and linearity of b, the main result of this paper verifies this scenario and shows how to construct an optimal feedback control and stopping rule. For precise results, the following conditions will often be assumed throughout this article. (1.5)

(i) The function b is continuously differentiable on R and b(0) = 0.

(1.6)

(ii) C(·) is a twice continuously differentiable, strictly concave function which attains its unique maximum at x = 0 and C(0) = 1.

The continuous differentiability of b(·) assumed in (1.5) guarantees local existence and uniqueness of solutions to (1.1). The requirement that C(0) = 1 is just a normalization convention of no consequence to the results.

Discretionary Stopping

157

Admissible controls are defined precisely as follows: An admissible control system is a quintuple ((Ω, F, P ), {Ft }, W (·), u(·), τ ) such that (Ω, F, P ) is a complete probability space, {Ft } is a right-continuous, complete filtration, W (·) is a onedimensional Brownian motion adapted to {Ft }, such that W (t + s) − W (t) is independent of Ft for all t > 0 and s > 0, u(·) is an {Ft }-progressively measurable process satisfying (1.2), and τ is an {Ft }-stopping time less than or equal to the explosion time of the solution to (1.1). (In the situation of interest in this paper, xb(x) ≥ 0 for all x and the explosion time is infinite almost surely.) With a slight abuse of notation, henceforth we denote an admissible policy by the pair (u, τ ). The class of admissible policies is denoted by U and this is the class that should be used in (1.4) in the definition of the value function. Theorem 1.1. Let the drift coefficient in (1.1) be given by b(x) = −θx for all x, where θ > 0 is a positive constant. Assume C(·) satisfies (1.6). Then an explicit representation of the value function V (·) defined in (1.4) is given in (3.11) and the value function is continuously differentiable everywhere. Furthermore, there exist four points c∗ < p∗ < 0 < q ∗ < d∗ so that the following admissible control strategy ∗ (u∗ , τ ∗ ) with the corresponding state process Xxu (·) is an optimal strategy. 1. If x ≤ c∗ or x ≥ d∗ then choose τ ∗ = 0 and stop. 2. If p∗ ≤ x ≤ q ∗ , then choose τ ∗ = ∞, u∗ (t) = 0 for all t and follow the deterministic motion. 3. If q ∗ < x < d∗ then choose u∗ (t) = σ0 and let τˆ be the first exit time of the ∗ process Xxu (·) from the interval (q ∗ , d∗ ). Thereafter, follow as in the steps 1 or 2 appropriately. In this case, τ ∗ = τˆI[Xxu∗ (ˆτ )=d∗ ] + ∞ · I[Xxu∗ (ˆτ )=q∗ ] . 4. If c∗ < x < p∗ then choose u∗ (t) = σ0 and let τˆ be the first exit time of the ∗ process Xxu (·) from the interval (c∗ , p∗ ). Thereafter, follow as in the steps 1 or 2 appropriately. In this case, τ ∗ = τˆI[Xxu∗ (ˆτ )=c∗ ] + ∞ · I[Xxu∗ (ˆτ )=p∗ ] . That allowing positive variance control boosts the expected reward in a way similar to Theorem 1.1 should be a general fact. The linearity of b(·) and the concavity of C(·) are used to derive the particularly simple optimal policy of Theorem 1.1 by smooth fit. The value function V (·) is continuously differentiable and thus “the principle of smooth fit” holds for the first derivative of V (·) and its second derivative has jump discontinuities only at the points c∗ , p∗ , q ∗ and d∗ . Some of the results preliminary to the proof of Theorem 1.1 are proved under more general assumptions. Solvable stochastic control problems with discretionary stopping have received attention recently(see [2],[4],[6],[7],[9] and [13]). Variational inequalities related to higher dimensional problems are developed in [9]. A discretionary stopping problem arising in mathematical finance is addressed in [6]. The articles [2] and [4] treat singular stochastic control problems with discretionary stopping, while [13] studies a two player stochastic differential game with degenerate variance control. The existence and characterization of optimal Markov controls for several types of stochastic control problems are developed in [8]. They use a martingale problem approach. To obtain their results, they show that the original stochastic control problem is equivalent to a linear programming problem over a space of measures.

158

Daniel Ocone and Ananda Weerasinghe

In [7], authors address a finite time horizon problem with combined control and discretionary stopping. Their control process affects only the drift coefficient. Motivated by their martingale characterization of the optimal strategy, we also formulate a martingale characterization for the value function in section 2. We use it in section 3 to construct the optimal state process of Theorem 1.1. Our optimal control is “feed-back” type and hence the optimal state process is a Markov process. As noted in [7], this martingale condition is analogous to the “equalization” condition developed by Dubins and Savage [3] in a discrete time context. 2. A Martingale Formulation The first result in this section is a martingale characterization of the optimal value function. Then, we derive simple bounds and monotonicity properties of the value function. Since b(0) = 0 and C(x) ≤ C(0), we are also able to show that the origin can be considered as an absorption point without any change in the value function. This enables us to solve the control problem in each of the regions (−∞, 0) and (0, +∞) separately and then to paste the two solutions together. For the results in this section, we do not need the full power of assumptions (1.5) and (1.6). The martingale characterization theorem remains valid under quite general assumptions as listed below. All the other results in this section remain valid if the drift coefficient b(·) is a continuously differentiable function which satisfies b(0) = 0 and C(·) is a continuous function which is strictly increasing in (−∞, 0) and strictly decreasing in (0, ∞). We take C(0) = 1 for simplicity. The following theorem requires only that the drift coefficient b(·) in (1.1) be continuous and that the reward function C(·) in (1.3) be continuous and bounded above by a constant. Theorem 2.1. Let Q(·) be a non-negative, bounded continuous function defined on R and let the initial point x be fixed.  t∧τ (i) If Q(Xxu (t ∧ τ ))e−α(t∧τ ) + 0 e−αs C(Xxu (s))ds is a super-martingale for the state process Xxu (·) corresponding to each admissible control policy (u, τ ) in U, then Q(x) ≥ V (x). ∗

(ii) If Q(·) satisfies the above condition (i) and if there is a state process Zxu (·) corresponding to an admissible control policy (u∗ , τ ∗ ) so that  t∧τ ∗ −αs ∗ ∗ ∗ e C(Zxu (s))ds is a martingale and Q(Zxu (t ∧ τ ∗ ))e−α(t∧τ ) + 0 ∗ ∗ Q(Zxu (τ ∗ )) = 0 on the set [τ ∗ < ∞], then Q(x) = V (x), Zxu (·) is an optimal ∗ ∗ state process and (u , τ ) is the corresponding optimal control policy. Proof. Let Xxu (·) be a state process corresponding to an admissible control policy (u, τ ). Then, using the super-martingale property in condition (i) and the nonnegativity of the function Q(·), we obtain !

t∧τ

Q(x) ≥ E

(2.1)

−αs

e

" for all t ≥ 0.

C(Xxu (s))ds

0

Since C(·) is bounded above, we have ! (2.2)

lim E

t→∞

t∧τ

−αs

e 0

" C(Xxu (s))ds

! =E

τ

−αt

e 0

" C(Xxu (t))dt

.

Discretionary Stopping

159

Therefore, by (2.1) and (2.2) we obtain !

τ

Q(x) ≥ E

−αt

e

" C(Xxu (t))dt

0

and consequently, Q(x) ≥ V (x). The proof of part (i) is complete. Now let (u∗ , τ ∗ ) be an admissible control policy with associated state process u∗ Zx (·) which satisfies the assumptions in part (ii). Using the martingale condition, then we have : 9 ∗  (2.3)



Q(x) = E Q(Zxu (t ∧ τ ∗ ))e−α(t∧τ



t∧τ

)

+



e−αs C(Zxu (s))ds .

0 ∗

Using the fact that Q(Zxu (τ ∗ )) = 0 on the set [τ ∗ < ∞], we obtain * ) * ) ∗ ∗ ∗ E Q(Zxu (t ∧ τ ∗ ))e−α(t∧τ ) = E Q(Zxu (t))I[t τ0x .

Since the drift term b(·) is continuously differentiable, we can consider the associated state process Xxu˜ (·) on the same probability space using the equation (1.1). The condition b(0) = 0 implies that Xxu˜ (t) = Xxu (t ∧ τ0x ) for all t ≥ 0. Hence, we have the following proposition. Proposition 2.2. Assume that the drift b(·) is a continuously differentiable function which satisfies b(0) = 0 and the reward function C(·) is a continuous function which is strictly increasing in (−∞, 0), strictly decreasing in (0, ∞) and satisfies C(0) = 1. Let the state processes Xxu (·) and Xxu˜ (·) be defined as above. Then the following results hold.

160

Daniel Ocone and Ananda Weerasinghe

(i) J(x, u, τ ) ≤ J(x, u ˜, τ ) for each stopping time τ . Furthermore, if we let D be the sub-collection of admissible control policies (˜ u, τ ) of U so that the corresponding state process Xxu˜ (·) is stopped at the origin, then V (x) = sup J(x, u, τ )

(2.6)

D

(ii) V (x) ≤

1 α

for all x and V (0) =

1 α.

Proof. Since Xxu˜ (t) = Xxu (t ∧ τ0x ) for all t ≥ 0 and the reward function C(·) has a unique maximum at the origin, it follows that C(Xxu˜ (t)) ≥ C(Xxu (t)) for all t ≥ 0. Therefore, J(x, u, τ ) ≤ J(x, u ˜, τ ) for any stopping time τ . As an immediate consequence, V (x) = sup J(x, u, τ ) follows. D

To prove part (ii), observe that C(Xxu (t)) ≤ C(0) for all t and consequently J(x, u, τ ) ≤ C(0) for each admissible policy (u, τ ). Hence V (x) ≤ α1 . If the iniα tial point is at the origin, one can choose u0 (t) ≡ 0 and τ∞ = ∞ to obtain J(x, u0 , τ∞ ) = α1 . Hence V (0) = α1 . This completes the proof. Remark. Notice that the above Proposition 2.2 implies that if the assumptions in part (i) of the Theorem 2.1 holds for the admissible control policies in the subcollection D, then the conclusion there still remains valid. The next lemma establishes monotonicity of the value function. Lemma 2.3. Under the assumptions of Proposition 2.2, the value function V (·) defined in (1.4) is non-negative, monotone increasing on (−∞, 0) and monotone decreasing on (0, ∞). Proof. If we choose the zero stopping time, J(x, u, 0) = 0 and hence V (x) ≥ 0 for all x. Here, we show that V (·) is decreasing on (0, ∞). A similar argument works on (−∞, 0). Let x > y > 0 and let (u, τ ) be any admissible control policy. Because of the assumed continuous differentiability of b(·), the solutions to (1.1) are path-wise unique and so Xyu (t) ≤ Xxu (t) for all t ≥ 0. Now introduce τ0y as in (2.4) and the admissible control process u ˜(t) = u(t)I[0,τ0y ] (t) u ˜ u ˜ as similar to (2.5). The state process Xy (·) is given by Xy (t) = Xyu (t ∧ τ0y ). Then by the proof of Proposition 2.2, it follows that

(2.7)

 J(y, u ˜, τ ) = E[

τ ∧τ0y

e−αt C(Xyu (t))dt +

0



τ

τ ∧τ0y

e−αt C(0)dt]

> J(x, u, τ ). Therefore, V (y) ≥ V (x) when x > y > 0. This completes the proof. Next, we show that any smooth solution to the corresponding Hamilton-JacobiBellman(HJB) equation of the discretionary stopping problem is an upper bound for the value function. Proposition 2.4. Make the same assumptions as in Proposition 2.2. Let Q(·) be a non-negative, bounded and continuously differentiable function which satisfies the following:

Discretionary Stopping

161

(i) Q (·) is continuous everywhere except in a finite set. Furthermore, the onesided derivatives Q (x−) and Q (x+) exists and are finite for all x. (ii) There is a positive constant M > 0 so that |Q (x)| < M for all x. (iii) The function Q(·) satisfies the HJB equation max

sup 0≤u≤σ0

u2  Q (x) + b(x)Q (x) − αQ(x) + C(x), −Q(x) = 0 2

for almost every x in R. Then Q(x) ≥ V (x) for all x. Proof. We use Proposition 2.2 and consider an initial point x > 0. We intend to verify the condition (i) of Theorem 2.1 for all the admissible control policies in D (see also the remark below the proof of Proposition 2.2). Let Xxu (·) be the state process which satisfies (1.1) corresponding to an admissible control policy (u, τ ) in D. Using a mollification for the function Q(·) to smooth it and using Theorem 7.1 of page 218 and the ex. 7.10 in page 225 of [5] (see also Appendix D, page 301 in [12]), we can extend Itˆ o’s lemma to the function Q(·)  t∧τ u(s)2  u u −α(t∧τ ) to obtain Q(Xx (t ∧ τ ))e − 0 ( 2 Q (Xx (s)) + b(Xxu (s))Q (Xxu (s)) − u −αs αQ(Xx (s)))e ds is a martingale. Therefore, using the assumption (iii), we ob t∧τ serve that Q(Xxu (t ∧ τ ))e−α(t∧τ ) + 0 C(Xxu (s))e−αs ds is a super-martingale. Now the conclusion follows from part (i) of Theorem 2.1.

Remark. Let Q satisfy the conditions of Proposition 2.4. Any function u∗ satisfying, (2.8)

sup 0≤u≤σ0

u2  Q (x) + b(x)Q (x) − αQ(x) + C(x) = 0, a.e. for Q(x) > 0, 2

is natural candidate for an optimal control; it is not necessary to define u∗ (x) on the set where Q(x) = 0, since it is optimal to stop on this set. Solutions u∗ to (2.8) are easy to come by; for example, σ0 1G (x), where G = {x : Q (x) > 0} will work. But (2.8) does not uniquely specify u∗ because it does not prescribe its values at points x such that Q (x) = 0 or such that Q (x) is not defined. Not all choices of u∗ will necessarily work. First, it must be chosen so that there is at least a weak solution to (1.1) using u∗ as a feedback control. A discussion in [10] shows that this will be the case for a model like (1.1) when u∗ is the indicator of an open set, but that other choices of u∗ off the set G = {x : Q (x) > 0} may not work. Second, one ∗ must verify that for the solution Xu corresponding to the feedback control u∗ , the ∗ t∧τ u∗ −α(t∧τ ) process Q(Xx (t ∧ τ ))e + 0 C(Xxu (s))e−αs ds is a martingale, and this requires us to pay some attention to how X u∗ behaves in the regions where the feedback variance control degenerates. We do not attempt to frame a general theorem on synthesis of an optimal control under the hypotheses of Proposition 2.4. In the problem that we analyze in the next section, our candidate for an optimal strategy exercises maximum variance control in a disjoint union G of two open intervals only until the corresponding state process X hits a boundary point of G. After this state process X hits a boundary point of G, it will never return to G. Therefore the construction of the candidate process and the proof of optimality present no difficulties.

162

Daniel Ocone and Ananda Weerasinghe

3. Linear Drift In this section, we use the assumption (1.6) and the drift term b(x) = −θx for all x where θ > 0 is a positive constant. Therefore, for a given control process u(·), the corresponding state process in (1.1) takes the form 



t

Xxu (t) = x − θ

(3.1)

t

Xxu (s)ds + 0

u(s)dW (s). 0

First we derive the properties of the pay-off function with zero control and infinite stopping time. Notice that when u(t) = 0 for all t, the state process is given by Xx0 (t) = xe−θt . Therefore, the pay-off function from the zero control and the infinite stopping time is given by J(x, 0, +∞) and for convenience, we label it by V0 (x). Hence,  ∞ (3.2) V0 (x) = e−αt C(xe−θt )dt. 0

Notice that the above integral is uniformly convergent on compact sets. Lemma 3.1. Let V0 (·) be as in (3.2). Then the following results hold: (i) V0 (·) satisfies the differential equation θxV0 (x) + αV0 (x) = C(x) for all x and V0 (0) = α1 . x α (ii) For x = 0, V0 (·) is given by V0 (x) = 1α 0 C(r)r θ −1 dr and lim V0 (x) =

x→0

θx θ

1 α.

(iii) V0 (·) is strictly increasing on (−∞, 0) and strictly decreasing on (0, ∞). Furthermore, V0 (·) is a strictly concave function which has a unique maximum at x = 0. Proof. The proofs of parts (i) and (ii) are straightforward. The limit lim V0 (x) can x→0

be computed using L’Hopital’s rule. To prove part (iii), notice that the following formulas also follow from (3.2):  ∞ (3.3) V0 (x) = e−(θ+α)t C  (xe−θt )dt for all x 0

and (3.4)

V0 (x) =





e−(2θ+α)t C  (xe−θt )dt

for all x.

0

Now using the assumption(1.6) for C(·), part (ii) of Proposition 2.2 and the above formulas, the conclusions of part (iii) hold. Since V0 (·) is a concave function with a unique global maximum at x = 0 and V0 (0) = α1 > 0, there exist two points a0 and b0 so that a0 < 0 < b0 and V0 (a0 ) = V0 (b0 ) = 0. Furthermore, the set [V0 > 0] is equal to the open interval (a0 , b0 ). Let us introduce the infinitesimal generator G related to the Ornstein-Uhlenbeck process corresponding to the constant control u(t) ≡ σ0 for all t ≥ 0 in (3.1) by (3.5)

G=

σ02 d2 d − θx . 2 dx2 dx

Discretionary Stopping

163

For a constant α > 0, we also write G − α for (3.6)

G−α=

d σ02 d2 − α. − θx 2 2 dx dx

Consider next the family of solutions Qd (·), for d ≥ b0 of (3.7)

(G − α)Qd (x) + C(x) = 0 for all x > 0, Qd (d) = Qd (d) = 0.

Our aim is to build the value function on (0, ∞) from V0 (·) and Qd∗ (·), where the point d∗ > b0 is chosen so that Qd∗ (·) meets V0 (·) tangentially. Lemma 3.2. Let V0 (·) be as in Lemma 3.1 and the family of functions Qd (·) be as described above. Then the following hold: (i) There is a δ1 > 0 so that for each d in (b0 , b0 + δ1 ), Qd (·) meets V0 (·) at some point in the interval (0, b0 ). (ii) There is a point l0 > b0 so that for every d > l0 , Qd (x) > V0 (x) for all x > 0. Proof. By part (iii) of Lemma 3.1, we have V0 (x) < 0 for all x > 0. By evaluating the differential equation for V0 (·) in the part (i) of Lemma 3.1 at the point b0 and using V0 (b0 ) < 0, we conclude C(b0 ) < 0. Now consider the function Qb0 (·) which satisfies (3.7) with d = b0 . Then Qb0 (b0 ) = Qb0 (b0 ) = V0 (b0 ) = 0 and we evaluate (3.7) for the function Qb0 (·) at the point b0 and obtain Qb0 (b0 ) = − σ22 C(b0 ) > 0. Therefore, the function Qb0 (·) is strictly convex in an interval (b0 − , b0 + ) and V0 (·) is strictly concave everywhere. Hence, there is a δ0 > 0 so that Qb0 (x) < V0 (x) for all x in (b0 − δ0 , b0 ). The solutions Qd (x) of (3.7) are jointly continuous in (d, x) and therefore, we can find a δ1 > 0 so that Qd (b0 −

δ0 δ0 ) < V0 (b0 − ) 2 2

for all d in [b0 , b0 + δ1 ). By (3.7), Qd (d) = − σ22 C(d) > 0, and hence the function Qd (·) is strictly convex in a neighborhood of the point x = d. Consequently, Qd (x) < 0 in an interval (d − d , d) for some d > 0. For each d in (b0 , b0 + δ1 ), we intend to show that Qd (x) > 0 for all x in (b0 , d). For this, it suffices to prove Qd (·) < 0 on the interval (b0 , d). We let η = inf{x : Qd (y) < 0 on (x, d)}. The above set is non-empty, since Qd (·) < 0 on the interval (d − d , d). Notice that, we attain our conclusion if we can show η ≤ b0 . Suppose that η > b0 . Then, clearly Qd (η) > 0, Qd (η) = 0 and by (3.7), Qd (η) = αQd (η) − σ22 C(η) > 0. Hence, Qd (x) > 0 for all x in an interval (η, η +  ) for some  > 0. This contradicts with the definition of η and hence we conclude that η ≤ b0 . From this, it follows that Qd (·) < 0 on (b0 , d). Therefore, Qd (x) > 0 > V0 (x) on (b0 , d). We have already shown that Qd (b0 − δ20 ) < V0 (b0 − δ20 ), for each d in (b0 , b0 + δ1 ). Therefore, Qd (·) intersects V0 (·) at some point in the interval (0, b0 ) for each d in (b0 , b0 + δ1 ). Next, we intend to show that for large values of d, Qd (·) does not intersect V0 (·) at all. First we prove that for each d > b0 , Qd (·) is strictly decreasing on the interval (b0 , d). By evaluating (3.7) at the point d, we know that Qd (d) > 0 and hence Qd (·) is strictly decreasing in an interval (d − , d) for some  > 0. If Qd (ζ) = 0 and

164

Daniel Ocone and Ananda Weerasinghe

Qd (ζ) > 0 for some ζ in the interval (b0 , d), then by (3.7), and by the fact that C(x) < 0 for all x > b0 , we obtain Qd (ζ) > 0. Hence, x = ζ is necessarily a local minimum for Qd (·). Therefore, Qd (·) cannot have any local maxima on the interval (b0 , d) and consequently, it is strictly positive and is strictly decreasing on (b0 , d). Next, we show that lim Qd (b0 ) = ∞. By (3.7), we obtain d→∞

σ 2  Q (x) + C(x) > θxQd (x) 2 d for all x in (b0 , d). By integrating this, using integration by parts in the right hand side and using the boundary conditions in (3.7) we obtain  d  d σ2 C(u)du > −θxQd (x) − θ Qd (u)du − Qd (x) + 2 x x for all x in (b0 , d). Next, integrating the above inequality again and using the fact that Qd (·) is decreasing on (b0 , d) we derive d d θ σ2 2 2 2 2 Qd (b0 ) + b0 x C(u)dudx > − 2 [(d − b0 ) + (d − b0 ) ]Qd (b0 ). But Qd (b0 ) > 0 and therefore we obtain

 2  d d σ + θd2 Qd (b0 ) + C(u)dudx > 0. (3.8) 2 b0 x Since C(·) is a strictly concave, strictly decreasing function and C(b0 ) < 0, there are two constants k0 and k1 so that k1 > 0 and C(x) < k0 − k1 x for all x > b0 . Therefore, we obtain the estimate  d d k1 k0 k1 b0 2 d . C(u)dudx < − d3 + (d − b0 )2 + 3 2 2 b0 x Consequently,

1 2 d→∞ d

lim

d d b0

x

C(u)dudx = −∞. This together with (3.8) implies

that lim Qd (b0 ) = ∞. Therefore, we can conclude that there is a point l0 > b0 so d→∞

that for every d > l0 , Qd (b0 ) > α1 . Now let d > l0 and suppose that Qd (·) intersects V0 (·) at some point in (0, b0 ). Then Qd (·) attains a positive local maximum at some point ζ in (0, b0 ) and Qd (·) is decreasing on (ζ, b0 ). Then, by (3.7) we obtain σ 2  2 Qd (ζ) + C(ζ) = αQd (ζ). But αQd (ζ) ≥ αQd (b0 ) > C(0) > C(ζ) and hence Qd (ζ) > 0 and Qd (·) cannot have a local maximum at x = ζ. Consequently, Qd (·) cannot intersect V0 (·) at any point in (0, b0 ), Qd (·) is strictly decreasing on (0, ∞) and Qd (x) > V0 (x) for all x ≥ 0. This implies the proof of part (ii) of the lemma. Now consider (3.9)

d∗ = sup{d > b0 : ∃ Qd (·) which satisfies (3.7) and intersects V0 (·)}

By part (i) of the above lemma, the above set is non-empty and d∗ is well defined. By part (ii) of the lemma, d∗ is finite and d∗ < l0 . Next, we consider the function Qd∗ (·) and show that its graph intersects the graph of V0 (·) tangentially. Lemma 3.3. Let d∗ be as in (3.9) and consider the function Qd∗ (·) which satisfies (3.7) with d = d∗ . Then the following results hold: (i) There is a point q ∗ in (0, b0 ) so that Qd∗ (·) intersects V0 (·) at the point q ∗ and Qd∗ (·) is a strictly decreasing convex function on the interval (q ∗ , d∗ ).

Discretionary Stopping

165

(ii) Qd∗ (q ∗ ) = V0 (q ∗ ). Proof. If Qd∗ (·) does not intersect V0 (·) in the interval [0, b0 ], then by the joint continuity of Qd (·) in the variables (d, x), there is an  > 0 so that Qd (·) does not intersect V0 (·) on [0, b0 ] for each d in (d∗ − , d∗ ) and this contradicts with the definition of d∗ in (3.9). Hence, Qd∗ (·) intersects V0 (·) at least once in the interval [0, b0 ]. Now let q ∗ = sup{z in [0, b0 ] : Qd∗ (z) = V0 (z)}. Then, Qd∗ (q ∗ ) = V0 (q ∗ ), Qd∗ (x) > V0 (x) on (q ∗ , d∗ ) and Qd∗ (q ∗ ) ≥ V0 (q ∗ ). Let us introduce the function P (x) = Qd∗ (x) on [q ∗ , d∗ ]. Notice that σ2 P (q ∗ ) = θq ∗ Qd∗ (q ∗ ) + αQd∗ (q ∗ ) − C(q ∗ ) 2 ≥ θq ∗ V0 (q ∗ ) + αV0 (q ∗ ) − C(q ∗ ) = 0. By (3.7) and since d∗ > b0 we have P (d∗ ) > 0. We intend to show that the function P (·) is increasing on (q ∗ , d∗ ). Differentiating (3.7), we derive,

(3.10)

σ2  ∗ P (q ) = θq ∗ P (q ∗ ) + (θ + α)Qd∗ (q ∗ ) − C  (q ∗ ) 2 ≥ (θ + α)V0 (q ∗ ) − C  (q ∗ ) = −θq ∗ V0 (q ∗ ) > 0.

Here, we have differentiated the differential equation for V0 (·) in Lemma 3.1 and and used it in the last equality of (3.10). Hence, P (·) is strictly increasing on an interval (q ∗ , q ∗ + ) for some  > 0. Now suppose P (·) has a positive local maximum at some point ζ > q ∗ and P (·) is increasing on (q ∗ , ζ). Then P (ζ) > 0 and P  (ζ) = 0. Furthermore, using parts (ii) and (iii) of Lemma 3.1, we have V0 (ζ) ≤ 0 and Qd∗ (ζ) > Qd∗ (q ∗ ) ≥ V0 (β) > V0 (ζ). Therefore, σ2  P (ζ) = θζP (ζ) + (θ + α)Qd∗ (ζ) − C  (ζ) 2 > θζV0 (ζ) + (θ + α)V0 (ζ) − C  (ζ) = 0. This is a contradiction and hence we can conclude that P (·) is increasing and P (x) > 0 on (q ∗ , d∗ ). Consequently, Qd∗ (·) is a strictly convex function on (q ∗ , d∗ ). Since Qd∗ (d∗ ) = 0, it is also strictly decreasing on (q ∗ , d∗ ) as well. Since V0 (·) is a strictly concave function, we can rule out the case q ∗ = 0. Otherwise, there is an  > 0 so that Qd∗ (·) < V0 (·) on the interval (0, ) and now using the joint continuity of Qd (x) in the variables d and x, we can find d > d∗ where Qd (·) intersects with V0 (·). This is a contradiction and hence q ∗ > 0. Also, since Qd∗ (b0 ) > 0, it is clear that q ∗ < b0 . This completes the proof of part (i). Now if Qd∗ (q ∗ ) > V0 (q ∗ ), since Qd∗ (q ∗ ) = V0 (q ∗ ), we can find an  > 0 so that Qd∗ (x) < V0 (x) for all x in (β ∗ − , q ∗ ) and β ∗ −  > 0. Now again using the joint continuity of Qd (x) in (d, x), we can find d > d∗ so that Qd (·) intersects V0 (·). Hence, we conclude that Qd∗ (q ∗ ) = V0 (q ∗ ). This completes the proof of the lemma. By a similar argument, there exist points c∗ < p∗ < 0 and a function Qc∗ (·) on (−∞, 0) satisfying the following: Qc∗ (·) is a solution to (3.7) with boundary conditions Qc∗ (c∗ ) = Qc∗ (c∗ ) = 0; Qc∗ (p∗ ) = V0 (p∗ ), Qc∗ (p∗ ) = V0 (p∗ ), and Qc∗ (·)

166

Daniel Ocone and Ananda Weerasinghe

is a strictly increasing convex function on (c∗ , p∗ ). Theorem 1.1 will follow immediately if we show that the value function for the control problem is given by ⎧ V0 (x) for p∗ ≤ x ≤ q ∗ ⎪ ⎪ ⎪ ⎨Q ∗ (x) for c∗ ≤ x ≤ p∗ c (3.11) V ∗ (x) = ∗ ∗ ⎪ Q d∗ (x) for q ≤ x ≤ d ⎪ ⎪ ⎩ 0 otherwise.

Proof of Theorem 1.1. The function V ∗ (·) is continuously differentiable by Lemma 3.3 and by the dis cussion above (3.11). Furthermore, V ∗ (·) is continuous everywhere except at the points c∗ , p∗ , q ∗ and d∗ . Also, it is easy to check that the one-sided derivatives   V ∗ (x+) and V ∗ (x−) exists everywhere. Since, Qd∗ (·) and Qc∗ (·) are convex functions which satisfy the differential equation in (3.7) and since V0 (·) is a concave function which satisfies the differential equation in Lemma 3.1, it is a straight forward computation to check that V ∗ (·) satisfies all the assumptions in the Proposition 2.4. Therefore, we can conclude that V ∗ (x) ≥ V (x) for all x. We can apply Itˆ o’s lemma to verify that the pay-off function from the admissible control strategy (u∗ , τ ∗ ) is indeed V ∗ (·) and hence V ∗ (x) = V (x) for all x. This completes the proof. References [1] Assaf, D. (1997). Estimating the state of a noisy continuous time Markov chain when dynamic sampling is feasible. Ann. Appl. Probab. 3 822–836. [2] Davis, M. H. A. and Zervos, M. (1994). A problem of singular stochastic control with discretionary stopping. Ann. Appl. Probab. 4 226–240. [3] Dubins, L. E. and Savage, L. J. (1965). How To Gamble If You Must: Inequalities for Stochastic Processes. McGraw-Hill Book Co., New York. [4] Karatzas, I., Ocone, D., Wang, H. and Zervos, M. (2000). Finite fuel singular control with discretionary stopping. Stochastics 71 1–50. [5] Karatzas, I. and Shreve, S. E. (1988). Brownian motion and stochastic calculus. Springer-Verlag, New York. [6] Karatzas, I. and Wang, H. (2000). Utility maximization with discretionary stopping. SIAM J. Control and Opt. 39 306–329. [7] Karatzas, I. and Zamfirescu, I.-M. (2006). Martingale approach to stochastic control with discretionary stopping. Appl. Math. Optim. 53 163–184. [8] Kurtz, T. G. and Stockbridge, R. H. (1998). Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control and Opt. 36 609–653. [9] Morimoto, H. (2003). Variational inequalities for combined control and stopping. SIAM J. Control and Opt. 42 686–708. [10] Ocone, D. and Weerasinghe, A. (2000). Degenerate variance control of a one dimensional diffusion process. SIAM J. Control and Opt. 39 1–24. [11] Ocone, D. and Weerasinghe, A. (2003). Degenerate variance control in the one dimensional stationary case. Elec. J. of Probab. 8 1–27. [12] Øksendal, B. (2000). Stochastic differential equations, fifth edition. SpringerVerlag, New York.

Discretionary Stopping

167

[13] Weerasinghe, A. (2006). A controller and a stopper game with degenerate variance control. Elec. Comm. of Probab. 11 89–99.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 169–193 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000372

Double Skorokhod Map and Reneging Real-Time Queues L  ukasz Kruk,1 John Lehoczky,2,∗ Kavita Ramanan,3,† and Steven Shreve3,‡ Maria Curie–Sklodowska University and Carnegie Mellon University Abstract: An explicit formula for the Skorokhod map Γ0,a on [0, a] for a > 0 is provided and related to similar formulas in the literature. Specifically, it is shown that on the space D[0, ∞) of right-continuous functions with left limits taking values in R, Γ0,a (ψ)(t) = ψ(t) −





+

ψ(0) − a

 ∧ inf

u∈[0,t]



ψ(u) ∨ sup

s∈[0,t]

 (ψ(s) − a) ∧ inf

u∈[s,t]

ψ(u)

is the unique function taking values in [0, a] that is obtained from ψ by minimal “pushing” at the endpoints 0 and a. An application of this result to real-time queues with reneging is outlined.

1. Introduction In 1961 A. V. Skorokhod [12] considered the problem of constructing solutions to stochastic differential equations on the half-line R+ with a reflecting boundary condition at 0. His construction implicitly used properties of a deterministic mapping on the space C[0, ∞) of continuous functions on [0, ∞). This mapping was used more explicitly by Anderson and Orey in their study of large deviations properties of reflected diffusions on a half-space in RN (see p. 194 of [1]). These authors exploited the fact that the mapping, which is now called the Skorokhod map and is denoted here by Γ0 , has the explicit representation Δ

+

Γ0 (ψ)(t) = ψ(t) + max [−ψ(s)] , s∈[0,t]

and is consequently Lipschitz continuous (with constant 2). In fact, this formula easily extends to a mapping on D[0, ∞), the space of right-continuous functions with left limits mapping [0, ∞) into R. Given ψ ∈ D[0, ∞), define  +   (1.1) η(t) = sup − ψ(s) = − inf ψ(s) ∧ 0 s∈[0,t]

s∈[0,t] ∗ Partially † Partially

supported by ONR/DARPA MURI under contract N000140-01-1-0576. supported by the National Science Foundation under Grants No. 0728064 and

0406191. ‡ Partially supported by the National Science Foundation under Grant No. DMS-0404682. 1 Department of Mathematics, Maria Curie–Sklodowska University, Lublin, Poland and Institute of Mathematics, Polish Academy of Sciences, Warsaw, Poland. e-mail: [email protected] 2 Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213. e-mail: [email protected] 3 Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213. e-mail: [email protected], [email protected] AMS 2000 subject classifications: Primary 60G07, 60G17; secondary 90B05,90B22 Keywords and phrases: Skorokhod map, reflection map, double-sided reflection map 169

170

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve

and (1.2)

Γ0 (ψ) = ψ + η

∀ψ ∈ D[0, ∞).

(Here, and in what follows, the operations ∨ and ∧ denote maximum and minimum, respectively, and x+ represents x ∨ 0.) From (1.1) and (1.2) it is clear that Γ0 (ψ) is in D[0, ∞) and takes values in R+ , η is in D[0, ∞) and is nondecreasing, and the pair of functions (Γ0 (ψ), η) satisfies the complementarity condition  ∞ (1.3) I{Γ0 (ψ)(s)>0} dη(s) = 0, 0

which says that η “pushes” only when Γ0 (ψ) is zero. These properties uniquely characterize the pair of functions (Γ0 (ψ), η), and this pair is said to solve the Skorokhod problem for ψ on [0, ∞). Let z < a be a real number. The double Skorokhod map Γz,a is the mapping from D[0, ∞) into itself such that for ψ ∈ D[0, ∞), Γz,a (ψ) takes values in [z, a] and has the decomposition Γz,a (ψ) = ψ + η − ηu ,

(1.4)

where η and ηu are nondecreasing functions in D[0, ∞) so that the triple (Γz,a (ψ), η , ηu ) satisfies the complementarity conditions  ∞  ∞ (1.5) I{Γz,a (ψ)(s)>z} dη (s) = 0, I{Γz,a (ψ)(s) 0 be fixed. Let us define φ = ψ + η, where η is given by (1.1). In other words, φ = Γ0 (ψ). Let us next define φ = Λa (φ). We must show that φ(t) = Ξa (ψ)(t). Case I: η(t) = 0. Because η is nondecreasing, in this case we have η(s) = 0 and φ(s) = ψ(s) for all s ∈ [0, t]. In particular, ψ is nonnegative on [0, t]. Therefore, for every s ∈ [0, t],  + 0 ≤ ψ(s) − a ∧ inf u∈[s,t] ψ(u).  Since the latter expression can be rewritten as 0 ∨ ψ(s) − a ∧ inf u∈[s,t] ψ(u) , it follows that     ψ(s) − a ∧ inf ψ(u) Ξa (ψ)(t) = ψ(t) − 0 ∨ sup u∈[s,t]

s∈[0,t]

= ψ(t) − sup

 

ψ(s) − a

+

s∈[0,t]

 ∧ inf ψ(u) u∈[s,t]

= Λa (ψ)(t) = Λa (φ)(t) = φ(t). Case II: η(t) > 0. In this case, (1.1) becomes η(t) = − inf u∈[0,t] ψ(u). Using ψ = φ − η, we write (2.1) as

 +   Ξa (ψ)(t) = φ(t) − η(t) + φ(0) − a − η(0) ∧ − η(t)       φ(s) − a − η(s) ∧ inf φ(u) − η(u) ∨ sup u∈[s,t]

s∈[0,t]

   = φ(t) − (φ(0) − a − η(0))+ + η(t) ∧ 0       ∨ sup φ(s) − a + η(t) − η(s) ∧ inf φ(u) + η(t) − η(u) . u∈[s,t]

s∈[0,t]

 +  + Theterm φ(0) − a − η(0) + η(t) is nonnegative, and so φ(0) − a − η(0) + η(t) ∧ 0 = 0. Therefore,  Ξa (ψ)(t) = φ(t) − sup

    + φ(s) − a + η(t) − η(s) ∧ inf φ(u) + η(t) − η(u) .



u∈[s,t]

s∈[0,t]

Δ

We conclude the proof that this last expression is φ(t) = Λa (φ)(t) by showing that (2.2)     +   +  φ(s) − a + η(t) − η(s) ∧ inf φ(u) + η(t) − η(u) = φ(s)−a ∧ inf φ(u). u∈[s,t]

u∈[s,t]

Δ

There are two possibilities. The first is that φ(u) = Γ0 (ψ)(u) > 0 for every u ∈ [s, t]. According to the complementarity condition (1.3), η is then constant on   + [s, t], and the left-hand side of (2.2) becomes φ(s) − a ∧ inf u∈[s,t] φ(u) , which agrees with the right-hand side. The other possibility is that φ(u) = 0 for some u ∈ [s, t]. Define u∗ = sup{u ∈ [s, t] : φ(u) = 0}. According to the complementarity condition (1.3), either φ(u∗ ) =

Double Skorokhod Map

173

0 and η is constant on [u∗ , t] or else u∗ > s, φ(u∗ −) = 0, φ(u∗ ) > 0, η is constant on [u∗ , t], and η is continuous at u∗ . In either case, inf u∈[s,t] (φ(u) + η(t) − η(u)) = 0 and inf u∈[s,t] φ(u) = 0, and hence (2.2) holds with both sides equal to zero. ˜ Remark 2.2 If ψ(0) ≤ 0, then  + ψ(0) − a ∧ inf ψ(u) = inf ψ(u), u∈[0,t]

u∈[0,t]

and (2.3) Ξa (ψ)(t)



= ψ(t) − inf ψ(u) ∨ sup u∈[0,t]

= ψ(t) − sup s∈[0,t]

= ψ(t) − sup

 

 (ψ(s) − a) ∧ inf ψ(u) u∈[s,t]

s∈[0,t]

     ψ(s) − a ∨ inf ψ(u) ∧ inf ψ(u) ∨ inf ψ(u) u∈[0,t]

u∈[s,t]

  ψ(s) − a ∨ inf ψ(u) ∧ inf ψ(u) .



u∈[0,t]



u∈[0,t]

s∈[0,t]

u∈[s,t]

Example 2.3 We provide here an example illustrating the fact that Ξa = Λa ◦ Γ0 , and demonstrating in addition that Ξa = Λa . Let a = 1 and −2 + t, 0 ≤ t ≤ 4, (2.4) ψ(t) = 6 − t, 4 ≤ t ≤ 6. For 0 ≤ t ≤ 6, we have inf u∈[0,t] ψ(u) = ψ(0) = −2. It is straightforward to compute ⎧ −2, 0 ≤ t ≤ 1, ⎪ ⎪   ⎨    −3 + t, 1 ≤ t ≤ 4, ψ(s) − 1 ∨ (−2) ∧ inf ψ(u) = sup 1, 4 ≤ t ≤ 5, ⎪ u∈[s,t] s∈[0,t] ⎪ ⎩ 6 − t, 5 ≤ t ≤ 6. According to (2.3),

Ξa (ψ)(t) = ψ(t) − sup s∈[0,t]



⎧ t, 0 ≤ t ≤ 1, ⎪ ⎪  ⎨    1, 1 ≤ t ≤ 4, ψ(s) − 1 ∨ (−2) ∧ inf ψ(u) = 5 − t, 4 ≤ t ≤ 5, ⎪ u∈[s,t] ⎪ ⎩ 0, 5 ≤ t ≤ 6.

We see that Ξa (ψ) = Λa (ψ) because

⎧ ⎪ ⎪ −2 + t, 0 ≤ t ≤ 2, ⎪   ⎪ 0, 2 ≤ t ≤ 3, ⎨  + ψ(s) − 1 ∧ inf ψ(u) = −3 + t, 3 ≤ t ≤ 4, sup ⎪ u∈[s,t] s∈[0,t] ⎪ 1, 4 ≤ t ≤ 5, ⎪ ⎪ ⎩ 6 − t, 5 ≤ t ≤ 6,

and so

⎧ 0, 0 ≤ t ≤ 2, ⎪ ⎪ ⎪   ⎪ −2 + t, 2 ≤ t ≤ 3, ⎨  + Λa (ψ)(t) = ψ(t) − sup 1, 3 ≤ t ≤ 4, ψ(s) − 1 ∧ inf ψ(u) = ⎪ u∈[s,t] s∈[0,t] ⎪ 5 − t, 4 ≤ t ≤ 5, ⎪ ⎪ ⎩ 0, 5 ≤ t ≤ 6.

174

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve

The discrepancy between Ξa (ψ) and Λa (ψ) is due to the fact that ψ can take negative values. If ψ is nonnegative, then Γ0 (ψ) = ψ, and Theorem 2.1 implies Ξa (ψ) = Λa ◦ Γ0 (ψ) = Λa (ψ). For ψ given by (2.4), t, 0 ≤ t ≤ 4, Δ φ(t) = Γ0 (ψ)(t) = ψ(t) + 2 = 8 − t, 4 ≤ t ≤ 6, and  sup s∈[0,t]

⎧ 0, 0 ≤ t ≤ 1, ⎪  ⎪ ⎨  + −1 + t, 1 ≤ t ≤ 4, φ(s) − 1 ∧ inf φ(u) = 3, 4 ≤ t ≤ 5, ⎪ u∈[s,t] ⎪ ⎩ 8 − t, 5 ≤ t ≤ 6.

Therefore,  Λa (φ)(t) = φ(t) − sup s∈[0,t]

⎧ t, 0 ≤ t ≤ 1, ⎪  ⎪ ⎨  + 1, 1 ≤ t ≤ 4, φ(s) − 1 ∧ inf φ(u) = 5 − t, 4 ≤ t ≤ 5, ⎪ u∈[s,t] ⎪ ⎩ 0, 5 ≤ t ≤ 6.

This illustrates the result Ξa (ψ) = Λa ◦ Γ0 (ψ) = Λa (φ) of Theorem 2.1.

˜

3. Comparison with Other Formulas 3.1. The formula of Cooper, Schmidt and Serfozo [5] Following Cooper, Schmidt and Serfozo [5], we let H be a signed measure on the Borel subsets of R+ whose total variation on each compact interval is finite. The function t → H(0, t] is right-continuous with left limits and is of bounded variation. We denote this function by H(0, · ]. Let a be a positive number, and let x ∈ [−a, 0] be given. Cooper et. al. [5] (equation [15]) define (3.1)

Δ

X(t) = sup

inf

s∈[0,t] u∈[s,t]



xI{s=u=0} + H(u, t] − aI{s=u>0}



  and show that X = Γ−a,0 x + H(0, · ] . In particular, X(0) = x. Negating (3.1), we obtain   (3.2) −X(t) = − sup inf xI{s=u=0} + H(u, t] − aI{s=u>0} , s∈[0,t] u∈[s,t]

and the result in [5] implies that (3.3)

  −X = Γ0,a − x − H(0, · ] .

In particular, −X(0) = −x. To relate (3.2) to Ξa , we let ψ be a bounded variation function in D[0, ∞). We can then define the signed measure H by (3.4)

H(u, t] = ψ(u) − ψ(t),

0 ≤ u ≤ t.

The number −x in (3.3) must be taken to be in the interval [0, a]. We define −x in terms of ψ by  + (3.5) −x = Γ0,a (ψ)(0) = ψ(0) ∧ a.

Double Skorokhod Map

175

It is then easily verified that  + x + ψ(0) = ψ(0) − a ∧ ψ(0).

(3.6)

With the choices of H and −x given by (3.4) and (3.5), (3.2) becomes   −X(t) = − sup inf xI{s=u=0} + ψ(u) − ψ(t) − aI{s=u>0} s∈[0,t] u∈[s,t]

(3.7)

= ψ(t) − inf

u∈[0,t]

= ψ(t) − = ψ(t) −

= ψ(t) −



 xI{u=0} + ψ(u) ∨ sup

inf



s∈(0,t] u∈[s,t]

 

ψ(u) − aI{s=u}



      x + ψ(0) ∧ inf ψ(u) ∨ sup ψ(s) − a ∧ inf ψ(u) u∈(0,t]

u∈(s,t]

s∈(0,t]

 ψ(0) − a ∧ ψ(0) ∧ inf ψ(u) u∈(0,t]     ψ(s) − a ∧ inf ψ(u) ∨ sup

 

+

u∈(s,t]

s∈(0,t]

     + ψ(s) − a ∧ inf ψ(u) . ψ(0) − a ∧ inf ψ(u) ∨ sup

 

u∈[0,t]

u∈(s,t]

s∈(0,t]

Because ψ is right continuous, for 0 ≤ s ≤ t,     ψ(s) − a ∧ inf ψ(u) = ψ(s) − a ∧ inf ψ(u), u∈(s,t]

u∈[s,t]

where we adopt the convention that inf u∈(t,t] ψ(u) = ∞ to handle the case s = t. Furthermore, this expression is right-continuous in s. Therefore, for t > 0,         sup ψ(s) − a ∧ inf ψ(u) = sup ψ(s) − a ∧ inf ψ(u) . u∈(s,t]

s∈(0,t]

u∈[s,t]

s∈[0,t]

We thus conclude from (3.7) that        + ψ(s) − a ∧ inf ψ(u) −X(t) = ψ(t) − ψ(0) − a ∧ inf ψ(u) ∨ sup u∈[0,t]

(3.8)

s∈[0,t]

u∈[s,t]

= Ξa (ψ)(t).

To deal with the case that t = 0, it is easily verified that (3.9)

 + Ξa (ψ)(0) = ψ(0) − (ψ(0) − a)+ ∧ ψ(0) = ψ(0) ∧ a = Γ0,a (ψ)(0) = Λa ◦ Γ0 (ψ)(0) = −X(0).

3.2. The formula of Chitashvili and Lazrieva [4] Lemma 2 of [4] considers the process  t  t a(s) ds + b(s) dW (s), (3.10) ξ(t) = ξ(0) + 0

0

where W is a Brownian motion, a(·) and b(·) are random processes adapted to the filtration generated by W , and ξ(0) ∈ [γ1 , γ2 ]. Here γ1 < γ2 are real numbers. In Lemma 2 of [4], the doubly reflected version ξ ∗ of ξ on [γ1 , γ2 ] is shown to be     ξ ∗ (t) = sup inf ξ(t) − ξ(s) + γ1 I{s>u} + ξ(t) − ξ(u) + γ2 I{s0} s∈[0,t] u∈[0,t]

176

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve

 +ξ(t)I{s=u=0} .

(3.11)

The proof of Lemma 2 in [4] is probabilistic, using the specific form (3.10) of ξ. Because the three indicators I{s>u} , I{s0} and I{s=u=0} appearing on the right-hand side of (3.11) sum to 1, the addition of the same constant to ξ, γ1 and γ2 results in the addition of this constant to the right-hand side of (3.11), and hence to ξ ∗ . We may therefore assume without loss of generality that γ1 = −a < 0, γ2 = 0 and ξ(0) ∈ [−a, 0]. With this simplification, we show below that −ξ ∗ = Ξa (−ξ). Actually, all we need to show is that   (3.12) ξ ∗ (t) = sup inf ξ(0)I{s=u=0} + ξ(t) − ξ(u) − aI{s=u>0} , s∈[0,t] u∈[s,t]

because we can then set ψ = −ξ and proceed as in the previous section. We thus begin with (3.11), where γ1 and γ2 have been set to −a and 0 respectively, and ξ(0) ∈ [−a, 0]. We have   ξ ∗ (t) = sup inf ξ(t) − ξ(s) − a I{s>u} s∈[0,t] u∈[0,t]

(3.13)

   + ξ(t) − ξ(u) I{s0} + ξ(t)I{s=u=0}       inf ξ(t) − ξ(s) − a ∧ inf ξ(t) − ξ(u) + ξ(0)I{s=u=0} = sup s∈[0,t]

u∈[0,s)

u∈[s,t]

    = sup I{s>0} ξ(t) − ξ(s) − a + ∞I{s=0} s∈[0,t]

∧ inf

u∈[s,t]

  ξ(t) − ξ(u) + ξ(0)I{s=u=0} .

To evaluate the right-hand side of (3.13), we consider separately the cases s = 0 and 0 < s ≤ t. If s = 0, then       I{s>0} ξ(t) − ξ(s) − a + ∞I{s=0} ∧ inf ξ(t) − ξ(u) + ξ(0)I{s=u=0} u∈[s,t]   = inf ξ(t) − ξ(u) + ξ(0)I{s=u=0} (3.14) u∈[s,t]   = inf ξ(0)I{s=u=0} + ξ(t) − ξ(u) − aI{s=u>0} . u∈[s,t]

On the other hand, if s > 0, then       I{s>0} ξ(t) − ξ(s) − a + ∞I{s=0} ∧ inf ξ(t) − ξ(u) + ξ(0)I{s=u=0} u∈[s,t]     = inf (3.15) ξ(t) − ξ(s) − a ∧ ξ(t) − ξ(u) u∈[s,t]   = inf ξ(0)I{s=u=0} + ξ(t) − ξ(u) − aI{s=u>0} . u∈[s,t]

Substituting (3.14) and (3.15) into (3.13), we obtain (3.12). 3.3. The formula of Ganesh, O’Connell and Wischik [7] Section 5.7 of [7] records the size of a finite-buffer fluid queue at time zero under the assumption that the queue was empty at time −t, where t > 0. The buffer size of the queue is a, a positive number. We adjust the formula in [7] by relabeling

Double Skorokhod Map

177

time; our queue is empty at time zero and we record its size at time t. We call the netput process ψ. The queue length is then Γ0,a (ψ). We thus begin with a bounded-variation function ψ ∈ D[0, ∞) satisfying ψ(0) = 0. Following [7], we define   Δ M (s, t) = inf u∈[s,t] ψ(t) − ψ(u) = ψ(t) − supu∈[s,t] ψ(u),   Δ N (s, t) = supu∈[s,t] ψ(t) − ψ(u) = ψ(t) − inf u∈[s,t] ψ(u). We note that like ψ itself, M (s, t) and N (s, t) are right-continuous with left-hand limits in s. Here and elsewhere, we adopt the notational conventions Δ

(3.16)

sup ψ(u) = lim sup ψ(u), v↑s u∈[v,t]

u∈[s−,t]

Δ

inf

ψ(u) = lim inf ψ(u). v↑s u∈[v,t]

u∈[s−,t] Δ

Δ

We shall also use the notation (s−, t] = [s, t] and 0− = 0. Lemma 3.1 For all t ≥ 0,       (3.17) sup N (s, t) ∧ M (s, t) + a ≤ inf N (s, t) ∨ M (s, t) + a . s∈[0,t]

s∈[0,t]

Proof: For t ≥ 0,    sup N (s, t) ∧ M (s, t) + a

(3.18)

s∈[0,t]



= ψ(t) + sup s∈[0,t]

= ψ(t) − inf

s∈[0,t]

  − inf ψ(u) ∧ a − sup ψ(u) 

u∈[s,t]



u∈[s,t]

inf ψ(u) ∨ sup

u∈[s,t]



ψ(u) − a

 

.

u∈[s,t]

The infimum over s ∈ [0, t] in the last line of (3.18) is either attained by some s1 ∈ [0, t], or else there is an s1 in (0, t] for which the infimum is (in the notation (3.16))   inf ψ(u) ∨ sup ψ(u) − a . u∈[s1 −,t]

In the former case, both cases, we say inf

u∈[s1 ,t]

u∈[s1 −,t]

we let s1 denote s1 ; in the that s1 ∈ [0, t] satisfies

ψ(u) ∨ sup



u∈[s1 ,t]



ψ(u) − a = inf

s∈[0,t]

latter case, s1 denotes s1 −. Capturing  inf ψ(u) ∨ sup

u∈[s,t]



ψ(u) − a

 

.

u∈[s,t]

Continuing in this way, we observe that if s1 = s1 , then there is an s∗ ∈ [s1 , t] that attains inf u∈[s1 ,t] ψ(u) or else there is an s∗ ∈ (s1 , t] for which the infimum is ψ(s∗ −). If s1 = s1 −, then either there is an s∗ ∈ [s1 , t] that attains inf [s1 −,t] ψ(u) or else there is s∗ ∈ [s1 , t] for which the infimum is ψ(s∗ −). If ψ(s∗ ) = inf u∈[s1 ,t] ψ(u), we set s∗ = s∗ ; if ψ(s∗ −) = inf u∈[s1 ,t] ψ(u), s∗ denotes s∗ −. Capturing all these cases, we say that s∗ ∈ [s1 , t] satisfies ψ(s∗ ) = inf u∈[s1 ,t] ψ(u). With these conventions, we have       inf inf ψ(u) ∨ sup ψ(u) − a = inf ψ(u) ∨ sup ψ(u) − a s∈[0,t]

u∈[s,t]

u∈[s,t]

u∈[s1 ,t]

u∈[s1 ,t]

178

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve



≥ ψ(s∗ ) ∨ sup

u∈[s∗ ,t]



ψ(s) ∨ sup

≥ inf

s∈[0,t]

The reverse inequality 



inf ψ(u) ∨ sup

inf s∈[0,t]

u∈[s,t]

ψ(u) − a



ψ(u) − a







ψ(u) − a

.

u∈[s,t]

ψ(s) ∨ sup

≤ inf

s∈[0,t]

u∈[s,t]













ψ(u) − a

u∈[s,t]

obviously holds. Returning to (3.18), we see that (3.19)





sup N (s, t) ∧ M (s, t) + a



 ψ(s) ∨ sup

= ψ(t) − inf

s∈[0,t]

s∈[0,t]



ψ(u) − a

 

.

u∈[s,t]

An analogous argument shows that (3.20)



   inf N (s, t) ∨ M (s, t) + a = ψ(t) − sup

s∈[0,t]

   ψ(s) − a ∧ inf ψ(u) . u∈[s,t]

s∈[0,t]

Now choose s3 ∈ [0, t], where s3 attains the infimum on the right-hand side of (3.19) (in which case we write s3 = s3 ) or if no such s3 exists, then choose s3 ∈ (0, t], where s3 − attains the infimum on the right-hand side of (3.19) (in which case we write s3 = s3 −). Let s4 be defined analogously in connection with the supremum on the right-hand side of (3.20). If s3 ≤ s4 (this means either that s3 < s4 or else that s3 = s4 and it is not the case that s3 = s3 , s4 = s4 −), we have supu∈[s3 ,t] ψ(u) − a ≥ ψ(s4 ) − a, and so  (3.21)

inf s∈[0,t]

ψ(s) ∨ sup





ψ(u) − a

u∈[s,t]



= ψ(s3 ) ∨ sup



u∈[s3 ,t]

ψ(u) − a



  ≥ ψ(s4 ) − a ∧ inf ψ(u) u∈[s4 ,t]     ψ(s) − a ∧ inf ψ(u) . = sup u∈[s,t]

s∈[0,t]

Relation (3.17) follows from (3.19) and (3.20). On the other hand, if s3 ≥ s4 , then ψ(s3 ) ≥ inf u∈[s4 ,t] ψ(u), and again relation (3.21) and hence relation (3.17) hold. ˜ For x ∈ R and α ≤ β, define [x]βα = (x ∨ α) ∧ β. On a subset of bounded-variation functions in D[0, ∞) whose initial condition is zero, in [7] a mapping Φa is defined by the formula inf [N (s,t)∨(M(s,t)+a)] Δ  Φa (ψ)(t) = ψ(t) sups∈[0,t] [N (s,t)∧(M(s,t)+a)] .

(3.22)

s∈[0,t]

According to this definition and relations (3.19) and (3.20), Φa (ψ)(t)  =





ψ(t) ∨ sup N (s, t) ∧ M (s, t) ∧ a s∈[0,t]



 ∧ inf

s∈[0,t]



  N (s, t) ∨ M (s, t) + a

Double Skorokhod Map

 =



 ψ(t) ∨

ψ(t) − inf

s∈[0,t]



∧ ψ(t) − sup

(3.23)

ψ(u) − a

 ψ(s) − a ∧ inf ψ(u)

 



u∈[s,t]

s∈[0,t]

s∈[0,t]



∨ sup



ψ(s) ∨ sup

0 ∧ inf

 

u∈[s,t]



 = ψ(t) −

ψ(s) ∨ sup



179

ψ(u) − a

 

u∈[s,t]

 ψ(s) − a ∧ inf ψ(u) .





u∈[s,t]

s∈[0,t]

Theorem 3.2 Let ψ ∈ D[0, ∞) satisfy ψ(0) = 0. Then Φa (ψ) = Ξa (ψ), where Ξa (ψ) is given by (2.1). Proof: According to (3.23),  ψ(t) − Φa (ψ)(t) = 0 ∧ A(t)) ∨ B(t), where



Δ

A(t) = inf s∈[0,t]



ψ(s) ∨ sup

ψ(u) − a



 ,

    ψ(s) − a ∧ inf ψ(u) . B(t) = sup Δ

u∈[s,t]

s∈[0,t]

u∈[s,t]

Using ψ(0) = 0, we obtain from equation (2.1) that ψ(t) − Ξa (ψ)(t) = C(t) ∨ B(t), where

Δ

C(t) = inf ψ(s). s∈[0,t]

To prove the theorem, we must show that   (3.24) 0 ∧ A(t) ∨ B(t) = C(t) ∨ B(t). Clearly, A(t)   ≥ C(t), and because ψ(0) = 0, we have also 0 ≥ C(t). It follows that 0 ∧ A(t) ≥ C(t), and thus   (3.25) 0 ∧ A(t) ∨ B(t) ≥ C(t) ∨ B(t). If A(t) = C(t), then 0 ∧ A(t) = C(t) and so equality holds in (3.25). To complete the proof, we need only establish the implication =⇒ A(t) ≤ B(t).   Indeed, if A(t) > C(t), (3.26) will imply 0 ∧ A(t) ∨ B(t) ≤ B(t) ≤ C(t) ∨ B(t), and we have the reverse of (3.25). Assume (3.26)

(3.27)

A(t) > C(t)

A(t) > C(t).

Using the notation developed for Lemma 3.1, we choose s1 ∈ [0, t] so that ψ(s1 ) = C(t) or s1 ∈ (0, t] so that ψ(s1 −) = C(t). We use s1 to denote s1 in the former case and s1 − in the latter case. We capture both cases by the relation (3.28)

ψ(s1 ) = C(t) ≤ ψ(s)

∀s ∈ [0, t].

180

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve

We next define Δ

s2 = sup



      s ∈ [0, t] ψ(s) ∨ sup ψ(u) − a  u∈[s,t]      ∧ ψ(s−) ∨ sup ψ(u) − a = A(t) . u∈[s−,t]

  Either s2 ∈ [0, t] and ψ(s2 ) ∨ supu∈[s2 ,t] ψ(u) − a = A(t), in which case we denote  s2 by s2 , or else s2 ∈ (0, t], ψ(s2 ) ∨ supu∈[s2 ,t] ψ(u) − a) > A(t), and ψ(s2 −) ∨   supu∈[s2 −,t] ψ(u) − a = A(t), in which case we denote s2 − by s2 . We capture both cases by the equation   (3.29) ψ(s2 ) ∨ sup ψ(u) − a = A(t). u∈[s2 ,t]

We cannot have s2 < s1 (which means s2 < s1 or s2 = s1 , s2 = s2 −, s1 = s1 ), for then we would have, using (3.28) and the definition of A(t),     A(t) ≤ ψ(s1 ) ∨ sup ψ(u) − a ≤ ψ(s2 ) ∨ sup ψ(u) − a = A(t), u∈[s1 ,t]

u∈[s2 ,t]

a contradiction to the maximality property of s2 . Therefore, s2 ≥ s1 . We must have ψ(s) ≥ ψ(s2 ) ∀s ∈ [s2 , t].

(3.30)

If this were not the case, then we would have ψ(s) < ψ(s2 ) for some s ∈ (s2 , t], and then     A(t) ≤ ψ(s) ∨ sup ψ(u) − a ≤ ψ(s2 ) ∨ sup ψ(u) − a = A(t), u∈[s2 ,t]

u∈[s,t]

which also contradicts the maximality property of s2 .   Case I: A(t) = supu∈[s2 ,t] ψ(u) − a ≥ ψ(s2 ). Define       Δ u2 = sup u ∈ [s2 , t] ψ(u) − a ∨ ψ(u−) − a = A(t) . Then either ψ(u2 ) − a = A(t) or else u2 > 0, ψ(u2 ) − a < A(t), ψ(u2 −) − a = A(t). Let us consider first the case that ψ(u2 ) − a = A(t), in which case we denote u2 by u2 . There cannot exist u3 ∈ (u2 , t] for which ψ(u3 ) < A(t), for if such a u3 were to  exist, we would have ψ(u3 ) ∨ supu∈[u3 ,t] ψ(u) − a < A(t). Therefore, (3.31)

ψ(u) ≥ A(t)

∀u ∈ [u2 , t].

Let us next consider the case that u2 > 0, ψ(u2 ) − a < A(t) and ψ(u2 −) − a = A(t), in which case we denote u2 − by u2 . There cannot exist u3 ∈ [u2 , t] such that ψ(u3 ) < A(t), for if such a u3 were to exist, we would again have ψ(u3 ) ∨ supu∈[u3 ,t] ψ(u) − a < A(t). Once again, (3.31) holds. From (3.31) and the fact that ψ(u2 ) − a = A(t), we have immediately   B(t) ≥ ψ(u2 ) − a ∧ inf ψ(u) = A(t). u∈[u2 ,t]

Double Skorokhod Map

181

This completes the proof of (3.26) in Case I.   Case II: A(t) = ψ(s2 ) > supu∈[s2 ,t] ψ(u) − a . Define       Δ u1 = sup u ∈ [s1 , s2 ] ψ(u) − a ∨ ψ(u−) − a ≥ ψ(s2 ) .     If no such u1 were to exist, then we would have ψ(u) − a ∨ ψ(u−) − a < ψ(s2 ) for all u ∈ [s1 , s2 ], in which case we would have from (3.27), (3.28), and the case assumption that       ψ(s1 ) ∨ sup ψ(u) − a = ψ(s1 ) ∨ sup ψ(u) − a ∨ sup ψ(u) − a < ψ(s2 ). u∈[s1 ,t]

u∈[s1 ,s2 ]

u∈[s2 ,t]

But according to the definition of A(t), it is dominated by the left-hand side of this expression. We have a contradiction to the case assumption, which shows that u1 ∈ [s1 , s2 ] is well defined. If ψ(u1 ) − a ≥ ψ(s2 ), we denote u1 by u1 . If this is not the case, then u1 > 0, ψ(u1 ) − a < ψ(s2 ), ψ(u1 −) − a = ψ(s2 ), and we denote u1 − by u1 . We capture both cases by the relation ψ(u1 ) − a ≥ ψ(s2 ) = A(t).

(3.32)

The maximality property of u1 implies that sup

u∈(u1 ,s2 ]

ψ(u) − a < ψ(s2 ).

If ψ(u3 ) < ψ(s2 ) for some u3 ∈ (u1 , s2 ], then we would have   A(t) ≤ ψ(u3 ) ∨ sup ψ(u) − a < ψ(s2 ), u∈[u3 ,t]

a contradiction to the case assumption. Therefore, ψ(u) ≥ ψ(s2 ) ∀u ∈ (u1 , s2 ].

(3.33) It follows that

  B(t) ≥ ψ(u1 ) − a ∧ inf ψ(u) u∈[u1 ,t]   ≥ ψ(u1 ) − a ∧ ψ(u1 ) ∧ inf

u∈(u1 ,s2 ]

ψ(u) ∧ inf

ψ(u),

u∈[s2 ,t]

and each of these terms dominates ψ(s2 ) = A(t) by (3.32), (3.33), and (3.30). This completes the proof of (3.26) in Case II, and thus completes the proof of Theorem 3.2. ˜ 3.4. The formula of Toomey [14] Toomey [14] records the size of a finite-buffer queue at time −k under the assumption that the queue was of size qm at time −m < −k. The buffer size of the queue is a, a positive number, and qm is assumed to be in [0, a]. There are two formulas, (4) and (5), in [14], each obtained from the other by reversing the spatial axis. We deal with (5), mapping −m into time zero and mapping −k into time t > 0

182

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve

and writing the formula for piecewise constant functions in D[0, ∞) rather than functions defined on the integers. The netput process, cumulative arrivals minus offered service, over the time interval −k to −m is denoted Ukm by [14] and by ψ(t) − ψ(0) here. We take ψ(0) = qm ∈ [0, a], the initial queue length. In our notation, formula (5) in [14] is     inf sup a + ψ(t) − ψ(s) ∨ ψ(t) − ψ(u) s∈(0,t] u∈(s,t]

∧ sup

    qm + ψ(t) − ψ(0) ∨ ψ(t) − ψ(u)

u∈(0,t]

= ψ(t) − sup

inf

s∈(0,t] u∈(s,t]

  ψ(s) − a) ∧ ψ(u) ∨ inf [0 ∧ ψ(u)] . u∈(0,t]

Because ψ is right-continuous and (ψ(0) − a)+ = 0, this expression can be rewritten as        + ψ(s) − a ∧ inf ψ(u) ∨ ψ(0) − a ∧ inf ψ(u) , ψ(t) − sup u∈[s,t]

s∈[0,t]

u∈[0,t]

which is Ξa (ψ)(t) given by (2.1). 4. Application to real-time queues with reneging 4.1. Heavy-traffic convergence Consider a sequence of single station queueing systems indexed by the positive integers. In the n-th system, the interarrival times are a sequence of positive, in(n) (n) dependent, identically distributed random variables u1 , u2 , . . . and the service times are likewise a sequence of positive, independent, identically distributed ran(n) (n) (n) dom variables v1 , v2 , . . . . The arrival rate in the n-th system is λ(n) = 1/Eui , Δ

(n)

the service rate is μ(n) = 1/Evi , and the traffic intensity is ρ(n) = λ(n) /μ(n) . We assume that λ(n) has a positive limit λ as n → ∞, μ(n) also has a positive limit (n) (n) μ as n → ∞, ui has a limiting positive variance α2 as n → ∞, and vi has a 2 limiting variance β as n → ∞. We make the heavy traffic assumption γ ρ(n) = 1 − √ n

(4.1)

for some nonzero constant γ. This implies λ = μ. For the n-th system, the customer arrival times are (n) Δ

Sk

=

k 

(n)

ui

i=1

and the customer arrival process is    Δ  (n) A(n) (t) = max k Sk ≤ t . The work arrival process is Δ

V (n) (k) =

k  j=1

(n)

vj .

Double Skorokhod Map

The netput process

183

  Δ N (n) (t) = V (n) A(n) (t) − t

represents the work that would be present in queue at time t if the server were never idle between times 0 and t. We are taking the queue to be empty at time zero. However, the queue may be idle prior to time t, and thus the work that is actually present at time t is given by the workload process Δ

W (n) = Γ0 (N (n) ),

(4.2)

where Γ0 is defined by (1.2). The idleness process Δ

I(t) = − inf N (n) (s) s∈[0,t]

plays the role of η of (1.1). The scaled workload process is Δ 1  (n) (t) = √ W (n) (nt), W n

(4.3)

t ≥ 0.

It is well known that the following heavy traffic convergence result holds under assumption (4.1) and the Lindeberg condition   2 (n) (n) −1    lim E uj − (λ ) I u(n) −(λ(n) )−1 >c√n n→∞ j   2 (n) (n) −1    = 0 ∀c > 0. I v(n) −(μ(n) )−1 >c√n = lim E vj − (μ ) n→∞

j

Theorem 4.1 (Kingman [9], Iglehart and Whitt [8]) As n → ∞, (4.4)

 (n) ⇒ W ∗ , W

where W ∗ is a Brownian motion with drift −γ and variance per unit time λ(α2 +β 2 ), Δ ∗ reflected at the origin so as to always be nonnegative. More precisely, define N (t) = 2 2 −γt + λ(α + β ) B(t), where B is a standard Brownian motion. Then (4.5)

Δ

W ∗ = Γ0 (N ∗ ).

4.2. Lead times In real-time queues, customer deadlines are taken into account. We introduce a sequence of lead times, which are positive, independent and identically distributed (n) (n) (n) random variables L1 , L2 , . . . . Customer k arrives in system n at time Sk with (n) (n) deadline Sk + Lk . The lead time of customer k, which is the time until the (n) customer’s deadline elapses, is Lk upon arrival of customer k and then decreases at rate one thereafter, becoming negative when the customer becomes late. Under √ the heavy traffic assumption (4.1), delay in√the n-th system will be of order n, so the lead times must also be of order n to avoid trivialities. We assume therefore that there is a cumulative distribution function G independent of n such that  (n)  Lj (4.6) P √ ≤ y = G(y). n

184

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve

For technical reasons, we also assume that there exists a finite y ∗ for which G(y ∗ ) = 1 and G(y) √ < 1 for y < y ∗ , i.e., we cannot have a lead time in the n-th system larger than n y ∗ but we can have lead times equal to or at least arbitrarily close √ Δ to n y ∗ . We also assume that y∗ = inf{y ∈ R|G(y) > 0} is strictly positive; this seems to be essential to the results we obtain. We serve the customers using the earliest deadline first (EDF) protocol. The customer in service may be preempted by the arrival of a more urgent customer. When service eventually resumes on the preempted customer, the service begins where it left off, i.e., the work already done on that customer is not lost. We wish to determine the heavy traffic limit of the distribution of the lead times of customers in queue. We define two measure-valued processes, W (n) and V (n) , by specifying for every Borel subset B of R that Work associated with customers in Δ (n) (4.7) , W (t)(B) = queue at time t with lead times in B ⎫ ⎧ ⎨ Work associated with customers arrived ⎬ Δ (4.8) V (n) (t)(B) = by time t with lead times in B, whether . ⎭ ⎩ or not still present at time t We define the frontier to be ⎧ ⎫ ⎨ Largest lead time of any customer who has ever been in ⎬ Δ whether or not that customer is still present, or . (4.9) F (n) (t) = service, ⎩√ ∗ ⎭ n y − t if this quantity is larger than the former one We also define the scaled measure-valued workload process, scaled measure-valued work arrival process, and scaled frontier, respectively: (4.10) (4.11) (4.12)

√ Δ 1 (n) (t)(B) = √ W (n) (nt)( n B), W n √ 1 Δ $ (n) (t)(B) = √ V (n) (nt)( n B), V n 1 Δ (n) F$ (t) = √ F (n) (nt). n

At time t > 0, work whose lead time is less or equal to F (n) (t) receives priority. However, the arrival √ rate of work in this category is less than the full arrival rate, since F (n) (t) < n y ∗ . Therefore, this work is not in heavy traffic, which leads to the following result, proved in [6]. (n) (−∞, F$ (n) ] ⇒ 0. Lemma 4.2 (Crushing) As n → ∞, W Lemma 4.2 says that in order to understand the limiting distribution of lead times, it is enough to consider only work whose lead time exceeds F (n) . However, since this work has never been in service, we can restrict attention to the measurevalued work arrival process V (n) rather than the more complicated measure-valued workload process W (n) . The following limit for the scaled version of this process is obtained in [6]. Theorem 4.3 For all y ∈ R, (4.13)

Δ $ (n) (·)(y, ∞) ⇒ H(y) = V

 y





 1 − G(x) dx.

Double Skorokhod Map

185

In fact, the convergence in (4.13) is weak convergence of a sequence of measurevalued processes to a measure on R, not just weak convergence of a real-valued process for each fixed y. Theorem 4.3 can be explained by the following heuristic. To have scaled lead time x at scaled time t, a customer must have entered the system scaled time units s earlier with scaled lead time x + s. Given that a customer arrives at scaled time t − s, the density at time t for the lead time at x of this customer is G (x + s). We must integrate this density over all possible values of s ≥ 0 and multiply by the limiting arrival rate λ of customers to obtain the density of customers with scaled lead time x at scaled time t, which is therefore  ∞    λ G (x + s) ds = λ G(∞) − G(x)) = λ 1 − G(x) . 0

The limiting work brought by each customer is 1/μ = 1/λ. Therefore, to find the density of work (as opposed to customers) with scaled lead time x at scaled time t, we divide the expression above by λ. Finally, to obtain the amount of work in (y, ∞) at time t, we integrate the resulting expression from y to ∞ and obtain H(y) defined in (4.13). The limiting scaled work in the system is W ∗ defined by (4.5), and according to Lemma 4.2, as n → ∞, this work is increasingly concentrated to the right of the frontier F$(n) . One can use these observations to show that F$ (n) has a limit, and this limit must be F ∗ = H −1 (W ∗ ) so that     $ (n) (t) F (n) (t), ∞ = H F ∗ (t) = W ∗ (t). lim V

n→∞

In the limit, arrived work to the right of the frontier accounts for all work in the system. We summarize with the principal conclusions from [6]. Theorem 4.4 As n → ∞, (4.14) (4.15)

Δ F$(n) ⇒ F ∗ = H −1 (W ∗ ),

(n) (·)(y, ∞) ⇒ H(y ∨ F ∗ ) W

∀y ∈ R.

The convergence in (4.14) is weak convergence of a sequence of measure-valued processes to a measure-valued process. In other words, the density of the limit of  (n) (t) is 1 − G(x) I{x≥F ∗ (t)} , which is the the measure-valued workload processes W density of the limit of V (n) (t) truncated at the random process F ∗ (t). 4.3. Reneging We modify the real-time queueing system of the previous subsection by assuming that customers renege when they become late, i.e., a customer whose lead time reaches zero disappears from the queue never to return. This system has the same customer arrival process A(n) , work arrival process V (n) , and netput process N (n) as the system without reneging. However, in contrast to the EDF queueing system in Section 4.2, whose (scalar) workload process is Markov, the (scalar) workload process in the system with reneging is not Markov. This is because different customers reduce the workload by different amounts when they renege, and so it is necessary to keep track of the lead times and remaining service requirements of all

186

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve

customers. In the presence of reneging, we need the full measure-valued workload process in order to have a Markov system. (n) The workload process in the reneging system, denoted WR , is less than or equal to the workload process W (n) of (4.2). For the reneging system, we scale the workload process to obtain (cf. (4.3))  (n) (t) = √1 W (n) (nt), W R n R

t ≥ 0.

For the reneging system, we define the measure-valued workload process (cf. (4.7)) Work associated with customers in the reneg(n) WR (t)(B) = ing system at time t with lead times in B and the scaled measure-valued workload process (cf. (4.10)) √ (n) (t)(B) = √1 W (n) (nt)( n B). W R R n Here B is an arbitrary Borel subset of R. The measure-valued work arrival process $ (n) for the reneging system V (n) and scaled measure-valued work arrival process V are the same as for the non-reneging system; these are given by (4.7) and (4.11). For the reneging system, the frontier is (cf. (4.9)) ⎧ ⎫ ⎨ Largest lead time of any customer who has ever been in service in ⎬ Δ (n) FR (t) = the√reneging system, whether or not that customer is still present, . ⎩ ⎭ or n y ∗ − t if this quantity is larger than the former one Because the reneging system serves customers that have not yet been in service in (n) the non-reneging system, we have F (n) ≤ FR . The scaled frontier for the reneging system is (cf. (4.10)) 1 (n) (n) F$R (t) = √ FR (t). n Recall the definition (1.6) of Λa and the reflected Brownian motion W ∗ of (4.5). The principal result of [11] is the following. Theorem 4.5 As n → ∞, Δ  (n) ⇒ W ∗ = W ΛH(0) (W ∗ ), R R

(4.16)

which is a Brownian motion with drift −γ andvariance per unit time λ(α2 + β 2 ), doubly reflected to stay in the interval 0, H(0) . As n → ∞, Δ (n) F$R ⇒ FR∗ = H −1 (WR∗ ),

(4.17)

(n) (·)(y, ∞) ⇒ H(y ∨ F ∗ ) W R R

(4.18)

∀y ∈ R.

We sketch the proof of Theorem 4.5. For this we introduce M, the set of finite measures on the Borel subsets of R. We endow M with the topology of weak convergence. We denote by DM [0, ∞) the set of functions from [0, ∞) to M that are right-continuous and have left limits. We further define a mapping Λ : DM [0, ∞) → DM [0, ∞) by (4.19)    + Δ

Λ(μ)(t)(−∞, y] =

μ(t)(−∞, y] − sup s∈[0,t]

μ(s)(−∞, 0] ∧ inf μ(u)(R) u∈[s,t]

.

Double Skorokhod Map

187

Consideration of (4.19) reveals that Λ(μ)(t) is the measure on R that agrees with μ(t) except that it has all mass removed to the left of some point, no mass removed to the right of that point, and perhaps some of the mass removed at that point if there is a point mass there. The point in question is the supremum of those y for which   μ(t)(−∞, y] ≤ sup μ(s)(−∞, 0] ∧ inf μ(u)(R) . u∈[s,t]

s∈[0,t]

The total amount of mass removed is almost the largest amount of “lateness” prior to time t, by which we mean sups∈[0,t] μ(s)(−∞, 0], but this is tempered by the fact that at some time between t and the prior time s when this maximal lateness was obtained, the system may have become empty. For example, if there is an s1 ∈ [0, t] and a u1 ∈ [s1 , t] such that   sup μ(s)(−∞, 0] ∧ inf μ(u)(R) = μ(s1 )(−∞, 0] ∧ inf μ(u)(R) = μ(u1 )(R), u∈[s,t]

s∈[0,t]

u∈[s1 ,t]

then  Λ(μ)(u1 )(R) =



μ(u1 )(R) − sup

s∈[0,u1 ]

μ(s)(−∞, 0] ∧

inf

u∈[s,u1 ]

 + μ(u)(R) = 0;

the system is empty at time u1 and rather than subtracting mass μ(s1 )(−∞, 0] from μ(t) to obtain Λ(μ)(t), we subtract only μ(u1 )(R), the amount removed at time u1 in order to create the empty system. We conclude the paper with the detailed Example 4.7 of the operation of Λ on a path of W (n) . Unlike Λa of (1.6), which maps real-valued functions to real-valued functions, Λ maps measure-valued functions to measure-valued functions. To obtain a realvalued process, we define Δ

U (n) (t) = Λ(W (n) )(t), Δ

U (n) (t) = U (n) (t)(R)  =

W

(n)



(t) − sup

W

s∈[0,t]

(n)

(s)(−∞, 0] ∧ inf W

(n)

 + (u) .

u∈[s,t]

Scaling these relations, we obtain (4.20)

Δ 1 $ (n) (t) = √ U (n) (nt) U n    + (n) (n) (n)    = W (t) − sup W (s)(−∞, 0] ∧ inf W (u) . s∈[0,t]

u∈[s,t]

(n) (·)(R) and W (n) appearing in (4.20) are for  (n) (·) = W Note that the processes W the non-reneging system. We take the limit as n → ∞. Although Λ is not continuous on the set of all measure-valued processes, it is continuous on the set of processes that can result as the limit of W (n) . Therefore, we can use Theorems 4.1 and 4.4 and the continuous mapping theorem to obtain (4.21)   +   (n) ∗ ∗ ∗ ∗ $ U (t) ⇒ W (t) − sup W (s) − H 0 ∨ F (s) ∧ inf W (u) , s∈[0,t]

u∈[s,t]

188

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve

where we have used the fact that   (n) (s)(R) − W (n) (s)(0, ∞) ⇒ W ∗ (s) − H 0 ∨ F ∗ (s) . (n) (s)(−∞, 0] = W W Because H is nonincreasing,     H 0 ∨ F ∗ (s) = H(0) ∧ H F ∗ (s) = H(0) ∧ W ∗ (s). Therefore,      + W ∗ (s) − H 0 ∨ F ∗ (s) = W ∗ (s) − H(0) ∧ W ∗ (s) = W ∗ (s) − H(0) . Making this substitution in (4.21), we see that $ (n) ⇒ ΛH(0) (W ∗ ). U

(4.22) In conclusion, we have defined

(n) )(t)(R) $ (n) (t) = Λ(W U

(4.23)

∀t ≥ 0,

taken the limit as n → ∞, and obtained (4.22).  (n) have the same $ (n) and W The following lemma implies that the processes U R limit. In particular, (4.22) yields (4.16). Lemma 4.6 Let D(n) (t) denote the work that arrives to the reneging system that has lead time upon arrival less than or equal to the frontier at the time of arrival $ (n) (t) = √1 D(n) (nt). Then and that ultimately reneges. Define D n $ (n) − W  (n) ≤ D $ (n) ⇒ 0, 0≤U R

(4.24)

where the limit in (4.24) is taken as n → ∞. Just as with the non-reneging system, as n → ∞, work in the reneging system (n) concentrates to the right of the frontier F$R . The remainder of the argument follows as in the derivation of (4.14) and (4.14). We know that the limiting scaled work in the system is WR∗ , that this work must be concentrated to the right of the limiting frontier, and that work to the right of the frontier has never been is service and hence is just the work that has arrived. The limit of arrived work is given by Theorem 4.3, and (4.17) and (4.17) follow. We do not attempt to prove Lemma 4.6 here. Instead, we illustrate it with the following example. Example 4.7 Consider a system realization in which (n)

u1

(n) u2 (n) u3 (n) u4 (n) u5

(n)

= 1, v1 = 1, = 3, = 2, = 2,

(n) v2 (n) v3 (n) v4 (n) v5

(n)

= 4, L1 = 4, = 2, = 1, = 1,

(n) L2 (n) L3 (n) L4 (n) L5

(n)

= 3, S1 = 5, = 1, = 4, = 1,

(n) S2 (n) S3 (n) S4 (n) S5

= 1, = 2, = 5, = 7, = 9.

Double Skorokhod Map

189

Then using δs to denote a unit of mass at the point s, we have ⎧ 0, 0 ≤ t < 1, ⎪ ⎪ ⎪ ⎪ (5 − t)δ , 1 ≤ t < 2, ⎪ 4−t ⎪ ⎨ (5 − t)δ + 4δ , 4−t 7−t 2 ≤ t < 5, W (n) (t) = (7 − t)δ + 4δ ⎪ 6−t 7−t , 5 ≤ t < 7, ⎪ ⎪ ⎪ (11 − t)δ + δ ⎪ 7−t 11−t , 7 ≤ t < 9, ⎪ ⎩ 2δ−2 + δ1 + δ2 , t = 9. The measure W (n) (t) is shown for integer values of t ranging between 1 and 9 in Figure 1.

4δ3

4δ5

4δ4

3δ2 2δ1

t=1

0

0

4δ3

t=2

0

4δ2

t=3 4δ1

2δ1 δ0

δ0 t=4

0

0

t=5

0

t=6

4δ0 3δ−1 2δ−2 δ4 0

δ3

t=7

0

δ1

t=8

Fig 1. Evolution of

0

δ2 t=9

W (n)

We have W (n) (u) ≥ 4 for all u ∈ [2, 8] and hence   Δ (n) (n) (n) K (t) = sup W (s)(−∞, 0] ∧ inf W (u) u∈[s,t]

s∈[0,t]

= sup W (n) (s)(−∞, 0] s∈[0,t]

⎧ ⎨ 0, 0 ≤ t < 4, = 1, 4 ≤ t < 7, ⎩ 4, 7 ≤ t ≤ 8. However, for 8 ≤ t < 9, we have W (n) (t) = 12 − t ≤ 4. For t in this range, the supremum in the definition of K (n) (t) is attained at s = 7, and K (n) (t) = W (n) (7)(−∞, 0] ∧ inf W (n) (u) = 4 ∧ (12 − t) = 12 − t. u∈[7,t]

190

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve

For t = 9, we have W (n) (9) = 4 = 12 − t. Nonetheless, the supremum in the definition of K (n) (9) is still attained at s = 7. Indeed, K (n) (9) = W (n) (u)(−∞, 0] ∧ inf W (n) (u) = 4 ∧ u∈[7,9]

In summary,



 inf (12 − t) ∧ 4 = 3.

u∈[7,9)

⎧ 0, 0 ≤ t < 4, ⎪ ⎪ ⎨ 1, 4 ≤ t < 7, K (n) (t) = 4, 7 ≤ t ≤ 8, ⎪ ⎪ ⎩ 12 − t, 8 ≤ t ≤ 9.

The measure U (n) (t) is obtained by subtracting mass K (n) (t) from the measure W (t), working from left to right. This results in the formula (n)

⎧ 0 ≤ t < 1, ⎪ ⎪ 0, ⎪ ⎪ (5 − t)δ , 1 ≤ t < 2, ⎪ 4−t ⎪ ⎪ ⎪ (5 − t)δ4−t + 4δ7−t , 2 ≤ t < 4, ⎪ ⎪ ⎪ ⎪ 4 ≤ t < 5, ⎨ (8 − t)δ7−t , (n) U (t) = (6 − t)δ6−t + 4δ7−t , 5 ≤ t < 6, ⎪ ⎪ 6 ≤ t < 7, ⎪ (10 − t)δ7−t , ⎪ ⎪ ⎪ 7 ≤ t < 8, ⎪ (8 − t)δ11−t , ⎪ ⎪ ⎪ 8 ≤ t < 9, ⎪ 0, ⎪ ⎩ δ2 , t = 9. The measure U (n) (t) is shown for integer values of t ranging between 1 and 9 in Figure 2. The total mass in the U (n) system is ⎧ 0, 0 ≤ t < 1, ⎪ ⎪ ⎪ ⎪ 5 − t, 1 ≤ t < 2, ⎪ ⎪ ⎪ ⎪ 9 − t, 2 ≤ t < 4, ⎪ ⎪ ⎨ 8 − t, 4 ≤ t < 5, U (n) (t) = 10 − t, 5 ≤ t < 7, ⎪ ⎪ ⎪ ⎪ 8 − t, 7 ≤ t < 8, ⎪ ⎪ ⎪ ⎪ 0, 8 ≤ t < 9, ⎪ ⎪ ⎩ 1, t = 9. This total mass path has jumps ΔU (n) (1) = 4, ΔU (n) (2) = 4, ΔU (n) (4) = −1, ΔU (n) (5) = 2, ΔU (n) (7) = −2 (the result of an arrival of mass 1 and the deletion of mass −3), and U (n) (9) = 1. We see that arriving mass to U (n) is not always placed at the lead time of the arriving customer. In particular, U (n) (5−) = 3δ2 , but U (n) (5) = δ1 + 4δ2 . The mass (n) (n) v3 = 2 arriving at time 5 is distributed with one unit at L3 = 1 and one unit at (n) 2. Furthermore, the mass v5 = 1 arriving at time t = 9, which begins a new busy (n) period for U (n) , is placed at 2 rather than at L5 = 1. (n) Because of the failures of U to place all arriving masses at their lead times, (n) the reneging system measure WR (t) is not U (n) (t) for 5 ≤ t < 7 and t = 9. The

Double Skorokhod Map

4δ3

191

4δ5

4δ4

3δ2 2δ1

0

t=1

0

4δ3

t=2

0

4δ2

t=3 4δ1

δ1 0

t=4

0

t=5

0

δ4 0

t=7

t=6

δ2 0

t=8

Fig 2. Evolution of

0

t=9

U (n)

full formula for the reneging system is ⎧ 0, 0 ≤ t < 1, ⎪ ⎪ ⎪ ⎪ (5 − t)δ , 1 ≤ t < 2, ⎪ 4−t ⎪ ⎪ ⎪ (5 − t)δ + 4δ , 2 ≤ t < 4, ⎪ 4−t 7−t ⎪ ⎪ ⎪ 4 ≤ t < 5, ⎨ (8 − t)δ7−t , (n) WR (t) = (7 − t)δ6−t + 3δ7−t , 5 ≤ t < 6, ⎪ ⎪ (9 − t)δ7−t , 6 ≤ t < 7, ⎪ ⎪ ⎪ ⎪ (8 − t)δ , 7 ≤ t < 8, ⎪ 11−t ⎪ ⎪ ⎪ 0, 8 ≤ t < 9, ⎪ ⎪ ⎩ δ1 , t = 9. (n)

The measure WR (t) is shown for integer values of t ranging between 1 and 9 in Figure 3. Beginning at time t = 4, the reneging system begins serving the customer with lead time 3, and thus by time t = 5, this customer, whose lead time is now 2, requires only three remaining units of service. The customer arriving at time t = 5 with lead time 1 brings an additional two units of work. At time t = 5, the reneging system thus has five units of work, which agrees with U (n) (5) = 5, but the mass in the reneging system is not distributed according to the measure U (n) (5). At time t = 6, an additional unit of work is deleted from the reneging system but not from (n) the U (n) system, and so WR (6) = 3, whereas U (n) (6) = 4. This discrepancy is due to the deletion in the reneging system at time 6 of the customer who arrived at time t = 5, a customer who upon arrival was more urgent than the customer in service in the reneging system. The work associated with this customer upon arrival is counted in the process D(n) in Lemma 4.6.

192

L  ukasz Kruk, John Lehoczky, Kavita Ramanan, and Steven Shreve (n)

Lemma 4.6 asserts that we always have WR (t) ≤ U (n) (t), and the inequality can be strict due to work that preempts the customer in service in the reneging (n) system, but the difference between WR (t) and U (n) (t) is never more than the amount of such work deleted by the reneging system up to time t. ˜

4δ3

4δ5

4δ4

3δ2 2δ1

0

t=1

0

t=2

t=3

0

4δ3 3δ2

3δ1

2δ1

0

t=4

0

t=5

δ4 0

t=7

t=6

0

δ1 0

t=8

Fig 3. Evolution of the reneging system

0

t=9

(n) WR

Acknowledgment We are grateful to Leszek S lomi´ nski for pointing out [4], Søren Asmussen for pointing out [5], and to Neil O’Connell for telling us about [7] and [14]. References [1] Anderson, R. and Orey, S. (1976). Small random perturbations of dynamical systems with reflecting boundary. Nagoya Math. J. 60 189–216. [2] Anulova, S. V. and Liptser, R. Sh. (1990). Diffusion approximation for processes with normal reflection. Theory Probab. Appl. 35 (3) 411–423. [3] Burdzy, K., Kang, W., and Ramanan, K. (2008). The Skorokhod problem in a time-dependent interval. Stoch. Proc. Appl., to appear. [4] Chitashvili, R. J. and Lazrieva, N. L. (1981). Strong solutions of stochastic differential equations with boundary conditions. Stochastics 5 255–309. [5] Cooper, W., Schmidt, V. and Serfozo, R. (2001). Skorohod–Loynes characterizations of queueing, fluid, and inventory processes. Queueing Systems 37 233–257.

Double Skorokhod Map

193

[6] Doytchinov, B., Lehoczky, J., and Shreve, S. (2001). Real-time queues in heavy traffic with earliest-deadline-first queue discipline. Annals of Applied Probability 11 332–378. [7] Ganesh, A., O’Connell, N. and Wischik, D. (2004). Big Queues. Lecture Notes in Mathematics 1838. Springer, New York. [8] Iglehart, D. and Whitt, W. (1970). Multiple channel queues in heavy traffic I. Adv. Appl. Probab. 2 150–177. [9] Kingman, J. F. C. (1961). A single server queue in heavy traffic. Proc. Cambridge Phil. Soc. 48 277–289. [10] Kruk, L ., Lehoczky, J., Ramanan, K. and Shreve, S. (2007). An explicit formula for the Skorokhod map on [0, a]. Ann. Probab. 35 1740–1768. [11] Kruk, L ., Lehoczky, J., Ramanan, K. and Shreve, S. (2007). Heavy traffic analysis for EDF queues with reneging. Preprint. [12] Skorokhod, A. V. (1961). Stochastic equations for diffusions in a bounded region. Theor. of Prob. and Its Appl. 6 264–274. [13] Tanaka, H. (1979). Stochastic differential equations with reflecting boundary conditions in convex regions. Hiroshima Math. J. 9 163–177. [14] Toomey, T. (1998). Bursty traffic and finite capacity queues. Ann. Oper. Research 79 45–62. [15] Whitt, W. (2002). Stochastic-Process Limits. Springer.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 195–214 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000381

Bounding Stationary Expectations of Markov Processes Peter W. Glynn1 and Assaf Zeevi2,∗ Stanford University and Columbia University Abstract: This paper develops a simple and systematic approach for obtaining bounds on stationary expectations of Markov processes. Given a function f which one is interested in evaluating, the main idea is to find a function g that satisfies a certain “mean drift” inequality with respect to f , which in turn leads to bounds on the stationary expectation of the latter. The approach developed in the paper is broadly applicable and can be used to bound steady-state expectations in general state space Markov chains, continuous time chains, and diffusion processes (with, or without, reflecting boundaries).

1. Introduction Consider an irreducible non-explosive Markov jump process X = (X(t) : t ≥ 0) on a discrete state space S (otherwise known as a continuous-time Markov chain on S). Let f : S → R+ be a cost function on S, in which f (x) represents the instantaneous rate at which cost accrues when X is in state x ∈ S. (Here and in what follows, all functions are assumed to be finite valued.) Then,  t f (X(s))ds C(t) = 0

is the total cost of running X over the time horizon [0, t]. Computing the exact distribution of C(t) (or even its expectation) is difficult. However, when X is positive recurrent, it is well known that there exists a distribution π = (π(x) : x ∈ S) for which  1 C(t) → π(x)f (x) a.s. t x∈S

as t → ∞; see, for example, Asmussen (2003). This justifies the approximation  C(t) ≈ tα := t · π(x)f (x) x∈S

for t large. Of course, for this approximation to be practically useful, we need to be able to compute α or (at least) bound it. The distribution π is the unique stationary distribution of X, so that π satisfies  π(x)Q(x, y) = 0, y ∈ S x∈S

(1)

s.t



π(x) = 1; π(y) ≥ 0, y ∈ S,

x∈S ∗ Research

partially supported by NSF grant DMI-0447562 Science and Engineering, Stanford University, e-mail: [email protected] 2 Graduate School of Business, Columbia University, e-mail: [email protected] AMS 2000 subject classifications: Primary 68M20, 60J10, 60J15 Keywords and phrases: Markov processes, diffusions, stationary, bounds, Poisson’s equation, queueing 1 Management

195

196

Peter W. Glynn and Assaf Zeevi

where Q = (Q(x, y) : x, y ∈ S) is the rate matrix of X. If |S| is finite and small, π can be computed numerically. If S is large, π can typically not be computed numerically, and in this setting one may need to be satisfied with computing bounds on α. Assuming that π is encoded as a row vector, the linear system (1) can be rewritten in matrix/vector notation as (2)

πQ = 0

subject to π being a probability distribution on S. To obtain a bound on α, note that when |S| < ∞, it follow from (2) that (3)

πQg = 0

for any column vector g. Hence, if we can find a vector g and a constant c for which (4)

Qg ≤ −f + ce

(where the function f = (f (x) : x ∈ S) is now encoded as a column vector and e = (1, . . . , 1) is the column vector in which all the entries are 1s), it is evident that we arrive at the upper bound: (5)

πf ≤ c.

Similarly, if we can find a g˜ and c˜ for which (6)

Q˜ g ≥ −f + c˜e,

we arrive at the lower bound: (7)

πf ≥ c˜.

While the bounds (5) and (7) are trivial to derive, we are unaware of any specific literature that presents these bounds (although it seems like such bounds have appeared previously); see further comments in the literature review at the end of this section. Our objective, in this paper, is to extend the above bounds to infinite state spaces, as well as in the direction of more general Markov processes. To offer a hint of the difficulties that can arise, suppose that X is an irreducible non-explosive birth-death process on Z+ = {0, 1, . . .}. For this class of jump processes, the socalled Poisson’s equation (8)

Qg = −k

has a solution for all right-hand sides k. (This solution can be computed by setting g(0) = 0, and then using the tri-diagonal structure of Q to recursively solve for the g(k)s.). Since πk is typically non-zero, it is evident that (3) fails badly for arbitrary g, even in the setting of simple birth-death processes. (For some discussion of sufficient conditions on the functions g and k that ensure that (3) holds see, e.g., Kumar and Meyn (1996).) Note that when |S| < ∞, the Poisson’s equation (8) is solvable for g only when πk = 0 (as can be seen by pre-multiplying both sides of (8) by the row vector π), so that the above difficulty disappears. Thus, to some degree, the complications associated with the validity of the bounds (5) and (7) have to do with issues of non-uniqueness and solvability of Poisson’s equation when S is infinite, and with related potential-theoretic issues.

Bounds for Markov Processes

197

Since our interest is in obtaining computable bounds for α = πf , our focus here is on deriving sufficient conditions under which the bounds (5) and (7) are valid. In Section 2, we explore such conditions that support the upper bound (5), and in Section 3 we develop conditions that support the derivation of the lower bound (7). Section 4 deals with linear programming formulations and its connections with the basic inequalities described above. Section 5 provides several applications of the method to queueing-related processes. Related literature: The references mentioned below are not meant to be an exhaustive survey, but rather touch upon strands of work that are connected directly with the main theme of our paper; for further reading and connections to bounds related to the ones mentioned above, the reader is referred to Meyn and Tweedie (1993) and Borovkov (2000). In the former, the function g appearing above is referred to as a Lyapunov function, and similar inequalities to (4), otherwise known as drift conditions, are used as sufficient conditions to establish f -regularity of the chain; see also the “comparison theorem” in Meyn and Tweedie (Theorem 14.2.2, 1993). The primary thrust of Meyn and Tweedie (1993) is the use of such drift conditions for purposes of establishing stochastic stability and recurrence properties of Markov chains [a similar treatment can be found in Borovkov (2000), and for stochastic differential equations in Hazminskii (1979)]. We note that passing from inequality (4) to (5) bears some similarities to the analysis of Poisson’s equation in Glynn and Meyn (1996), and some connections will be made to in this paper as well. By contrast to much of the above work, this paper is concerned with the use of the aforementioned drift conditions to develop computable bounds on various expectations of Markov processes. A particular focus is on clarifying the role that non-negativity plays in the application of such bounds. We further provide easily applied concrete hypotheses under which our bounds apply to discrete time Markov chains, Markov jump processes, and diffusion processes. In addition, we also illustrate these ideas on some queueing-related examples, and indicate how one may tighten such bounds via linear programming formulations. A related analysis that focuses on bounding the tails of the stationary distribution can be found in Hajek (1982), Lasserre (2002), Bertsimas, Gamarnik and Tsitsikilis (2001), and Gamarnik and Zeevi (2006); see also further references in the latter two papers. One important application area that has historically driven the need for such bounds is the queueing network context. A significant number of papers have focused on deriving performance bounds for such networks. In that setting, the goal is typically to bound the steady-state queue lengths or workload [see, e.g., Bertsimas, Paschalidis and Tsitsikilis (1994), Kumar and Kumar (1994), Sigman and Yao (1997), Bertsimas, Gamarnik and Tsitsikilis (2001) and Gamarnik and Zeevi (2006), as well as references therein]. We provide some examples in this paper that are related to the derivation of such bounds. 2. The Upper Bound Our goal is to exploit the inequality (4) so as to arrive at the bound (5). An equivalent perspective is to seek conditions on g under which πQg ≥ 0. In this case, if f is a function for which there exists c such that Qg ≤ −f + ce,

198

Peter W. Glynn and Assaf Zeevi

then we arrive at the bound πf ≤ c. Here is our main result for Markov jump processes. Proposition 1. Let X = (X(t) : t ≥ 0) be a non-explosive Markov jump process with rate matrix Q. If g : S → R is non-negative and sup(Qg)(x) < ∞,

(9)

x∈S

then the inequality (10)

πQg ≥ 0

holds for any stationary distribution π of X. Remark 1. Note that in the presence of (9), we can write Qg = −h + ce, where c = sup{(Qg)(x) : x ∈ S} and h is a non-negative function. The inequality (10) then asserts that πh ≤ c. Proposition 1 is actually a special case of a result that holds for much more general Markov processes. To state this result, assume that X = (X(t) : t ≥ 0) is a strong Markov process taking values in a Polish space S and having c` adl` ag (i.e., right-continuous with left-limits) paths. We say that g belongs to the domain of the (extended) generator A of the process X and write g ∈ D(A) if there exists a function k for which the process  t k(X(s))ds (11) M (t) = g(X(t)) + 0

is a local martingale (adapted to the filtration of X) with respect to Px (·) := P(· | X(0) = x) for each x ∈ S. Furthermore, we then write Ag = −k, where k is any given member selected from the class of functions satisfying (11). For the Markov processes that arise in typical applications, it is straightforward to offer conditions guaranteeing that g ∈ D(A) and to compute explicitly Ag. Markov jump processes: Suppose that X = (X(t) : t ≥ 0) is a non-explosive Markov jump process living on a discrete state S, with associated  rate matrix Q = (Q(x, y) : x, y ∈ S). Then, any function g : S → R for which y∈S |Q(x, y)g(y)| < ∞ for each x ∈ S lies in D(A). Furthermore, for such a function g, (Ag)(x) = (Qg)(x) for x ∈ S. To see this, let {Kn : n ≥ 1} be a sequence of subsets of S with Kn % S as n → ∞ and |Kn | < ∞ for all n ≥ 1. We then define Tn = inf{t ≥ 0 : X(t) ∈ Knc }. Using the Kolmogorov forward equations, it follows that  min{t,Tn } Mn (t) = g(X(min{t, Tn }))− 0 (Qg)(X(s)ds is a Px -martingale for all x ∈ S; see also further discussion in Karlin and Taylor (1981). Stochastic differential equations (SDEs): Let B = (B(t) : t ≥ 0) denote standard Brownian motion in Rd . Let μ : Rd → Rd and σ : Rd → Rd×r be functions that are assumed to satisfy the “usual” Lipschitz and linear growth conditions. In particular, we require the existence of constants c1 and c2 such that

(12)

μ(x) − μ(y) + σ(x) − σ(y) ≤ c1 x − y, μ(x)2 + σ(x)2 ≤ c2 (1 + x2 )

for all x, y ∈ Rd ; the (vector) coefficient function μ constitutes the drift of the process, and σ(x) is known as the volatility matrix. Let X = (X(t) : t ≥ 0) denote

Bounds for Markov Processes

199

the unique Rd -valued strong solution of the following stochastic differential equation (SDE) (13)

dX(t) = μ(X(t))dt + σ(X(t))dB(t)

,

where X(0) = x ∈ Rd . If g is a twice continuously differentiable, then g ∈ D(A) and Ag = Lg, where L is a the second order differential operator (14)

L :=

d  i=1

μi (x)

d ∂ 1  ∂2 + bij (x) , ∂xi 2 i,j=1 ∂xi ∂xj

where b(x) = σ(x)σ(x) is the diffusion matrix. The localizing sequence of stopping times {Tn : n ≥ 1} can be taken to be  t 2 Tn = inf t ≥ 0 : X(t) ≥ n, or σi,j (X(s))ds ≥ n for some 0 i = 1, . . . , d, j = 1, . . . , r . see pp. 312–313 of Karatzas and Shreve (1991). Jump diffusion processes: For simplicity, consider the one-dimensional case, i.e., S = R. Let μ and σ be such that they satisfy (12), and consider the following process: ⎛ ⎞  t  t N (t)  (15) X(t) = μ(X(s))ds + σ(X(s))dB(s) + ⎝ Yi ⎠ 0

0

i=1

where B = (B(t) : t ≥ 0) is a standard Brownian motion, N = (N (t) : t ≥ 0) is a Poisson process with constant rate λ > 0, and {Yi } are iid random variables with common distribution function F and finite second moment. The above represents one of the more standard formulations of a jump-diffusion process, where the jump component is given by a compound Poisson process. It is assumed that N and the sequence {Yi } are mutually independent, as well as independent of X(0) and the Brownian motion B. A sufficient  condition that g ∈ D(A) is that it be twice continuously differentiable and that |g(x + y)|dF (y) be bounded on compact sets. For such functions g 

(Ag)(x) := (Lg)(x) + λ g(x + y)dF (y) − g(x) , R

where L is given in (14). In this jump diffusion setting, the localizing sequence of stopping times {Tn : n ≥ 1} can be taken to be  t   2 2 Tn = inf t ≥ 0 : (g (X(s))) σ (X(s))ds ≥ n, or |g(X(t−) + y)|F (dy) ≥ n . R

0

Discrete-time Markov chains (DTMCs): Suppose that X = (X(t) : t ≥ 0) is an S-valued Markov chain with one-step transition kernel P = (P (x, y) : x, y ∈ S), so that P (x, dy) = P(X1 ∈ dy | X0 = x). If g : S → R is such that P (x, dy)|g(y)| < ∞ for each x ∈ S, then S Mn = g(Xn ) +

n−1  j=0

k(Xj )

200

Peter W. Glynn and Assaf Zeevi

 is a Px -local martingale for each x ∈ S, where k(x) = g(x) − S P (x, dy)g(y) for x ∈ S, thus, if we set A = P − I, then k = −Ag; for a localizing sequence, set Tn = inf{m ≥ 1 : (P |g|)(Xm ) > n, or k(Xm ) > n}. Recall that π is a stationary distribution for X if, given that X(0) has distribution π, then X(t) has distribution π for each t ≥ 0. Let Ex [·] := E[· | X(0) = x]. Here is our main upper bound result. Theorem 1. Suppose that g ∈ D(A) is a non-negative function for which sup(Ag)(x) < ∞.

(16)

x∈S

Then: i.) For each x ∈ S and t ≥ 0,  −Ex

t

(Ag)(X(s))ds ≤ g(x). 0

ii.) For each stationary distribution π of X,  π(dx)(Ag)(x) ≥ 0. S

Proof. First note that by definition of g,  t (Ag)(X(s))ds g(X(t)) − 0

is a Px -local martingale for each x ∈ S. Let {Tn : n = 1, 2, . . .} be the localizing sequence of stopping times under Px . Then,  min{t,Tn } (Ag)(X(s))ds = g(x) − Ex g(X(min{t, Tn }). −Ex 0

Since g is by assumption non-negative, we obtain the inequality:  min{t,Tn } −Ex (Ag)(X(s))ds ≤ g(x), 0

which holds for each x ∈ S. Put C := supx∈S (Ag)(x) < ∞, then C − Ag(x) ≥ 0. Rewriting the above inequality we have  min{t,Tn } Ex (C − (Ag)(X(s))) ds ≤ g(x) + CEx min{t, Tn }. 0

Since, by definition, Tn ↑ ∞, letting n → ∞ and applying monotone convergence to each side of the inequality above we get that  t (Ag)(X(s))ds ≤ g(x) −Ex 0

which proves i.). To prove ii.) we proceed as follows. Let π be any stationary distribution of X. Adding and subtracting Ct to each side of the inequality in i.) we have  t (C − (Ag)(X(s))) ds ≤ g(x) + Ct. Ex 0

Bounds for Markov Processes

201

Now, dividing both sides of the inequality by t, sending t → ∞ and using Fatou’s lemma we have that  1 t Ex lim inf (C − Ag)(X(s))ds ≤ C. t→∞ t 0 Because C − (Ag)(x) is non-negative for all x ∈ S, we may integrate the left-hand side against π. It now follows from Birkhoff’s ergodic theorem that the left-hand side above is given by C − Eπ E[(Ag)(X(0)) | I], where I denotes the invariant sigma-field of X. Hence, we have that πAg ≥ 0. This concludes the proof. Remark 2. The finite-time bound i.) is well known, see for example Meyn and Tweedie (p. 337,1993) for the discrete time result, and a version of the continuous time bound is implicit in Meyn and Tweedie (1993a). Note that Proposition 1 is a direct consequence of the above theorem. Remark 3. A simpler way to obtain ii.) from i.) would be to require that g be πintegrable and to directly integrate both sides of the finite time bound i.) against π. Dividing by t and sending t → ∞ then yields the inequality ii.). This approach has two disadvantages. Firstly, it requires an additional step from an applied standpoint, as one must now check π-integrability of g. Secondly, such a hypothesis would weaken the result, as Example 1 below shows that the functions g satisfying the hypothesis of Theorem 1 need not be π-integrable. Example 1. Let X be the number-in-system process corresponding to the M/M/1 queue, so that X is a birth-death process in Z+ with birth rates λ(x) = λ for x ≥ 0 and death rates μ(x) = μ for x ≥ 1. If λ < μ then X has a unique stationary distribution π(x) = (1 − λ/μ)(λ/μ)x for x ≥ 0 with ρ := λ/μ. Given θ > 0, the function (17)

g(x) =

θ(μ/λ)x+1 θx − 2 λ(μ/λ − 1) λ(μ/λ − 1)

satisfies Qg = θe, and is non-negative. The function g therefore satisfies the hypothesis of Theorem 1. On the other hand, g is not π-integrable. Remark 4. Note that Example 1 implies that the conclusions of Theorem 1 cannot be strengthened to  π(dx)(Ag)(x) = 0 S

under the hypothesis stated in Theorem 1. In other words, the inequality statement in ii.) is the best possible under the assumptions of the theorem. Theorem 1 leads immediately to the following corollaries. Corollary 1. Let X = (X(t) : t ≥ 0) be a non-explosive Markov jump process and suppose that f : S → R is non-negative. If there exists a non-negative function g and a constant c for which Qg ≤ −f + ce, then πf ≤ c for any stationary distribution π of X. Corollary 2. Let X = (X(t) : t ≥ 0) be a solution of the SDE (12), and suppose that f : S → R is non-negative. If there exists a non-negative twice continuously differentiable function g and a constant c for which (Lg)(x) ≤ −f (x) + c,

202

Peter W. Glynn and Assaf Zeevi

for x ∈ S, then

 π(dx)f (x) ≤ c S

for any stationary distribution π of X. Corollary 3. Let X = (X(t) : t ≥ 0) be a jump-diffusion process as in (15), and suppose that f : S → R is non-negative.  If there exists a non-negative twice continuously differentiable function g with |g(x + y)|dF (y) < ∞ for all x ∈ S, and a constant c for which (Ag)(x) ≤ −f (x) + c, for x ∈ S, then

 π(dx)f (x) ≤ c S

for any stationary distribution π of X. Corollary 4. Let X = (Xn : n ≥ 0) be a discrete-time S-valued Markov chain with transition kernel P , and suppose f : S → R is non-negative. If there exists a non-negative function g : S → R and a constant c for which  P (x, dy)g(y) ≤ g(x) − f (x) + c, S

for x ∈ S, then

 π(dx)f (x) ≤ c S

for any stationary distribution π of X. Another important applications domain is that of diffusions with boundaries. Very similar results to Theorem 1 hold in such settings. To illustrate this point, assume that the real-valued process X = (X(t) : t ≥ 0) satisfies the stochastic differential equation dX(t) = a(X(t))dt + b(X(t))dB(t) + dΓ(t), where B = (B(t) : t ≥ 0) is a one-dimensional standard Brownian motion, and Γ(·) is the minimal non-decreasing process that increases only when X is at the origin and is such that the solution X is non-negative. If g is twice continuously differentiable, then  t M (t) = g(X(t)) − (Lg)(X(s))ds − g  (0)Γ(t) 0

is a local martingale with respect to Px for each x ≥ 0, where L is the differential operator defined in (14). Note that if g  (0) ≤ 0, then  t ˜ (t) = g(X(t)) − (Lg)(X(s))ds M 0

is a local supermartingale. The proof of Theorem 1 goes through without change in the local supermartingale setting. It follows that if f and g are non-negative functions with (Lg)(x) ≤ −f (x) + c

Bounds for Markov Processes

203

for x ≥ 0 and with g  (0) ≤ 0, then we may conclude that  π(dx)f (x) ≤ c S

for any stationary distribution π of X. This argument easily extends to other types of boundary behavior, as well as to higher-dimensional diffusions. 3. The Lower Bound In this section, we turn to the question of when the lower bound (7) and its extensions to general Markov processes is valid. Such lower bounds would follow naturally from an inequality of the form  π(dx)(Ag)(x) ≤ 0, (18) S

just as the upper bounds of Section 2 follow directly from Theorem 1. Given our interest in obtaining bounds on the π-expectation of a non-negative function f and Section 2’s discussion of the solution to Poisson’s equation for such functions f , it is natural to restrict our attention to non-negative functions g for which sup(Ag)(x) < ∞. s∈S

In view of Theorem 1, it is evident that (18) can hold only if we establish equality, namely, determining additional conditions on g ensuring that  π(dx)(Ag)(x) = 0. (19) S

In the setting of a discrete-time Markov chain, it is easily seen that the requirement that g be π-integrable suffices to guarantee (19). Proposition 2. Let π be a stationary distribution of the Markov chain X = (Xn : n ≥ 0) and suppose that π|g| < ∞. If (Ag)(x) = Ex g(X1 ) − g(x), then πAg = 0. Proof. Note that if g = g + − g − where g + = max{g(x), 0} and g − = (−g)+ , then πP g + = πg + < ∞, so πAg + = 0. Similarly, πAg − = 0, yielding the result. Corollary 5. Suppose that f is non-negative. If there exist non-negative functions g1 and g2 and constants c1 and c2 for which (P g1 )(x) ≥ g1 (x) − f (x) + c1 , (P g2 )(x) ≤ g2 (x) − g1 (x) + c2 , for all x ∈ S, then πf ≥ c1 . As our next example illustrates, π-integrability of g does not suffice to guarantee that πAg = 0 in the setting of a continuous time Markov chain.

204

Peter W. Glynn and Assaf Zeevi

Example 2. Our counterexample is framed in the setting of a continuous time birth-death process X on Z+ = {0, 1, . . .}. Suppose that λ(x) = λrx for x ≥ 0 and μ(x) = μrx for x ≥ 1. Assume that μ > λ > 0, and r > 1. Note that the embedded discrete-time Markov chain is a positive recurrent process (since λ > μ). It follows that the jump process X is non-explosive. Let Q be the rate matrix of X and consider, as for Example 1, the solution g to the equation Qg = θe for some θ > 0. Similar to Example 1, it is not difficult to verify that the solution g to the above is non-negative and satisfies g(x) =

 θ 1 − (μr2 /λ − r(1 + μ/λ) + 1)−1 (λ/(μ − λ)(μ/λ)x λ θr + (μr2 /λ − r(1 + μ/λ) + 1)−1 xr−x . λ

For this example, the stationary distribution π of X is given by π(x) = (1 − λ/(rμ))(λ/(rμ))x . Note that if r > 1 + λ/μ, then g is non-negative and π-integrable. However, πQg = θ > 0, thereby providing the required example. Remark 5. Note that in the above example, both g and Qg are π-integrable, so evidently πQg can be positive, even if integrability of both g and Qg is imposed. For Markov jump processes, X our next proposition provides a sufficient condition under which πQg = 0. Proposition 3. Let X be a Markov jump process on discrete state space S with rate matrix Q and possessing a stationary distribution π. Suppose that g satisfies  π(x)|Q(x, x)||g(x)| < ∞. x∈S

Then, πQg = 0. Proof. Note that     π(x)|Q(x, y)||g(y)| = π(x) Q(x, y)|g(y)| + π(x)|Q(x, x)||g(x)| x,y

x

=



|g(y)|

y

=2

x

y=x





π(x)Q(x, y) +



π(x)|Q(x, x)||g(x)|

x

x=y

|g(y)||Q(y, y)|π(y) < ∞.

y

It follows that 

π(x)(Qg)(x) =

x



=

 y

This concludes the proof.

π(x)Q(x, y)g(y)

x,y

g(y)

 x

π(x)Q(x, y) = 0.

Bounds for Markov Processes

205

Corollary 6. Suppose that f is non-negative. If there exist non-negative functions g1 and g2 , and constants c1 and c2 for which (Qg1 )(x) ≥ −f (x) + c1 , (Qg2 )(x) ≤ −g1 (x)|Q(x, x)| + c2 , for all x ∈ S, then, πf ≥ c1 . To obtain lower bounds on SDEs and jump-diffusions, we offer the following result. Theorem 2. Suppose that g ∈ D(A). Assume that the local martingale  t g(X(t)) − (Ag)(X(s))ds 0

is a martingale (adapted to the filtration of X) with respect to Px for each x ∈ S. If X has a stationary distribution π for which g is π-integrable and supx∈S (Ag)(x) < ∞, then  π(dx)(Ag)(x) = 0. S

Proof. By virtue of the martingale property, " !  t Ex g(X(t)) − (Ag)(X(s))ds = g(x) 0

for each x ∈ S. Note that g(X(0)) and g(X(t)) are both Pπ -integrable (since X is stationary under Pπ by definition). It follows that  t (Ag)(X(s))ds = 0. (20) Eπ 0

Because supx∈S (Ag)(x) < ∞, either Eπ (Ag)(X(s)) = ∞, or Eπ |(Ag)(X(s))| < ∞ for each s ≥ 0. In view of (20) we may conclude that Eπ |(Ag)(X(s))| < ∞ and hence (20) implies that t Eπ (Ag)(X(0)) = 0, which proves the result. The above result provides a mechanism for establishing lower bounds on stationary expectations for general Markov processes. 4. A Connection with Linear Programming In this section, we explore connections between the bound (1) and linear programming characterizations of the stationary expectation α = πf . We start by observing that when X is an irreducible finite-state discrete-time Markov chain, there always exists a solution g ∗ to Poisson’s equation g − P g = f − α. Furthermore, because all functions are automatically π-integrable in this context, α can be characterized as the minimum of the following linear program (LP):

(21)

min c s.t. P g ≤ g − f + ce,

where e = (1, . . . , 1)t is the column vector consisting entirely of 1s. A couple of observations are in order:

206

Peter W. Glynn and Assaf Zeevi

1. Note that if g is a solution of the inequality (21), we may always take the solution to be non-negative (without loss of generality). To see this, observe that g + βe is then also a solution of (21) for any constant β. So, if g has a negative component, just choose β = − min{g(x) : x ∈ S}. Hence, in finite state space, requiring g to be non-negative (as in Theorem 1) is no restriction on the class of “test functions” g. 2. It can be easily verified that the dual LP is

(22)

max νf s.t. νP = ν νe = 1.

Hence, the dual LP corresponds precisely to the standard equations that uniquely characterize the stationary distribution. Of course, in infinite state space, a solution g to the linear inequalities system (21) may not be bounded from below, so that the non-negativity constraint on g in Theorem 1 could, in principle, limit the applicability of Theorem 1’s bound. In view of this, we offer the following result. Theorem 3. Suppose that f is a bounded non-negative function and that X = (Xn : n ≥ 0) is a uniformly ergodic Markov chain. Then, there exists a finitevalued non-negative function g and non-negative constant c that solve the linear inequality system (P g)(x) ≤ g(x) − f (x) + c, x ∈ S. Proof. We show that there exists a solution to Poisson’s equation P g = g − f + α (where α = πf ) that is bounded below. It is shown in Glynn and Meyn (1996) that one solution g to Poisson’s equation is τ −1  (f (Xj ) − α), g ∗ (x) = Ex j=0

where τ is the regeneration time for the chain. In the uniformly ergodic case, the regeneration time τ has the property that Px (τ > n) = O(γ n ) for some γ ∈ (0, 1) that is uniform in x; see Meyn and Tweedie (1993). Because f is non-negative, g ∗ (x) ≥ −αEx τ, and it follows that g ∗ is bounded below. The following result suggests that for the types of functions f that arise in most applications, the non-negativity constraint on g is not a serious restriction. Proposition 4. Let X = (X(t) : t ≥ 0) be a positive recurrent irreducible Markov jump process on discrete state space S, with stationary distribution π. Suppose that f : S → R+ is π-integrable and has the property that for each c > 0, {x : f (x) ≤ c} has finite cardinality. Then, there exists a finite-valued non-negative function g and a non-negative constant c that solve the linear inequality system: (Qg)(x) ≤ −f (x) + x,

x ∈ S.

Bounds for Markov Processes

207

Proof. As in Theorem 3, we show that there exists an equality solution of the linear inequality system. In this setting, we choose z so that f (z) is the minimum of f . A simple continuous-time adaptation of the reasoning of Glynn and Meyn (1996) establishes that one solution g to Poisson’s equation is  τ (x) ∗ (f (X(s)) − α)ds, g (x) = Ex 0

where α = πf . Let K = {x : f (x) ≤ α}, and note that if TK = inf{t ≥ 0 : X(t) ∈ K}, it is evident that  TK (f (X(s)) − α)ds + Ex g ∗ (X(TK )). g ∗ (x) = Ex 0

Since f (x) ≥ α for x ∈ K , it follows that  TK (23) Ex (f (X(s)) − α)ds ≥ 0 c

0

for all x ∈ S (because (23) holds trivially for x ∈ K). The set K has finite cardinality so β := inf{g ∗ (x) : x ∈ K} > −∞. So, g ∗ is bounded below over S by β, and hence there exists a non-negative solution to Poisson’s equation. We conclude this section by showing how LP methods can be used to tighten the bound on α = πf relative to the constant c as determined by (6), where f is non-negative. In particular, suppose that one has found a non-negative Lyapunov function g˜ satisfying (5) on the complement of some subset K. In order to obtain a finite-dimensional LP, suppose that K is a finite set for which  P (x, y)˜ g (y) y∈K c

can be computed for each x ∈ K. To tighten the bound on α relative to (6), consider the LP: (24)

min c   s.t. P (x, y)g(y) ≤ − P (x, y)˜ g (y) − f (x) + c, y∈K

0 ≤ g(x) ≤ g˜(x)

for all x ∈ K

y∈K c

for all x ∈ K.

By setting gˆ = g in K and gˆ = g˜ in K c , we see that (ˆ g , c∗ ) satisfies the hypotheses ∗ of Theorem 1, where c is the minimum of the LP (24). Hence, α ≤ c∗ . Thus, the bounding method developed in this paper can be used in conjunction with LP ideas to create a numerical scheme for computing tight bounds on α = πf . 5. Applications This section presents various applications of the above results; the applications are grouped into two categories. The first set deals with application in discrete time, the focus being on reflected random walks. The second set of examples deals with applications in continuous-time: examples include analysis of a Markov jump process and a diffusion process with reflecting boundaries, both motivated by applications in queueing theory.

208

Peter W. Glynn and Assaf Zeevi

5.1. The single-server queue and random-walk processes The single-server queue. Consider a queue that is fed by an arrival stream of jobs with i.i.d. processing requirements V = (Vn : n ≥ 0), and i.i.d. inter-arrival times U = (Un : n ≥ 1) that are also independent of the processing requirements. We assume that EV0 ≤ EU1 , guaranteeing the stability of the system (i.e., ρ := EV0 /EU1 < 1). We also assume that at time t = 0 the first job arrives at the queue and finds the system empty. Let X = (Xn : n ≥ 0) denote the waiting time process, where Xn is the time that the nth job spends in the system before receiving service. Taking Zn := Vn − Un+1 , we can express the dynamics of the waiting time via the recursion Xn+1 = [Xn + Zn+1 ]+ where [x]+ := max{x, 0}. By construction, X is a discrete-time Markov chain taking values in S = R+ . Given the negative drift condition EZ < 0, basic stability theory for the G/G/1 queue ensures the existence (and uniqueness) of a stationary distribution π. Suppose we are interested in bounds on the first moment of this distribution, that is, suppose that f (x) = x. Put g(x) = x2 . Then, for any x ∈ S we have 2

Ex g(X1 ) = E ([x + Z1 ]+ )   = E(x + Z1 )2 − E (x + Z1 )2 ; x + Z1 < 0 . So, Ex g(X1 ) ≤ E(x + Z1 )2 = x2 + 2xEZ1 + EZ12 . Since (Ag)(x) ≤ EZ12 , then if Z has a finite second moment we can apply Theorem 1 (and Corollary 4), yielding the upper bound EZ12 , 2|EZ1 |

Eπ X1 ≤

which is nothing but Kingman’s bound; see Kingman (1962). If one considers the performance of this bound in heavy-traffic, √ i.e., along a sequence of systems n = 1, 2, . . . in which ρn → 1 in such a way that n(1 − ρn ) → μ for some μ > 0, then, denoting the corresponding sequence of waiting times {X n : n ≥ 1} we have that n−1/2 Eπ Xn → σ 2 /(2μ), where σ 2 =: EZ12 . Hence the above inequality holds with equality in the aforementioned limiting sense. To derive a lower bound on the mean, note that ! 0 " < ; 2 2udu; x + Z1 < 0 −E (x + Z1 ) ; x + Z1 < 0 = E x+Z1 0

! = 2E 

−∞ 0

=2 −∞  0

≥2

" uI{x + Z1 ≤ u, x + Z1 ≤ 0}du

uP(Z1 ≤ u − x)du

uP(Z1 ≤ u)du ; < = −E Z12 ; Z1 < 0 . −∞

Bounds for Markov Processes

209

If E|Z1 |3 < ∞ then πg < ∞ and hence Theorem 2 yields ; < E Z12 ; Z1 ≥ 0 . Eπ X1 ≥ 2|EZ1 | Note that the upper bound only requires EZ12 < ∞. 5.2. Applications in continuous time We consider two applications. The first derives bounds on mean queue lengths in a multi-class single server queue operating under the longest queue first (LQF) scheduling policy. We then derive bounds on moments of semi-martingale reflecting Brownian motion in the orthant. Performance bounds for scheduling control in a single-server multiclass queue. Customer arrival are modeled as m mutually independent Poisson processes with rates λ1 , . . . , λm . The processing requirements of customers in each class follow an exponential distribution with mean 1/μi , i = 1, . . . , d and are independent of each other and of the arrival processes. There is a single server which can serve customers at unit rate. Upon arrival, customers either get served immediately or are put into infinite capacity buffers, according to their class. Upon completion of service, a customer leaves the system. In any given class, at most one customer can be serviced and the sequencing within a class is according to a First-In-First-Out (FIFO) discipline. Customers that are not in service are said to be in the queue. We will assume in what follows that ρ :=

m 

ρi < 1 ,

i=1

where ρi := λi /μi for i = 1, . . . , m. (The quantity ρ is referred to as the traffic intensity in the system.) It is well known that under the above condition every Markovian work-conserving policy is stable, in the sense that the associated CTMC is positive recurrent. (By work-conserving we mean that the server does not idle whenever there is work to be done.) Conversely, if ρ > 1 then any scheduling policy is unstable (i.e., there is no steady-state). Denote the queue-length vector at time t ≥ 0 by X(t) = (X1 (t), . . . , Xm (t)), and let X = (X(t) : t ≥ 0) denote the queue-length process. To illustrate the application of our Lyapunov inequality, we consider a simple state-dependent scheduling policy known as “serve the longest queue first,” denoted LQF for brevity. As the name suggests, this policy assigns the server to serve the class in which the queue length is the longest, and if no customers are in the system the server idles. We allow for service to be preempted if at any time instant the queue in one of the classes that is not being served increases beyond the length of the queue in the currently served class. To formalize the verbal description of the scheduling policy, define a mapping δ : Zd+ → {0, . . . , m}, such that for any fixed vector of queue lengths x ∈ Zd+ , we have a(x) ∈ {e0 , e1 , . . . , ed } where ei is the ith unit vector in Rm , and e0 is an m-dimensional zero vector. The action a(x) specifies what customer class receives service when the system is in the given state x. Let ai (x) be the ith component of a(x) for any state x. For the LQF discipline we have that ei if xi > max{xj : j = i} a(x) = e0 otherwise ,

210

Peter W. Glynn and Assaf Zeevi

and ties are broken arbitrarily by giving priority to the class with larger index. Let us denote by i∗ (x) the class that is granted priority under this scheduling rule. With this notation, the infinitesimal generator of the controlled CTMC is λi if y = x + ei A(x, y) = μi ai (x) if y = x − ei for any two states x, y ∈ Zm + , such that x = y (where this vector inequality is interpreted to hold if the two vectors differ at least in one coordinate),  for i = 1, . . . , m. The diagonal entries in this matrix are defined by A(x, x) = − y=x A(x, y). Our objective is to obtain upper bounds on the steady-state queue lengths under the aforementioned LQF policy. Given our chosen scheduling rule, we are particularly interested in the behavior of the longest queue. Using previously established notation, put f (x) = x∞ := max{x1 , . . . , xm }, and take the test function g to be g(x) =

m  x2 i

i=1

μi

+

m  xi . μ i=1 i

Using the definition of the infinitesimal generator, straightforward algebra yields that m  (Ag)(x) = 2 ρi xi − 2xi∗ + 2ρ, where ρi = λi /μi , ρ = Hence, we have that

 i

i=1 ∗

ρi and i is the index of the largest coordinate of x.

(Ag)(x) ≤ −2f (x)(1 − ρ) + 2ρ, which serves as the basic inequality for the purposes of bounding the maximal queue length. In particular, by Theorem 1 and Corollary 1 we have Eπ X(t)∞ ≤

ρ . (1 − ρ)

A simple manipulation of the above bound gives us the following bound on the total workload in the system in steady-state 9m :  Xi (t) ρ2 Eπ ≤ , μi λmin (1 − ρ) i=1 where λmin = min{λ1 . . . , λm }. The work reported on in Bertsimas et al. (2002) can be used to contrast this with a lower bound that holds for all stable Markovian policies, and is derived by other methods. In particular, Theorem 2 in Bertsimas et al. (2002) asserts that : 9m  1 ρ2 , Xi (t) ≥ Eπ μ 4λmax (1 − ρ) i=1 i where λmax = max{λi : i = 1 . . . , d}. Hence our argument recovers the correct order of this lower bound. In particular, this implies that the performance of LQF scheduling is within a constant factor of the best possible scheduling rule, for all problem instances in which the ratio λmin /λmax is held constant. Reflected Brownian motion. A class of diffusion processes that play a central role in queueing theory are of the type most often referred to as reflected Brownian

Bounds for Markov Processes

211

motions (or RBMs); see, e.g., Harrison and Reiman (1981), Harrison and Williams (1987), Dupuis and Williams (1995) and references therein. (There are many more recent references on the topic but the latter are the most relevant to the examples presented below.) Specific instances of these processes have been shown to arise as diffusion limits of certain queueing networks that operate under so-called heavy traffic conditions. We next proceed with three concrete examples of such RBMs and illustrate how our basic inequalities can be used to obtain bounds on the tail of their stationary distribution. Example 1: One dimensional RBM. The simplest instance of RBMs arises as a diffusion limit of the single server queue in heavy-traffic. The process can be defined as the unique strong solution of the following stochastic differential equation, which is a particular instance of the class of stochastic differential equations discussed in Section 3.2: dX(t) = −μdt + σdB(t) + dΓ(t) X(0) = x0 , where μ, σ > are positive constants, x0 ≥ 0, and Γ = (Γ(t) : t ≥ 0) is the “pushing process” that keeps X = (X(t) : t ≥ 0) from going negative. It is well known that the stationary distribution of this process is exponential with mean σ 2 /(2μ), that is, when X(0) is drawn from this distribution the process X is stationary. Establishing this result is not difficult, but does require some modest amount of work and familiarity with properties of Brownian motion [see, e.g., Karatzas and Shreve (1991)]. We next illustrate how our basic inequality can be used to obtain rough bounds on the tail of the stationary distribution. Let g(x) = exp{ax} − ax for some positive constant a to be specified shortly. (It is possible to allow for two different constants to parameterize g, but this does not improve upon the bounds derived below.) With this definition we have g  (0) = 0, and (Lg)(x) = −μag(x) + μa +

σ2 2 a g(x). 2

Fix  > 0 and set a := 2μ(1 − )/σ 2 . From this we get that (Lg)(x) ≤ −(1 − )μag(x) + μa. Hence, setting f (x) = exp{ax} and c = ((1 − ))−1 , with the above choice of constant a we have by Corollary 2 (in particular, the discussion following Corollary 4) that πf ≤ c. Now, using Markov’s inequality we have that Pπ (X(0) ≥ x) ≤ ((1 − ))−1 exp{−2μ(1 − )/σ 2 }, for any  > 0. Example 2: RBM in the orthant. Let S = Rd+ (the positive d-dimensional orthant). Let μ be a constant vector in Rd , σ a d×d non-degenerate covariance matrix (symmetric and strictly positive definite), and R a d × d matrix. For each x ∈ S, a semimartingale reflecting Brownian motion (abbreviated as SRBM) associated with the data (S, μ, σ, R, x) is an Ft -adapted, d-dimensional process X = (X(t) : t ≥ 0) defined on some filtered probability space (Ω, F, Ft , P) such that: (i) X = W + RΓ, Px -a.s., (ii) Px -a.s., X has continuous paths and X(t) ∈ S for all t ≥ 0, (iii) W is a d-dimensional Brownian motion with drift vector μ, covariance matrix σ and W (0) = x. In addition, M (t) = W (t) − μt is an Ft -martingale. (iv) Γ is an Ft -adapted d-dimensional process such that under P it satisfies for each j = 1, . . . , d: a.) Γj (0) = 0 b.) (Γj (t) : t ≥ 0) is continuous and non-decreasing

212

Peter W. Glynn and Assaf Zeevi

c.) Γj (t) can increase only when X hits the face Fj = {x ∈ Rd+ : xj = 0}. Loosely speaking, SRBM behaves like Brownian motion in the interior of the orthant, and is confined to the orthant by instantaneous “reflection” at the boundary faces, where the direction of reflection is dictated by the matrix R. The most general condition currently known to ensure existence and uniqueness (in law) of SRBM in the orthant is that the matrix R is completely S. (That is, in the construction of the SRBM satisfying the properties detailed above is not done via a mapping from the initial condition and the Brownian motion, but rather as a weak solution.) The completely S condition is in fact necessary and sufficient; see Taylor and Williams (1993). [This property requires that for every principal ˜ of R there exists a vector v with strictly positive entries such sub-matrix of R ˜ is strictly positive.] For this class of SRBMs it is more challenging to that Rv characterize the existence of a stationary distribution. Dupuis and Williams (1995) prove that a sufficient condition for the existence and uniqueness of a stationary distribution is that all solutions of an associated deterministic Skorohod problem are attracted to the origin in finite time. (Their proof relies on a construction of a somewhat complicated piecewise linear Lyapunov function and uses the martingale structure of SRBM.) We next illustrate how a variation on that idea, using the basic inequalities developed in this paper, can be used to establish integrability of moments of an SRBM. For the SRBM X, we require the following conditions to hold: (i) R is symmetric and positive definite; and (ii) −γ := R−1 μ < 0 componentwise. Condition (ii) is necessary for the existence of a stationary distribution (see Dai and Harrison (2008)). As for the symmetry assumption, this is imposed primarily to facilitate the explicit construction of a simple test function g. Fix a > 0 and let g(x) = √ exp{a 1 + x R−1 x}. Straightforward algebra yields that ∇g · μ = −a √  i,j=1,...,d

(25)

σij

x·γ g(x) 1 + x R−1 x

∂g 2 (x) ≤ a2 c1 g(x) + ac2 (1 + x R−1 x)−1/2 g(x) ∂xi ∂xj

(∇g · R)i = √

axi g(x), 1 + x R−1 x

where c1 , c2 are finite constants that depend only on the matrices R and σ, and can be computed explicitly in a straightforward manner (we omit such calculations for space considerations). Examining the third equality in (25), we may conclude that   ∇g(X(t)) RdΓ(t) = 0 for all t since X(t)·dΓ(t) = 0 by definition of the SRBM. Examining the first and second inequalities in (25), it follows that for a suitable choice of r > 0, depending on γ and c2 , we can ensure that ∇g · μ ≤ −2ac2 g(x) for x∈ / {x : 0 ≤ xi ≤ r}. It then follows that by taking a < c2 /(2c1 ) we have that (Lg)(x) ≤ −(c22 /(2c1 )g(x) + c where c := max{|(Lg)(x)| : 0 ≤ xj ≤ r, j = 1, . . . , d}. Put Kn := {x ∈ S : xi ≤ n, for all j = 1, . . . , n}. Let Tn = inf{t ≥ 0 : Xi (t) ∈ / Kn }. By continuity of the o’s paths of SRBM, we have that Tn → ∞ a.s., as n → ∞. We can now apply Itˆ differential rule to X, “localize” the martingale term using Tn and hence apply the same logic used in the proof of Theorem 1 and Corollary 2 (and the sketch provided for diffusion with reflecting boundaries) and arrive at the conclusion that πf ≤ c.

Bounds for Markov Processes

This yields

213

" +  −1 ≤ 2cc1 /c22 . exp a 1 + X(0) R X(0)

! Eπ

Thus, for suitably small a we have exponential moments for the particular class of SRBMs satisfying assumptions (i) and (ii) above. Corresponding bounds on the tail of the stationary distribution follow immediately by using Markov’s inequality. We should note that more precise characterization of the tail of SRBMs (without the need for condition (ii)) was recently derived by Budhiraja and Lee (2007). Acknowledgements We wish to thank an anonymous referee for his/her comments on an earlier version of the paper which led to an improved exposition, and highlighting the importance of Theorem 1. Many thanks are due to Stewart Ethier for giving us the opportunity to contribute this paper honoring Tom Kurtz on the occasion of his 65th birthday. References [1] Asmussen, S. (2003). Applied Probability and Queues. Springer, NY. [2] Bertsimas, D., Gamarnik, D. and Tsitsiklis, J. (2001). Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions. Ann. Appl. Probab. 11 1384–1428. [3] Bertsimas, D., Paschalidis, I. and Tsitsiklis, J. (1994). Optimization of multiclass queueing networks: Polyhedral and nonlinear characterizations of achievable performance. Ann. Appl. Probab. 4 43–75. [4] Borovkov, A. A. (2000). Ergodicity and Stability of Stochastic Processes. John Wiley & Sons, New York. [5] Budhiraja, A. and Lee, C. (2007). Long time asymptotics for constrained diffusions in polyhedral domains. Stochastic Processes and Their Applications 117 1014–1036. [6] Dai, J. and Harrison, J. M. (2008). Reflecting Brownian motion in the orthant: An illuminating example of instability. Math. of Oper. Res., to appear. [7] Dupuis, P. and Williams, R. J. (1994). Lyapunov functions for semimartingale reflecting Brownian motions. Ann. Probab. 22 680–702. [8] Fayolle, G. (1989). On random walks arising in queueing systems: Ergodicity and transience via quadratic forms as Lyapunov functions. Part I. Queueing Systems 5 167–184. [9] Gamarnik, D. and Zeevi, A. (2006). Validity of heavy-traffic steady-state approximations in generalized Jackson networks. Ann. Appl. Probab. 16 56– 90. [10] Glynn, P. W. and Meyn, S. P. (1996). A Liapounov bound for solutions of the Poisson equation. Ann. Probab. 24 916–931. [11] Hajek, B. (1982). Hitting-time and occupation-time bounds implied by drift analysis with applications. Ann. Appl. Prob. 14 502–525. [12] Has’minskii, R. Z. (1980). Stochastic Stability of Differential Equations. Monographs and Textbooks on Mechanics of Solids and Fluids: Mechanics and Analysis 7. Germantown, Md. [13] Karlin, S. and Taylor, H. (1981). A Second Course in Stochastic Processes. Academic Press, San Diego.

214

Peter W. Glynn and Assaf Zeevi

[14] Karatzas, I. and Shreve, S. E. (1991). Brownian Motion and Stochastic Calculus. Springer. [15] Kingman, J. F. C. (1962). Some inequalities for the queue GI/G/1. Biometrica 49 315–324. [16] Kumar, P. R. and Meyn, S. P. (1996). Duality and linear programs for stability and performance analysis of queueing networks and scheduling policies. IEEE Trans. on Automatic Control 41 4–17. [17] Kumar, S. and Kumar, P. R. (1994). Performance bounds for queueing networks and scheduling policies. IEEE Trans. on Automatic Control 39 1600– 1611. [18] Lasserre, J. (2002). Bounds on measures satisfying moment conditions. Ann. Appl. Probab. 12 1114–1137. [19] Lions, P.-L. and Sznitman, A.-S. (1984). Stochastic differential equations with reflecting boundary conditions. Comm. Pure Appl. Math. 37 511–537. [20] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Communications and Control Engineering Series. Springer-Verlag, London, 1993. [21] Meyn, S. P. and Tweedie, R. L. (1993). Stability of Markovian processes III: Foster–Lyapunov criteria for continuous time processes. Adv. Appl. Probab. 25 518–548. [22] Sigman, K. and Yao, D. (1993). Finite moments for inventory processes. Ann. Appl. Probab. 3 765–778. [23] Taylor, L. M. and Williams, R. J. (1993). Existence and uniqueness of semimartingale reflecting Brownian motion in the orthant. Probab. Theory Related Fields 96 283–317. [24] Tweedie, R. (1983). The existence of moments for stationary Markov chains. J. Appl. Probab. 20 191–196.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 215–234 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000390

Internet Traffic and Multiresolution Analysis Ying Zhang,1 Zihui Ge,2 Suhas Diggavi,3 Z. Morley Mao,1 Matthew Roughan,4 Vinay Vaishampayan,2 Walter Willinger,2 and Yin Zhang5 Abstract: Traditional Internet traffic studies have primarily focused on the temporal characteristics of packet traces as observed on a single link within an ISP’s network. They have contributed to advances in the areas of self-similar stochastic processes, long-range dependence, and heavy-tailed distributions and have demonstrated the benefits of applying a wavelet-based multiresolution analysis (MRA) approach when analyzing these traces. However, an ISP’s physical infrastructure typically consists of 100s or 1000s of such links which are connected by routers or switches, and the Internet as a whole is made up of about 20,000 such ISPs. When viewed within this bigger context, the importance of the traffic’s spatial characteristics becomes evident, and traffic matrices—compact and succinct descriptions of the traffic exchanges between nodes in a given network structure—are used in practice to capture and explore critical aspects of this spatial component of Internet traffic. In this paper, we first review some of the known results about the observed multifaceted scaling behavior of Internet traffic as seen on a single link. Next, we give a detailed account of how the architectural design of the Internet gives rise to natural representation of traffic matrices at different scales or levels of resolution. Moreover, we discuss the development of a MRA-like framework of traffic matrices that respects the different physically or logically meaningful Internet connectivity structures and provides new insights into Internet traffic as a spatio-temporal object.

1. Introduction Internet traffic is a multi-faceted object, and depending on one’s vantage point, can either be viewed as a purely temporal, a purely spatial, or a combined temporalspatial process. Traditional Internet traffic studies have focused mainly on the traffic that traverses a given link between two routers in the network. Their primary objective has been to describe the pertinent statistical characteristics of the temporal behavior of the measured traffic rate process (i.e., the number of bytes or packets seen on the given link in successive time intervals). However, Internet traffic arises in a very structured manner that reflects the architectural design of the Internet, with the vertical separation into layers and the horizontal decentralization 1 Ying Zhang and Z. M. Mao are with the EECS Department, University of Michigan, Ann Arbor, MI 48109, USA, e-mail: [email protected]; e-mail: [email protected] 2 Z. Ge, V. Vaishampayan, and W. Willinger are with AT&T Labs-Research, Florham Park, NJ 07932, USA, e-mail: [email protected]; e-mail: [email protected]; e-mail: [email protected] 3 S. Diggavi is with the School of Computer and Communication Sciences, EPFL, Lausanne, SWITZERLAND, e-mail: [email protected] 4 M. Roughan is with the School of Mathematical Sciences, University of Adelaide, Adelaide 5005, AUSTRALIA, e-mail: [email protected] 5 Yin Zhang is with the Department of Computer Sciences, University of Texas at Austin, Austin, TX 78712, USA, e-mail: [email protected] AMS 2000 subject classifications: Primary 60K30, 90B18; secondary 60G18, 60G57 Keywords and phrases: self-similar processes, heavy-tailed distributions, traffic matrices, wavelet-based multi-resolution analysis

215

216

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

across network components representing two of its most prominent and influential features. As a result, the focus of subsequent Internet traffic studies has turned from being largely descriptive to being fully explanatory in the sense that observed characteristics of traffic rate processes are traced to particular aspects or mechanisms that determine how traffic is generated and handled within the confines of the architectural framework provided by today’s Internet. As an example, one of the most visible manifestations of the Internet’s vertical decomposition is the 5-layer TCP/IP protocol stack, consisting of (from the bottom up) the physical layer (e.g., optical fiber, copper), the data link or network access layer (e.g., Ethernet, frame relay), network or internet layer (e.g., Internet Protocol, or IP), the host-to-host or transport layer (e.g., Transmission Control Protocol, or TCP), and the application layer (e.g., HyperText Transfer Protocol, or HTTP, for the World Wide Web, or WWW). In turn, on a given link, every bit recorded at the physical layer can in general be uniquely associated with higher-layer entities such as IP packets, IP flows, TCP connections, or application-layer sessions. As discussed in Section 2, the challenge of explanatory Internet traffic modeling has been to relate observed features of measured traffic rate processes to and explain them in terms of properties of these higher-layer traffic entities. Key to these efforts has been Kurtz’s construction [31], a flexible framework for exploring Internet traffic that (i) is mathematically rigorous, (ii) accounts for the different layers in the TCP/IP protocol stack, (iii) is consistent with measured Internet traffic across the different layers, and (iv) highlights the intimate connection between the observed temporal scaling properties of various traffic rate processes and the ubiquitous highvariability or heavy-tailed properties of the different higher-layer entities (e.g., IP flows, TCP connections, sessions). These efforts have been aided by the development and application of a 1D wavelet-based multi-resolution analysis (MRA) that has enabled an in-depth examination of the temporal dynamics of measured Internet traffic across a wide range of time scales of interest [2]. We summarize the main findings and implications from these efforts in Section 2. The main objective of this paper is to outline a similar wavelet-based MRA (in 2D instead of 1D) for studying the spatial features of a network’s measured traffic matrices rather than the temporal aspects of the measured traffic rate processes on a link. Here, a traffic matrix describes the amount of traffic (in bytes or packets) that is sent from one point or node in a network to another during some time interval (e.g., 5–30 minutes). The networks of interest are manifestations of the Internet’s horizontal decentralization and reflect physically, logically, or managerially meaningful ways of organizing the Internet-wide physical infrastructure into smaller entities or subnets. In turn, nodes can represent physical links, routers, Points-of-Presence (PoPs), autonomous systems (ASs) or domains, or entire ISPs, and the resulting traffic matrices provide compact descriptions of network-wide Internet traffic across a wide range of spatial scales of interest. In Section 3, we discuss traffic matrices at different levels of scale or resolution, outline a rudimentary 2D wavelet-based MRA of traffic matrices, and illustrate it with some examples of actual traffic matrices. In particular, we use the Abilene network [1] since Abilene makes all of the information about its network publicly available. However, we hasten to point out that the development of a MRA of traffic matrices will benefit from future studies that examine a wider range of networks, especially from commercial ISPs. In addition, many technical and practical problems remain, and we conclude in Section 4 with a list of the most pressing open problems and a discussion of promising applications of the envisioned MRA of traffic matrices.

Multiresolution Analysis and Internet Traffic

217

2. Single-link traffic: Self-similarity and Kurtz’s construction When focusing on a single link within an ISP’s network, Internet traffic data typically consists of high time-resolution measurements recorded on that physical link over which the bits are sent. Each transmitted bit seen on this link can in general be associated with higher-layer entities such as an IP packet. Along with every packet header that is captured and stored, additional information is usually saved, notably an accurate time stamp (packet arrival time), packet size (number of bits or bytes), and other status and possibly even some payload information. Empirical studies of these high-quality and high-volume data sets have generally focused on identifying and describing pertinent statistical characteristics of the temporal dynamics of the measured packet or bit rate processes (i.e., the time series representing the number of packets or bits per time unit, over a certain time interval). They have provided ample evidence that measured Internet traffic exhibits extended temporal correlations (i.e., long-range dependence), and that when viewed within some range of moderately small to moderately large time scales, the traffic appears to be fractal-like or self-similar, in the sense that a segment of the traffic measured at some time scale looks or behaves just like an appropriately scaled version of the traffic measured over a different time scale. The original finding of self-similar scaling behavior in measured network traffic was reported in [24, 32] and was based on an extensive statistical analysis of traffic measurements from Ethernet local-area networks over a four-year period from 1989-1993 [32, 33]. A number of key follow-up studies have provided further evidence of the prevalence of self-similar traffic patterns in measured pre-Web Internet traffic [42, 43] and post-1995, Web-dominated Internet traffic [11, 12] (see also [54, 41, 15] and references therein) and have contributed to a general acceptance of self-similarity as an invariant [23] of Internet traffic—a traffic characteristic that has been largely insensitive to the sometimes drastic changes the network and its traffic have undergone during the past 10 or so years. Subsequent empirical studies have refined this picture by focusing on measured network traffic over very large as well as over very small time scales. Over large time scales (e.g., hours or days), traffic has been found to be largely dominated by pronounced time-of-day and dayof-week effects (e.g., see [46, 48]), and traffic models used for network engineering purposes such as link dimensioning and capacity planning need to account for this property of network traffic. With respect to the dynamic nature of network traffic over small time scales (i.e., below the typical round-trip time (RTT) of a packet), recent work has demonstrated that it also deviates from the self-similar scaling behavior that has been observed over larger than RTT time scales. However, in contrast to incorrect claims about an apparent Poisson-like dynamics of Internet traffic on sub-RTT time scales (e.g., see [9, 30]), there generally exists significant burstiness even on very small time scales, mainly due to the closed-loop feedback dynamics inherent in TCP-dominated Internet traffic (e.g., [19, 21, 22, 17]) and the intricate traffic interactions that can occur within a router and across the network. On the one hand, this understanding has provided new insight into when the use of self-similar traffic models such as fractional Brownian motion (or equivalently, its increment process, fractional Gaussian noise) is justified and can be exploited for network engineering purposes [18]. On the other hand, it has also highlighted the need for a paradigm shift in network traffic modeling, whereby the currently employed strictly open-loop traffic models need to be replaced by constructs that can account for the critical features of TCP-type feedback regulation. The importance of considering relevant closed-loop traffic models becomes apparent when studying

218

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

networking problems such as adequate sizing of router buffers in today’s networks (e.g., see [27, 7, 17] or helping a service provider to offer and guarantee competitive service-level agreements (SLAs) to its customers [50]. In effect, these empirically-based efforts toward describing actual Internet traffic have demonstrated that self-similar processes, despite their limitations on both sides of the spectrum of relevant time scales, define an elegant family of compact mathematical models for capturing the essence behind the wide range of “burstiness” or scale-invariance encountered in measured traffic traces. In turn, the self-similarity discovery has invigorated research in the area of statistics for long-range dependent and self-similar stochastic processes (e.g., see [8, 49]). More importantly, however, it has motivated the construction of new mathematical models that provide a physical (i.e., networking-based) explanation of the observed self-similar scaling behavior of Internet traffic that is intuitively appealing, conceptually simple, mathematically rigorous, and verifiable. Recognizing that it is difficult to think of many other areas in the sciences where the available data provides such detailed information about so many different facets of the system under study, these models have by and large succeeded in demystifying self-similarity as an Internet traffic characteristic by explicitly accounting for key aspects of the design and architectural principles of today’s Internet and enabling direct model validation that relies on and exploits the high semantic context contained in the measured data. They are in stark contrast to the traditional traffic models that are “black boxes” in the sense that they ignore nearly all of this rich semantic context (they tend to use only packet arrival time and packet size information), describe the traffic traces at hand well in a statistical sense, but typically contribute little or nothing to our understanding of data networks and the traffic they carry. To illustrate, given accurate packet header information, measured Internet traffic can be sliced and diced in many different ways, resulting in a number of different representations of network traffic as seen on a single link. For example, by extracting packet header-specific information such as source- and/or destination IP address or prefix, source- and/or destination port numbers, protocol-, or applicationspecific attributes, it is possible to uniquely associate each packet with the IP flow, TCP connection, and sometimes even application-layer entity that it belongs to. Such mappings decompose traffic naturally into individual constituents that have network-specific meaning and explicitly reflect the layered design of the Internet architecture—IP flows allow for an IP layer view of traffic, TCP connections for a TCP layer perspective, and higher-layer entities provide a glimpse into network traffic at the application layer. In turn, such decompositions support model formulations that treat traffic on a link as an aggregate of many such constituents and invite model constructions that don’t view packets as black boxes but are given in terms of these constituents. An elegant mathematical framework for generating and analyzing such physical or “structural” models is due to Kurtz [31]. Kurtz’s construction considers traffic models that are integral representations with respect to certain Poisson random measures, and in it‘s general form, includes well-known earlier approaches such as Cox’s construction [10] (also known as immigration death process or M/G/∞ queuing model) and Mandelbrot’s construction [37] (also known as renewal-reward process). In its basic form, Kurtz’s construction accounts for the layering architecture of the Internet by assuming for example that at the application layer, sources or sessions (e.g., ftp, http, telnet) arrive at random (i.e., according to some stochastic process) on the link and have a “lifetime” or session length during which they exchange information. At the IP layer, this information exchange manifests itself as

Multiresolution Analysis and Internet Traffic

219

a flow of IP packets that are transmitted at, say, some constant rate from the start until the end of a session. Thus, at the IP layer, the aggregate link traffic measured over some time period is made up of the contributions of all the sources that—during the period of interest—actively transmitted packets. More formally, one representation of this aggregate link traffic or workload process is as follows. Let the source activation process be N = (N (t) : t ≥ 0), denoting the number of source activations up to time t; for the i-th activation, let Xi (s) denote the total traffic generated by source i during the first s units of time. We model the length of time τi that source i remains active separately from Xi and assume that the pairs (Xi , τi ) are i.i.d. The total link traffic or workload generated up to time t can then be written as  t XN (s) (τN (s) ∧ (t − s))dN (s), (2.1) U (t) = 0

and if L = L(t) : t ≥ 0) denotes the number of sources that are active at time t, we have  t I[0,τN (s) ) (t − s))dN (s). (2.2) L(t) = 0

Assuming that N is a counting process with intensity λ(N, U, L, ·), that is,  (2.3)

N (t) −

t

λ(N, U, L, s)ds 0

is a martingale with respect to the filtration generated by the random variable {N (s), U (s), L(s); s ≤ t}, the process (N, U, L) can be represented as the solution of a system of stochastic equations involving a Poisson random measure (see [31] for details). This representation provides a convenient mathematical framework for studying scaling limits of the process (N, U, L) that yield deterministic “fluid approximations” or corresponding central limit theorems. For example, by appropriately scaling session intensity and time, the workload process U can be shown to converge to a self-similar limiting process, namely fractional Brownian motion (or its increment process, fractional Gaussian noise), provided the session arrivals follow a Poisson process, the sessions share the bandwidth in a “fair” (i.e., TCP-like) manner, and, more importantly, the session durations or lifetimes are i.i.d. and have a distribution that is heavy-tailed with infinite variance [5, 45]. Intuitively, the latter condition implies that there is no “typical” session size but instead, the session sizes are “highly variable” (i.e., exhibit infinite variance) and fluctuate over a wide range of scales, from Kilobytes to Megabytes and Gigabytes and beyond. It is this basic characteristic at the higher layers in the TCP/IP protocol stack that causes the aggregate traffic at the IP layer to exhibit self-similar scaling. By relaxing the fair bandwidth sharing assumption, allowing for more realistic within-session traffic rates, or manipulating the relative speed with which the number of sessions and the time scale increase, other types of (self-similar and not self-similar) limiting workload processes are possible, including L´evy-stable motion or its increment processes, fractional L´evy-stable noise (for more details, see [31, 34, 52, 39, 44, 28]). The beauty of structural models such as Kurtz’s construction is that in stark contrast to the conventional black box models, they not only explain the selfsimilarity phenomenon in simple terms (i.e., heavy-tailed connections), but they

220

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

also clearly identify the data sets that need to be either obtained from new measurements or extracted from the available IP packet-header traces to validate the proposed explanation. This “closes the loop” between the empirical discovery of the self-similar scaling behavior of aggregate Internet traffic on the one hand, and its mathematical explanation in terms of infinite variance phenomena associated with meaningful quantities at the higher layers in the TCP/IP protocol stack on the other. For example, because of the way many applications are structured, determining session-related entities such as arrival times and sizes or durations from packet-level measurements is straightforward. For ftp and telnet, these entities have been shown to be consistent with Kurtz’s construction in [43]. For http (i.e., Web sessions), obtaining session information is generally more involved [29], but the empirical evidence for the heavy-tailed characteristic of Web-related entities (e.g., http request sizes and durations) has been well-established to date (see for example [56, 12, 14, 57]). In fact, heavy-tailed characteristics of higher-layer entities such as IP flows, TCP connections, or sessions constitute yet another set of Internet traffic invariants. While the self-similar scaling behavior across a range of intermediate time scales of Internet traffic at the IP layer (i.e., the time series representing the number of packets or bytes per time unit) is well documented, an equally intriguing scaling property of Internet traffic across the higher layers in the TCP/IP protocol stack has received comparatively little attention. For example, instead of viewing Internet traffic at the IP layer in terms of a time series representing the number of IP packets per time unit, we can consider physically meaningful “coarsened” versions by, for example, defining Internet traffic at the IP layer as given by the time series representing the number of IP flow arrivals per time unit or, for that case, Internet traffic at the TCP layer as given by the time series representing the number of TCP connection arrivals per time unit. As originally pointed out in [20], the latter also exhibit self-similar scaling characteristics which have in fact become more pronounced as the traffic mix at the application layer has changed from mostly telnet and simple use of email and ftp during the pre-Web period to predominantly Webbased after about 1995 [20, 21]. Note that Kurtz’s construction applies equally well for explaining the self-similar scaling behavior of these coarsened versions of Internet traffic as observed at the IP and TCP layers, respectively, and simply requires the distribution of the number of IP flows (or TCP connections) per session to be heavy-tailed with infinite variance [21]. Together, these observations suggest that the different self-similar scaling phenomena observed in measured Internet traffic are mainly caused by user/application characteristics, have little to do with the network (except that it imposes some fair sharing of bandwidth), and are likely to remain with us in the foreseeable future. Note that to arrive at this basic understanding of the temporal dynamics of Internet traffic as seen on a single link within the network, the development and application of a 1D wavelet-based MRA in support of a detailed examination of measured Internet traffic over a wide range of time scales of interest has been of critical importance [2, 3, 4]. Acting as an analytic telescope, this wavelet-based MRA has been ideal for the study of scaling properties and as such, has enabled a data analysis that matches well with the properties encountered in measured Internet traffic. Moreover, this technique has been instrumental in demonstrating that alternative models that are capable of reproducing long-range dependencies or self-similar scaling behavior (e.g., conventional stochastic processes with built-in non-stationarities such as deterministic monotonic trends or level shifts of the mean) are by and large inconsistent with measured network traffic (e.g., see for example [55, 2]). This

Multiresolution Analysis and Internet Traffic

221

development has been accompanied by equally important advances in the area of inference for heavy-tailed phenomena (e.g., [45, 5]). In particular, the high-volume of the available data sets has motivated a pragmatic approach to dealing with high-variability in network measurements that has its roots in Mandelbrot’s early work [36]. This approach is described in [58] and clarifies in which sense higher layer traffic quantities such as sizes or durations of IP flows, TCP connections, or sessions are fully consistent with proper infinite variance distributions, but are by and large inconsistent with conventional, finite variance distributions such as Lognormal or Weibull distributions. Thus, as far as the self-similar scaling behavior of Internet traffic is concerned, the explanation in terms of high-variability phenomena (i.e., infinite variance distributions) at the higher layers in the protocol stack remains to date the only model that is mathematically rigorous and consistent with measured network traffic as it manifests itself (in different incarnations) across the different layers of the protocol stack. 3. Network-wide traffic: Traffic matrices The traffic observed on a single link within an ISP’s network arrives at that link coming from possibly many different sources and leaves the link destined to possibly many different destinations. While our main focus in Section 2 was on the temporal dynamics of Internet traffic as observed on a single link, here we are less concerned with its temporal aspects, but are mainly interested in its spatial properties. To this end, the objects of interest are traffic matrices [38, 6], and to simplify the presentation, we discuss traffic matrices in the context of a single network or domain (i.e., a single network that connects end-systems, but does not connect to other networks). 3.1. Traffic matrices at different levels of resolution Traffic matrices describe the amount of traffic from one point in a network to another during some time interval, and are thus naturally represented by a threedimensional data structure Tt (i, j) which represents the traffic volume (in bytes or packets) from i to j during a time interval [t, t + Δt). The locations i and j are generally considered to be discrete in nature, i.e., they are drawn from some set of possible locations. We may consider these locations to be physical geographic locations, thereby making i and j spatial variables. However, in the Internet, it is common to associate i and j with logical structures related to the address structure of the Internet, i.e., IP addresses, or natural groupings of such by a common prefix corresponding to a subnet. In the case that locations are modeled spatially, we call the set of possible locations L, while the set of address-based locations will be denoted I. In general, there is some correspondence between the two, but the mapping is not one-to-one, and so we cannot in general map a location to an IP address, or vice versa. Given some set of locations I, we can easily aggregate a traffic matrix across sets S, D ⊂ I to obtain  Tt (S, D) = Tt (i, j). i∈S j∈D

As with any time series, we can also perform standard aggregation operations in time, making it relatively straightforward to create multiple approximate representations of the original traffic matrix at different levels of resolution.

222

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

Fig 1. Example network.

However, such an approach, while potentially useful, might provide only limited additional insight into the nature of traffic matrices. The key to interesting approximation lies in the choice of sets S, D used at each step in the aggregation of the matrices. The reason for this lies in the designed structure of a network. For instance, consider the network in Figure 1. The figure shows a toy network comprising two regional networks, where each subnet contains several Points-of-Presences (PoPs), each of which in turn contains a number of routers, which connect to multiple end-systems. It seems obvious that this purposefully engineered hierarchy should be related to the manner in which we perform the “coarsening” of traffic matrices. For instance, in Figure 1, we might naturally consider the end systems to be the locations of interest, i.e., A = {a, b, c, d, e, f, g, h, i, j, k, l, m}, and then aggregate first by router, so that we take sets (1)

S1

(1) S4 (1) S7 (1) S10

= {l, k}, = = =

(1)

S2

= {m, n},

(1) {i, j}, S5 = {}, (1) {}, S8 = {a, b}, (1) {}, S11 = {e, f },

(1)

S3

= {},

(1) S6 = {}, (1) S9 = {c, d}, (1) S12 = {g, h}.

Note that, in reality there would likely be many more end-systems, and hence the sets would be rather larger. We could then aggregate into PoPs, such that (2)

(1)

(1)

(1)

S X = S 1 ∪ S2 ∪ S3 , and regions such that (3)

(2)

(2)

SA = S X ∪ S Y , and so on for other regions and PoPs.

Multiresolution Analysis and Internet Traffic

223

Note that the superscript in the sets above is used to denote the level (scale) of (i) (i) resolution (approximation) that would be involved in calculating Tt (Sa , Sb ). We (i) will typically denote a traffic matrix aggregated across sets Sa by T (i) , but we will also retain this notation for any approximation at level i. In the network above, the topological hierarchy is defined by the administrator and has some meaning, either geographically, or managerially. However, it may not be obvious in some networks what the natural groupings are. For instance, regions may not be well defined in many networks. In this case, it may make sense to group the end hosts using a clustering algorithm based on network distances: many networks use shortest-path routing where the link weights (distances) are administratively defined, and these distances between end-points define a natural clustering, or hierarchy on the network. Similarly, there may be circumstances where the logical hierarchy is more important than the physical topology when creating approximate representations of the network. For example, IP addresses have a natural hierarchy, which does not necessarily mesh with geography. Another example occurs when end-points connect to multiple points in the network (for redundancy), and it might make sense to aggregate over these logically related end-points. In particular cases, we may be able to define a natural logical hierarchy suitable for generating appropriate approximations. However, in many cases, there may be no obvious logical hierarchy, in which case, part of our goal may be to search for the “best” hierarchical decomposition. 3.2. Towards a MRA of traffic matrices Multi-Resolution Analysis (MRA) (or alternatively Multi-Resolution Approximation) refers to the process of creating multiple approximate representations of an object (e.g., traffic matrix), such that these have different resolution. In the wellknown context of wavelets, fast algorithms exist to calculate these approximations at a countable number of resolutions (for a different approach using wavelets for spatial traffic analysis, see [13]). MRA can be useful for a number of problems, including denoising, compression, and anomaly detection. Here we wish to extend these ideas to Internet traffic matrices so that we may be able to use these types of applications in practice, but more importantly, to gain fundamental insight into the nature of actual Internet traffic matrices. However, there is more to MRA than simple aggregation/approximation, and one of the main objectives is to be able to decompose a given traffic matrix in such a way that when we form our successive approximate representations (using a “decomposition algorithm”), we can also retain enough information to reverse the approximation process—in wavelet parlance, we wish to retain the details necessary for obtaining high-fidelity reconstructions (using a “reconstruction algorithm”) [16, 53]. In essence, the approach is intended to find sparse representations of traffic matrices such that one can represent their important features with a small set of numbers. Our objective is to understand traffic matrices at a level which will aid in synthesis (artificial generation of traffic matrices for the purpose of simulations [47]) or inference (statistical estimation of a traffic matrix from link load data [59]). In this context, sparse representations of traffic matrices are of special interest. For example, for synthesis, they reduce the number of parameters that must be tuned or estimated. For inference, the key problem is the massively under-constrained nature of the linear-inverse problem that must be solved—if we can reduce the problem to inference of a smaller number of parameters, then it will no longer be

224

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

under-constrained. To illustrate some of the features of the envisioned MRA of traffic matrices, we are motivated by existing work in image compression. To this end, consider a n × n traffic matrix T and let G represent the analysis matrix of a wavelet transform. For example, when using the Haar transform, the analysis matrix is given by ⎞ ⎛ 1 1 0 ... 0 ⎜0 0 1 1 0 . . . . . . 0 ⎟ ⎟ ⎜ ⎜ .. .. .. ⎟ ⎜. . . ⎟ ⎟ 1 ⎜ ⎜ 0 1 1⎟ (3.1) G = √ ⎜0 0 ⎟. 2 ⎜1 −1 0 . . . 0 ⎟ ⎟ ⎜ ⎜0 0 1 −1 0 . . . . . . 0 ⎟ ⎠ ⎝ .. . 1 −1 While generalizations of the wavelet transform to multiple dimensions are known, one of the simplest methods for applying such a transform in higher dimensions is in a separable fashion. That is, given a traffic matrix T , a separable wavelet transform matrix is computed by applying the analysis matrix G to the rows and columns of T separately. This results in the matrix A given by (3.2)

A = GT Gt ,

where Gt is the transpose of G. One of the reasons why wavelet transforms are widely used is that the resulting wavelet representations tend to concentrate energy in a few of the coefficients, thus resulting in a sparse representation. To explain this in more detail, consider a partition of the matrix G into two block submatrices of size n/2 × n; that is, 

G1 . (3.3) G= G2 This in turn allows us to partition the wavelet transformed matrix A as follows 

A11 A12 (3.4) A= A21 A22 where Aij = Gi T Gtj . The wavelet coefficients in each of the four submatrices allow for different interpretations, exhibit different behavior, and can be quantized differently in order to gain a compression advantage. For example, as in image compression, where using A11 alone to reconstruct the image will produce a smoothed version of the original image, in the case of traffic matrices, A11 defines an approximate “coarsened” version of the original traffic matrix that is obtained by aggregating over appropriate rows and columns of T . On the other hand, the details are contained in the other submatrices, and their contributions to the reconstructed traffic matrix can be controlled by thresholding them and by tuning the value of the threshold. In the area of image compression, much is known about the subtle correlations between the wavelet coefficients, especially when the wavelet transform is applied repeatedly. However, in the context of traffic matrices, such an understanding is still missing. Note that separability in the context of traffic matrices has a clear interpretation. Aggregating rows corresponds to aggregating source nodes and aggregating columns corresponds to aggregating destination nodes. In effect, separability allows for a

Multiresolution Analysis and Internet Traffic

225

2D transform that reduces to two 1D transforms, one for source nodes, the other for destination nodes. Since the rows and columns selected for aggregation by the analysis matrix G are fixed and may not correspond to any physically, managerially, or logically defined hierarchy, we can provide some extra flexibility by permuting the rows and columns using permutation matrices Πr and Πtc , respectively. This results in more flexible wavelet transformed matrices of the form (3.5)

A = GΠr T Πtc Gt .

This of course leads to the problem of permutation matrix selection, a hard combinatorial problem by itself, unless the permutations to use are obvious and arise naturally within the hierarchical network structure of interest. As far as reconstruction is concerned, the wavelet transformed matrix A may be inverted through the use of the synthesis matrix H = G−1 , via the equation T = HAH t . The wavelet transform is information preserving and thus T can be recovered exactly1 . However, from a modeling perspective, we would like to study the quality of the resulting traffic matrix estimate when the wavelet coefficients are modified in some way, e.g., by setting some to zero through an appropriately designed thresholding operation. This requires that adequate distance measures be used (e.g., Kullback–Leibler divergence, norm-based metrics such as l2 -norm or Frobenius norm), but a detailed study of appropriate evaluation metrics is beyond the scope of this paper and will appear elsewhere. As an example, in Section 3.3 below, we will consider a reconstruction of the form

 A11 0 ˆ H t. (3.6) T =H 0 0 Concerning the structure of the wavelet transformed traffic matrix itself, note that in the case of the Haar analysis matrix, A11 is guaranteed to have non-negative entries (since the entries in G1 are non-negative) and can therefore be thought of as a genuine traffic matrix. The other submatrices carry detail information that is lost in the aggregation and will in general have non-negative as well as negative entries. A traffic matrix T is called a gravity matrix or gravity model if T = uv t for some vectors u and v. Gravity models have been used successfully as models for traffic matrices in inference and synthesis [47], though they have limitations [6]. Note that the defining property of a gravity matrix corresponds to separability of the traffic matrix. Moreover, if T is a gravity matrix, we have (3.7)

A = (Gu)(Gv)t ,

that is, A is also a gravity matrix. Since this is independent of the precise type of analysis matrix G, the property of being a gravity matrix is preserved not only under aggregation, but under other forms of filtering as well. Similarly, it also follows that each of the submatrices Aij is also a gravity matrix. In terms of reconstructed traffic matrices, it is not hard to see that the reconstruction (3.6) will result in a gravity matrix provided the submatrix matrix A11 is a gravity matrix. On the other hand, the reconstruction 

A11 0 ˆ (3.8) T =H H t. 0 A22 1 This is assuming infinite precision arithmetic if the wavelet analysis matrix contains irrational numbers. The desire for exact reconstruction is one of the motivations for considering lifting schemes.

226

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

will not be a gravity matrix if A22 is nonzero. Thus if we wish the approximate traffic matrix

 A˜11 A˜12 ˆ Ht (3.9) T =H ˜ ˜ A21 A22 to be a gravity matrix, we must be careful to ensure that A˜ (with submatrices A˜ij ) is also a gravity matrix. Gravity models are of particular interest in the context of MRA of traffic matrices because the underlying assumption of the gravity model (i.e., traffic homogeneity) is expected to improve with aggregation. Larger aggregates of traffic should behave more and more like a gravity model, until the top level approximation (just the total traffic in the network) is exactly represented by such a model. Note however that systematic biases away from a gravity model may be regional, so aggregating topologically may actually result in delayed convergence to the gravity model, whereas, randomized aggregation may actually converge quite quickly to fit a gravity model. Other approaches to aggregation that are less oblivious to actual routing of the traffic through the network may have the benefit of quick convergence and lack of systematic bias. 3.3. A look at real traffic matrices

Fig 2. The Abilene network.

To illustrate various features of actual traffic matrices, we first consider the Abilene network shown in Figure 2. Abilene [1] is the U.S. Internet backbone for higher education. It is comprised of high-speed connections between core routers (Juniper T640) which are located in 11 U.S. cities, with the Atlanta node consisting of two core routers (i.e., ”Atlanta” and ”Atlanta-M5”). The Abilene backbone is shown in Figure 2(a) and is a sparsely connected mesh; connectivity to regional and local customers is not shown but is provided with some minimal amount of redundancy. Abilene maintains peering connections with other higher educational networks (domestic and international) but does not connect directly to the commercial Internet. This feature is shown in Figure 2(b) which depicts the connectivity of Abilene’s core router in Washington, D.C. at the level of populated router interfaces (numbered 1–16). For example, interfaces 3 and 5 connect to the core router in New York and

Multiresolution Analysis and Internet Traffic

227

one of the core routers in Atlanta, respectively; interfaces 4, 6, 11, and 12 connect to Internet exchange points; and the other interfaces shown reflect peering connections to customers such as AS81 which belongs to the North Carolina Research and

(a) Abilene backbone network

(b) Local router (Washington, D.C.)

Fig 3. Measured traffic matrices.

(a) Abilene traffic matrix elements

(b) Local router traffic matrix elements

Fig 4. Measured traffic matrices over time.

A snapshot of Abilene’s traffic matrix is shown in Figure 3(a) and depicts the amount of traffic carried between each Abilene node on 09/01/2006. For that same day, the local router traffic matrix for the Washington, D.C. node is shown in Figure 3(b). Note that the large diagonal elements in Figure 3(a) reflect a pronounced locality property of Abilene traffic, while the local router traffic matrix in (b) is largely determined by the configuration of this core router (i.e., which interface carries which in- and out-going traffic). Plotting in Figure 4(a) the values of the 12 largest elements of the traffic matrix in Figure 3(a) for successive 1-hour intervals for the 6-day period from 09/01/2006 to 09/06/2006 shows the presence of a dominant diurnal cycle that has been well-documented in past studies of single-link traffic dynamics over large time scales [48]. A very similar behavior can be observed for the entries on the local router traffic matrix in Figure 4(b). Considering the static Abilene traffic matrix T in Figure 3(a), two simple approximations are obtained by computing the corresponding gravity model TG and (3.6). Note deriving the wavelet transformed model TW of the form given # by  that for the gravity model, we have TG = uv t with ui = (1/ (S)) j Ti,j and

228

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

(a) Gravity model TG

(b) Wavelet model TW

Fig 5. Approximate traffic matrices.

(a) Difference between T and TG

(b) Differences between T and TW

Fig 6. Quality of traffic matrix approximations; red bars (pointing up) are positive, blue bars (pointing down) are negative differences.

Multiresolution Analysis and Internet Traffic

229

#   vj = (1/ (S)) i Ti,j , where S = i,j Ti,j . To derive a simple, yet meaningful wavelet transformed traffic matrix, we aggregate the Abilene nodes geographically in pairs of two as shown in Figure 2(a) by using appropriate permutation matrices Πr = Πc , compute the wavelet transformed matrix A via (3.5), and set TW = Tˆ where Tˆ is given by equation (3.6). While Figure 5 shows the two approximate traffic matrices, Figure 6 depicts the differences between T and TG , and between T and TW , respectively. Note that while neither approximations can account for the large diagonal elements of T , the wavelet transformed traffic matrix TW results in a qualitatively better approximation of T than the gravity model TG . At the same time, Figure 6(b) also shows the effects of relying on the simple dyadic structure associated with the Haar transform when choosing the matrix A given by (3.1) as our analysis matrix. When comparing Figures 3(a) and 5(b), this aggregation into groups of two appears as the most significant difference between the original and the wavelet transformed traffic matrices and suggests alternate and more flexible choices of wavelet transforms and corresponding analysis matrices. However, not every choice that is meaningful from a networking perspective is feasible from an MRA perspective (i.e., A may not be invertible, causing problems for the reconstruction), and herein lies much of the tension that exists between developing a MRA that is, one the one hand, suitable for the Internet context and, on the other hand, amenable to a rigorous mathematical treatment. 4. Summary and Outlook By combining the analysis of single-link traffic rate processes with the more recent studies of network-wide traffic matrices, a detailed exploration of Internet traffic as a spatial-temporal object across the different layers of the TCP/IP protocol stack looms as a real possibility. However, to study Internet traffic over a wide range of scales in space and time and across different layers will require a dramatic widening of MRA technology as it is known and used today. In Section 3, we discussed some basic features of such an MRA for the case of static traffic matrices, but much work remains even in this case where temporal and layer-specific aspects are largely suppressed and the focus is on the spatial characteristics of the total traffic volumes exchanged between pairs of nodes in the network. In particular, we would like to know how to coarsify traffic matrices in such a way that the reconstructed approximations automatically satisfy the non-negativity constraints and can therefore be interpreted as genuine traffic matrices. In the case of wavelet transformed matrices, we are especially interested in thresholding techniques that ensure non-negativity of the reconstruction, especially when the transform is applied iteratively. Other open issues concern the choice of appropriate metrics for comparing different traffic matrices across scales and within a given scale; the development of flexible “zoom-in” capabilities for exploring Internet traffic localized in time, space, or layer; and the use of non-separable wavelet-transform matrices to develop truly 2D wavelet-based MRA schemes. In its full-blown version, the envisioned MRA framework promises to significantly advance Internet theory and practice. For example, in terms of its ability to impact a more theoretical study of the Internet, it would provide a framework for unifying various Internet congestion control modeling and analysis approaches found in the current literature. On one end of the spectrum, by concentrating on the transport layer and accounting for very fine scales in space (e.g., link-to-link, host-to-host), but considering a largely trivial temporal dynamic (e.g., infinite source models),

230

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

the proposed framework incorporates the scenarios treated in recent work by Low et al. [35, 40, 51] on the existence, uniqueness, and stability of equilibria of heterogeneous congestion control in general networks. On the other end of the spectrum, when focusing on the same transport layer and allowing for very fine scales in time (e.g., flow-level source models), but requiring an essentially trivial spatial structure (e.g., linear networks), it also captures the setup considered in recent work by Gromoll and Williams [25, 26] who study stability and heavy traffic behavior of a general stochastic flow model of congestion control for two very specific types of networks. The challenge will be to bridge the gap between these two extremes and establish similar existence, uniqueness, and stability results for models of Internet congestion control that allow for very fine scales in time and space. This is closely related to the problem of generalizing Kurtz’s construction to network-wide traffic matrices by (i) accounting for the spatial aspect of Internet traffic, (ii) incorporating those mechanisms of Internet congestion control that shape the behavior of network-wide traffic at the transport layer over sufficiently large time scales, and (iii) explaining features of an overall traffic matrix in terms of application-specific traffic matrices (e.g., Web traffic only, Peer-to-Peer traffic only). From a more practical perspective, the envisioned MRA technology can also be expected to aid the development of novel and powerful tools for root-cause analyses of network failures or detection of different types of unwanted traffic (e.g., spam, botnets, worms, viruses). The ability to examine network traffic measurements in a systematic manner across many different time scales, over a variety of different spatial scales (e.g., IP address, prefix, autonomous domains), and at the different layers in the TCP/IP protocol stack suggests a holistic approach to exploiting Internet-related measurements that has been largely absent to date. In particular, it argues for tools and techniques with “drill-down” or “zoom-in” capabilities that are informed by coarse-scale representations of the data and are guided by a detailed understanding of the correlations that might exist at the different scales in time, over space, and across layers. While multi-scale approaches to, for example, network intrusion detection have been popular in the recent past, the main challenge here will be to fully exploit the multi-dimensional aspect of scale and not treat it one dimension at a time. Acknowledgments Matthew Roughan’s participation in this work was supported by the Australian Research Council Grant DP0665427. References [1] Abilene Network. http://www.internet2.edu/abilene. [2] Abry, P. and Veitch, D. (1998). Wavelet analysis of long-range dependent traffic. IEEE Transactions on Information Theory 44 (1) 2–15. [3] Abry, P, Taqqu, M. S., Flandrin, P. and Veitch, D. (2000). Wavelets for the analysis, estimation, and synthesis of scaling data. In Self-Similar Network Traffic and Performance Evaluation (K. Park and W. Willinger, eds.) 39–88. Wiley, New York. [4] Abry, P., Flandrin, P., Taqqu, M. S. and Veitch, D. (2003). Selfsimilarity and long-range dependence through the wavelet lens. In Long-range

Multiresolution Analysis and Internet Traffic

[5] [6]

[7]

[8] [9]

[10]

[11]

[12]

[13] [14]

[15] [16] [17] [18]

[19]

[20]

[21]

[22]

231

Dependence: Theory and Applications (P. Doukhan, G. Oppenheim and M. S. Taqqu, eds.) 527–556. Birkh¨ auser, Boston. Adler, R. J., Feldman, R. E. and Taqqu, M. S. (1998). A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkh¨ auser, Boston. Alderson, D., Chang, H., Roughan, M., Uhlig, S. and Willinger, W. (2006). The many facets of Internet topology and traffic. Networks and Heterogeneous Media 1 (4) 569–600. Appenzeller, G., Keslassy, I. and McKeown, N. (2004). Sizing router buffers. Computer Communication Review (Proc. of ACM/Sigcomm’04, Portland, OR) 34 (4) 281–292. Beran, J. (1994). Statistics for Long-Memory Processes. Chapman & Hall, New York. Cao, J., Cleveland, W. S. and Sun, D. X. (2002). Internet traffic tends toward Poisson and independent as the load increases. In Nonlinear Estimation and Classification (C. Holmes, D. Dennison, M. Hansen, B. Yu and B. Mallick, eds.) 83–109. Springer-Verlag, New York. Cox, D. R. (1984). Long-range dependence: A review. In Statistics: An Appraisal (H. A. David and H. T. David, eds.) 55–74. Iowa State University Press, Ames, Iowa. Crovella, M. E. and Bestavros, A. (1996). Self-similarity in World Wide Web traffic—evidence and possible causes. Proc. ACM/Sigmetrics’96 160–169. Philadelphia, PA. Crovella, M. E. and Bestavros, A. (1997). Self-similarity in World Wide Web traffic—evidence and possible causes. IEEE/ACM Transactions on Networking 5 835–846. Crovella, M. E. and Kolaczyk, E. (2003). Graph wavelets for spatial traffic analysis. Proc. IEEE Infocom. Crovella, M. E., Taqqu, M. S. and Bestavros, A. (1998). Heavy-tailed probability distributions in the World Wide Web. In A Practical Guide to Heavy Tails: Statistical Techniques and Applications (R. Adler, R. Feldman and M. S. Taqqu, eds.) 27–53. Birkh¨ auser, Boston. Crovella, M. E. and Krishnamurthy, B. (2006). Internet Measurements: Infrastructure, Traffic, and Applications. J. Wiley & Sons, New York. Daubechies, I. (1992). Ten Lectures on Wavelets. CBMS-NSF Regional Conference Series in Applied Mathematics 61. SIAM. Jiang, H. and Dovrolis, C. (2005). Why is Internet traffic bursty in short (sub-RTT) time scales? Proc. ACM/Sigmetrics’05. Banff, Canada. Erramilli, A., Narayan, O. and Willinger, W. (1996). Experimental queueing analysis with long-range dependent packet traffic. IEEE/ACM Transactions on Networking 4 (2) 209–223. Erramilli, A., Roughan, M., Veitch, D., and Willinger, W. (2002). Self-similar traffic and network dynamics. Proceedings of the IEEE 90 (5) 800–819. Feldmann, A. (2000). Characteristics of TCP connection arrivals. In SelfSimilar Network Traffic and Performance Evaluation (K. Park and W. Willinger, eds.) 367–399. Wiley, New York. Feldmann, A., Gilbert, A. C., Willinger, W., and Kurtz, T. G. (1998). The changing nature of network traffic: Scaling phenomena. Computer Communication Review 28 5–29. Feldmann, A., Gilbert, A. C., Huang, P., and Willinger, W. (1999). Dynamics of IP traffic: A study of the role of variability and the impact of

232

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

control. Proc. ACM/Sigcomm’99 301–313. Cambridge, MA. [23] Floyd, S. and Paxson, V. (2001). Difficulties in simulating the Internet. IEEE/ACM Transactions on Networking 9 (4) 392–403. [24] Fowler, H. J. and Leland, W. E. (1991). Local area network traffic characteristics, with implications for broadband network congestion management. IEEE Journal on Selected Areas in Communication 9 1139–1149. [25] Gromoll, H. C. and Williams, R. J. (2006). Fluid limit of a network with fair bandwidth sharing and general document size distributions. Preprint. [26] Gromoll, H. C. and Williams, R. J. (2008). Fluid model for a data network with α-fair bandwidth sharing and general document size distributions: Two examples of stability. In Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz (S. N. Ethier, J. Feng and R. H. Stockbridge, eds.) 255–267. Institute of Mathematical Statistics, Beachwood, OH. [27] Joo, Y., Ribeiro, V., Feldmann, A., Gilbert, A. C. and Willinger, W. (2001). TCP/IP traffic dynamics and network performance: A lesson in workload modeling, flow control, and trace-driven simulations. Computer Communication Review 31 (2) 25–37. [28] Kaj, I. and Taqqu, M. S. (2005). Convergence to fractional Brownian motion and to the Telecom process: The integral representation approach. Preprint. [29] Kannan, J., Jung, J., Paxson, V. and Koksal, C. E. (2006). Semiautomated discovery of application session structure. ACM/Sigcomm Internet Measurement Conference IMC’06. Rio de Janeiro, Brazil, to appear. [30] Karagiannis, T., Molle, M. and Faloutsos, M.. Long-range dependence: Ten years of Internet traffic modeling. IEEE Internet Computing 8 (5) 57–64. [31] Kurtz, T. G. (1996). Limit theorems for workload input models. In Stochastic Networks: Theory and Applications (F.P. Kelly, S. Zachary and I. Ziedins, eds.) 119–139. Oxford University Press, Oxford, UK. [32] Leland, W. E., Taqqu, M. S., Willinger, W. and Wilson. D. V. (1993). On the self-similar nature of Ethernet traffic. Proc. of ACM/Sigcomm’93 183– 193. San Francisco, CA. [33] Leland, W. E., Taqqu, M. S., Willinger, W. and Wilson. D. V. (1994). On the self-similar nature of Ethernet traffic (extended version). IEEE/ACM Transactions on Networking 2 1–15. [34] Levy, J. B. and Taqqu, M. S. (2000). Renewal reward processes with heavytailed interrenewal times and heavy-tailed rewards. Bernoulli 6 (1) 23–44. [35] Low, S. H., Paganini, F. and Doyle, J. C. (2002). Internet congestion control. IEEE Control Systems Magazine (Feb.) 28–43. [36] Mandelbrot, B. B. (1963). New methods in statistical economics. Journal of Political Economics 71 421–440. [37] Mandelbrot, B. B. (1969). Long-run linearity, locally Gaussian processes, H-spectra and infinite variances. International Economic Review 10 82–113. [38] Medina, A., Fraleigh, C., Taft, N., Bhattacharyya, S. and Diot, C. (2002). A taxonomy of IP traffic matrices. Proc. SPIE ITCOM 2002. Boston, MA. [39] Mikosch, T., Resnick, S., Rootzen, H. and Stegeman, A. (2002). Is network traffic approximated by stable L´evy motion or fractional Brownian motion? Annals of Applied Probability, 12 (1) 23–68. [40] Paganini, F., Wang, Z., Doyle, J. C. and Low, S. H. (2005). Congestion control for high performance, stability, and fairness in general networks. IEEE/ACM Transactions on Networking 13 (1) 43–56. [41] Park, K. and Willinger, W. (2000). Self-Similar Network Traffic and Per-

Multiresolution Analysis and Internet Traffic

233

formance Evaluation. J. Wiley & Sons, New York. [42] Paxson, V. and Floyd, S. (1994). Wide-area traffic: The failure of Poisson modeling. Computer Communication Review (Proc. of ACM/Sigcomm’94, London, UK) 24 (4) 257–268. [43] Paxson, V. and Floyd, S. (1995). Wide area traffic: The failure of Poisson modeling. IEEE/ACM Transactions on Networking 3 226–244. [44] Pipiras, V., Taqqu, M. S. and Levy, J. B. (2004). Slow, fast, and arbitrary growth conditions for renewal reward processes when the renewals and the rewards are heavy-tailed. Bernoulli 10 121–163. [45] Resnick, S. I. (1997). Heavy tail modeling and teletraffic data. The Annals of Statistics 25 1805–1869. [46] Roughan, M. and Kalmanek, C. R. (2003). Pragmatic modeling of broadband access traffic. Computer Communications 26 (8) 804–816. [47] Roughan, M. (2005). Simplifying the synthesis of Internet traffic matrices. Computer Communication Review 35 93–96. [48] Roughan, M., Greenberg, A., Kalmanek, C., Rumsewicz, M., Yates, J. and Zhang, Y. (2003). Experience in measuring Internet backbone traffic variability: Models, metrics, measurements, and meaning. Proc. ITC 18 379– 388. Berlin, Germany. [49] Samorodnitsky, G. and Taqqu, M. S. (1994). Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, London. [50] Sommers, J., Barford, P., Duffield, N. and Ron, A. (2007). Efficient network-wide SLA compliance monitoring. Computer Communication Review (Proc. of ACM/Sigcomm’07, Kyoto, Japan) 39 (4), to appear. [51] Tang, A., Wang, J., Low, S. H., and Chiang, M. (2005). Equilibrium of heterogeneous congestion control: Existence and uniqueness. Proc. IEEE Infocom 2005. [52] Taqqu, M. S., Willinger, W. and Sherman, R. (1997). Proof of a fundamental result in self-similar traffic modeling. Computer Communication Review 27 5–23. [53] Vetterli, M. and Kovacevic, J. (1995). Wavelets and Subband Coding. Prentice Hall, Englewood Cliffs, NJ. [54] Willinger, W., Taqqu, M. S. and Erramilli, A. (1996). A bibliographical guide to self-similar traffic and performance modeling for modern highspeed networks. In Stochastic Networks: Theory and Applications (F. P. Kelly, S. Zachary and I. Ziedins, eds.) 339–366. Oxford University Press, Oxford, UK. [55] Willinger, W., Taqqu, M. S., Leland, W. E. and Wilson, D. V. (1995). Self-similarity in high-speed packet traffic: Analysis and modeling of Ethernet traffic measurements. Statistical Science 10 (1) 67–85. [56] Willinger, W., Taqqu, M. S., Sherman, R. and Wilson, D. V. (1997). Self-similarity through high-variability: Statistical analysis of Ethernet LAN traffic at the source level. IEEE/ACM Transactions in Networking 5 (1) 71– 86. [57] Willinger, W., Paxson, V. and Taqqu, M. S. (1998). Self-similarity and heavy tails: Structural modeling of network traffic. In A Practical Guide to Heavy Tails: Statistical Techniques and Applications (R. Adler, R. Feldman and M. S. Taqqu, eds.) 27–53. Birkh¨ auser, Boston. [58] Willinger, W., Alderson, D. and Li, L. (2004). A pragmatic approach to dealing with high-variability in network measurements. Proc. 2004 ACM/Sigcomm Internet Measurement Conference (IMC’04) 88–100.

234

Zhang, Ge, Diggavi, Mao, Roughan, Vaishampayan, Willinger, and Zhang

[59] Zhang, Y., Roughan, M., Lund, C. and Donoho, D. (2005). Estimating point-to-point and point-to-multipoint traffic matrices: An informationtheoretic approach. IEEE/ACM Transactions on Networking 13 947–960.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 235–251 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000408

Maximum Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input Tyrone E. Duncan1,∗ and Yasong Jin2,∗ University of Kansas Abstract: A fractional Brownian queueing model, that is, a fluid queue with an input of a fractional Brownian motion, has been applied in network modeling since the self-similarity and long-range dependence were observed in Internet traffic. In this paper, a fluid queue with an aggregated fractional Brownian input, which is a generalization of a fractional Brownian queueing model, is considered and the maximum queue length over a time interval [0, t] is studied. The impact of an aggregated fractional Brownian input on the queue length process is analyzed and the main results on the maximum queue length are compared with some related known results in the literature.

1. Introduction In the 1990s, researchers observed the properties of self-similarity and long-range dependence in Internet traffic. Since then, various models have been proposed to model these complex features. In [17], Norros proposed a fluid queueing model with an input of a fractional Brownian motion. Different from the traditional queueing models, a fluid model has an input process with a continuous sample path. Since a fractional Brownian motion with Hurst parameter H ∈ (1/2, 1) has the properties of self-similarity and long-range dependence, e.g., [7], [15], it is used to capture the complex features of Internet traffic. A fractional Brownian queueing model is a useful model for analyzing the impact of self-similarity and long-range dependence on the queueing performance, however there are some generic shortcomings in this model. Firstly, since the input process is Gaussian, negative increments, which are not meaningful for a queueing model, can be observed at small time scales. Secondly, the actual Internet traffic is regulated by TCP/IP protocol, which is a closed-loop congestion control mechanism. A fractional Brownian queueing model, which is open-loop as are many queueing models, cannot capture the dynamics of Internet traffic over small time scales, i.e., less than the typical round trip packet time. Although the model has some shortcomings, it can be used to approximate other aspects of Internet traffic under certain circumstances. It has been empirically demonstrated in [8] that a fractional Brownian queueing model is appropriate for the backbone traffic, in which millions of independent flows are highly aggregated, traffic control on a single flow would not dominate the whole traffic and the time scale is larger than the typical round trip time. In recent network measurements [12], it was observed that for small time scales, less ∗ Research

supported in part by NSF grant DMS 0505706 and ANI 0125410. Snow Hall, 1460 Jayhawk Blvd., Lawrence, KS 66045, e-mail: [email protected] 2 553 Snow Hall, 1460 Jayhawk Blvd., Lawrence, KS 66045, e-mail: [email protected] AMS 2000 subject classifications: Primary 60K25; Secondary 60G70 Keywords and phrases: fractional Brownian motion, maximum queue length, queueing model 1 512

235

236

Tyrone E. Duncan and Yasong Jin

than a millisecond, the traffic in the Internet backbone is memoryless or of short memory; while for larger time scales, in milliseconds, the long-range dependence characterizes the backbone traffic. From a practical point of view, see [18], [21], a fractional Brownian queueing model is an approximation of Internet traffic and can produce meaningful results for queueing performance, such as inter-congestion event times and congestion durations, which are in a time scale larger than the typical round trip time. In practice, it has been observed that the Hurst parameter estimated in network does not remain constant. For this reason, besides a fractional Brownian motion, other Gaussian processes have been proposed to model network traffic, such as an aggregation of independent fractional Brownian motions, [5], [19], [22, p 335] and an integrated Ornstein-Uhlenbeck process [3], [4], [5]. Here a queue with an aggregated fractional Brownian input is studied and the maximum queue length over a time interval [0, t] is analyzed. For a queue with a single fractional Brownian input, i.e., a fractional Brownian model, the maximum queue length was considered in [23]. In this paper, the results of [23] are extended, and the impact of an aggregated fractional Brownian input on the queueing behavior is analyzed. The structure of the paper is as follows: In Section 2, some preliminaries on a queueing model with an aggregated fractional Brownian input are given, the maximum queue length is defined and some results in the literature are reviewed. In Section 3, the main results are presented and compared with some known related results. Section 4 is devoted to the proofs of the main results, Theorems 3.1 and 3.2. 2. Preliminary The definition of a fractional Brownian motion is as follows. Definition 2.1. A standard . fractional Brownian motion with Hurst parameter H ∈ (0, 1), B H (t), t ∈ [0, ∞) on the complete probability space (Ω, F , P ) is a realvalued Gaussian process with continuous sample paths such that for s, t ∈ [0, ∞), E[B H (t)] = 0 and E[B H (s)B H (t)] = 12 [s2H + t2H − |s − t|2H ]. More properties of a fractional Brownian motion can be found in [7], [15] and the references therein. The queueing model is a single fluid queue with an infinite buffer size and a fixed service rate. Let A(t) = mt + Y (t) be the cumulated arrivals to the queue up to time t, where m is a mean input rate and Y = {Y (t), t ≥ 0} is a continuous Gaussian process with stationary increments. Let μ denote a service rate and c = μ − m be the surplus rate. For the stability of the queue, it is assumed that c > 0. In the case that Y is a fractional Brownian motion, this model is called a fractional Brownian queueing model, which is proposed by Norros [17] to capture the self-similarity of Internet traffic. Here a more general Gaussian process is considered initially. It is assumed that the input process Y is an aggregation of independent standard fractional Brownian motions, that is, (1)

Y (t) =

N 

σi B Hi (t)

i=1

where for i = 1, ..., N , σi ’s are real-valued coefficients and {B Hi (t), t ≥ 0} are independent fractional Brownian motions with Hurst parameters Hi ∈ (0, 1). Let

Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input

237

J ⊂ {1, ..., N } be the set of all indices j such that Hj = max1≤i≤N {Hi } and $ (2) σi2 . σ= i∈J

Note  that for N = 1, the model is a fractional Brownian model. Let ; queueing < N G = i=1 σi2 and γ = 2 min {Hi }, then for t ∈ [0, 1], E Y 2 (t) ≤ Gtγ . Based on 1≤i≤N

[14, Lemma 12.2.1], there exists a constant CG,γ > 0, which only depends on G and γ, such that for all x,

   (3) P sup Y (s) > x ≤ 4 exp −CG,γ x2 . 0≤s≤1

Let Q = {Q(t), t ≥ 0} denote the queue length process. In the literature, the process Q is also called a workload process or a storage process. Suppose Q(0) = 0, then for each time t ≥ 0, the queue length Q(t) can be written as Q(t) = Y (t) − ct + sup (−Y (s) + cs) .

(4)

0≤s≤t

In general, for 0 ≤ s ≤ t, Q(t) can be written in terms of Q(s) as Q(t) = Y (t) − ct + max sup (−Y (r) + cr) , Q(s) − (Y (s) − cs) . (5) s≤r≤t d

Let Q(∞) = limt→∞ Q(t) be the steady state queue length where the limit denotes convergence in law. Let M (t) denote the maximum of the queue length in [0, t], that is, M (t) = max Q(s).

(6)

0≤s≤t

The properties of M (t) have been analyzed for different queueing models, see e.g. [1], [2], [11], and were applied in network systems to estimate certain traffic parameters. In the context of renewal processes, that is, the queue length process is renewal, some asymptotic properties of the maximum queue length were analyzed in [9]. To discuss asymptotic properties of M (t), it is convenient to introduce a stationary version of the queue length process. It follows from [13], also see [9], [23], that one can construct a probability space supporting both the process {Y (t), t ≥ 0} and a stationary process Q∗ = {Q∗ (t), t ≥ 0} such that d

(i) Q∗ (t) = Q(∞) for all t ≥ 0, (ii) For t ≥ 0, (7)









Q (t) = Y (t) − ct + max Q (0), sup (−Y (s) + cs) . 0≤s≤t

Remark 2.1. Recall that Q(t) = Y (t) − ct + sup0≤s≤t (−Y (s) + cs), so it follows from (7) that for all t ≥ 0, Q∗ (t) ≥ Q(t). Let M ∗ (t) be the maximum of the queue length process Q∗ over an interval [0, t], that is, (8)

M ∗ (t) = max Q∗ (s). 0≤s≤t

The following proposition is used in the proofs of the main results. It shows that the logarithmic overflow probability, i.e. log P (Q(∞) > b), is asymptotically determined by the largest Hurst parameter H.

238

Tyrone E. Duncan and Yasong Jin

N Proposition 2.1. Let Y (t) = i=1 σi B Hi (t) be defined as in (1), H = max1≤i≤N Hi and σ be as in (2). Let Q(∞) be the steady state queue length, then (9)

log P (Q(∞) > b) = −θ, b→∞ b2−2H lim

where (10)

θ=

c2H . 2σ 2 H 2H (1 − H)2−2H

The proof of this proposition is given in Section 4.1. Recently, the asymptotic overflow probability, i.e., limb→∞ P (Q(∞) > b), was obtained using a double sum method in [5]. The main results on the maximum queue length of a queue with an input of an aggregation of fractional Brownian motions are given in Theorem 3.1 and 3.2. Before the main results are presented, some results in the literature are reviewed. The maximum queue length of a fractional Brownian queueing model was discussed in [9], [23]. For H = 1/2, that is, the input traffic is a Brownian motion, the property of M (t) was discussed in [9] using renewal theory. With some different approaches, the maximum queue length of a fractional Brownian queueing model with H ∈ (1/2, 1) is analyzed in [23]. The results from [9] and [23] are summarized as follows. For brevity, let β=

(11)

1 . 2 − 2H

Note that β > 1/2 since H ∈ (0, 1). Theorem 2.1. Let Y (t) = σB H (t) where σ is a real-valued coefficient and {B H (t), t ≥ 0} is a fractional Brownian motion with Hurst parameter H ∈ [1/2, 1). Let M (t) be defined in (6). Then, (i) (12)

M (t) = t→∞ (log t)β lim

 β 1 θ

in Lp for each p ∈ [1, ∞) where θ and β are given in (10) and (11), respectively. (ii) For H = 1/2, the convergence of (12) holds also almost surely. The above theorem shows that for a fractional Brownian queueing model, the  β maximum queue length M (t) grows like θ−1 log t for a large t. 3. Main Results The following two theorems are the main results of this paper. N Theorem 3.1. Let Y (t) = i=1 σi B Hi (t) be defined as in (1), H = max1≤i≤N Hi and σ be defined in (2). Let M (t) be defined in (6), then (13)

M (t) = lim t→∞ (log t)β

 β 1 θ

in Lp for each p ∈ [1, ∞) where θ is given in (10) and β is given in (11).

Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input

239

The proof of this theorem is given in Section 4.1. According to Theorem 3.1, for a queue with an aggregated fractional Brownian input, the asymptotic behavior of M (t) only depends on the largest Hurst suppose.that - parameter. For . example, Y (t) = 0.99B H1 (t) + 0.01B H2 (t) where B H1 (t), t ≥ 0 and B H2 (t), t ≥ 0 are independent fractional Brownian motions with H1 = 0.55 and H2 = 0.95, respectively. Since the coefficient of B H1 is relatively large, when the transient behavior is considered, the component of B H1 dominates the queueing performance, that is, the component of B H2 can be ignored. However, when the asymptotic behavior is discussed, by Theorem 3.1, the maximum queue length will be dominated by the component of B H2 . Therefore, even though the coefficient of B H2 is relatively small, when large time periods are considered, the component of B H2 is not negligible. In this example, since the coefficient of B H2 is small, the convergence of the maximum queue length is slow and may be difficult to observe from simulations. It can be observed that the first part of Theorem 2.1 is a special case of Theorem 3.1. The result of Theorem 3.1 can be further extended to a general Gaussian queueing model, that is, where Y is a general Gaussian process. Under some mild assumptions, it can be shown that asymptotically a suitably normalized maximum queue length is determined by a suitable function of the asymptotic variance of the input Y . The assumptions on Y are satisfied for most Gaussian processes that are applied to model network traffic in the literature, such as a fractional Brownian motion and an integrated Ornstein-Uhlenbeck process. The general result and the assumptions will be presented in a future paper. In the following, the maximum queue length of a fractional Brownian queueing model is revisited and a stronger result is obtained. Theorem 3.2. Let Y (t) = σB H (t) where σ is a real-valued coefficient and {B H (t), t ≥ 0} is a standard fractional Brownian motion with Hurst parameter H ∈ (0, 1). Let M (t) be defined in (6), then

(14)

M (t) = t→∞ (log t)β lim

 β 1 θ

a.s.

and in Lp for each p ∈ [1, ∞) where θ is given in (10) and β is given in (11). The proof of this theorem is given in Section 4.2. Theorem 3.2 extends the result of Theorem 2.1 in two directions: (i) the convergence result of (12) holds almost surely for any H ∈ (0, 1); (ii) the Lp convergence is true for H ∈ (0, 1/2). As discussed in [9], [23], [24], the maximum queue length M (t) can be applied to estimate the overflow probability P (Q(∞) > b), which is important for the admission control in network systems. From Proposition 2.1, it is known that asymptotically the logarithmic overflow probability is essentially determined by H and θ. Assume that the Hurst parameter H of the input traffic is known, following Theorem 3.2, the value θ can be strongly consistently estimated by using the maximum queue length 2−2H = θ1 a.s. M (t), i.e., limt→∞ M (t) log t

4. Proofs of the main results The proofs of Theorems 3.1 and 3.2 are given in Section 4.1 and 4.2, respectively.

240

Tyrone E. Duncan and Yasong Jin

4.1. A queue with an aggregated fractional Brownian input Proof of Proposition 2.1. It is sufficient to show that Y satisfies Hypotheses 2.1 and 2.3 in [6]. Apply the same notation as [6], let v(t) = t2−2H and a(t) = t. Note that N 2 2Hi ∼ σ 2 t2H , then i=1 σi t )

* (t)−ct) log E exp ξ v(t)(Ya(t) λ(ξ) = lim t→∞ v(t) N −ξct2−2H + 12 ξ 2 t2−4H i=1 σi2 t2Hi = lim t→∞ t2−2H 1 = ξ 2 σ 2 − cξ. 2 So (i) and (ii) of Hypothesis 2.1 are satisfied. Let h(t) = t2−2H and a−1 (t) = sup{s ∈ v (a−1 (t)/ξ ) [0, ∞); a(s) ≤ t}. It can be verified that for ξ > 0, g(ξ) = limt→∞ = h(t) 2H−2 ξ , which satisfies Hypothesis 2.1(iii). Let Wt = Y (t) − ct. For n ∈ Z, let Wn∗ = sup0≤r 0, ;  < log E exp ξn1−2H (Wn∗ − Wn ) lim sup n2−2H n→∞ ;  1−2H < log E exp ξn sup0≤r≤1 (Y (n + r) − c(n + r)) − Y (n) + cn = lim sup n2−2H n→∞ ;  1−2H < log E exp ξn sup0≤r≤1 Y (r) ≤ lim sup n2−2H n→∞ ;  < N  log E exp ξn1−2H sup0≤r≤1 B Hi (r) ≤ lim sup . n2−2H n→∞ i=1 From [16], it is obtained that for i = 1, ..., N , ! 

"  ∞  E exp ξn1−2H sup B Hi (r) = P sup B Hi (r) ≥ 0≤r≤1

log x dx ξn1−2H 0≤r≤1 0

 ∞  log x Hi P B (1) ≥ 1−2H dx. ≤2 ξn 0

log E [exp(ξn1−2H sup0≤r≤1 B Hi (r))] → 0 as n → ∞, which Simple calculations lead to n2−2H log E [exp(ξn1−2H (Wn∗ −Wn ))] → 0 as well. Therefore Hypothesis 2.3 is satisfied implies n2−2H and the proposition follows [6, Corollary 2.3].

According to Proposition 2.1, there exists a constant K0 , such that, log P (Q∗ (0) > u) θ K0 = inf u : (15) . ≤− u2−2H 2 To prove Theorem 3.1, the following two technical lemmas are needed. Lemma 4.1. Let β be defined in (11). Then for all t ≥ e, p ∈ (1, ∞) and K = max{K0 , (4/θ)β } where K0 is defined in (15),  ∞

 y ty p−1 P Q∗ (0) > (log t)β dy < ∞. (16) 3 3K

Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input

241

  Proof. Rewrite P Q∗ (0) > y3 (log t)β as    

y 1/β

log P Q∗ (0) > (y/3)(log t)β y ∗ β . log t P Q (0) > (log t) = exp 3 3 (y/3)1/β log t Since y/3 ≥ K ≥ K0 and log t ≥ 1, from (15), it follows that    1/β P Q∗ (0) > y3 (log t)β ≤ exp − θ2 y3 log t . Then 



ty 3K

p−1

 y P Q (0) > (log t)β dy ≤ 3









y

p−1

3K

θ y  β1 exp − log t + log t dy. 2 3

 1/β  1/β From K ≥ (4/θ)β and y ≥ 3K, it follows that − θ2 y3 + 1 ≤ − θ4 y3 . By substitution, it follows that 

 ∞ θ y  β1 3p · 4βp β y p−1 exp − log t + log t dy ≤ Γ(βp) < ∞. βp 2 3 (θ log t) 3K

Lemma 4.2. β be the  constant defined in (11). Then for t ≥ e, p ∈ (1, ∞) and  Let # K = max K0 , 8/CG,γ where CG,γ is the constant in (3), 

y β sup Y (s) ≥ (log t) dy < ∞. 6 0≤s≤1





ty

p−1

P

3K

Proof. From (3), it can be obtained that 



y CG,γ y 2 P sup Y (s) ≥ (log t)β ≤ 4 exp − (log t)2β . 6 36 0≤s≤1 So 



 ty p−1 P

0≤s≤1

3K ∞



 (17)



4y 3K

sup Y (s) ≥

p−1

y (log t)β dy 6

CG,γ y 2 2β (log t) + log t dy. exp − 36

Since β > 1/2 and t ≥ e, from (17), it follows that 

 ∞ y p−1 β ty P sup Y (s) ≥ (log t) dy 6 0≤s≤1 3K

   ∞ CG,γ y 2 ≤ −1 dy. 4y p−1 exp −(log t) 36 3K # C y2 C y2 2 − 1 ≥ G,γ Since y ≥ 3K ≥ 3 8/CG,γ , then G,γ 36 72 . Let z = y , by substitution, it can be verified that



− p2    ∞ CG,γ (log t) CG,γ y 2 p p−1 log t dy ≤ 2 < ∞. 4y exp − Γ 72 72 2 3K

242

Tyrone E. Duncan and Yasong Jin

Proof of Theorem 3.1. The proof mainly follows the arguments of [23] in which the self-similarity of a fractional Brownian motion is used implicitly. Here the input process Y (t), which is an aggregation of independent fractional Brownian motions, is not self-similar, so the Slepian inequality, [20, Theorem C.1], is applied to solve  β M ∗ (t) the problem. It is first shown that limt→∞ (log = θ1 in Lp for each p ∈ [1, ∞), t)β then the result can be extended to M (t) naturally. The proof consists of three steps. The following expressions, (18) and (19), are proved in Step I and II, respectively. For a fixed δ ∈ (0, 1), 

β   1−δ ∗ lim P M (t) ≥ log t (18) = 1, t→∞ θ 

β   1+δ ∗ log t (19) = 0. lim P M (t) ≥ t→∞ θ From (18) and (19), it follows that limt→∞

M ∗ (t) (log t)β

=

III, the uniform integrability of the random variables

 1 β

θ

in probability. In Step p is proved for t ≥ e

M ∗ (t) (log t)β

and p ∈ [1, ∞). β  Step I Let δ ∈ (0, 1) be fixed. For brevity, let α(t) = 1−δ θ log t . Fix Δ ∈ (0, t), from the definition of Q∗ , it follows that Q∗ (t) ≥ Y (t) − ct − inf (Y (s) − cs)

(20)

0≤s≤t

≥ Y (t) − ct − Y (t − Δ) + c(t − Δ) = Y (t) − Y (t − Δ) − cΔ. Consequently, P (M ∗ (t) ≥ α(t)) = P



sup Q∗ (s) ≥ α(t)

0≤s≤t



 (21)

≥P

sup

Y (kΔ) − Y (kΔ − Δ) ≥ α(t) + cΔ .

1≤k≤ t/Δ

; < N Let v 2 (t) = E Y 2 (t) = i=1 σi2 t2Hi be the variance function of the process Y . For j = 1, 2, ..., let Y (jΔ) − Y (jΔ − Δ) . v(Δ) - . Since Y has stationary increments, ZjΔ is a sequence of stationary standard normal random variables. From (21), it can be obtained that   α(t) + cΔ ∗ Δ P (M (t) ≥ α(t)) ≥ P (23) . sup Zj ≥ v(Δ) 1≤j≤ t/Δ (22)

ZjΔ =

Choose ε ∈ (0, δ] and let Δ be dependent on t such that  (24)

Δt =

β 2σ 2 (1 − ε) 2 H log t , c2

Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input

243

where H is the largest Hurst parameter and σ is as in (2). Then (23) can be written as   α(t) + cΔ t Δ P (M ∗ (t) ≥ α(t)) ≥ P (25) . sup Zj t ≥ v(Δt ) 1≤j≤ t/Δt Consider the covariance of {ZjΔt }, for all t ≥ 0, j = 1, 2, ... and k = 0, 1, ... N Δt cov(ZjΔt , Zj+k )

=

i=1

; < i σi2 Δ2H (k + 1)2Hi − 2k 2Hi + (k − 1)2Hi t N i 2 i=1 σi2 Δ2H t

For H ≥ 1/2, since for 1 ≤ i ≤ N , (k + 1)2Hi − 2k 2Hi + (k − 1)2Hi ≤ (k + Δt 1)2H − 2k 2H + (k − 1)2H , then it is obtained that cov(ZjΔt , Zj+k ) ≤ f (k) where ; < 1 2H 2H 2H . For H < 1/2, let f (0) = 1 and f (k) = 0 f (k) = 2 (k + 1) − 2k + (k − 1) Δt ) ≤ f (k). Let {Z˜j , j = for k = 1, 2, 3, ..., then it can be verified that cov(ZjΔt , Zj+k 1, 2, ...} be a sequence of stationary standard normal random variables such that the covariance of Z˜j is determined by f (k). By the Slepian inequality, e.g. [20, Theorem C.1], it follows from (25) that     α(t) + cΔt α(t) + cΔt Δt ˜ P sup Zj ≥ (26) ≥P sup Zj ≥ . v(Δt ) v(Δt ) 1≤j≤ Δt 1≤j≤ Δt t

t

From the definitions of v 2 (·) and Δt , it can be verified that # α(t) + cΔt α(t) + cΔt α(t) + cΔt = + ≤ ≤ 2(1 − ε) log t. H v(Δt ) N σΔt 2 2Hi i=1 σi Δt   Let n =  Δtt  and tn = t : Δtt = n . Note that  Δtt  = n if and only if tn ≤ t < tn+1 . Then for sufficiently large t ∈ [tn , tn+1 ), the following inequalities are obtained # # α(t) + cΔt ≤ 2 (1 − ε) log t ≤ 2 (1 − ε) log tn+1 . v(Δt ) Let un be defined as un =

(27)

# 2 (1 − ε) log tn+1 .

So from (26) and (27), it follows that   

α(t) + cΔ t P sup Z˜j ≥ ≥P sup Z˜j ≥ un . (28) v(Δt ) 1≤j≤n 1≤j≤ Δt t

Following [14, Theorem 4.3.3], it is sufficient to prove that n(1 − Φ(un )) → ∞ as 2 x e−x /2 , so n → ∞. Recall that for x ≥ 0, 1 − Φ(x) ≥ √2π(1+x 2) n(1 − Φ(un )) ≥ n √

 2

un u exp − n . 2 2 2π(1 + un )

244

Tyrone E. Duncan and Yasong Jin 2

−1+ε From (27), it follows that e−un /2 = tn+1 . Since limn→∞ un → ∞, there exists an n0 such that for all n > n0 , un ≥ 1. Then for n > n0 , n(1 − Φ(un )) ≥ u2 √n exp − 2n . Thus, 2 2πu n

tεn+1 n n −1+ε tn+1 = √ . n(1 − Φ(un )) ≥ √ 2 2πun 2 2π(n + 1) Δtn+1 un From (24) and (27), it can be observed that Δtn+1 = C1 (log tn+1 )β and un = C2 (log tn+1 )1/2 for some positive constants C1 and C2 , respectively. Then as n → ∞, tεn+1 √ n Δ un → ∞. Thus the expression (18) is verified. 2 2π(n+1) t n+1

Step II Let Vi = supi−1≤s 1/2, 0 < δ < 1, then 

β   1+δ P V1 ≥ log t θ   

β 

β   log t 1 + δ/2 δ log t +P sup (Y (s) − cs) ≥ ≤ P Q∗ (0) ≥ θ 20 θ 0≤s≤1   

β δ log t + P − inf (Y (s) − cs) ≥ 0≤s≤1 20 θ    

β

β   log t 1 + δ/2 δ log t + 2P sup Y (s) ≥ −c ≤ P Q∗ (0) ≥ θ 20 θ 0≤s≤1 It can be observed that for a fixed δ, there exists t0 such that for t ≥ t0 ,

β log t δ . So c ≥ 40 θ 

δ 20

log t θ

β  1+δ P V1 ≥ log t θ  

β  

β   log t 1 + δ/2 δ log t +2 P sup Y (s) ≥ ≤ P Q∗ (0) ≥ θ 40 θ 0≤s≤1 >? @ >? @ = = 

L1

L2





Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input

245

It is necessary to show that lim tLi = 0 for i = 1, 2. For i = 1, it is equivalent  t→∞ β

1+δ/2 ∗ to show that log t + log P Q (0) ≥ log t → −∞. Following Proposition θ 2.1, it is obtained that as t → ∞, ⎡ ⎢ log t ⎢ ⎣1 +

log P



β ⎤

" !  Q∗ (0) ≥ 1+δ/2 log t θ ⎥ ⎥ ∼ log t 1 + −1 − δ → −∞. ⎦ log t 2

For i = 2, from (3), it follows that lim tL2 = 0.

t→∞

Step III In this step, the uniform integrability of the random variables for t ≥ e is proved. It is sufficient to show that for each p ∈ (1, ∞), !

M ∗ (t) sup E (log t)β t≥e

M ∗ (t) (log t)β

"p < ∞.

Let y = x1/p , for t ≥ e, ! E

"p

M ∗ (t) (log t)β







=

P 0 ∞

=

M ∗ (t) (log t)β

p

> x dx

  P M ∗ (t) > y(log t)β py p−1 dy.

0

  # Let K = max K0 , (4/θ)β ), 8/CG,γ , then K < ∞ and "p M ∗ (t) E (log t)β  3K  ∞     = P M ∗ (t) > y(log t)β py p−1 dy + P M ∗ (t) > y(log t)β py p−1 dy 0 3K  ∞   ≤ (3K)p + P M ∗ (t) > y(log t)β py p−1 dy . 3K = >? @ !

L3

Similar to the arguments in Step II, it can be verified that for all x > 0, P (M ∗ (t) > x) ≤ tP



Q∗ (0) + max (Y (s) − cs) − min (Y (s) − cs) > x . 0≤s≤1

0≤s≤1

Then



x x + tP max (Y (s) − cs) > P (M ∗ (t) > x) ≤ tP Q∗ (0) > 0≤s≤1 3 3

 x + tP − min (Y (s) − cs) > 0≤s≤1 3 



x x + 2tP max (Y (s)) > − c . ≤ tP Q∗ (0) > 0≤s≤1 3 3

p

246

Tyrone E. Duncan and Yasong Jin

It is obtained that

 y(log t)β ∗ py p−1 dy P Q (0) > L3 ≤ t 3 3K

 ∞  y(log t)β + 2t − c py p−1 dy. P max Y (s) > 0≤s≤1 3 3K 



From the choice of K, it follows that  L3 ≤ t =



P 3K

 + 2t =

y(log t)β 3

−c≥

y(log t)β . 6

So

 y(log t)β py p−1 dy Q∗ (0) > 3 >? @ 



P 3K

L3,1

max Y (s) > 0≤s≤1 >?

y(log t)β 6

py p−1 dy . @

L3,2

It is shown in Lemma 4.1 and 4.2 that L3,1 < ∞ and L)3,2 < ∞ *pwith the choice of M ∗ (t) K, respectively. Therefore it is obtained that supt≥e E (log t)β < ∞. Combining  β M ∗ (t) = θ1 in Lp for each Step I, II and III, it is obtained that limt→∞ (log t)β p ∈ [1, ∞). In the following, the result is extended to M (t). Recall that for all t ≥ 0, Q(t) ≤ Q∗ (t). Consequently, M (t) ≤ M ∗ (t) for all t ≥ 0. In Step I, replacing (20) with Q(t) = Y (t) − ct − inf 0≤s≤t (Y (s) − cs), changing M ∗ (t) and Q∗ (t) to M (t) and Q(t), respectively, the rest remains unchanged. For Step II and III, since M (t) ≤  1+δ β  ∗ M (t) for all t ≥ 0, it is obtained that limt→∞ P M (t) ≥ θ log t = 0 and *p ) M (t) supt≥e E (log < ∞. Thus the proof is complete. t)β 4.2. Fractional Brownian queueing model The Lp convergence stated in Theorem 3.2 has been shown in the proof of Theorem 3.1. In the following, the almost sure convergence stated in Theorem 3.2 is proved. An upper bound and a lower bound are derived in Proposition 4.1 and 4.2,  β M ∗ (t) respectively. From these two propositions, limt→∞ (log = θ1 a.s. is concluded, t)β then the proof is extended to M (t). The following two lemmas are needed to prove Proposition 4.1. Lemma 4.3. Let θ and β be given in (10) and (11), respectively. Let δ ∈ (0, 1) be fixed, then for almost every ω ∈ Ω, there exists a K (ω) < ∞ such that for n ≥ K (ω), Q∗ (n) < (log n)β

(29)



1+δ θ

β a.s.

d

Proof. Recall that Q∗ (n) = Q(∞) for all n. By theBorel-Cantelli lemma,

it is suffi  1+δ β ∞ δ ∗ < ∞. Choose ε ∈ 0, 2(1+δ) θ , cient to prove that n=1 P Q (n) ≥ θ log n   β log P Q∗ (n)≥( 1+δ θ log n) < −θ + ε by Proposition 2.1, there exists N < ∞ such that 1+δ log n θ

Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input

for n ≥ N . Since ε < ∞ 

δ 2(1+δ) θ,







Q (n) ≥

P

n=1

247

then it can be verified that

β 

1+δ log n θ

≤N+

∞ 

e−(1+ 2 ) log n < ∞. δ

N +1

Fix an ω ∈ Ω for which (29) holds, then there exists a K(ω) such that max0≤n≤K(ω) Q∗ (n, ω) max0≤n≤ t Q∗ (n, ω) Q∗ (n, ω) ≤ + max (logt)β (logt)β K(ω)≤n≤ t (log n)β

β  max0≤n≤K(ω) Q∗ (n, ω) 1+δ ≤ + . (logt)β θ Let t → ∞ and δ be arbitrarily small, then max0≤n≤ t Q∗ (n) ≤ lim sup (logt)β t→∞

(30)

 β 1 θ

a.s.

Lemma 4.4. Suppose that σ > 0 and {B H (t), t ≥ 0} is a standard fractional Brownian motion with H ∈ (0, 1). Let β be defined as in (11). Then   max0≤n≤ t supn≤s≤n+1 σB H (s) − σB H (n) = 0 a.s. lim t→∞ (logt)β Proof. It is sufficient to show that for any ε > 0,     ∞  max0≤n≤ t supn≤s≤n+1 σB H (s) − σB H (n) > ε < ∞. P (logt)β t =0

d

Since supn≤s≤n+1 σB H (s) − σB H (n) = sup0≤s≤1 σB H (s), it follows that 

   max0≤n≤ t supn≤s≤n+1 σB H (s) − σB H (n) P >ε (logt)β t =0 

∞  H β (t + 1)P sup σB (s) > ε(logt) . ≤ ∞ 

(31)

0≤s≤1

t =0

From (3), it follows that ∞ 

 (t + 1)P

t =0

(32)

∞ 



t =0

H

sup σB (s) > ε(logt)

β

0≤s≤1

CG,γ ε2 2β (logt) + log (t + 1) . 4 exp − σ2 

Since β > 1/2, there exists M such that for all t > M , −

CG,γ ε2 (logt)2β + log (t + 1) ≤ −2β logt. σ2

248

Tyrone E. Duncan and Yasong Jin

From (31) and (32), it can be obtained that     ∞  max0≤n≤ t supn≤s≤n+1 σB H (s) − σB H (n) P >ε (logt)β t =0



M  t =0



M 



∞  CG,γ ε2 2β 4 exp − (logt) + log (t + 1) + 4 exp (−2β logt) σ2 t =M

 4 exp −

t =0

CG,γ ε2 (logt)2β σ2

∞  + log (t + 1) + 4t−2β < ∞. t =M

Proposition 4.1. Let M ∗ (t) be defined in (8). Let θ and β be given in (10) and (11), respectively. Then lim sup t→∞

M ∗ (t) ≤ (log t)β

 β 1 θ

a.s. 







Proof. Since M (t) = sup Q (s), then M (t) ≤ 0≤s≤t

max

sup

0≤n≤ t

n≤s≤n+1

Q (s) . From ∗

(5), it follows that for s ∈ [n, n + 1], Q∗ (s) = σB H (s) − cs

  ∗  H   H + max sup −σB (r) + cr , Q (n) − σB (n) − cn . n≤r≤s

Then it can be obtained that Q∗ (s) ≤ Q∗ (n) +

sup n≤s≤n+1

(σB H (s) − cs) +

sup n≤s≤n+1

So

+ max 0≤n≤ t



sup

0≤n≤ t

sup





M ∗ (t) ≤ max Q∗ (n) + max 0≤n≤ t

(−σB H (s) + cs).

sup n≤s≤n+1

σB H (s) − σB H (n)

n≤s≤n+1





−σB H (s) + σB H (n)

+ c.

n≤s≤n+1

Thus

  max0≤n≤ t Q∗ (n) max0≤n≤ t supn≤s≤n+1 σB H (s) − σB H (n) M ∗ (t) ≤ + (log t)β (logt)β (logt)β    H max0≤n≤ t supn≤s≤n+1 −σB (s) + σB H (n) c + + , (logt)β (logt)β

as t → ∞, from (30) and Lemma (4.4), the proposition follows. The following lemma is needed to prove Proposition 4.2. Lemma 4.5. Let M ∗ (t) be defined in (8) and θ, β be given in (10) and (11), respectively. Let δ ∈ (0, 1) be fixed, then for almost all ω ∈ Ω, there exists an n0 (ω)  β ∗ (n,ω) > 1−δ . such that for n ≥ n0 (ω), M θ (log n)β

Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input

249

 β  ∞ < ∞. For a Proof. It is sufficient to check that n=1 P M ∗ (n) ≤ 1−δ θ log n fractional Brownian queueing model, it is known from [10, Equation (23)] that there exists t0 < ∞ such that for t ≥ t0 ,   h c2 t (u(t)) −θ(u(t))2−2H ∗ e (33) , P (M (t) ≤ u(t)) ≤ exp − 2 2

where u(t) is a function in terms of t, h = 2(1−H) − 1 and c2 is a positive constant H in terms of c, H. Then from (33), for the fixedδ and a sufficiently large n, that is,   β  βh δ βh ≤ exp − c22 1−δ . n (log n) for all n ≥ t0  + 1, P M ∗ (n) ≤ 1−δ θ log n θ Thus it can be obtained that 

β   ∞  1−δ ∗ log n P M (n) ≤ θ n=1 

β   ∞  1−δ ∗ log n P M (n) ≤ ≤ (t0  + 1) + θ n= t0 +1   

βh ∞  c2 1 − δ βh δ ≤ (t0  + 1) + < ∞. exp − n (log n) 2 θ n= t0 +1

Proposition 4.2. Let M ∗ (t) be defined in (8) and θ, β be given in (10) and (11),  β M ∗ (t) ≥ θ1 a.s. respectively. Then lim inf t→∞ (log t)β Proof. From the definition of M ∗ (t), it can be observed that (logt) M ∗ (t) M ∗ (t) M ∗ (t) ≥ = . β β β (log t)β (log(t + 1)) (logt) (log(t + 1)) β

(34)

From Lemma 4.5, it is obtained that for a fixed δ ∈ (0, 1) and almost all ω ∈ Ω, there exists t0 (ω) such that for t ≥ t0 (ω), (35)

M ∗ (t) > (logt)β



1−δ θ

β .

Let t → ∞ and δ be arbitrarily small, from (34) and (35), the proposition follows. Proof of Theorem 3.2. From Proposition 4.1 and 4.2, it follows that M ∗ (t) lim = t→∞ (log t)β

 β 1 θ

a.s.

In the following, the proof is extended to M (t). For the upper bound, recall that for all t ≥ 0, Q(t) ≤ Q∗ (t), which implies that M (t) ≤ M ∗ (t) for t ≥ 0, then (36)

lim sup t→∞

M (t) ≤ (log t)β

 β 1 θ

a.s.

250

Tyrone E. Duncan and Yasong Jin

For the lower bound, rewrite M ∗ (t) = max0≤s≤t Q∗ (s) as .  M ∗ (t) = max max Q∗ (0) + σB H (s) − cs, Q(s) 0≤s≤t 

  H ∗ = max Q (0) + max σB (s) − cs , M (t) , 0≤s≤t

  it can be observed that M (t) ≥ M ∗ (t) − Q∗ (0) − maxs≥0 σB H (s) − cs . Since   d Q∗ (0) = maxs≥0 σB H (s) − cs < ∞ a.s, then it can be derived that  β 1 M (t) lim inf (37) ≥ a.s. t→∞ (log t)β θ Therefore from (36) and (37), the almost sure convergence of (14) follows. References [1] Asmussen, S. (1998). Extreme value theory for queues via cycle maxima. Extremes 1 137–168. [2] Cohen, J. W. (1968). Extreme value distribution for the M/G/1 and the G/M/1 queueing systems. Ann. Inst. H. Poincar´e B4 83–98. [3] De ¸ bicki, K., Michna, Z. and Rolski, T. (2003). Simulation of the asymptotic constant in some fluid models. Stoch. Models 19 (3) 407–423. [4] De ¸ bicki, K. and Palmowski, Z. (1999). On-off fluid models in heavy traffic environment. Queueing Systems 33 (4) 327–338. [5] Dieker, A. B. (2005). Extremes of Gaussian processes over an infinite horizon. Stochastic Process. Appl. 115 (2) 207–248. [6] Duffield, N. G. and O’Connell, N. (1995). Large deviations and overflow probabilities for the general single-server queue, with applications. Math. Proc. Camb. Phil. Soc. 118 363–374. [7] Duncan, T. E. (2001). Some aspects of fractional Brownian motion. Nonlinear Analysis 47 4475–4782. [8] Erramilli, A., Narayan, O. and Willinger, W. (2001). Experimental queueing analysis with long-range dependent packet traffic. IEEE/ACM Trans. Networking 4 209–223. [9] Glynn, P. W. and Zeevi, A. J. (2000). Estimating tail probabilities in queues via extremal statistics. In Analysis of Communication Networks: Call Centres, Traffic and Performance (D. R. McDonald and S. R. E. Turner, eds.) Fields Inst. Commun. 28. ¨sler, J. and Piterbarg, V. (2004). Limit theorem for maximum of the [10] Hu storage process with fractional Brownian motion as input. Stochastic Processes and Their Applications 114 231–250. [11] Iglehart, D. L. (1972). Extreme values in the GI/G/1 queue. Ann. Math. Statist. 43 627–635. [12] Jiang, H. and Dovrolis, C. (2005). Why is the Internet traffic bursty in short time scales? Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems 241–252. ACM Press. [13] Konstantopoulos, T., Zazanis, M. and de Veciana, G. (1995). Conservation laws and reflection mappings with an application to multiclass mean value analysis for stochastic fluid queues (extended version). http://www.ece.utexas.edu/ takis/PAPERS/ccc.new.pdf.

Queue Length of a Fluid Model with an Aggregated Fractional Brownian Input

251

[14] Leadbetter, M. R., Lindgren, G. and Rootzen, H. (1983). Extremes and Related Properties of Random Sequences and Processes. Springer, New York. [15] Mandelbrot, B. B. and Van Ness, J. (1968). Fractional Brownian motions, fractional noises and applications. SIAM Review 10 (4). [16] Michna, Z. (1998). Self-similar processes in collective risk theory. J. Appl. Math. Stochastic Anal. 11 (4) 429–448. [17] Norros, I. (1994). A storage model with self-similar input. Queueing Systems 16 387–396. [18] Norros, I. (2000). Queueing behavior under fractional Brownian traffic. In Self-Similar Network Traffic and Performance Evaluation (K. Park and W. Willinger, eds.) 101–114. Wiley-Interscience. [19] Orenstein, P., Kim, H. and Lau, C. L. (2001). Bandwidth allocation for selfsimilar traffic consisting of multiple traffic classes with distinct characteristics. Proc. of IEEE GLOBECOM’01. [20] Piterbarg, V. I. (1996). Asymptotic Methods in the Theory of Gaussian Processes and Fields. Providence, RI. [21] Riedi, R. H. and Willinger, W., Toward an improved understanding of network traffic dynamics. In Self-Similar Network Traffic and Performance Evaluation (K. Park and W. Willinger, eds.) 507–530. Wiley-Interscience. [22] Roberts, J., Mocci, U. and Virtamo, J., eds. (1996). Broadband Network Teletraffic — Performance Evaluation and Design of Broadband Multiservice Networks: Final Report of Action COST 242. Springer, Berlin. [23] Zeevi, A. J. and Glynn, P. W. (2000). On the maximum workload of a queue fed by fractional Brownian motion. Ann. Appl. Probab. 10 (4) 1084–1099. [24] Zeevi, A. J. and Glynn, P. W. (2004). Estimating tail decay for stationary sequences via extreme values. Adv. in Appl. Probab. 36 (1) 198–226.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 253–265 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000417

Fluid Model for a Data Network with α-Fair Bandwidth Sharing and General Document Size Distributions: Two Examples of Stability H. C. Gromoll1,∗ and R. J. Williams2,† University of Virginia and University of California, San Diego Abstract: The design and analysis of congestion control mechanisms for modern data networks such as the Internet is a challenging problem. Mathematical models at various levels have been introduced in an effort to provide insight to some aspects of this problem. A model introduced and studied by Roberts and Massouli´e [13] aims to capture the dynamics of document arrivals and departures in a network where bandwidth is shared fairly amongst flows that correspond to continuous transfers of individual elastic documents. Here we consider this model under a family of bandwidth sharing policies introduced by Mo and Walrand [14]. With generally distributed interarrival times and document sizes, except for a few special cases, it is an open problem to establish stability of this stochastic flow level model under the nominal condition that the average load on each resource is less than its capacity. As a step towards the study of this model, in a separate work [8], we introduced a measure valued process to describe the dynamic evolution of the residual document sizes and proved a fluid limit result: under mild assumptions, rescaled measure valued processes corresponding to a sequence of flow level models (with fixed network structure) are tight, and any weak limit point of the sequence is almost surely a solution of a certain fluid model. The invariant states for the fluid model were also characterized in [8]. In this paper, we review the structure of the stochastic flow level model, describe our fluid model approximation and then give two interesting examples of network topologies for which stability of the fluid model can be established under a nominal condition. The two types of networks are linear networks and tree networks.

1. Introduction The design and analysis of congestion control mechanisms for modern data networks such as the Internet is a challenging problem. Mathematical models at various levels have been introduced in an effort to provide insight to some aspects of this problem. Roberts and Massouli´e [13] have introduced and studied a flow level model of congestion control that represents the randomly varying number of flows present in 1 Department of Mathematics, University of Virginia, Charlottesville, VA 22903; e-mail: [email protected] 2 Department of Mathematics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0112; e-mail: [email protected] ∗ Research supported in part by an NSF Mathematical Sciences Postdoctoral Research Fellowship, NSF Grant DMS FRG 0244323, a European Union Marie Curie Postdoctoral Research Fellowship, and EURANDOM. † Research supported in part by NSF grants DMS 0305272 and DMS 0604537. AMS 2000 subject classifications: Primary 60K30; secondary 60F17, 90B15. Keywords and phrases: Bandwidth sharing, α-fair, flow level Internet model, connection level model, congestion control, measure valued process, fluid model, workload, Lyapunov function, simultaneous resource possession, Lagrange multipliers, stability, sensitivity.

253

254

H. C. Gromoll and R. J. Williams

a data network where bandwidth is shared fairly between flows that correspond to continuous transfers of individual elastic documents. This model assumes a “separation of time scales” such that the time scale of the flow dynamics (i.e., of document arrivals and departures) is much longer than the time scale of the packet level dynamics on which rate control schemes such as TCP converge to equilibrium. Subsequent to the work of Roberts and Massouli´e, assuming Poisson arrivals and exponentially distributed document sizes, de Veciana, Lee and Konstantopoulos [7] and Bonald and Massouli´e [1] studied the stability of the flow level model operating under various bandwidth sharing policies. Lyapunov functions constructed in [7] for weighted max-min fair and proportionally fair policies, and in [1] for weighted α-fair policies (α ∈ (0, ∞)) [14], imply positive recurrence of the Markov chain associated with the model when the average load on each resource is less than its capacity. Lin, Shroff and Srikant [11] have recently given sufficient conditions for stability of a Markov model where the assumption of time scale separation is relaxed. Here we consider the model of Roberts and Massouli´e with generally distributed document sizes and interarrival times. We are interested in the stability and heavy traffic behavior of this flow level model operating under a weighted α-fair bandwidth sharing policy (α ∈ (0, ∞)) [14]. (Despite the claim in [1], the proof of sufficient conditions for stability given there does not apply when document sizes are other than exponentially distributed. The reason for this is that the method of Dai [5] quoted there implicitly assumes (through the form of the model equations) that the service discipline is a head-of-the-line discipline and consequently the method does not apply in general to processor sharing types of disciplines, such as the bandwidth sharing policy considered here.) There are a few results on sufficient conditions for stability of the flow level model with general document size distributions. With Poisson arrivals and document sizes having a phase type distribution, for α = 1, Lakshmikantha et al. [10] have established stability of some two resource linear networks and a 2 × 2 grid network when the average load on each resource is less than its capacity. For generally distributed interarrival and document sizes, Bramson [3] has shown sufficiency of such a condition for stability under a max-min fair policy (corresponding to α → ∞). Under proportional fair sharing, Massouli´e [12] has recently established stability of a fluid model for the flow level model with exponential interarrival and document sizes, and additional routing. From this he infers stability when documents have phase type distributions. In contemporaneous work, Chiang, Shah and Tang [4] have developed a fluid approximation for the flow level model when the arrival rate and capacity are allowed to grow proportionally but the bandwidth per flow stays uniformly bounded. Using their fluid model, they derive some conclusions concerning stability for general document size distributions when α ∈ (0, ∞) is sufficiently small. However, in general, it remains an open question whether, with renewal arrivals and arbitrarily (rather than exponentially) distributed document sizes, the flow level model is stable under an α-fair bandwidth sharing policy when the nominal load placed on each resource is less than its capacity. This paper reports on some first steps in our study of the flow level model operating under a weighted α-fair bandwidth sharing policy with general interarrival and document size distributions. Here we review the definition of a measure valued process that keeps track of the residual sizes of all documents in the system at any given time. We describe a fluid model (or formal functional law of large numbers approximation) for the flow level model. In a separate work [8], we showed that under mild conditions, appropriately rescaled measure valued processes corresponding to a sequence of flow level models (with fixed network structure) are tight, and any

Fluid Model for Bandwidth Sharing

255

weak limit point of the sequence is almost surely a fluid model solution. The invariant states for the fluid model were also characterized in [8]. Here, as an illustration of sufficient conditions for stability of the fluid model, we establish stability of fluid model solutions with finite initial workload for linear networks and tree networks, under the nominal condition that the average load placed on each resource is less than its capacity. The result for tree networks is particularly interesting as there the distribution of the number of documents process in steady state is expected to be sensitive to the (non-exponential) document size distribution [2]. Future work will be aimed at further analysis of the fluid model and at using it for studying stability and heavy traffic behavior of the flow level model. The paper is organized as follows. In Section 2, we define the network structure, the weighted α-fair bandwidth sharing policy, the stochastic model, and we introduce the measure valued processes used to describe the evolution of the system. The notion of a fluid model solution is defined in Section 3. In Section 4 we give sufficient conditions for stability of fluid model solutions with finite initial workload for linear networks and tree networks. 1.1. Notation Let N = {1, 2, . . . , }, R = (−∞, ∞), and let Rd denote d-dimensional Euclidean space for any d ≥ 1. For x, y ∈ R, x ∧ y is the minimum of x and y, and x+ is the positive part of x. For x, y ∈ Rd , let x = maxdi=1 |xi |, and interpret vector inequalities componentwise: x ≤ y means xi ≤ yi for all i = 1, . . . , d. The positive d-dimensional orthant is denoted Rd+ = {x ∈ Rd : x ≥ 0}. To ease notation throughout the paper, define c/0 to be zero for any real constant c, and define a %l sum over an empty set of indices or of the form k=j with j > l to be zero. For two functions f and g with the same domain, f ≡ g means f (x) = g(x) for all x in the domain. For a bounded function f : R+ → R, let f ∞ = supx∈R+ |f (x)|. Let Cb (R+ ) be the set of bounded continuous functions f : R+ → R, let C1 (R+ ) be the set of once continuously differentiable functions f : R+ → R, and let C1b (R+ ) be the set of functions f in C1 (R+ ) that together with the first derivative f  are bounded on R+ . If w ∈ C1 (R+ ) is considered as a function of time, its first derivative will be denoted by w. ˙ For a Polish space (i.e., a complete separable metrizable space) S, let D([0, ∞), S) denote the space of right continuous functions from [0, ∞) into S that have left limits in S on (0, ∞). We endow this space with the Skorohod J1 -topology. All stochastic processes used in this paper will be assumed to have paths in D([0, ∞), S) for a suitable Polish space S. For a finite non-negative Borel measure ξ on R+ and a ξ-integrable function f : R+ → R, define  f dξ. f, ξ = R+

If ξ = (ξ1 , . . . , ξd ) is a vector of such measures, then we use f, ξ to denote the vector (f, ξ1 , . . . , f, ξd ). All functions f : R+ → R are extended to be identically zero on (−∞, 0) so that f (·−x) is well defined on R+ for all x > 0. Let χ : R+ → R+ denote the identity function χ(x) = x. Let M be the set of finite non-negative Borel measures on R+ , endowed with w the weak topology: ξ k −→ ξ in M if and only if f, ξ k  → f, ξ as k → ∞, for all f ∈ Cb (R+ ). For I ∈ N, let MI = {(ξ1 , . . . , ξI ) : ξi ∈ M for all i ≤ I}.

256

H. C. Gromoll and R. J. Williams w

The spaces M and MI are Polish spaces. Convergence in MI is also denoted ξ k −→ ξ. The zero measure in M is denoted 0. 2. Flow level model 2.1. Network structure Consider a network with finitely many resources labeled by j = 1, . . . , J, and a finite set of routes labeled by i = 1, . . . , I. A route i is a non-empty subset of {1, . . . , J}, interpreted as the set of resources used by the route. Let A be the J × I incidence matrix satisfying Aji = 1 if resource j is used by route i, and Aji = 0 otherwise. Since each route is a non-empty subset of {1, . . . , J}, no column of A is identically zero. A flow on route i is the continuous transfer of a document through the resources used by the route. Assume that, while being transferred, a flow takes simultaneous possession of all resources on its route. The processing rate allocated to a flow is the rate at which the document associated with the flow is being transferred. There may be multiple flows on a route, and the bandwidth Λi allocated to route i is the sum of the processing rates allocated to flows on route i. The bandwidth allocated through resource j is the sum of the bandwidths allocated to routes using resource j. Assume that each resource j ≤ J has finite capacity Cj > 0, interpreted as the maximum bandwidth that can be allocated through it. Let C = (C1 , . . . , CJ ) be the vector of capacities in RJ+ . Then any vector Λ = (Λ1 , . . . , ΛI ) of bandwidth allocations must satisfy AΛ ≤ C. 2.2. Bandwidth sharing policy We consider the network operating under a bandwidth sharing policy first introduced by Mo and Walrand [14]. Bandwidth is dynamically allocated to routes as a function of the number of flows on all routes, and the resulting allocation is shared equally among individual flows on each route. Let Zi (t) denote the number of flows on route i ≤ I at time t, and let Z(t) = (Z1 (t), . . . , ZI (t)) be the corresponding vector in RI+ . The bandwidth allocated to route i at time t is a function of the vector Z(t) and is denoted Λi (Z(t)). The corresponding vector of bandwidth allocations at time t is given by Λ(Z(t)) =  Λ1 (Z(t)), . . . , ΛI (Z(t)) . Although the coordinates of Z(·) are non-negative and integer valued, the function Λ is defined on the entire orthant RI+ to accommodate fluid analogues of Z(·) later. Fix a parameter α ∈ (0, ∞) and a vector of strictly positive weights κ = (κ1 , . . . , κI ). For z ∈ RI+ , let I0 (z) = {i ≤ I : zi = 0} and I+ (z) = {i ≤ I : zi > 0}. Let O(z) = {λ ∈ RI+ : λi = 0 for all i ∈ I0 (z)}. Define a function Gz : RI+ → [−∞, ∞) by ⎧ % λ1−α ⎪ i ⎪ κi ziα 1−α , α ∈ (0, ∞) \ {1}, ⎨ i∈I+ (z) (2.1) Gz (λ) = % ⎪ κi zi log λi , α = 1, ⎪ ⎩ i∈I+ (z)

where the value of Gz (λ) is taken to be −∞ if α ∈ [1, ∞) and λi = 0 for some i ∈ I+ (z). For each z ∈ RI+ , define Λ(z) as the unique vector λ ∈ RI+ that solves

Fluid Model for Bandwidth Sharing

257

the optimization problem: (2.2)

maximize

(2.3) (2.4)

subject to Aλ ≤ C over O(z).

Gz (λ)

For existence, uniqueness and other properties of the solution Λ(z) of this convex optimization problem, see for example Appendix A in [9]. (Although A is assumed to have full row rank in [9], that property is not needed for the results proved in Appendix A there.) The resulting allocation is called a weighted α-fair allocation, and the function Λ : RI+ → RI+ is called a weighted α-fair bandwidth sharing policy. Note that by (2.3), (2.5)

sup Λ(z) ≤ C.

z∈RI+

Note also that for any z ∈ RI+ , Λi (z) = 0 for all i ∈ I0 (z). This implies that no bandwidth is allocated to routes with no flows. The bandwidth Λi (Z(t)) allocated to route i at time t is shared equally by all flows on the route. That is, if there are Zi (t) > 0 flows on route i at time t, then each flow on route i is allocated a processing rate of Λi (Z(t))/Zi (t) at time t. When κi = 1 for all i ≤ I, the cases α → 0, α → 1, and α → ∞ correspond respectively to a bandwidth allocation which achieves maximum throughput, is proportionally fair or is max-min fair [1, 14]. Weighted α-fair allocations provide a tractable theoretical abstraction of decentralized packet-based congestion control algorithms such as TCP, the transmission control protocol of the Internet, particularly when α = 2 and κi is the reciprocal of the square of the round trip time on route i. 2.3. Stochastic model Fix a network structure (A, C) and a weighted α-fair bandwidth sharing policy Λ with parameters (α, κ). Our stochastic model of document flows consists of the ∞ following: a collection of stochastic primitives E1 , . . . , EI and {v1k }∞ k=1 ,. . . ,{vIk }k=1 describing the arrivals of document flows (including their sizes) to the network, a random initial condition Z(0) ∈ MI specifying the state of the system at time zero, and a collection of performance processes describing the time evolution of the system state. The performance processes are defined in terms of the primitives and initial condition through a set of descriptive equations. The stochastic primitives consist of an exogenous arrival process Ei and a sequence of document sizes {vik }∞ k=1 for each route i ≤ I. The arrival process Ei is a rate νi > 0 delayed renewal process with kth jump time Uik . For t ≥ 0, Ei (t) represents the number of flows that have arrived to route i during the time interval (0, t]. The kth such arrival is called flow k for route i and arrives at time Uik ; flows already on route i at time zero are called initial flows for route i. For each i ≤ I and k ≥ 1, the random variable vik represents the initial size of the document associated with flow k for route i. This is the cumulative amount of processing that must be allocated to the flow to complete its transfer through the network. The flow is considered to depart or to become inactive once it receives this total amount of processing. Assume that the random variables {vik }∞ k=1 are strictly positive and form an independent and identically distributed sequence with

258

H. C. Gromoll and R. J. Williams

common distribution ϑi on R+ . Assume that the mean χ, ϑi  ∈ (0, ∞) and let μi = χ, ϑi −1 . Define the traffic intensity on route i by ρi = νi /μi . The initial condition specifies Z(0) = (Z1 (0), . . . , ZI (0)), the number of initial flows on each route at time zero, as well as the initial sizes of the documents on these flows at time zero. Assume that the components of Z(0) are nonnegative, integer valued random variables. The initial document sizes of the initial flows on route i ≤ I are the first Zi (0) elements of a sequence {˜ vil }∞ l=1 of strictly positive random variables. The performance processes consist of a measure valued process Z, taking values in D([0, ∞), MI ), and a collection of auxiliary processes (Z, T, U, W ). The process Z = (Z1 , . . . , ZI ) takes values in D([0, ∞), RI+ ). For i ≤ I and t ≥ 0, Zi (t) is the number of (active) flows on route i at time t. Recall that at time t, the bandwidth allocated to route i is Λi (Z(t)), and this bandwidth is shared equally by all Zi (t) flows on route i; each such flow receives a processing rate of Λi (Z(t))/Zi (t), which equals zero by convention if Zi (t) = 0. Thus, a flow that is active on route i during a time interval [s, t] ⊂ [0, ∞) receives cumulative service during [s, t] equal to  t Λi (Z(u)) (2.6) Si (s, t) = du. Zi (u) s Consider the kth flow for route i. This flow arrives at time Uik and has initial document size vik . At time t ≥ Uik , the cumulative service received by this flow during [Uik , t] equals Si (Uik , t) ∧ vik . The amount of service still required therefore equals (vik −Si (Uik , t))+ . (Once this latter quantity becomes zero, the flow becomes inactive, i.e., it departs from the system.) For t ≥ 0, k ≤ Ei (t), and l ≤ Zi (0), define the residual document size at time t of the kth flow for route i, and the lth initial flow for route i, by  + vik (t) = vik − Si (Uik , t) , (2.7)  + v˜il (t) = v˜il − Si (0, t) . The measure valued process Z = (Z1 , . . . , ZI ) is called the state descriptor; it tracks the residual document sizes of the flows for all routes at any given time. Let δx+ ∈ M denote the Dirac measure at x if x ∈ (0, ∞) and let δ0+ = 0. For t ≥ 0 and i ≤ I, Zi (0)

(2.8)

Zi (t) =



δv+ ˜il (t) +

l=1

Ei (t)



δv+ik (t) .

k=1

We can recover Z from Z by (2.9)

Zi (t) = 1, Zi (t),

for all t ≥ 0, i ∈ I.

The process T takes values in D([0, ∞), RI+ ) and tracks the cumulative bandwidth allocated to each route. For t ≥ 0 and i ≤ I,  t (2.10) Ti (t) = Λi (Z(s)) ds. 0

The process U takes values in D([0, ∞), RJ+ ) and tracks the cumulative unused bandwidth capacity of each resource. For t ≥ 0, (2.11)

U (t) = Ct − AT (t).

Fluid Model for Bandwidth Sharing

259

The process W takes values in D([0, ∞), RI+ ) and tracks the immediate amount of work still to be transferred on each route. For t ≥ 0, W (t) = χ, Z(t).

(2.12)

Recall that χ(x) = x and that integration against the vector of measures Z(t) is interpreted componentwise. 3. Fluid model Fix a network structure (A, C) and a weighted α-fair bandwidth sharing policy Λ with parameters (α, κ). This section defines a fluid analogue of the stochastic model introduced in Section 2.3. In [8], under mild assumptions, it was shown that this fluid model is a first order approximation (under functional law of large numbers scaling) to the stochastic model. For details of when this approximation holds, we refer the reader to [8]. As in the stochastic model, fix a vector of positive arrival rates ν = (ν1 , . . . , νI ) and a vector of probability measures ϑ = (ϑ1 , . . . , ϑI ) in MI , satisfying the assumptions of Section 2. Recall that μi = χ, ϑi −1 and ρi = νi /μi for each i ≤ I. The fluid model consists of a deterministic measure valued function of time, called the fluid model solution, and a collection of auxiliary functions of time defined below. Definition 3.1 Given a continuous function ζ : [0, ∞) → MI , define the auxiliary functions (z, τ, u, w) of ζ, with respect to the data (A, C, α, κ, ν, ϑ), by z(t) = 1, ζ(t),  t  Λi (z(s))1(0,∞) (zi (s)) + ρi 1{0} (zi (s)) ds, i ≤ I, τi (t) = 0

u(t) = Ct − Aτ (t), w(t) = χ, ζ(t), for all t ≥ 0. Here z(t) and τ (t) take values in RI+ and u(t) takes values in RJ+ . On the other hand, w(t) takes values in [0, ∞]I , as the fluid model solution need not have finite first moments. The function ζ is the fluid analogue of the measure valued process Z. The functions z, τ, u, and w, are fluid analogues of the processes Z, T, U , and W , which keep track of queue length, cumulative bandwidth allocation, unused capacity and workload, respectively. The equation satisfied by τi may seem counterintuitive at first. However, the presence of the term involving ρi is accounted for by the fact that in passing to a fluid limit of the stochastic model, bandwidth allocations made when a queue length is near zero in the stochastic model are averaged with the zero bandwidth allocations made when a queue length is zero. The fact that ρi is the correct form here is related to the fact that when the fluid workload function wi is real-valued, at a positive time where it is differentiable (which occurs a.e.) and at which the value of wi is zero, the derivative of the workload function must be zero (cf. (3.2) below). The notion of a fluid model solution is defined via projections against test functions in the class C = {f ∈ C1b (R+ ) : f (0) = f  (0) = 0}.

260

H. C. Gromoll and R. J. Williams

Definition 3.2 A fluid model solution for the data (A, C, α, κ, ν, ϑ) is a continuous function ζ : [0, ∞) → MI that, together with its first three auxiliary functions (z, τ, u), satisfies (i) 1{0} , ζ(t) = 0 for all t ≥ 0, (ii) uj is non-decreasing for all j ≤ J, (iii) for each f ∈ C, i ≤ I, and t ≥ 0,  (3.1)

f, ζi (t) = f, ζi (0) −

0

t

Λi (z(s)) ds zi (s)  t + νi f, ϑi  1(0,∞) (zi (s)) ds.

f  , ζi (s)

0

Recall that in (3.1), the integrand in the first integral term is defined to be zero when its denominator is zero. When there is mass present in the system, the first integral term in (3.1) relates to the movement to the left of the random measure ζi at the processing rate of Λi (z(s))/zi (s), and the second integral term corresponds to new infusion of mass due to new arrivals coming at a rate of νi with a distribution of ϑi for route i. The appearance of the indicator function in the last term may seem counterintuitive. The correct form for this term is discerned using the fact that at a time t > 0 for which zi (t) = 0 and f, ζi (·) is differentiable, we must have that the time derivative of f, ζi (·) is zero. Since the integrand in the first integral term is zero by definition at such times, the same must be true for the second integral term. When the initial fluid workload is finite, we have the following result which is proved in [8]. Lemma 3.3 Suppose ζ is a fluid model solution with finite initial workload, i.e., wi (0) = χ, ζi (0) < ∞ for all i ≤ I. Then, for each i ≤ I and t ≥ 0,  t   ρi − Λi (z(s)) 1(0,∞) (zi (s)) ds wi (t) = wi (0) + 0

(3.2)

= wi (0) + ρi t − τi (t).

In particular, the fluid workload wi (t) is finite for all t ≥ 0 and i ≤ I. For later use, when ζ(·) is a fluid model solution with finite initial workload and fluid workload function w, we define υ : [0, ∞) → RJ+ by (3.3)

υ(t) = Aw(t),

t ≥ 0,

so that the jth component of υ(t) defines the fluid workload at resource j at time t. In other words, υ is a resource level workload, whereas w is a route level workload. 4. Fluid stability for some network topologies In this section, we use Lyapunov functions to show stability of fluid model solutions with finite initial workload for two types of network topologies, linear networks and tree networks, under the nominal condition:  (4.1) Aji ρi < Cj for all j ≤ J, i≤I

Fluid Model for Bandwidth Sharing

261

Fig 1. A linear network with 3 resources (denoted by circles) and 4 routes (denoted by line segments)

i.e., the average load placed on each resource is less than its capacity. (We note that it follows from the characterization of invariant states for the fluid model given in [8] that under this nominal condition, the only invariant state is the zero state.) We assume that (4.1) holds henceforth. Let ⎛ ⎞  (4.2) ε = min ⎝Cj − Aji ρi ⎠ , j≤J

i≤I

so that ε > 0. 4.1. Linear network A linear network consists of J resources and I = J + 1 routes where route j consists of resource j alone for j = 1, . . . , J and route J + 1 consists of all of the J resources. A schematic of such a network is shown in Figure 1 for J = 3. Consider a fluid model solution ζ with finite initial workload w(0) = χ, ζ(0) and associated resource level workload function υ as defined in (3.3). Consider the Lyapunov function H : RJ+ → R+ defined by (4.3)

H(υ) = max υj . j≤J

A Lipschitz continuous function x : [0, ∞) → R is absolutely continuous, hence it is differentiable almost everywhere and it can be recovered by integration from its a.e. defined derivative. We call a point at which such a Lipschitz continuous function is differentiable a regular point for the function. The auxiliary functions τi : [0, ∞) → R+ , i ≤ I, are Lipschitz continuous, and hence so too are uj , j ≤ J, wi , i ≤ I and υj , j ≤ J. The function H(·) is Lipschitz continuous and hence so too is H(υ(·)). Let t > 0 be a regular point for H(υ(·)), τi , wi : i ≤ I, uj , υj : j ≤ J, such that for all i ≤ I, (4.4)

τ˙i (t) = Λi (z(t))1(0,∞) (zi (t)) + ρi 1{0} (zi (t)),

(such points occur a.e.). Suppose that H(υ(t)) > 0 and let Jt = {j ≤ J : H(υ(t)) = υj (t)}. Then, H(υ(t)) = υj (t) for j ∈ Jt , / Jt , H(υ(t)) > υj (t) for j ∈

262

H. C. Gromoll and R. J. Williams

and by the fact that t > 0 is a regular point for H(υ(·)) and υj , j ∈ Jt , we have (cf. [6], Section 3), d H(υ(t)) = υ˙ j (t) dt

(4.5)

for all j ∈ Jt .

Now, by Lemma 3.3 and (4.4),   Aji w˙ i (t) = Aji (ρi − Λi (z(t)))1(0,∞) (zi (t)). (4.6) υ˙ j (t) = i≤I

i≤I

We consider two cases. Case (a). Suppose zj (t) > 0 for some j ∈ Jt . Then by the definition of Λ(·) and the fact that route j just contains resource j, it follows that the full capacity of resource j will be used by Λ(z(t)), i.e.,  Aji Λi (z(t))1(0,∞) (zi (t)) = Cj . i≤I

Thus, for this j ∈ Jt , (4.6) becomes  Aji ρi 1(0,∞) (zi (t)) − Cj υ˙ j (t) = i≤I





Aji ρi − Cj

i≤I

≤ −ε < 0, by the assumption (4.1). It follows that in Case (a), d H(υ(t)) ≤ −ε. dt Case (b). Suppose zj (t) = 0 for all j ∈ Jt . Then wj (t) = 0 for all j ∈ Jt and since υj (t) = wj (t) + wJ+1 (t),

j ≤ J,

we have υj (t) = wJ+1 (t)

for all j ∈ Jt .

Since wJ+1 (t) ≤ υl (t) < υj (t)

for all l ∈ / Jt , j ∈ Jt ,

it follows that Jt = {1, . . . , J}, and υj (t) = wJ+1 (t)

for all j ≤ J.

By Lemma 3.3 and (4.4), since H(υ(t)) = wJ+1 (t) > 0 and hence zJ+1 (t) > 0, we have (4.7)

w˙ J+1 (t) = ρJ+1 − ΛJ+1 (z(t)).

Since zj (t) = 0 for all j ≤ J, Λj (z(t)) = 0 for all j ≤ J, and it follows from the definition of Λ(z(t)) as the solution of an optimization problem where at least one constraint must be binding, that there is at least one j ≤ J such that ΛJ+1 (z(t)) = Cj .

Fluid Model for Bandwidth Sharing

263

Fig 2. A tree network with 4 resources and 3 routes

Here Cj > ρj + ρJ+1 by (4.1). It follows that, for this j, w˙ J+1 (t) = ρJ+1 − Cj < ρj + ρJ+1 − Cj ≤ −ε < 0. Hence in Case (b), d H(υ(t)) = w˙ J+1 (t) ≤ −ε. dt Thus, in either Case (a) or (b), at the regular point t > 0, d H(υ(t)) ≤ −ε dt

when H(υ(t)) > 0.

Since H(υ(·)) is non-negative, it follows from Lemma 2.2 of Dai and Weiss [6] that H(υ(t)) = 0

for all t ≥ δ = H(υ(0))/ε.

We summarize the above analysis as follows. Lemma 4.1 Consider a linear network satisfying the condition (4.1) and let ε > 0 be as defined in (4.2). Suppose that ζ is a fluid model solution with finite initial workload w(0) = χ, ζ(0). Then ζ(t) = 0

for all t ≥ δ,

where δ = maxj≤J υj (0)/ε. In the above sense, the fluid model for any linear network is stable under the natural condition (4.1). 4.2. Tree network As pointed out by Bonald and Prouti`ere [2], tree networks, as illustrated in Figure 2, are practically interesting as they may represent an access network consisting of several multiplexing stages. Furthermore [2], they typically exhibit sensitivity to document size distributions. A tree network consists of J ≥ 2 resources and I = J − 1 routes such that a single resource (labeled J and referred to as the trunk) belongs to all routes and each of the other resources (labeled by 1, . . . , J − 1) belongs to a single route. Proceeding in a similar manner to that for the linear network, consider a fluid model solution ζ with finite initial workload χ, ζ(0). We use the total workload J−1 function H : R+ → R+ defined by (4.8)

H(w) =

J−1  i=1

wi

264

H. C. Gromoll and R. J. Williams

as a Lyapunov function. Note that H(w(·)) = υJ (·), the resource level workload for the trunk resource J. Suppose t > 0 is a regular point for τi , i ≤ J − 1, such that for all i ≤ J − 1, τ˙i (t) = Λi (z(t))1(0,∞) (zi (t)) + ρi 1{0} (zi (t)),

(4.9)

(such points t occur a.e.) Then t is a regular point for all wi , i ≤ J − 1. Suppose H(w(t)) > 0. Then by Lemma 3.3 and (4.9) we have,  d H(w(t)) = (ρi − Λi (z(t)))1(0,∞) (zi (t)). dt

(4.10)

i≤J−1

We consider two cases. Case (a). Suppose



Λi (z(t))1(0,∞) (zi (t)) = CJ .

i≤J−1

Then by (4.10) and (4.1) with j = J, we have  d H(w(t)) ≤ ρi − CJ ≤ −ε. dt i≤J−1

Case (b). Suppose



Λi (z(t))1(0,∞) (zi (t)) < CJ .

i≤J−1

Then, by the definition of Λ(z(t)), we must have Λi (z(t)) = Ci

(4.11)

for those i ≤ J − 1 satisfying zi (t) > 0.

For if not, the value of Λi (z(t)) could be increased on some non-empty route i without exceeding the capacity of the resources i and J on that route. From (4.10) and (4.11), it follows that  d H(w(t)) = (ρi − Ci )1(0,∞) (zi (t)) ≤ −ε < 0, dt i≤J−1

since ρi < Ci for all i ≤ J − 1 by (4.1), and since zi (t) > 0 for some i ≤ J − 1 as H(w(t)) > 0. Thus, in either Case (a) or (b), d H(w(t)) ≤ −ε < 0, dt

when H(w(t)) > 0.

Since H(w(·)) is non-negative, it follows from Lemma 2.2 of [6] that H(w(t)) = 0 for all t ≥ δ = H(w(0))/ε. We summarize the above analysis as follows. Lemma 4.2 Consider a tree network satisfying the condition (4.1) and let ε > 0 be as defined in (4.2). Suppose that ζ is a fluid model solution with finite initial workload w(0) = χ, ζ(0). Then

where δ =

ζ(t) = 0 for all t ≥ δ,

% i≤J−1

wi (0)/ε.

Fluid Model for Bandwidth Sharing

265

References ´, L. (2001). Impact of fairness on Internet per[1] Bonald, T. and Massoulie formance. In Proceedings of ACM Sigmetrics 2001. [2] Bonald, T. and Prouti´ ere, A. (2003). Insensitive bandwidth sharing in data networks. Queueing Systems 44 69–100. [3] Bramson, M. (2005). Stability of networks for max-min fair routing. Presentation at the 13th INFORMS Applied Probability Conference, Ottawa. [4] Chiang, M., Shah, D. and Tang, A. K. (2006). Stochastic stability under network utility maximization: general file size distribution. In Proceedings of 44th Allerton Conference on Communication, Control and Computing. [5] Dai, J. G. (1995). On positive Harris recurrence of multiclass queueing networks: a unified approach via fluid limit models. Annals of Applied Probability 5 49–77. [6] Dai, J. G. and Weiss, G. (1996). Stability and instability of fluid models for re-entrant lines. Mathematics of Operations Research 21 115–134. [7] de Veciana, G., Konstantopoulos, T. and Lee, T.-J. (2001). Stability and performance analysis of networks supporting elastic services. IEEE/ACM Transactions on Networking 9 2–14. [8] Gromoll, H. C. and Williams, R. J. (2008). Fluid limits for networks with fair bandwidth sharing and general document size distributions. Ann. Appl. Probab. To appear. [9] Kelly, F. P. and Williams, R. J. (2004). Fluid model for a network operating under a fair bandwidth sharing policy. Annals of Applied Probability 14 1055–1083. [10] Lakshmikantha, A., Beck, C. L. and Srikant, R. (2004). Connection level stability analysis of the internet using the sum of squares (sos) techniques. In Conference on Information Sciences and Systems, Princeton. [11] Lin, X., Shroff, N. and Srikant, R. (2008). On the connection-level stability of congestion-controlled communication networks. IEEE Transactions on Information Theory 54 2317–2338. [12] Massouli´ e, L. (2007). Structural properties of proportional fairness: stability and insensitivity. Annals of Applied Probability 17 809–839. [13] Massouli´ e, L. and Roberts, J. (2000). Bandwidth sharing and admission control for elastic traffic. Telecommunication Systems 15 185–201. [14] Mo, J. and Walrand, J. (2000). Fair end-to-end window-based congestion control. IEEE/ACM Transactions on Networking 8 556–567.

IMS Collections Markov Processes and related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 267–283 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000426

No Arbitrage and General Semimartingales Philip Protter1,∗,† and Kazuhiro Shimbo2,∗ Cornell University and Mizuho Alternative Investments, LLC Abstract: No free lunch with vanishing risk (NFLVR) is known to be equivalent to the existence of an equivalent martingale measure for the price process semimartingale. We give necessary conditions for such a semimartingale to have the property NFLVR. We also extend Novikov’s criterion for the stochastic exponential of a local martingale to be a martingale to the general case; that is, the case where the paths need not be continuous.

1. Introduction The question of whether the absence of arbitrage is equivalent to the existence of an equivalent measure has now been clarified for some time, in the papers of Delbaen and Schachermayer ([5] and [6]). They showed that one has no arbitrage in the sense of no free lunch with vanishing risk if and only if there exists an equivalent probability measure rendering the price process a sigma martingale. (In the continuous case, all sigma martingales are local martingales.) Their conditions, known by its acronym NFLVR, imply also that the price process must be a semimartingale, as a consequence of the Bichteler-Dellacherie theorem. Therefore a natural question arises: which semimartingales actually satisfy NFLVR, and thus can be used to model price processes in arbitrage free models? To analyze this, one wants to give conditions on the original semimartingale which imply that it is a sigma martingale after one changes to a risk neutral measure. Once one has the risk neutral measure, checking when a semimartingale is a sigma martingale follows from Proposition 6.35 on page 215 of [10]; what we are concerned with here is giving the conditions on the original semimartingale, before the change to the risk neutral measure. Partial results in this direction have been obtained by E. Strasser (see [28]) in the continuous case, and also by E. Eberlein and J. Jacod (see [9]) in the case of geometric L´evy processes. In the first half of this paper we consider the general situation and obtain primarily necessary conditions for a semimartingale price process to engender a model without arbitrage. Our primary result in this half is Theorem 2. When dealing with sufficient conditions, some difficult issues arise: how does one find an equivalent sigma martingale measure? Obvious constructions lead to ∗ Tom Kurtz has been a leader in the field of stochastic processes his entire career, and without his vision, guidance, and generosity, the field would be much the poorer. The authors wish to dedicate this paper to Tom Kurtz on his sixty-fifth birthday. † Supported in part by NSF grant DMS-0202958 and NSA grant H98230-06-1-0079 1 ORIE, Cornell University, Rhodes Hall, Ithaca, NY 14853-3801 USA, e-mail: [email protected] 2 Mizuho Alternative Investments, LLC, 1251 Avenue of the Americas, New York, NY 10020 USA, e-mail: [email protected] JFL Classification G10 AMS 2000 subject classifications: Primary 60G44, 62P05 Keywords and phrases: equivalent martingale measure, Girsanov’s theorem, no free lunch with vanishing risk, semimartingales, Novikov’s condition

267

268

Philip Protter and Kazuhiro Shimbo

measures which a priori could be sub probability measures, and not true probability measures. The Radon Nikodym densities of these measures can often be constructed as stochastic exponentials of local martingales. A classic tool (in the continuous case) used to verify that the exponential of a local martingale is itself a martingale, and not just a supermartingale, is Novikov’s theorem. Often Novikov’s theorem is insufficient, but it is always appealing due to its simple nature and ease of computation. In the second half of this paper we propose an analog of Novikov’s criterion for the general case (that is, the case with jumps). Our results build on the pioneering work of J. M´emin, A.S. Shiryaev, and their co-authors. Our primary result in this half of the paper is Theorem 9. Acknowledgement We wish to thank Jean Jacod for enlightening discussions concerning the first half of this paper. We also wish to thank Darrell Duffie for complaining years ago to the first author that there was no good Novikov type criterion for the general case. 2. Necessary Conditions for No Arbitrage 2.1. The Continuous Case Let Xt = X0 + Mt + At , t ≥ 0 be a continuous semimartingale on a filtered probability space (Ω, F, F, P ) where F = (Ft )t≥0 . Here M represents the continuous local martingale part and A is a process with paths of finite variation on compact time sets, almost surely. We seek necessary and sufficient conditions such that there exists an equivalent probability measure P ∗ such that X is a P ∗ sigma martingale. Since X is continuous, and since all continuous sigma martingales are in fact local martingales, we need only concern ourselves with local martingales. Theorem 1 below is essentially already well known. See for example ([27]), Theorem 1, which itself has its roots in ([1]); we include it here for the reader’s convenience, and since it illustrates what we are trying to do in Theorem 2. Theorem 1. Let Xt = X0 + Mt + At , 0 ≤ t ≤ T be a continuous semimartingale on a filtered probability space (Ω, F, F, P ) where F = (Ft )0≤t≤T . Let Ct = [X, X]t = [M, M ]t , 0 ≤ t ≤ T . There exists an equivalent probability measure P ∗ on FT such that X is a P ∗ sigma martingale only if 1. dA ! dC a.s.; t T 2. If J is such that At = 0 Js dCs for 0 ≤ t ≤ T , then 0 Js2 dCs < ∞ a.s.; If in addition one has the condition below, then we have sufficient conditions for there to exist an equivalent probability measure P ∗ on FT such that X is a P ∗ sigma martingale: 3. E{E(−J · M )T } = 1, where E(U ) denotes the stochastic exponential of a semimartingale U . Proof. Suppose there exists P ∗ equivalent to P such that X is a P ∗ local martingale. ∗ Let Z = dP dP and let Zt = E{Z|Ft } for all t, 0 ≤ t ≤ T . We then have, by Girsanov’s theorem, that the decomposition of X under P ∗ is given by:  t  t 1 1 (1) Xt = X0 + Mt − d[Z, M ]s + At + d[Z, M ]s . 0 Zs 0 Zs

No Arbitrage and General Semimartingales

269

Since X is a P ∗ local martingale and continuous semimartingales have unique decompositions of the type (1), we conclude that we must have  t 1 d[Z, M ]s (2) At = − 0 Zs and since further, by the Kunita-Watanabe inequality, we have d[Z, M ] ! d[M, M ] a.s., we conclude that for some predictable process J that  t (3) At = Js d[M, M ]s . 0

Since Z is a strictly positive P martingale, we can write it as a solution of an exponential equation. (Note that even though X is assumed to be a continuous semimartingale, that does not imply that Z too must be continuous.)  Zt = 1 +

(4)

t

Zs− dYs ,

Z0 = 1

0

for a local martingale Y with  t Y0 = 0. This Y is often called a stochastic logarithm of Z and is given by Y = 0 (1/Zs− )dZs . A local martingale Y has a decomposition Y = Y c + Y d where Y c is a continuous martingale part of Y and Y d is a purely discontinuous martingale part of Y . (See [10, P.85]) Since Y c is continuous, it is locally square integrable local martingale. Therefore we have a unique representation of the form  t Hs dMs + Ntc (5) Ytc = 0

where H is a predictable process such that the stochastic integral in (5) exists, and N c is a continuous local martingale orthogonal to H ·M in the sense that [H ·M, N ]c is a local martingale. Since the stochastic integral in (5) exists, we have of necessity t that 0 Hs2 d[M, M ]s < ∞ a.s. for each t, 0 ≤ t ≤ T . Let’s let N = Ntc + Ytd . Then [H · M, N ] = [H · M, N c ] + [H · M, Y d ] = [H · M, N c ].

(6)

It follows that [H · M, N ] is also a local martingale and we have a decomposition of Y into two orthogonal components:  t

 t  t  c  c d d (7) Yt = Hs dMs + Nt + Yt = Hs dMs + Nt + Yt = Hs dMs + Nt . 0

0

0

We next apply the Meyer-Girsanov theorem to calculate the decomposition of X under P ∗ . (Since M is continuous there is no issue about the existence of dZ, M s .) We get:  t  t 1 1 Xt = X0 + Mt − dZ, M s + At + dZ, M s . 0 Zs− 0 Zs− By the uniqueness of the decomposition, we must have  t 1 dZ, M s = −M, Y  = −M, H · M + N  At = − Z s− 0  t  t Hs d[M, M ]s = Js dCs , =− 0

0

270

Philip Protter and Kazuhiro Shimbo

and by the definition of C we have −H = J, d[M, M ]s dP almost everywhere. Thus t 2 J d[M, M ]s < ∞ for all t, 0 ≤ t ≤ T . This gives the necessity. 0 s For the sufficiency, let us take Z to be the stochastic exponential of −J · M . Applying the Meyer-Girsanov theorem we again have  t  t 1 1 dZ, M s + At + dZ, M s Xt = X0 + Mt − 0 Zs− 0 Zs− t and by construction we have that {At + 0 Z1s− dZ, M s } = 0. The process Z is a strictly positive local martingale with Z0 = 1, hence it is a positive supermartingale, and it is a martingale as soon as E{Zt } = 1 for all t, 0 ≤ t ≤ T . If Z is known to be a martingale on [0, T ] then we define P ∗ by dP ∗ = ZT dP , and we can conclude t that {Mt − 0 Z1s− dZ, M s } is a local martingale under P ∗ . However the third hypothesis guarantees that Z is a martingale and hence that P ∗ is a probability measure (and not a sub probability measure), and we have sufficiency. Remark 1. The sufficiency is not as useful in practice as it might seem. The first two conditions should be, in principle, possible to verify, but the third condition is in general not. Depending on the structure of Y , different techniques are available. An obvious one is Novikov’s condition, but while easy to state, this too is difficult to verify in practice. Remark 2. If condition (2) of Theorem 1 is satisfied for all ω (instead of P a.s.), then condition (3) is automatically satisfied. (See, for example, [20]) This is sufficient but not necessary in general. This difference seems subtle but plays an important role. Essentially this is because a probability measure P ∗ such that X is P ∗ -local martingale, if it exists, is not necessarily equivalent to P in general a priori. Remark 3. Condition (1) is often called a structure condition (SC) in the literature. See for example Schweizer [26, page 1538]. Also see Jarrow and Protter (2004) [11] for a constructive example of an arbitrage opportunity when this condition is violated. Remark 4. In an interesting paper, Strasser [28] discusses a similar problem in the case of continuous semimartingales. She focuses on the condition (1), and does not take the approach we take here. 2.2. General Case The techniques used in the continuous case break down in the general case (ie, the case with jumps). The reason is that to use formally the same ideas, one would need to use the Meyer-Girsanov theorem, which requires the existence of the process Z, M . When M has continuous paths, such a process always exists, even if Z can have jumps. But if both Z and M have jumps, then the process Z, M  exists if and only if the process [Z, M ] is locally integrable, which need not in general be the case. (We mention here that [Z, M ] is called locally integrable if there exists a sequence of stopping times (τn )n≥1 such that τn−1 ≤ τn a.s. for each n ≥ 1, limn→∞ τn ≥ T a.s., and E{[Z, M ]τn } < ∞ for each n ≥ 1). A technique developed to circumvent this kind of technical integrability problem is that of random measures, and in particular the use of the characteristics of a semimartingale. We assume the reader is familiar with the basic definitions and theorems concerning the characteristics

No Arbitrage and General Semimartingales

271

of a semimartingale. We refer the reader to (for example) [10] for an expository treatment of them. Let X be an arbitrary semimartingale with characteristics (B, C, ν) on our usual filtered probability space (Ω, F, F, P ), F = (Ft )t≥0 . Then there exists a predictable process At with A0 = 0 such that  t  t (8) ν(ds, dx) = Ks (ω, dx)dAs (ω), Ct = cs dAs , Bt = bs dAs . 0

0



Let P be another probability measure equivalent to P . Then of course X is a semimartingale under P ∗ , with characteristics (B ∗ , C, ν ∗ ). (We write C instead of C ∗ because it is the same process for any equivalent probability measure.) We then know (see Theorem 3.17 on page 170 of [10]) that the random measure ν ∗ is absolutely continuous with respect to ν, and that there exists a predictable process (predictable in the extended sense) Y (s, x)s≥0,x∈R such that ν ∗ = Y · ν.

(9)

We have the following theorem, which gives necessary conditions for X to have no arbitrage in the Delbaen-Schachermayer sense of “No free lunch with vanishing risk,” hereafter abbreviated as NFLVR. See Delbaen and Schachermayer [5] or alternatively [12]. One can also consult [13]. In Kabanov’s paper [13], conditions are given for a semimartingale to be a sigma martingale; these are also given in [10]. In the theorem below we present conditions on the semimartingale such that it is not necessarily a sigma martingale, but that it is one when viewed under a risk neutral measure, which of course is a different situation. The authors just learned that Karatzas and Kardaras ([15]) have recently obtained similar results, although in a different context. Theorem 2. Let X be a P semimartingale with characteristics (B, C, ν). For X to have an equivalent sigma martingale measure and hence satisfy the NFLVR condition, there must exist a predictable process β = (βt )t≥0 and an (extended) predictable process Y (·, t, x) such that following four conditions are satisfied:  1. bt + βt ct + {x(Y (t, x) − 1{|x|≤1} )Kt (dx) = 0; P (dω)dAs (ω) almost everywhere; T 2. 0 βs2 dCs < ∞, a.s.;  3. ΔAt > 0 implies that xY (s, x)K(s, dx) = 0; 4. |x2 | ∧ |x|Y (t, x)Kt (dx) < ∞, P (dω)dAs (ω) almost everywhere, where the predictable process At is defined by (8). Proof. Our primary tool will be the Jacod-M´emin version of a Girsanov theorem with characteristics (see Theorem 3.24 on page 172 of [10]). Let P ∗ be an equivalent sigma martingale measure. Let (B ∗ , C ∗ , ν ∗ ) be the characteristics of X under P ∗ . Then there exist cs , b∗s , Ks∗ such that (10)

Ct∗ = cs dAs , Bt∗ =



t

b∗s dAs ; ν ∗ (ds, dx) = dAs Ks∗ (ω, dx).

0

Note that in the above, we write c and not c∗ , and also A and not A∗ , since under our hypothesis, we can take A∗ = A. In addition the process C does not change under an equivalent change of measure. We next invoke Proposition 6.35 on page 215 of [10] to conclude that X is a P ∗ sigma martingale if and only if

272

Philip Protter and Kazuhiro Shimbo

 1. b∗t + x1{|x|>1} Kt∗ (dx) P (dω)dAs (ω), almost everywhere;  = 0, ∗ 2. When ΔA > 0 then xK (dx) = 0; and t t  3. |x2 | ∧ |x|Kt∗ (dx) < ∞, P (dω)dAs (ω), almost everywhere. We wish to interpret these three conditions in terms of the original characteristics under P . We know from the continuous case that for C we need a new predictable ∗ process coming from the density process Z of dP dP , which we denote β, with the t 2 t 2 property that 0 βs dCs = 0 βs cs dAs < ∞ a.s. We also use a key fact that ν ∗ must be absolutely continuous with respect to ν and there must exist Y = Y (s, x), predictable in the extended sense, such that (11)

ν ∗ = Y · ν.

This is proved in Theorem 3.17 on page 170 of [10]. (We remark that both β and ∗ Y derive from the P martingale Z where ZT = dP P , with β coming from the continuous martingale part of Z, and Y coming from the ‘purely discontinuous’ part of Z.) Moreover since for any bounded U we have  t  t U (ω, s, x)ν ∗ (dx, ds) = U (ω, s, x)dAs Ks∗ (dx) 0 0  t = U (ω, s, x)Y (ω, s, x)dAs Ks (dx) 0

we can conclude that K ∗ = Y · K, Now we need only to re-express the three conditions in (2.2) to conclude that we must have:  1. bt + βt ct + {x(Y (t, x) − 1{|x|≤1} )Kt (dx) = 0; P (dω)dAs (ω) almost everywhere; T 2. 0 βs2 dCs < ∞, a.s.;   3. ΔAt > 0 implies that  xKt∗ (dx) = xY (s, x)K(s, dx) = 0; 4. |x2 | ∧ |x|Kt∗ (dx) = |x2 | ∧ |x|Y (t, x)Kt (dx) < ∞, P (dω)dAs (ω) almost everywhere.

Corollary 3. Let X be a semimartingale as in Theorem 2. Suppose in addition that F is a quasi-left continuous filtration. If X is a P ∗ sigma martingale, then we must have the following three conditions satisfied:  1. bt + βt ct + {x(Y (t, x) − 1{|x|≤1} )Kt (dx) = 0; P (dω)dAs (ω) almost everywhere; T 2. 0 βs2 dCs < ∞, a.s.; 3. |x2 | ∧ |x|Y (t, x)Kt (dx) < ∞, P (dω)dAs (ω) almost everywhere. Proof. We are able to remove the condition on the jumps of A because if F is quasi-left continuous, then A does not jump, it being increasing and predictably measurable. Corollary 4. Let X be a semimartingale as in Theorem 2. If X is a P ∗ local martingale, then we must have the following three conditions satisfied:  1. bt + βt ct + {x(Y (t, x) − 1{|x|≤1} )Kt (dx) = 0; P (dω)dAs (ω) almost everywhere; T 2. 0 βs2 dCs < ∞, a.s.;

No Arbitrage and General Semimartingales

3. ΔAt > 0 implies that



273

xY (s, x)K(s, dx) = 0;

and if the filtration F is quasi-left continuous, we must have the following two conditions satisfied:  1. bt + βt ct + {x(Y (t, x) − 1{|x|≤1} )Kt (dx) = 0; P (dω)dAs (ω) almost everywhere; T 2. 0 βs2 dCs < ∞, a.s. Proof. This follows from Theorem 2 and Proposition 6.35 on page 215 of [10]. That quasi-left continuity of F implies we can drop the condition on the jumps of A is a trivial consequence of A not having jumps when the filtration is quasi-left continuous. Remark 5. Comparing Theorem 1 and Theorem 2 illustrates how incompleteness of the market corresponding to the price process X can arise in two different ways. Theorem 1 shows that (in the continuous case) the choice of the orthogonal martingale N is essentially arbitrary, and each such choice potentially leads to a different equivalent probability measure rendering X a local martingale. Theorem 2 shows that in the general case (the case where jumps are present) incompleteness can still arise for the same reasons as in the continuous case, but also because of the jumps, through the choice of Y . Indeed, we are free to change Y appropriately at the cost of changing b. Only if K reduces to a point mass is it then possible to have uniqueness of P ∗ (and hence market completeness), and then of course only if C = 0. Remark 6. For the special case where X is a geometric L´evy process, Eberlein and Jacod [9] give a necessary and sufficient condition for the existence of an equivalent martingale measure. We can derive a structure condition for the general case, with an additional hypothesis involving integrability. Theorem 5. Let X be a special semimartingale with characteristics (B, C, ν). Then X has a canonical decomposition X = X0 + M + A. Assume (x2 ∧ x) ∗ νt < ∞. If there exists P ∗ such that X is P ∗ local martingale, then ; < (12) dAt ! d Ct + (x2 ∧ x) ∗ νt . In particular if X is locally square integrable then M, M  exists and (13)

dAt ! dM, M t

a.s.

Proof. Suppose an equivalent local martingale local measure P ∗ exists. Let (B, C, ν) and (B ∗ , C, ν ∗ ) be characteristics of X under P and P ∗ with truncation function h(x) = 1{|x|≤1} . Let μ be a jump measure of X. Since X is P ∗ -local martingale, (x1{|x|>1} ) ∗ μ is P ∗ -locally integrable and X has a representation: (14)

Xt = X0 + Xtc∗ + x ∗ (μ − ν ∗ )t .

Since P ! P ∗ by hypothesis, applying Girsanov’s theorem (Theorem 3.24 in page 172 of [10]), there exists a predictable process β  and P ⊗ B(R) measurable nonnegative function Y  such that (15) (16)

B = B ∗ + β  · [X c , X c ] + 1{|x|1} ∗ μ is P -locally integrable and (17)

Xt = X0 +

Xtc





+ x ∗ (μ − ν)t + x ∗ (ν − ν ) +

t

βs dCs = X0 + Mt + At .

0

By the uniqueness of Doob–Meyer decomposition, we have  t (18) At = x(1 − Y  ) ∗ νt + βs dCs . 0

Then clearly At ! d((x2 ∧x)∗νt +Ct ). Finally suppose in addition that X is locally square integrable. Then [M, M ] is locally integrable and M, M  = C +x2 ∗ν exists. It is clear from (18) that dAt ! dM, M t , a.s. Remark 7. The case when X is locally bounded (and hence X is a special semimartingale such that M is automatically a locally square integrable local martingale) is shown by Delbaen and Schachermayer [7, Theorem 3.5]. Theorem 5 extends their result to the case when X is not necessarily locally bounded. In addition, Theorem 5 does not depend on the notion of admissibility. Remark 8. The structure condition has a clear economic On the  interpretation.  set E such that E dM, M  = 0, M is constant and P ( E R |x|μ(dx, dt) = 0) = 1 where μ is a jump measure of X. Therefore any trading strategy supported on E is risk-free in the sense that any movement of X comes from the predictable component A and hence we can construct a trading strategy which takes advantage of the information of an infinitesimal future. Indeed it is easy to construct such a trading strategy to exploit an arbitrage opportunity if dA ! dM, M : Consider a price process X on a finite time horizon [0, T ]. Without loss of generality, we assume that A is an increasing predictable process. Suppose there exists a set E ∈ B(R+ ) suchthat E dM, M s = 0 but P ( E dAs > 0) > η for some η > 0. Let Act = At − 0≤s≤t 'As . Let ht be a predictable process defined by ht = Act 1E∩{t:At =0} + sgn('At )1{At =0} .

(19)

The following equation is well defined:  T   (20) hs dXs = hs dMs + E

0

=

1 2

Acs dAcs +

E∩{At =0}



d((Acs )2 ) + E





|'As |

s≤T

|'As |.

s≤T

T T Therefore P ( 0 hs dXs ≥ 0) = 1 and P ( 0 hs dXs > 0) > η > 0. Since (h · X)t ≥ 0 for all t ∈ [0, T ], h is a 0-admissible trading strategy and hence this is an arbitrage opportunity. 3. Stochastic exponential of local martingales 3.1. Definition and notations One of the key components of the sufficient conditions for no arbitrage is that a martingale density Z be a true martingale. However it is not easy to verify this

No Arbitrage and General Semimartingales

275

directly in general. The literature is rich on this topic especially for the case when a martingale density Z is continuous. For example Novikov [22], Kazamaki [16], [17], [18], Cherney & Shiryaev [4] studied this question for the continuous case and derived several sufficient conditions in terms of integrability conditions. M´emin [21], L´epingle and M´emin [19], and Kallsen and Shiryaev [14] studied the same question in a general (non-continuous) setting. The purpose of this section is to show that a formula similar to the famous Novikov condition works in a general setting. More precisely we want to show that a Novikov-type condition E[exp{cM, M }] < ∞ for some c is sufficient to show that E(M ) is a martingale. This condition belongs to the predictable type introduced by Revuz and Yor [24]. It should be noted that a Novikov-type condition is often difficult (even in the continuous case) to apply directly. Therefore a common use of this type of condition occurs together with a localization argument. We illustrate this with Example 14. Let M = {Mt }t≥0 be a c`adl` ag local martingale vanishing at 0 on a given filtered probability space (Ω, Ft , F, P ). A process X = {Xt }t≥0 defined by (21)

1 Xt = exp Mt − [M, M ]ct 2

(1 + 'Ms ) exp(−'Ms ) s≤t

is called a Dol´eans exponential or the stochastic exponential of M , and it is denoted by E(M )t . Xt is also given as a solution of the stochastic differential equation (22)

dXt = Xt− dMt ,

X0 = 1.

Since X− is left continuous and therefore locally bounded, it follows that E(M ) is a local martingale in all cases. When 'Mt > −1 for all t a.s., it is a positive local martingale. By Fatou’s lemma, E(M ) is also a positive supermartingale. Throughout this paper, we assume that 'Mt > −1. 3.2. Results 3.2.1. L´evy process and additive martingales We start with a basic result shown by L´epingle and M´emin [19]. Theorem 6 (L´epingle and M´emin). Let M be a local martingale. If the compensator C of the process (23)

M c , M c t +



'Ms2 1{|Ms |≤1} +

s≤t



'Ms 1{|Ms |>1}

s≤t

# is bounded, then E[ [E(M ), E(M )]t ] < ∞. In particular, E(M )t is a martingale. Proof. See L´epingle and M´emin [19]. Although the requirement of boundedness looks strong, it is enough to show the following well known fact: Corollary 7. Let M be a L´evy martingale. Then E(M )· is a martingale.

276

Philip Protter and Kazuhiro Shimbo

Proof. Fix T > 0 and let Mt = MtT . Then by the L´evy decomposition theorem,  

 M t = Wt + x (N (·, [0, t], dx) − tν(dx)) + 'Ms 1{|Ms |≥1} − αt |x| −1. If ) 1 c c * d d (26) E e 2 M ,M T +M ,M T < ∞, where M c and M d are continuous and purely discontinuous martingale parts of M , then E(M ) is a martingale on [0, T ], where T can be ∞. Proof. Let f (x) = (1 + x) ln(1 + x) − x, g(x) = x2 . For x > −1, g(x) ≥ f (x). It follows that  Δ (27) Lt = {g('Ms ) − f ('Ms )} ≥ 0 s≤t

˜ t of Lt by is a locally integrable increasing process and there exists a compensator L     ˜t = the Doob–Meyer decomposition. Since L s≤t g('Ms ) − s≤t f ('Ms ) ≥ 0,     A B f ('Ms ) ≤ g('Ms ) = '[M, M ]s = M d , M d t . s≤t

s≤t

s≤t

No Arbitrage and General Semimartingales

277

* ) 1 c c d d Thus E e 2 M ,M T +M ,M T < ∞ implies the conditions of Theorem 8 and E(M ) is a martingale on [0, T ]. A natural question is whether we can improve the constant multiplying M d , M d .  (1−ε)M,M ∞ The next example shows that the answer is negative. Namely E e < ∞ for any ε > 0 is not sufficient in general. Example 10 (α < 1 is not sufficient). Let (Nt )t≥0 be a standard Poisson process. Define T b = inf {s : Ns − (1 − b)s = 1} for b ∈ (0, 1). Let Un = inf s {s : Ns = n}. C  b b and Unn → 1 almost Then P (T = ∞) = 0 since {T = ∞} = n Un ≥ n−1 1−b surely. Then NT b − (1 − b)T b = 1

(28)

a.s.

The moment generating function of Nt − (1 − b)t exists and for any λ ∈ R (29)

E [exp {−λ [Nt − (1 − b)t]}] = eλ(1−b)t E [exp (−λNt )] = exp {tf (λ)}

where f (λ) = e−λ + λ(1 − b) − 1

(30)

and Zt := exp {−λ{Nt − (1 − b)t} − tf (λ)} is a martingale. Since Zt is non-negative, by Doob’s supermartingale inequality, E(ZT b ) ≤ E(Z0 ) = 1. From (28), we obtain ; .< ; .< E exp −λ{NT b − (1 − b)T b } − T b f (λ) = E exp −λ − T b f (λ) ≤ 1 * ) b E e−T f (λ) ≤ eλ .

(31) b

Now define Mt = −a(Nt − t)T where a ∈ (0, 1). Mt is martingale and E(M )t = exp{Nt∧T b ln(1 − a) + a(t ∧ T b )}. ; < E [E(M )T b ] = E exp{NT b ln(1 − a) + aT b } (32) . .< ; -= E exp 1 + (1 − b)T b ln(1 − a) + aT b .< ; = (1 − a)E exp T b {(1 − b) ln(1 − a) + a} .  

1 1+b 1−a 1−a ∗ Let λ∗ = ln 1−a . Then by (30), −f (λ ) = − + (1 − b) ln 2 (1+b)/2 (1+b)/2 + 1. Next define k(a, b) = −f (λ∗ ) − {(1 − b) ln(1 − a) + a} . Simplifying terms, (33)

k(a, b) = (1 − b) ln



1 (1 + b)/2



 − (1 − a)

1 −1 . (1 + b)/2

ln{2/(1+b)} Let g(b) = 1 − (1−b) . k(a, b) > 0 if a > g(b). Observe that on {b : 0 < 2/(1+b)−1 b < 1}, g(b) is an increasing function of b and 1 − ln 2 < g(b) < 1. Thus for every b ∈ (0, 1), there exists a∗ ∈ (0, 1) such that for all a ≥ a∗ , k(a, b) > 0. Fix b and choose a so that k(a, b) > 0. Then by (31), ; .< E [E(M )T b ] = (1 − a)E exp T b {(1 − b) ln(1 − a) + a} .< ; ≤ (1 − a)E exp −T b f (λ∗ ) (34) ∗ 1+b < 1. ≤ (1 − a)eλ = 2

278

Philip Protter and Kazuhiro Shimbo

E(M ) is not a uniformly integrable martingale. M, M t = a2 (t∧Tb ) since a stopped predictable process is still predictable process. Finally, define h(b) by h(b) = b + (1 − b) ln(1 − b) = −f (− ln(1 − b)) = maxλ {−f (λ)} . For all b : 0 < b < 1, h(b) < g(b)2 . However for every ε > 0, there exists b∗ε < 1 2  ∗ such that b > b∗ε implies h(b)  > (1 − ε)g(b) . Fix ε > 0. Let’s choose b ∈ (bε , 1) and # a ∈ g(b ), h(b )/(1 − ε) ⊂ (0, 1) so that 

a h(b )

(35)

> g(b ) 2 > (1 − ε)a .

By (31), (36)

* ) * ) * ) 2  E e(1−ε)M,M ∞ = E e(1−ε)a Tb ≤ E e(1−ε)h(b )Tb
0, then α = 1/2. Corollary 11. Fix ε ∈ (0, 1]. Let M be a locally square integrable martingale such that 'M > −1 + ε. Then there exists α(ε) ∈ [1/2, 1] such that * ) 1 c c d d (37) E e 2 M ,M T +α(ε)M ,M T < ∞, implies that E(M )t is martingale on [0, T ], where T can be ∞. Proof. Let f (x) = (1+x) ln(1+x)−x, Then there exists α(ε) = inf{a : ax2 −f (x) ≥ 0 on x > −1 + ε} such that α(ε) ∈ [1/2, 1]. Especially when ε = 1, we can take α(ε) = 1/2 and α(ε) is a decreasing function of ε. Let g(x) = α(ε)x2 For x > 0, g(x) ≥ f (x). It follows that  Δ Lt = {g('Ms ) − f ('Ms )} ≥ 0 s≤t

˜ t of Lt is a locally integrable increasing process and there exists a compensator L     ˜ by the Doob–Meyer decomposition. Since Lt = s≤t g('Ms ) − s≤t f ('Ms ) ≥ 0     A B f ('Ms ) ≤ g('Ms ) = '[M, M ]s = M d , M d t . s≤t

s≤t

s≤t

; A B < Thus E exp{1/2 M c , M c t T + α(ε) M d , M d T } < ∞ implies the condition of theorem 8 and hence E(M ) is martingale on [0, T ]. Remark 9. This integrability approach provides sufficient but not necessary conditions. While it is possible to derive a sequence of sufficient conditions converging in some sense to a necessary and sufficient condition, those stronger conditions become more difficult to verify at the same time. For details on this issue, see Kallsen and Shiryaev [14]. In the continuous framework, it is well known that the Novikov condition is not optimal. The symmetric nature of a quadratic variation processes requires that if a

No Arbitrage and General Semimartingales

279

continuous martingale M satisfies Novikov’s condition, −M has to satisfy Novikov’s condition as well. This implies that Novikov’s condition is not applicable to identify a class of martingales M such that E(M ) is a martingale but E(−M ) is not. More generally, if there exists a predictable process hs such that s hs dMs is a continuous local martingale satisfying Novikov’s condition, then for all    integrable predictable gs ∈ L(M ) such that |gs |2 d[M, M ]s = h2s d[M, M ]s , E( gs dMs ) is a uniformly integrable martingale. See, for example, Stroock [29]. (The authors thank Marc Yor for calling this reference to their attention.) Some of these examples can be dealt with using a stronger condition derived in an integrability approach, such as Kazamaki’s condition. But other examples requires totally different approaches. See for example Lipster and Shiryaev [20], Cheridito, Filipovic, and Yor [3]. Despite these examples showing its limitations, a Novikov-type condition is the kind of condition that we could hope to verify in a practical setting. This is due to the fact that the condition is given in terms of an increasing process and the quadratic variation of Log(Z), where Log(·) denotes a stochastic logarithm. 3.3. Examples and applications The following example shows that when the stochastic exponential comes from a driving L´evy martingale, then the condition in Theorem 9 becomes easier to compute. (We could phrase this as “Let M be a L´evy local martingale . . . ” but a L´evy process which is a local martingale is a fortiori a martingale, so it is not any more general, and indeed misleading, to state this example for L´evy local martingales.) Example 12. Let M be a L´evy martingale with L´evy triplet (B, C, ν). Let h ∈  L(M ) be a predictable process such that X = hs dMs is locally square integrable t and 'Xt = ht 'Mt > −1. Then [X c , X c ]t = 0 h2s Cds, and  X , X  = d

(38)



t

h2s dM d , M d s

d

Let K = (39)

 R

x2 ν(dx). Then 9 E exp

h2s

=

0

0



1 C +K 2







t

2

x dν(dx) ds. R

:

T

h2s ds

0. For example if ξ is bounded or ξ = |χ| where χ is normally distributed, then this condition is satisfied. In this case, E(h · M ) is a martingale. References [1] Ansel, J.-P. and Stricker, C. (1992). Lois de martingale, densit´es et d´ecomposition de F¨ ollmer–Schweizer. Ann. Inst. H. Poincar´e Probab. Statist. 28 375–392. [2] Beneˇ s, V. E. (1971). Existence of optimal stochastic control laws. SIAM J. Control 9 446–472. [3] Cheridito, P., Filipovic, D. and Yor, M. (2005). Equivalent and absolutely continuous measure changes for jump-diffusion processes. Ann. Applied Probability 15 1713–1732. [4] Cherney, A. S. and Shiryaev, A. N. (2001). On criteria for the uniform integrability of Brownian stochastic exponentials. Optimal Control and Partial Differential Equations, in honour of Alain Bensoussan’s 60th birthday, 80–92. IOS Press. [5] Delbaen, F. and Schachermayer, W. (1994). A general version of the fundamental theorem of asset pricing. Math. Ann. 300 463–520. [6] Delbaen, F. and Schachermayer, W. (1998). The fundamental theorem for unbounded stochastic processes. Math. Ann. 312 215–250. [7] Delbaen, F. and Schachermayer, W. (1995). The existence of absolutely continuous local martingale measures. Ann. Applied Probability 5 926–945. [8] Dellacherie, C. and Meyer, P. A. (1978). Probabilities and Potential B. North-Holland Mathematics Studies 72. North-Holland, Amsterdam. [9] Eberlein, E. and Jacod, J. (1997). On the ranges of options prices. Finance Stochast. 1 131–140.

No Arbitrage and General Semimartingales

283

[10] Jacod, J. and Shiryaev, A. N. (2002). Limit Theorems for Stochastic Processes, second edition. Springer–Verlag, Heidelberg. [11] Jarrow, R. and Protter, P. (2005). Large traders, hidden arbitrage and complete markets. Journal of Banking and Finance 29 2803–2820. [12] Jarrow, R. and Protter, P. (2008). A partial introduction to financial asset pricing theory. In Handbook in OR & MS: Financial Engineering 15 (J. R. Birge and V. Linetsky, eds.) 1–59. Elsevier. [13] Kabanov, Yuri (1997). On the FTAP of Kreps–Delbaen–Schachermayer. In The Liptser Festschrift. Papers from the Steklov Seminar held in Moscow, 1995–1996 (Yu. M. Kabanov, B. L. Rozovskii and A. N. Shiryaev, eds.) 191– 203. World Scientific Publishing Co., Inc., River Edge, NJ. [14] Kallsen, J. and Shiryaev, A. N. (2002). The cumulant process and Esscher’s change of measure. Finance Stoch. 6 (4) 397–428. [15] Karatzas, I. and Kardaras, C. (2007). The num´eraire portfolio in semimartingale financial models. Preprint. [16] Kazamaki, N. (1977). On a problem of Girsanov. Tˆ ohoku Math. J. 29 (4) 597–600. [17] Kazamaki, N. (1978). Correction: On a problem of Girsanov (Tˆ ohoku Math. J. (2) 29 (4) (1977) 597–600). Tˆ ohoku Math. J. (2) 30 (1) 175. [18] Kazamaki, N. (1994). Continuous exponential martingales and BMO. Lecture Notes in Mathematics 1579. Springer-Verlag, Berlin. ´pingle, Dominique and Me ´min, Jean. (1978). Sur l’int´egrabilit´e uniforme [19] Le des martingales exponentielles. Z. Wahrsch. Verw. Gebiete 42 (3) 175–203. [20] Liptser, R. S. and Shiryaev, A. N. (2001). Statistics of random processes. I, expanded edition. Applications of Mathematics 5. Springer-Verlag, Berlin. General theory, translated from the 1974 Russian original by A. B. Aries, Stochastic Modeling and Applied Probability. ´min, J. (1978). D´ecompositions multiplicatives de semimartingales expo[21] Me nentielles et applications. S´eminaire de Probabilit´es, XII (Univ. Strasbourg, Strasbourg, 1976/1977). Lecture Notes in Math. 649 35–46. Springer, Berlin. [22] Novikov, A.A. (1980). On conditions for uniform integrability for continuous exponential martingales. In Stochastic differential systems (Proc. IFIP-WG 7/1 Working Conf., Vilnius, 1978). Lecture Notes in Control and Information Sci. 25 304–310. Springer, Berlin. [23] Protter, P. (2005). Stochastic Integration and Differential Equations, second edition, Version 2.1. Springer–Verlag, Heidelberg. [24] Revuz, D. and Yor, M. (1999). Continuous martingales and Brownian motion, third edition. Grundlehren der Mathematischen Wissenschaften 293. Springer-Verlag, Berlin. [25] Sato, H. (1990). Uniform integrability of an additive martingale and its exponential. Stochastics Stochastics Rep. 30 163–169. [26] Schweizer, M. (1994). Approximating random variables by stochastic integrals. Ann. Probability 22 1536–1575. [27] Schweizer, M. (1995). On the minimal martingale measure and the F¨ ollmer– Schweizer decomposition. Stochastic Analysis and Applications 13 573–599. [28] Strasser, Eva (2005). Characterization of arbitrage-free markets. Ann. Appl. Probability 15 116–124. [29] Stroock, D. W. (2003). Markov processes from K. Itˆ o’s perspective. Annals of Mathematics Studies 155. Princeton University Press, Princeton, NJ.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 285–300 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000435

Optimal Asset Allocation under Forward Exponential Performance Criteria Marek Musiela1,∗ and Thaleia Zariphopoulou2,∗,† BNP Paribas, London and the University of Texas at Austin Abstract: This work presents a novel concept in stochastic optimization, namely, the notion of forward performance. As an application, we analyze a portfolio management problem with exponential criteria. Under minimal model assumptions we explicitly construct the forward performance process and the associated optimal wealth and asset allocations. For various model parameters, we recover a range of investment policies that correspond to distinct financial applications.

1. Introduction Optimal asset allocation problems can be formulated as classical stochastic optimization problems. They typically consist of a time horizon, a controlled process (investor’s wealth) and an optimization criterion represented as the conditional expectation of a wealth functional, given a relevant filtration. Maximizing this expectation, over a given set of admissible policies, yields the so-called value function. To facilitate the exposition, we denote the state controlled process by X, the set of admissible controls by A and the relevant filtration by Ft , 0 ≤ t ≤ T . The (x,t) criterion to be optimized is of the form J = EP (U (XT )) with U being a concave and increasing function, often referred to as the investor’s bequest or utility. The value function V is, in turn, defined as (x,t)

V (x, t; T ) = sup EP

(1.1)

A

(U (XT )) .

At t = T , it coincides with the utility datum and for previous times, it satisfies— under weak model assumptions—the Dynamic Programming Principle. Namely,

(1.2)

V

(Xs∗ , s; T )



  ⎪ ⎨ EP V Xs∗ , s ; T Fs =

⎪ ⎩

U (XT∗ )

for t ≤ s ≤ s < T

for s = T,

∗ This work was presented at the Conference of Markov Processes and Related Topics, honoring the 65th Birthday of Tom Kurtz, (Madison, July 2006), at the 4th World Congress of the Bachelier Finance Society, (Tokyo, August 2006) and at the Conference in Honor of the 60th Birthday of Dilip Madan (College Park, September 2006). The authors would like to thank the participants for fruitful comments and, especially, P. Carr and R. Jarrow. They would also like to thank an anonymous referee for valuable suggestions. † The second author acknowledges partial support from the National Science Foundation (NSF grants DMS-0091946 and DMS-FRG-0456118). 1 BNP Paribas, London, UK, e-mail: [email protected] 2 Department of Mathematics, College of Natural Sciences and Department of Information, Risk and Operation Management, Red McCombs School of Business, University of Texas at Austin, USA, e-mail: [email protected] AMS 2000 subject classifications: Primary 91B16, 91B28 Keywords and phrases: forward stochastic optimization, performance process, feedback controls, exponential preferences, portfolio choice, utility theory, incomplete markets

285

286

Marek Musiela and Thaleia Zariphopoulou

where X ∗ stands for the optimized state process, with Xt∗ = x (see, for example, [1] and [14]). What the above tells us is that the value function is prespecified at the end of the horizon and, for earlier times, is generated backwards in time. It is a martingale at the optimum, and a supermartingale otherwise. In essence, all that is needed in order to specify the value function, is to find a martingale that coincides with the utility at maturity. 1 Assigning a datum at a future time is in accordance with classical control criteria, as for example, in applications in manufacturing, supply chain management, production planning and inventory control (see [14] for a concise collection of applications). In other settings, however, it might not be a very realistic modeling assumption. This was, for example, observed by the authors in utility models used in the so-called indifference valuation of claims in incomplete markets. Therein, the following issues were observed. Firstly, fixing the trading horizon makes the valuation of claims of arbitrary maturities impossible. One might try to remedy this by allowing for infinite horizon and incorporating either a running discounted payoff in [t, ∞] or asymptotic growth criteria. However, infinite horizon problems, albeit more tractable than the time dependent ones, are, often, not suitable for modeling realistic situations generated, for example, by sudden changes of the investment opportunity set, defaults, etc.. Secondly, the fact that, from one hand, the utility is exogenously chosen far ahead in the future, and on the other, it is used to make investment decisions for today, does not appear very natural. Besides, the optimal expected utility is generated backwards in time while the market moves in the opposite direction (forward), an apparently not very intuitive situation. Motivated by the above considerations, the authors proposed an alternative approach to stochastic optimization, introduced in [4] (see, also [5] and [8]). Firstly, the horizon dependence was relaxed by removing the assumption of preassigned future data. The only standing requirement for the solution is that it is an adapted process and a supermartingale for arbitrary controls, and becomes a martingale at an optimum. Such a process is called a dynamic performance. For its full specification, a datum needs to be introduced. In contrast to the traditional (backwards) framework, the authors proposed to have a condition assigned at initial time. The solution is then called a forward performance. We refer the reader to [4] for a detailed exposition of the new approach and its applicability to valuation and hedging in the presence of unhedgeable risks. The martingality property stems from the natural requirement that, if the system is currently at an optimal state, one needs to seek for controls so that the same level of average performance is preserved at all future times. On the other hand, supermartingality is associated with declining upcoming average performance and, thus, suboptimal system behavior. We comment that the latter requirement is not crucial for the construction of the (optimal) martingale process. For the applications we are interested in, it is, however, a natural consequence of the inherent concavity properties the solution process has. Herein, we extend the results obtained in [4], for incomplete binomial models, to the case in which asset prices are modeled as Ito processes. The model is rather general and allows for market incompleteness as well as for investment in many assets. No Markovian assumptions are introduced. The solution approach is based 1 The above conditions are easily modified when a running criterion is also incorporated. For simplicity and in order to expose the new concepts, we choose not to consider this case.

Forward Exponential Performance and Portfolio Choice

287

entirely on stochastic calculus and yields explicit expressions for the forward performance process and the optimal policies. The forward performance is constructed by combining differential and stochastic input, namely, a deterministic function of wealth and time, and two auxiliary processes (see, respectively, (4.2), and (4.3), (4.4)). The first auxiliary process may be interpreted as a benchmark. The other is associated with a change of measure and may be used to represent the investor’s views for the market’s state away from equilibrium, or to model trading constraints. We work with exponential criteria (see (4.1)). We choose to do so for two reasons. Firstly, exponential preferences are most frequently used for pricing in incomplete markets, currently a very active area of research and applications. Their popularity is coming from the explicit solutions they generate as well as their direct connection to entropic dynamic risk measures. Secondly, the aim herein is to expose the advantages of working with forward exponential criteria instead of the backward ones. We will see that the proposed model is not only general and tractable, but it also yields a rich class of policies that capture distinct realistic situations. Indeed, we show that judicious choices of the coefficients of the market input processes generate a range of interesting strategies, including, among others, two extreme situations, namely, strategies that allocate zero or the entire wealth in the riskless asset. Our findings suggest that, if put in the right modeling perspective, exponential criteria do not produce naive, wealth-independent strategies, as it is the case in the traditional framework. Rather, they generate policies that seem suitable for a variety of applications in portfolio choice, and derivative pricing and hedging. The paper is organized as follows. In section 2 we introduce the model and the notion of forward performance. In section 3 we present two motivational examples. In section 4 we analyze the general exponential case. We construct the solution and the optimal strategies and wealth. In section 5, we analyze the optimal investments, wealth and performance for various choices of market parameters and coefficients of the auxiliary processes. We conclude in section 6. Acknowledgement: This work is dedicated to Tom Kurtz on the occasion of his 65th birthday. The second author would like to express her gratitude for all the support, guidance and advice she received from him throughout the years. 2. The model and its forward performance The market environment consists of one riskless and k risky securities. The risky securities are stocks and their prices are modeled as Ito processes. Namely, for i = 1, ..., k, the price S i of the ith risky asset solves ⎛ ⎞ d  σtji dWtj ⎠ (2.1) dSti = Sti ⎝μit dt + j=1

  with S0i > 0. The process W = W 1 , ..., W d is a standard d−dimensional Brownian motion, defined on a filtered probability space (Ω, F, P) . For simplicity, it is assumed that the underlying filtration, Ft , coincides with the one generated by the Brownian motion, that is Ft = σ (Ws : 0 ≤ s ≤ t) . The coefficients μi and σ i , i = 1, ..., k, follow bounded Ft -adapted processes with values in R and Rd , respectively. For brevity, we write σ = σt to denote the

288

Marek Musiela and Thaleia Zariphopoulou

 volatility matrix, i.e. the d×k stochastic matrix σtji , whose ith column represents the volatility σti of the ith risky asset. We may, then, alternatively write (2.1) as   dSti = Sti μit dt + σti · dWt . The riskless asset, the savings account, has the price process B satisfying dBt = rt Bt dt with B0 = 1, and for a bounded, nonnegative, Ft -adapted interest rate process r. A fundamental assumption in the financial applications that motivated this study is the so-called absence of arbitrage. Consequently, it is postulated that there exists an Ft -adapted process λ, taking values in Rd , such that the equality μit

− rt =

d 

σtji λjt = σti · λt

j=1

is satisfied for t ≥ 0, for all i = 1, ..., k. Using vector and matrix notation, the above becomes μt − rt 1 =σtT λt ,

(2.2)

where σ T stands for the matrix transpose of σ, and 1 denotes the d−dimensional vector with every component equal to one. The process λ is often referred to as a market price of risk. Note that, in general, it is not uniquely determined. Starting at t = 0 with an initial endowment x ∈ R, at future times the investor invests the amounts πt0 and πti , i = 1, ..., k, respectively, in the riskless and the ith risky asset. The present value of his/her investment is then given by k Xt =

i=0

Bt

πti

.

We will refer to X as the discounted wealth process. The investment strategies will play the role of control processes and are taken to satisfy the standard assumption of being self-financing, i.e. for s ≥ t , Xs = x +

k   i=1

0

s

k  s i   πui  i πu i σu · dWu . μ − ru du + Bu u B u i=1 0

Writing the above in differential form, yields the evolution of the discounted wealth, (2.3)

dXt =

k  πti i σ · (λt dt + dWt ) = βt · (λt dt + dWt ) . Bt t i=1

Herein, (2.4)

βt =

k  πti i σ Bt t i=1

or, equivalently, (2.5)

Bt−1 σt πt = βt ,

Forward Exponential Performance and Portfolio Choice

289

  where the (column) vector, πt = πti ; i = 1, ..., k . The set of admissible strategies, A, consists of all self-financing Ft -adapted processes, π, for which 2  t  k πsi i σs ds < ∞, EP 0 i=1 Bs

t > 0.

Whenever needed, we will be using the notation X π to denote the solution of (2.3) when the control π is used. We next introduce the notion of dynamic performance. Definition 2.1. An Ft -adapted process Ut (x) is a dynamic performance process if: i) the mapping x → Ut (x) is increasing and concave, for each t ≥ 0, ii) for each self-financing strategy, π, and s ≥ t, (2.6)

EP (Us (Xsπ ) |Ft ) ≤ Ut (Xtπ )

and iii) there exists a self-financing strategy, π ∗ , for which

∗ 

∗ (2.7) EP Us Xsπ |Ft = Ut Xtπ , s ≥ t. Remark: We, easily, see that the traditional value function V , (cf. (1.1)), is a dynamic performance. Indeed, if we define, ⎧ for 0 ≤ t ≤ T ⎨ V (x, t; T ) Ut (x) = ⎩ V (x, T ; T ) for t ≥ T, then Ut (x) satisfies the criteria in the above definition. Notice, however, the stringent requirement that the process Ut does not change for t ≥ T. Herein, we focus our attention on dynamic performance processes that are specified at initial time, to be henceforth called forward performance processes. We give their formal definition below. Definition 2.2. An Ft -adapted process Ut (x) is a forward performance process if it satisfies the assumptions of Definition 2.1 together with the initial condition U0 (x) = u0 (x)

(2.8)

where u0 is a concave and increasing function of wealth. We note that the forward performance process might not be unique. While lack of uniqueness is not important for the applications in mind, characterizing the class of all solutions is, in our opinion, an interesting and challenging question. We conclude this section mentioning that forward formulations of optimal control problems have been proposed and analyzed in the past. For deterministic models we refer the reader, among others, to [3], [12] and [13]. In stochastic settings, forward optimality has been studied, primarily under Markovian assumptions, in [2] via the associated martingale problems and construction of the Nisio semigroup (see, also [9]). The object of study is (2.9)

Vt (x) = sup E (x,t) (U0 (Xt )) , A

t ≥ 0,

290

Marek Musiela and Thaleia Zariphopoulou

with X0 (x) = x and U0 a given initial input. A rich theory has been developed which addresses a variety of questions related, among others, to the validity of the Dynamic Programming Principle and construction of the solution and optimal policies across all times. While the forward performance process introduced herein plays a different role than Vt , exploring how this theory can contribute to the study of forward solutions as well as to addressing some of the shortcomings of the existing terminal horizon (backward) problems is certainly worth pursing. 2 3. Two examples In order to provide intuition for the upcoming construction of the exponential forward performance process we present two representative examples. To facilitate the exposition, we assume that the market consists of a single stock and a bond and that the interest rate is zero. In the first example, we consider a binomial model while in the second we model the stock as in (2.1). To highlight the generality of the construction method, we take the binomial model to be incomplete. In both cases, the initial data is given by u0 (x) = −e−x , x ∈ R. The solution of the binomial example, see (3.3 ), suggests that the forward process can be constructed using a deterministic function of wealth and time, with the latter argument replaced by an appropriately chosen process. While the form of the deterministic input is, to some extent, not too surprising - due to the specific assumptions on the initial data - changing time is by no means standard. Notice that this is performed via a positive and non-decreasing process (cf. (3.2)) which depends on market movements but not on the investor’s preferences. In the second example, we use these insights and produce a similar representation of the solution.

Example 1: We consider a single stock whose levels are denoted by St > 0, d u t = 0, 1, ... and define the variables ξt+1 as ξt+1 = SSt+1 , ξt+1 = ξt+1 , ξt+1 with t d u 0 < ξt+1 < 1 < ξt+1 . A non-traded factor might be present whose values are denoted by Yt , (Yt = 0) , t = 0, 1, .... We, then, view {(St , Yt ) : t = 0, 1, ...} as a two-dimensional stochastic process defined on the probability space (Ω, F, (Ft ) , P) with P being the historical measure. The filtration Ft is generated by the random variables Si and Yi , for i = 0, 1, ..., t. We denote by Xt , t = 0, 1, ..., the investor’s wealth process associated with a multi-period self-financing portfolio. We take αt , t = 0, 1, ..., to be the number of shares of the traded asset held in this portfolio over the interval [t − 1, t). Then, , we have, for s = t+1, t+2, ..., the denoting by 'St the increment 'St = St −St−1  s binomial analogue of (2.3), namely, Xs = Xt + i=t+1 αi ' Si with Xt = x ∈ R. Proposition 3.1. Consider, for i = 1, .., the sets Bi = {ω : ξi (ω) = ξiu } and the associated nested risk neutral probabilities qi =

1−ξid . ξiu −ξid

Let

u (x, t) = −e−x+t

(3.1) and introduce the process (3.2)

At =

t 

hi

i=1 2 The

authors thank an anonymous referee for bringing these results to their attention.

Forward Exponential Performance and Portfolio Choice

291

with A0 = 0, where hi = qi log

qi 1 − qi + (1 − qi ) log . P ( Bi | Fi−1 ) 1 − P ( Bi | Fi−1 )

Then (3.3)

Ut (x) = u (x, At )

t = 0, 1, ....

is a forward performance process.

  Sketch of the proof: Using that supαs EP −e−αs Ss +hs Fs−1 = −1 (see, for example, [4]), we observe that for s = t + 1, t + 2, ...,     s−1  sup EP ( Us (Xs )| Ft ) = sup EP − exp −Xs−1 + hi Ft αt+1 ,...,αs αt+1 ,...,αs−1 i=1

and proceeding inductively we conclude. Example 2: We consider a single stock whose price solves (cf. (2.1)) dSt = St σt (λt dt + dWt )

(3.4)

with S0 = S > 0. The wealth process X satisfies (cf. (2.3)) (3.5)

dXt = σt πt (λt dt + dWt )

with X0 = x. We look for a forward solution in the form Ut (x) = u (x, At ) for some smooth concave and increasing function u (x, t) , with u (x, 0) = u0 (x). For reasons t that will be apparent in the sequel, we choose At = 0 λ2s ds. For an arbitrary control π, we, then, have dUt (Xt ) = ux (Xt , At ) σt πt dWt 

1 2 2 2 + ut (Xt , At ) λt + ux (Xt , At ) σt πt λt + uxx (Xt , At ) σt πt dt 2 = ux (Xt , At ) σt πt dWt 

1 +λ2t ut (Xt , At ) + ux (Xt , At ) αt + uxx (Xt , At ) αt2 dt 2 with α = σπλ−1 . We readily see that, due to the concavity assumption on u, it suffices to have that the above drift remains non positive. Because of its quadratic form, the appropriate drift sign is guaranteed if ut (x, t) uxx (x, t) ≥ 12 u2x (x, t) , (x, t) ∈ R × (0, +∞) . Let us now look for a concave and increasing function solving ut =

1 u2x 2 uxx

and

u (x, 0) = −e−x ,

t ≥ 0, x ∈ R.

Notice that a solution to the above is given by (3.6)

u (x, t) = −e−x+ 2 . t

Next, consider the control policy (3.7)

πt∗ = −σt−1 λt

ux (Xt∗ , At ) , uxx (Xt∗ , At )

292

Marek Musiela and Thaleia Zariphopoulou

with X ∗ being the associated to π ∗ wealth process. Assuming that the appropriate regularity conditions that guarantee solution to (3.5), if πt∗ is used, hold, we easily deduce that the above drift term vanishes, yielding dUt (Xt∗ ) = ux (Xt∗ , At ) σt πt∗ dWt . Using Definition 2.2 we conclude. We summarize these findings below. Proposition 3.2. Let the process λ be as in (2.2) and define  t λ2s ds, t ≥ 0. (3.8) At = 0

Let, also, u : (x, t) ∈ R × (0, +∞) be given in (3.6). Then, the process (3.9)

Ut (x) = u (x, At )

is a forward performance. Observe that the associated optimal policy (3.7) is not only explicitly given but, also, constructed in a feedback form via the stochastic functional Π∗t (x) = x (x,t) −σt−1 λt r (x, At ) with r (x, t) = − uuxx (x,t) . This feedback format comes as a surprise given the non Markovian nature of the model. 4. Forward exponential performance and log-affine solutions In this section, we construct a class of forward performance processes under the assumption that the initial datum is of the exponential form,

 x , (4.1) U0 (x) = − exp − y for x ∈ R and y > 0. We recall that in the traditional exponential case, the coefficient y is a given positive constant, expressed in wealth units. It is the reciprocal of risk aversion and is often called the investor’s risk tolerance. In the forward framework we propose herein, y will not be a constant. Rather, it will parametrize, as its initial condition, an auxiliary state process (see (4.3) below). Following the insights gained by the two examples presented in the previous section, we seek a solution process constructed by combining a deterministic and a stochastic input. The first is given by the function u : R × R+ × R → R− ,

(4.2)

 x u (x, y, z) = − exp − + z y

and is called differential performance input. It depends on individual characteristics, i.e. on the investor’s wealth and initial risk preferences. The stochastic input consists of a pair of Ito processes, (Y, Z), solving, respectively, ⎧ ⎨ dYt = Yt δt · (κt dt + dWt ) (4.3) ⎩ Y0 = y > 0

Forward Exponential Performance and Portfolio Choice

and (4.4)

293

⎧ ⎨ dZt = ηt dt + ξt · dWt ⎩

Z0 = 0.

Their coefficients satisfy the assumptions given in Condition 4.2 below. In the analysis that follows, we will be using the Moore-Penrose pseudo inverse matrix, denoted by σ + , of the volatility matrix σ . This concept was developed, independently, by Moore in 1920 and by Penrose in 1955 (see [10]). The matrix σ + always exists even if σ fails to be invertible. This is, often, the case in incomplete markets and, thus, this (pseudo) invertibility notion seems to be very suitable for the applications we want to study. Definition 4.1. Let σ be an d × k matrix. Its Moore-Penrose pseudo inverse σ + is the unique k × d matrix satisfying σσ + σ = σ

σ + σσ + = σ +

(4.5) T

T

(σσ + ) = σσ + (σ + σ) = σ + σ. Condition 4.2. The processes δ, κ, η, ξ are taken to be bounded and Ft -adapted. It is, also, assumed that (4.6)

σσ + δ = δ,

and (4.7)

δ · (κ − λ) = 0.

Moreover, the drift η of the process Z satisfies 2 2 (4.8) 2η = δ − σσ + (λ + ξ) − |ξ| .

We are now ready to present one of the main results. Theorem 4.3. Let U0 be given by (4.1), and the processes Y and Z solving (4.3) and (4.4), respectively, with the coefficients δ, κ, η, ξ satisfying Condition 4.2. Then, for x ∈ R and t ≥ 0, the process

 x (4.9) Ut (x) = − exp − + Zt Yt is a forward exponential performance. Proof. We first observe that (4.1) is automatically satisfied by the choice of the initial conditions of Y and Z. The fact that Ut (x) is Ft -adapted is, also, immediate. We continue with the derivation of the semimartingale representation for the process Ut (Xt ) where Xt satisfies (2.3), for a fixed π. We set, for x = (x, y, z) , F (x) = u (x, y, z) with u as in (4.2). Setting Xt = (Xt , Yt , Zt ) , we have dF (Xt ) = DF (Xt ) · dXt +

1 2 D F (Xt ) · d Xt , 2

294

Marek Musiela and Thaleia Zariphopoulou

where ” · ” stands for the inner product in the appropriate space. Direct calculations yield ⎛ −1 ⎞ −y DF (x) = F (x) ⎝ xy −2 ⎠ 1 and ⎛

⎞ y −2 y −2 − xy −3 −y −1 D2 F (x) = F (x) ⎝ y −2 − xy −3 x2 y −4 − 2xy −3 xy −2 ⎠ . −y −1 xy −2 1 Moreover, the joint quadratic variation X satisfies ⎛ ⎞ 2 |βt | Yt δt · βt ξt · βt d Xt = ⎝ Yt δt · βt Yt2 |δt |2 Yt δt · ξt ⎠ . dt 2 ξt · βt Yt δt · ξt |ξt | Therefore, for Ut (Xt ) = F (Xt ) , we can write (to ease the presentation, we omit for the moment the time indices)   dU (X) = U (X) −Y −1 dX + XY −2 dY + dZ

  1 2 + U (X) Y −2 |β| + Y −2 − XY −3 Y δ · β − Y −1 ξ · β 2     2 + Y −2 − XY −3 Y δ · β + X 2 Y −4 − 2XY −3 Y 2 |δ| + XY −2 Y δ · ξ  2 −Y −1 ξ · β + XY −2 Y δ · ξ + |ξ| dt. Using the dynamics of X, Y and Z, and the definition of β (cf. (2.5)), we deduce   dU (X) = U (X) −Y −1 β · dW + XY −1 δ · dW + ξ · dW

1 2 + U (X) − 2Y −1 β · λ + 2XY −1 δ · κ + 2η + Y −2 |β| 2   +2 Y −1 − XY −2 δ · β − 2Y −1 ξ · β + 2XY −1 δ · ξ    2 2 + X 2 Y −2 − 2XY −1 |δ| + |ξ| dt   = U (X) −Y −1 β · dW + XY −1 δ · dW + ξ · dW

    2 1 + U (X) Y −1 β − (λ + ξ) + XY −1 − 1 δ 2  +2XY −1 δ · (κ − λ) + 2η + |ξ| − |δ − (λ + ξ)| 2

2

dt

and, in turn,

  (4.10) dU (X) = U (X) −Y −1 B −1 σπ + XY −1 δ + ξ · dW

  2   1 + U (X) Y −1 B −1 σπ − σσ + (λ + ξ) + XY −1 − 1 δ 2  2  +2XY −1 δ · (κ − λ) + I − σσ + (λ + ξ)  2 2 +2η + |ξ| − |δ − (λ + ξ)| dt. Next, we observe that Condition 4.2, together with the orthogonality of the vectors (I − σσ + ) · (δ − (λ + ξ)) and δ − σσ + (λ + ξ) , yield  2  2 2 2XY −1 δ · (κ − λ) + I − σσ + (λ + ξ) + 2η + |ξ| − |δ − (λ + ξ)| = 0.

Forward Exponential Performance and Portfolio Choice

295

Therefore, (4.10) simplifies to   (4.11) dU (X) = U (X) −Y −1 B −1 σπ + XY −1 δ + ξ · dW   2   1 + U (X) Y −1 B −1 σπ − σσ + λ + XY −1 − 1 δ + ξ dt. 2 We next choose, the feedback portfolio control process     (4.12) π ∗ = Y Bσ + λ + X ∗ Y −1 − 1 δ + ξ . Clearly, (2.3) has a unique solution, denoted by X ∗ , solving (4.13)

dX ∗ = B −1 σπ ∗ · (λdt + dW )     = Y σσ + (λ + ξ) − δ + X ∗ δ · (λdt + dW ) .

Consider now the process Ut (Xt∗ ) = u (Xt∗ , Yt , Zt ) and recall that Ut (Xt ) = u (Xt , Yt , Zt ) , with X solving (2.3) for a generic policy π. To complete the proof , it suffices to establish that they are, respectively, martingale and supermartingale with respect to Ft and under P. The latter assertion follows directly from (4.11) and the negativity of U. For the former one, we see from (4.11) that Ut (Xt∗ ) satisfies   (4.14) dU (X ∗ ) = U (X ∗ ) −Y −1 B −1 σπ ∗ + X ∗ Y −1 δ + ξ · dW and from (4.12),

    = U (X ∗ ) σσ + (δ − λ) + I − σσ + ξ · dW.

The martingality property then follows from the assumptions on the coefficients and the choice of U. Remark 1: Note that under the assumption δ · (κ − λ) = 0, the dynamics of the auxiliary process Y can, also, be written as (4.15)

dYt = Yt δt · (λt dt + dWt )

with Y0 = y > 0. Consequently, without loss of generality, in choosing the process Y, we assume from now on that κ = λ. We have



 t   t 1 2 (4.16) Yt = y exp δs · λs − |δs | ds + δs · dWs . 2 0 0 Remark 2: Under the choice of drift (4.8), the dynamics of the second auxiliary process Z become  2 1 2 δt − σt σt+ (λt + ξt ) − |ξt | dt + ξt · dWt (4.17) dZt = 2 with Z0 = 0. Thus,   t  2 1 t 2 + (4.18) Zt = ξs · dWs . δs − σs σs (λs + ξs ) − |ξs | ds + 2 0 0 Next, we construct the optimal wealth process. For completeness, we restate some of the above findings. The proof of (4.21) follows directly from (4.13), (4.19), and Theorem 53 in [11].

296

Marek Musiela and Thaleia Zariphopoulou

Theorem 4.4. Let Y and Z satisfy (4.16) and (4.18). For t ≥ 0, the associated optimal allocation process (cf. (4.12)) is given by πt∗ = Bt Yt σt+ (λt + ξt − δt ) + Bt Xt∗ σt+ δt ,

(4.19)

where X ∗ is the unique solution to the wealth equation (2.3), with π ∗ being used. ∗ Thus, the optimal discounted wealth process X solves, for t ≥ 0, (4.20)

dXt∗ = Bt−1 σt πt∗ · (λt dt + dWt )     = Yt σt σt+ (λt + ξt ) − δt + Xt∗ δt · (λt dt + dWt ) .

It is, in turn, given by 

 t   ∗ (4.21) Xt = Et x + Es−1 Ys σs σs+ (λs + ξs ) − δs · ((λs − δs ) ds + dWs ) 0

where



 t   t 1 2 δs · dWs . δs · λs − |δs | ds + Et = exp 2 0 0

(4.22)

Corollary 4.5. The optimal π ∗ defined in (4.19) is an affine function of the initial wealth x, namely, for t ≥ 0, (4.23)

πt∗ = xEt Bt σt+ δt + Bt Yt σt+ (λt + ξt − δt )

 t   +Bt Et Es−1 Ys σs σs+ (λs + ξs ) − δs  · ((λs − δs ) ds + dWs ) σt+ δt .

0

The next result yields the optimal level of the investment system’s performance. It follows directly from (4.9) and (4.14). Proposition 4.6. At the optimum, the forward exponential performance process is given by Ut (Xt∗ ) = u (Xt∗ , Yt , Zt ) with u as in (4.2) and X ∗ , Y and Z as in (4.21), (4.3) and (4.4). It has the semimartingale representation     dUt (Xt∗ ) = Ut (Xt∗ ) σt σt+ (δt − λt ) + I − σt σt+ ξt · dWt and, hence, it is given by the martingale

x  t1   ∗ σs σs+ (δs − λs ) + I − σs σs+ ξs 2 ds (4.24) Ut (Xt ) = − exp − − y 0 2  t      + σs σs+ (δs − λs ) + I − σs σs+ ξs · dWs . 0

5. Examples We construct the forward performance process for various choices of model coefficients. We also compute and analyze the associate optimal wealth and asset

Forward Exponential Performance and Portfolio Choice

297

allocation, as well as the optimal performance level. For convenience, we assume that the initial datum is assigned at t= 0. We recall that π ∗ = π 1,∗ , ..., π k,∗ is the vector of the optimal allocations in the k risky assets. It is given by (4.19) while the optimal discounted wealth, X ∗ , is ∗ given in (4.21). Recall that the amount π 0,∗ = X ∗ − 1 · πB is the optimal allocation in the riskless asset, the discounted bond. Case 1: δ = ξ = 0. Then, Yt = y, for t ≥ 0. The forward performance process takes the form 

 t 2 1 x + σs σs λs ds . Ut (x) = − exp − + y 0 2 Note that even in this simple case, the solution is equal to the classical exponential utility only at t = 0. The optimal discounted wealth and optimal asset allocation are given, respectively, by  t   y σs σs+ λs · (λs ds + dWs ) Xt∗ = x + 0

and

πt∗ = yBt σt+ λt .

At the optimum, 

 t  t 2 1 x σs σs+ λs ds − σs σs+ λs · dWs . Ut (Xt∗ ) = − exp − − y 0 2 0 Observe that π ∗ is independent of the initial wealth x. Consequently, the total amount allocated in the risky assets is given by 1·

πt∗ = 1 · yσt+ λt Bt

and, thus, the amount invested in the riskless asset is πt0,∗ = Xt∗ − 1 · yσt+ λt . Clearly, such an allocation is rather conservative and is often viewed as an argument against the classical exponential utility. However, as examples below demonstrate, the class of forward exponential performances is rich enough to present an interesting range of allocations. Case 2: σσ + (δ − λ) + (I − σσ + ) ξ = 0. We observe that this condition yields σ + (δ − λ) = 0 and σσ + ξ = ξ. It is, then, t easy to see that Zt = 0 ξs · dWs and, in turn, 

 t x ξs · dWs Ut (x) = − exp − + Yt 0 with Y as in (4.16). The optimal discounted wealth is given by

  t ∗ −1 Es Ys ξs · ((λs − δs ) ds + dWs ) Xt = Et x + 0

298

Marek Musiela and Thaleia Zariphopoulou

with E as in (4.22). Respectively, πt∗ = xEt Bt σt+ δt + Yt Bt σt+ ξt + Bt Et



t

Es−1 Ys ξs · dWs σt+ δt .

0

At the optimum,

 x , Ut (Xt∗ ) = U0 (x) = − exp − y

namely, the optimal level of forward performance remains constant across times. Case 3: δ = 0 and λ + ξ = 0. t In this case, Et = 1, Yt = y > 0 and Zt = − 0 



x 1 Ut (x) = − exp − − y 2

1 2

2

|λs | ds − 

t

t

2

|λs | ds − 0

t 0

λs · dWs . Then,

λs · dWs . 0

The optimal discounted wealth remains constant, Xt∗ = x. In turn, the optimal allocations are πt∗ = 0

(5.1) Moreover,

πt0,∗ = Xt∗ = x.

and

Ut (Xt∗ ) = Ut (x) .

It is important to notice that, for all trading times, the optimal allocation consists of putting zero into the risky assets and, therefore, investing the entire wealth into the riskless asset. Such a solution seems to capture quite accurately the strategy of a derivatives trader for whom the underlying objective is to hedge as opposed to the asset manager whose objective is to invest. Case 4: δ = λ + ξ with λ + ξ = 0. Observe that this condition implies that δ = σσ + (λ + ξ) and, in turn, that t t 2 Zt = − 0 12 |ξs | ds + 0 ξs · dWs . Therefore, 

x Ut (x) = − exp − − Yt We easily see that

 0

t

1 2 |ξs | ds + 2



t

ξs · dWs .

0

Xt∗ = xEt . ∗

Note that, the returns of the processes X and Y are the same, i.e., and, thus, Xt∗ = xy Yt . The optimal asset allocation is given by ∗

πt∗ = Bt Xt σt+ δt and the optimal performance level by 

 t  t 1 x 2 ∗ Ut (Xt ) = − exp − − |ξs | ds + ξs · dWs . Yt 0 2 0



dXt ∗ Xt

=

dYt Yt

Forward Exponential Performance and Portfolio Choice

299

Observe that, contrary to what we have observed in traditional backward exponential utility problems, the optimal portfolio is a linear functional of the wealth and not independent of it. Let us, next, assume that 1 · σt+ (λt + ξt ) = 1. We, then, have 1·

(5.2)

πt∗ = Xt∗ Bt

and

πt0,∗ = 0.

Hence, the optimal allocation π ∗ puts zero amount in the riskless asset and invests all wealth in the risky assets, according to the weights specified by the vector σ + (λ + ξ) . Note, also, that for an arbitrary vector νt with 1 · σt+ νt = 0, the vector 1 − 1 · σt+ λt νt 1 · σt+ νt

 1−1·σ + λ satisfies the above constraint since 1 · σt+ λt + 1·σ+tν t νt = 1. It is, then, natural t t to ask whether we can generate optimal portfolios that allocate arbitrary, but constant, fractions of wealth to the different accounts. The answer is affirmative. Indeed, for p ∈ R, we set, 1 · σt+ (λt + ξt ) = p ξt =

a.e. and t ∈ [0, T ]. Then, the total investment in the risky assets and the allocation in the riskless bond are given, respectively, by 1·

πt∗ = pXt∗ Bt

and

πt0 = (1 − p) Xt∗ . Bt

6. Conclusions and extensions We introduced a new concept in stochastic optimization, namely, the one of forward performance. A forward performance is an adapted process that is a martingale at an optimum and a supermartingale otherwise. In addition, in contrast to the traditional control approach, it is prespecified today and not at the end of the horizon. This removes the horizon dependence and enables us to define the optimal solution for all future times. As an application, we study a portfolio choice problem in incomplete markets. The model is general and the forward performance is obtained under minimal assumptions on the underlying dynamics. It is constructed by combining appropriately chosen deterministic and stochastic market inputs. The deterministic input depends on the investor’s preferences while the stochastic input incorporates exclusively information from the market changes. An interesting class of policies was discovered yielding, among others, two extreme situations. In one of them, the investor allocates zero wealth in the risky assets while in the other the situation is totally reversed. Working with forward performance criteria, instead of the classical (backward) ones, seems to give more intuitive and tractable solutions for both the performance process and the optimal policies. It is worth observing that the classical solutions can be thought as special, but rather limited, cases of forward solutions. Interesting questions arise. They are related, among others, to necessary and sufficient conditions for the solution to remain log-affine (cf. (4.9)). A more challenging question is to solve the problem for arbitrary initial data. While the power

300

Marek Musiela and Thaleia Zariphopoulou

and logarithmic case appear sufficiently tractable, the general case, currently under study (see [7]) poses several difficulties, related among others to existence of solutions to inverse problems of fast diffusion type. In a different direction, one could try to price claims using forward performance criteria. This has been done by the authors for incomplete binomial models and for diffusion models with stochastic volatility (see, respectively, [4] and [8], and [6]). The emerging forward indifference prices do not coincide with their traditional counterparts and have more intuitive structural representation properties. References [1] Fleming, W. H. and Soner, H. M. (2006). Controlled Markov Processes and Viscosity Solutions, second edition. Springer-Verlag. [2] Kurtz, T. (1984). Martingale problems for controlled processes. In Stochastic Modeling and Filtering (M. Thoma and A. Wyner, eds.) 75–90. Lecture Notes in Control and Information Sciences. Springer-Verlag. [3] Larson, R. E. (1968). State increment dynamic programming. In Modern Analytic and Computational Methods in Science and Mathematics (R. Bellman, ed.) Elsevier. [4] Musiela, M. and Zariphopoulou, T. (2003). Backward and forward utilities and the associated pricing systems: The case study of the binomial model. Preprint. [5] Musiela, M. and Zariphopoulou, T. (2008). Derivative pricing, investment management and the term structure of exponential utilities: The single period binomial model. In Indifference Pricing (R. Carmona, ed.) Princeton University Press, in press. [6] Musiela, M. and Zariphopoulou, T. (2006). Investment and valuation under backward and forward exponential utilities in a stochastic factor model. Dilip Madan’s Festschrift, in press. [7] Musiela, M. and Zariphopoulou, T. (2006). Investments and forward utilities. Preprint. [8] Musiela, M., Sokolova, E. and Zariphopoulou, T. (2007). Indifference pricing under forward valuation criteria: The case study of the binomial model. Preprint. [9] Nisio, M. (1981). Lectures on Stochastic Control Theory. ISI Lecture Notes 9. Macmillan. [10] Penrose, R. (1955). A generalized inverse for matrices. Proceedings of Cambridge Philosophical Society 51 406–413. [11] Protter, P. (1990). Stochastic Integration and Differential Equations. Springer-Verlag. [12] Seinfeld, J. H. and Lapidus, L. (1968). Aspects of the forward dynamic programming algorithm. I&EC Process Design and Development 475–478. [13] Vit, K. (1977). Forward differential dynamic programming. Journal of Optimization Theory and Applications 21 (4) 487–504. [14] Yong, J. and Zhou, X.-Y. (1999). Stochastic Controls, Hamiltonian Systems and HJB Equations. Springer-Verlag.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 301–314 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000444

Estimates of Dynamic VaR and Mean Loss Associated to Diffusion Processes Laurent Denis,1,∗ Begoña Fernández,2,∗ and Ana Meda2,∗ Université d’Evry-Val-d’Essonne, and Universidad Nacional Autónoma de México. Abstract: Let Xt be a stochastic process driven by a differential equation of  = sup the form dXt = σ(t, Xt )dWt + b(t, Xt )dt, t > 0, and let Xs,t s≤u≤t Xu , be the maximum of the diffusion. In this work we obtain bounds for the tail ∗ , define several dynamic VaR type quantiles for this prodistribution of Xs,t cess and give upper and lower bounds for both, the VaR quantile and the conditioned mean loss associated to it. The results we obtain are based in the change of time property of the Brownian Motion, and can be applied to a a large class of examples used in Finance, in particular where σ(t, Xt ) = σt Xtγ , where 0 ≤ γ < 1. The estimates we obtain are sharp. We discuss carefully the Geometric Brownian Motion, the Cox-Ingersoll-Ross and the Vasicek type models, and give an application to Russian options.

1. Introduction For Risk Theory it is of interest to estimate high quantiles (Value at Risk: VaR) and mean loss given that an extreme event has occurred. One approach to deal with this is to use Extreme Value Theory by fitting Fréchet, Gumbel or Weibul distributions to approximate VaR and the Generalized Pareto distribution to fit the conditional loss distribution (see, for example, an excellent account in the book [EKM]; in references therein and subsequent work by the authors). An enormous amount of work in this direction has been done also for time series (see, for example [McNeil]). The behavior of extremes for diffusion processes has been studied by Davis (1982, [Dav]) who found a distribution Ft which is the asymptotic limit for the distribution of the maxima of the process as t tends to infinity. On the other hand, Borkovec and Klüppelberg (1998,[BoKl]) described the tail behavior of the limit Ft -using again Extreme Value Theory- in terms of the coefficients of the equation and proved that the number of −upcrossings of certain level converges to a homogeneous Poisson Process as t tends to infinity and mentioned that the applications of these results to study risk measures of financial products was work in progress. Another point of view is that of Talay and Zheng (2002, [TaZh]), who combined Monte-Carlo Methods with the Euler discretization to calculate VaR for diffusion processes that have densities (uniformly elliptic, or a more general setting as in Bally and Talay [BaTa1] and [BaTa2]). They applied their results to find VaR for portfolios. ∗ This work was partially supported by Grants CONACYT 37922E, and PAPIIT-DGAPAUNAM, IN103606, México. 1 Département de Mathématiques Equipe “Analyse et probabilités” Université d’Evry-Vald’Essonne, Bâtiment Maupertuis rue du pére Jarlan, 91025 Evry Cedex France, e-mail: [email protected] 2 Departamento de Matemáticas, Facultad de Ciencias, UNAM. Circuito Exterior s/n Ciudad Universitaria, Coyoacán, 04510, México DF, Mexico, e-mail: [email protected]; [email protected] Keywords and phrases: dynamic VaR, mean loss, diffusion process

301

302

Laurent Denis, Begoña Fernández, and Ana Meda

In this work we consider continuous processes driven by a differential stochastic equation dXt = σ(t, Xt )dWt + b(t, Xt )dt, t > 0, define VaR quantiles for sups≤u≤t Xu and give bounds for both, the VaR quantile and the conditioned mean loss associated to it. The second quantity is generally considered better than VaR not only because it is subadditive, but also because it provides useful extra information. Our approach is completely different to that of the authors mentioned above and can be applied to a general class of diffusions, in particular to typical examples in Finance where σ(t, Xt ) = σt Xtγ , with 0 ≤ γ < 1 and σt bounded. We use the change of time property of the Brownian Motion to give upper and lower bounds for the tail distribution of the process sups≤u≤t Xu , and apply those results to obtain estimates for different measures of risk. We pay special attention to the Geometric Brownian Motion, the Vasicek and the Cox-Ingersoll-Ross type models. It is important to remark that our estimates are sharp, as can be seen in all the corollaries where we combine upper and lower bounds. This property –sharpness of the bounds, that is important by itself, seems relevant for practitioners, who in general obtain bounds for the coefficients of the equations they work with. We can also note that the coefficients that define the processes (σ and b in equation (1) below) can be random, as long as they satisfy certain hypotheses defined in Section 2. The same type of analysis has already been done for diffusion processes with jumps in [DFM], using other techniques. Those bounds, if restricted to processes without jumps as in the present setting, are not as good as the estimates found here. Additionally, in this paper three important classes of examples are carefully discussed. The structure of the paper is as follows: In Section 2 we state the notation, the basic assumptions we will use (Hypotheses (UB) and (LB)), and we describe the main examples we discuss all along the paper. Section 3 has the upper and lower estimates for the tail distribution of sups≤u≤t Xu based in the change of time property of the Brownian Motion. Also in that Section are the computations of the estimates for the main examples, including an application to Russian Options. We devote Section 4 to the definition of different kinds of dynamic VaR quantiles, and give general upper and lower estimates for them, based on the previous results. Finally, in Section 5, we consider the expected shortfall as well as a second order VaR conditioned on the past, and obtain the corresponding estimations. 2. Preliminaries: Hypotheses and Notation We consider a one-dimensional Brownian Motion W and a diffusion process defined by the following differential equation (1)

dXt = σ(t, Xt )dWt + b(t, Xt )dt, t > 0,

which models financial assets such as interest rate. The functions σ and b are measurable, real valued, and so that equation (1) admits a unique solution which is a diffusion process with state space IR (see for example [ReYor] Chapter IX, or [KarShr] Chapter 6). All along this paper we shall denote by (UB) the hypotheses to have upper bounds, and by (LB) (correspondingly, to obtain lower bounds) to the following sets of conditions: Hypotheses (UB):

Dynamic VaR Associated to Diffusion Processes

303

1. for all z ∈ IR, and t ≥ 0, b(t, z) is unifomly bounded above b(t, z) ≤ b∗ by the constant b∗ ≥ 0. √ γ 2. for all z ∈ IR, and uniformly on t ≥ 0, |σ(t, z)| ≤ a∗ |z| , where a∗ > 0, and γ ∈ [0, 1) are constants. Observe that in the case γ = 0, Hypothesis 2. of (UB) means that σ is bounded. The case γ = 1 is not considered in (UB): it shall be treated separately, for example in what we call the Geometric Brownian type process case. Hypotheses (LB): 1. For all z ∈ IR, t ≥ 0, b∗ ≤ b(t, z) with the constant b∗ ≤ 0. √ 2. For all z ∈ IR, t ≥ 0, a∗ ≤ σ(t, z), where a∗ > 0 is constant. Note that Hypotheses (LB) are just uniform lower bounds on the coefficients. In a natural way, for all x ∈ IR, we shall denote by Px the probability associated to X such that Px (X0 = x) = 1. Adopting conventional notations we define, for all 0 ≤ s < t:  Xs,t = sup Xu , s≤u≤t

 . and we put Xt = X0,t Set, ∀z ∈ IR, a(t, z) = σ 2 (t, z). Let us fix q = 1 − α in (0, 1). ¯ the tail of the standard normal distribution function defined Let us denote by Φ on IR:  +∞ u2 1 ¯ Φ(x) = √ e− 2 du. 2π x

For any real number r, we shall denote by r+ (resp. r− ) its positive part (resp. negative part) so that r = r+ − r− . We are interested in finding upper bounds for some dynamic measures of risk as the quantile type (VaR) and expected shortfall (analogous to those defined in [McNeil] for time series), but for the maximum of the process between times s and t.  ) given that the underlying process We will define a quantile for the process (Xs,t is observed at time s and found equal to m. In order to do that we set, for x, m ∈ IR and 0 < s < t, s,t

 VaR m,α (X) = inf{z ∈ IR, Px (Xs,t ≤ z | Xs = m) ≥ q}.

Another definition for VaR (as in [TaZh]) is VaR s,t m,α (X) = inf{z, Px (Xt ≤ z | Xs = m) ≥ q}. s,t

One can easily verify that VaR s,t m,α (X) ≤ VaR m,α (X), so all the upper estimates that follow are also valid for VaR s,t m,α (X). We shall prove general results under (UB) or (LB), and apply them either directly or after a change of variable. This will be done mainly to three classes of examples, which are: Example 1:The Geometric Brownian type process. In this example, X satisfies the SDE (2)

dXt = σt dWt + bt dt, Xt

Laurent Denis, Begoña Fernández, and Ana Meda

304

with X0 = m, m > 0, where σt and bt are real functions defined on IR. In the context of the original model (1), b(t, z) = bt z, and σ(t, z) = σt z. Example 2:The Cox-Ingersoll-Ross (CIR) type model. The CIR process X is solution to the following SDE: # (3) dXt = a∗ Xt dWt + (b∗ − c∗ Xt )dt, with X0 = m, m > 0 and a∗ , b∗ , c∗ ∈ IR+ . It can be shown (see [IkWa]), that for all m ∈ IR+ this equation admits a unique + IR –valued solution with X0 = m. This process satisfies (UB) with γ = 1/2 : # b(t, z) = b(z) = b∗ − c∗ z + ≤ b∗ , and σ(t, z) = σ(z) = a∗ |z|. Let us remak that in general, this model is used in theory of interest rates. Example 3: The Vasicek type model. In this model X satisfies the SDE dXt = σdWt + (β − μt Xt )dt,

(4)

where σ ∈ IR , β ∈ IR, and μt is a positive continuous function. Let us remark that not all the examples above satisfy Hypotheses (UB) or (LB) directly. +

3. Estimates for the distribution of X  and applications 3.1. Estimates for the distribution of the sup Lemma 3.1. Let X be a solution of equation (1), and let us assume (UB). Then, for t > 0, m ∈ IR and z ∈ IR we have 

(z − m − b∗ t)+  ¯ √ . Pm (Xt ≥ z) ≤ 2Φ a∗ t|z|γ Proof. If z < m, the result is clear. If z ≥ m, then (wlog) we can assume that ∀t ≥ 0, ∀y ≥ z, a(t, y) = a(t, z). We have

 Pm (Xt

≥ z) ≤ Pm



σ(u, Xu )dWu ≥ z − m − b t .

s



sup 0≤s≤t

0



Let us denote ∀s ≥ 0, Rs =

s

σ(u, Xu )dWu . 0

˜ such that By the change of time property, there exists a Brownian Motion B ˜R,R . ∀s ≥ 0, Rs = B s This yields

 Pm (Xt

≥ z) ≤ Pm  ≤ Pm

∗ ˜ sup BR,Rs ≥ z − m − b t

0≤s≤t

sup 0≤u≤a∗ t|z|2γ

˜u ≥ z − m − b∗ t B

Dynamic VaR Associated to Diffusion Processes

305

since for all s ∈ [0, t], R, Rs ≤ a∗ t|z|2γ . As

 ∗ ˜a∗ t|z|2γ | ≥ z − m − b∗ t), ˜ Pm sup Bu ≥ z − m − b t = Pm (|B 0≤u≤a∗ t|z|2γ

we get the result. To get a lower bound, we assume conditions (LB), and we obtain the following: Lemma 3.2. Let X be a solution of equation (1), and let us assume conditions (LB). Then, for all z, m ∈ IR and t ≥ 0,

 − b∗ t)+ ¯ (z − m √ ≤ Pm (Xt ≥ z). 2Φ a∗ t Proof. We have   Pm sup s∈[0,t]

s

σ(u, Xu )dWu ≥ z − m − b∗ t ≤ Pm (Xt ≥ z).

0

By making the same change of variable as in the previous proof, with the same notation, and as  t a∗ t ≤ σ 2 (u, Xu )du = R, Rt a.e., 0

we deduce, using the Reflection Principle, that 

(z − m − b∗ t)+ ˜a t | ≥ z − m − b∗ t) ¯ √ 2Φ = Pm (|B ∗ a∗ t

 ˜ sup Bs ≥ z − m − b∗ t = Pm 

s∈[0,a∗ t]

≤ Pm

sup

˜ s ≥ z − m − b∗ t B

s∈[0,R,Rt ]

= Pm (Xt ≥ z), and so the proof is complete. Combining both results above, we have Corollary 3.3. Let X be a solution of equation (1), and assume both (UB) and (LB). Then, for all z ≥ m,  



(z − m − b∗ t)+ (z − m − b∗ t)+  ¯ ¯ √ √ 2Φ . ≤ Pm (Xt ≥ z) ≤ 2Φ a∗ t a∗ t|z|γ 3.2. Application to the Examples We keep notations of section 2. Example 1:Brownian Geometric type process In this example, the idea is to apply Itô’s formula to the process ∀t ≥ 0, Yt = ln(Xt ).

306

Laurent Denis, Begoña Fernández, and Ana Meda

Indeed, under Pm ,

 Yt = ln m +



t

t

˜bs ds,

σs dBs + 0

0

where

σ2 ∀s ≥ 0, ˜bs = bs − s . 2 As a consequence of Corollary 3.3, we have

Proposition 3.4. In Example 1, if one assumes that there exist constants 0 < a∗ ≤ a∗ and b∗ ≤ b∗ such that for all t ≥ 0 b∗ ≤ bt ≤ b∗ and a∗ ≤ σt2 ≤ a∗ , then for all z ≥ m and t ≥ 0  (ln z − ln m + t(b∗ − ¯ √ 2Φ a∗ t ≤

Pm (Xt

a∗ − + 2 ) )



¯ ≥ z) ≤ 2Φ



(ln z − ln m − t(b∗ − √ a∗ t

a∗ + +

2 ) )

.

American options are good candidates for an application of this result, in particular, the so called Russian option: Application: Let us consider a Russian option, whose underlying asset X is a Brownian geometric type process satisfying all the Hypotheses of Proposition 3.4. Assume that its maturity is at time t > 0, and recall that the pay-off at each s > 0 is given by D fs = M0 sup Xu . u≤s

If we assume that M0 is fixed, the quantity Pm (Xt ≥ M0 ) represents the risk for the seller that the option is exercised at a price greater than M0 , and Proposition 3.4 gives an estimate of such risk. Example 3: The Vasicek type model We introduce the process: t μ ds Yt = e 0 s Xt , t ≥ 0. One can easily verify that for all t ≥ 0:  t s  t s μ dr μ dr Yt = m + σe 0 r dWs + βe 0 r ds, 0

0

this yields Proposition 3.5. If X denotes the Vasicek type process as introduced in Example 3, for all z ≥ 0 and m in IR, we have: ⎛ ⎞ ⎞ ⎛ t t μs ds μs ds + + 0 0 βte ) ⎠ z − m − βt) ⎠ ¯ ⎝ (z − m − ¯ ⎝ (e t √ . 2Φ ≤ Pm (Xt ≥ z) ≤ 2Φ μs ds √ σ t σe 0 t Proof. It is an immediate consequence of: t μ ds Pm (Yt∗ ≥ e 0 s z) ≤ Pm (Xt ≥ z) ≤ Pm (Yt∗ ≥ z).

Dynamic VaR Associated to Diffusion Processes

307

4. Estimates of different kinds of VaR Along this section we assume that X satisfies equation (1) and the coefficients σ and b do not depend on t. We make the last assumption in order to apply the Markov property, and consider dynamic VaR; but all the results in this Section remain valid if s = 0, without the extra condition on the coefficients. 4.1. Dynamic VaR 4.1.1. Estimates s,t

Thanks to the estimates of the previous section, we are able to estimate VaR m,α (X): Theorem 4.1. Let us assume (UB). Then, for all 0 < s < t, s,t

VaR m,α (X) ≤ r, where r is the unique root on [b∗ (t − s) + m, +∞[ of the following equation: # ¯ −1 (α/2) − m − b∗ (t − s) = 0. (5) z − |z|γ a∗ (t − s)Φ Proof. Thanks to the Markov property we have   Px (Xs,t < z | Xs = m) = Pm (Xt−s < z),   so Px (Xs,t < z | Xs = m) > q if and only if Pm (Xt−s ≥ z) ≤ α, and because of Lemma 3.1 this is implied, if z ≥ m, by the following inequality:   (z − m − b∗ (t − s))+ ¯ # ≤ α. 2Φ a∗ (t − s)|z|γ

Since γ < 1, the left member of the previous inequality is equal to one for z ≤ m + b∗ (t − s) and goes to zero when z → ∞. Then there exists at least one root of the equation (5). The uniqueness is easy to verify. Corollary 4.2. (i) If γ = 0, that is σ bounded, there is the following estimate: s,t

VaR m,α (X) ≤ m + b∗ (t − s) +

# ¯ −1 (α/2). a∗ (t − s)Φ

(ii) If γ = 1/2 and m > 0, which corresponds to the CIR type model, we have: 1 s,t ¯ −1 )2 (α/2) VaR m,α (X) ≤ m + b∗ (t − s) + a∗ (t − s)(Φ 2 +   1 ¯ −1 ¯ −1 )2 (α/2) + 4(m + b∗ (t − s)) . (α/2) a∗ (t − s) a∗ (t − s)(Φ + Φ 2 Proof. One just has to calculate the root r of equation (5) in both cases. Theorem 4.3. Let us assume (LB). Then for all 0 ≤ s < t, m + b∗ (t − s) +

# ¯ −1 (α/2) ≤ VaR s,t (X). a∗ (t − s)Φ m,α

Laurent Denis, Begoña Fernández, and Ana Meda

308

Proof. It is just a consequence of Lemma 3.2. Let us sum up the results we obtained in the case where σ and b are bounded and s = 0 (see the comment at the beginning of this section): Proposition 4.4. Assume (UB) with γ = 0 and (LB), so we consider the process  t  t Xt = m + σ(u, Xu )dWu + b(u, Xu )du, t ≥ 0. 0

0

Then for all t > 0, √ √ ¯ −1 (α/2) ≤ V aR0,t (X) ≤ m + b∗ t + a∗ tΦ ¯ −1 (α/2). m + b∗ t + a∗ tΦ m,α Remark: This last inequality proves that in this case (γ = 0), the estimates we got are sharp. 4.1.2. Application to examples Example 1:Brownian Geometric type process In this case, as previously, we consider the process Y = ln X and keep the same notation. We have, for all 0 ≤ s < t: s,t

s,t

VaR m,α (X) = eVaRm,α (Y ) . This leads to the following estimate, that we state for the case s = 0 so that coefficients σ an b may depend on t: Proposition 4.5. In Example 1, if one assumes that there exist constants 0 < a∗ ≤ a∗ and b∗ ≤ b∗ such that for all t ≥ 0 b∗ ≤ bt ≤ b∗ and a∗ ≤ σt2 ≤ a∗ , then for all t ≥ 0 me−t(b∗ −

√ a∗ − ¯ α 2 ) + a∗ tΦ( 2 )

t(b ≤ VaR 0,t m,α (X) ≤ me



√ ¯ α) − a2∗ )+ + a∗ tΦ( 2

.

Example 3: The Vasicek type model In this case we consider μ not depending on t. Yt = eμt Xt , t ≥ 0. −μt Theorem 4.6. If V aR0,t Y ) ≥ 0 then, for t ≥ 0, m,α (e 

√ −1 α √ −1 α  0,t ¯ ¯ ≤ VaR x,α (X) ≤ m + βeμt + eμt σ tΦ . e−μt m + β + σ tΦ 2 2

Proof. Following the same argument of Proposition 4.4 we have: 0,t

−μt V aR0,t Y ) ≤ VaR x,α (X) ≤ V aR0,t m,α (e m,α (Y ),

and

√ −1 α  √ −1 α  μt ¯ ¯ m + β + σ tΦ ≤ V aR0,t , + eμt σ tΦ m,α (Y ) ≤ m + βe 2 2

then we get the desired inequality.

Dynamic VaR Associated to Diffusion Processes

309

5. Expected shortfalls and other kinds of VaR In this Section we assume again that X satisfies equation (1), the coefficients σ and b do not depend on t, and m > 0. The assumption on m is natural, and simplifies the calculations since applying the estimates of Lemma 3.1 we drop the absolute value. s,t

5.1. Mean of the excess distribution over the threshold VaRm,α (X)  We want to measure the expected shortfall of Xs,t given that the process Xt exceeds VaR between times s and t. In order to do that we find bounds for the excess distribution (see [EKM] and references therein) and find estimates for what we call a "second order" V aR. More precisely, we define for all 0 < s < t: s,t

s,t

  VVaR m,α (X) = inf{z ∈ IR, Px (Xs,t < z | Xs,t ≥ VaR m,α (X), Xs = m) ≥ q}. s,t

s,t

  < z | Xs,t ≥ VaR m,α (X), Xs = m) = 0. Notice that for z < VaR m,α (X), Px (Xs,t

Lemma 5.1. s,t

  Px (Xs,t ≥ z | Xs,t ≥ VaR m,α (X), Xs = m) ≤

 Pm (Xt−s ≥ z) α

s,t

 with equality if Pm (Xt−s = VaR m,α (X)) = 0. s,t

Proof. Thanks to the Markov property, if z ≥ VaR m,α (X) one has  ≥ z | Xs = m) Px (Xs,t

s,t

  Px (Xs,t ≥ z | Xs,t ≥ VaR m,α (X), Xs = m) =

= ≤

s,t

 ≥ VaR Px (Xs,t m,α (X) | Xs = m)  ≥ z) Pm (Xt−s s,t

 Pm (Xt−s ≥ VaR m,α (X))  ≥ z) Pm (Xt−s . α

s,t

s,t

  As Pm (Xt−s ≥ VaR m,α (X)) = α if Pm (Xt−s = VaR m,α (X)) = 0, the last assertion of the Lemma is clear.

So, we have the following Corollary 5.2. For all 0 < s < t, we have s,t

s,t

VVaR m,α (X) ≤ VaRm,α2 (X), s,t

 = VaR m,α (X)) = 0. with equality if Pm (Xt−s

Proof. s,t

s,t

  VVaR m,α (X) = inf{z, Px (Xs,t ≥ z | Xs,t ≥ VaR m,α (X), Xs = m) ≤ α}  Pm (Xt−s ≥ z) ≤α ≤ inf z, α  = inf{z, Px (Xs,t ≥ z | Xs = m) ≤ α2 } s,t

= VaR m,α2 .

310

Laurent Denis, Begoña Fernández, and Ana Meda

We also are able to estimate the mean loss under this conditional probability: s,t

Proposition 5.3. Let us assume hypotheses (UB), then if m ≤ VaR m,α (X)    s,t s,t  | Xs,t ≥ VaR m,α (X), Xs = m = VaR m,α (X) + R, Ex Xs,t where 2 R≤ α





+∞

¯ Φ

s,t

VaRm,α (X)

(z − m − b∗ (t − s))+ # a∗ (t − s)z γ

 dz.

Proof. We have    s,t s,t  | Xs,t ≥ VaR m,α (X), Xs = m = VaR m,α (X) Ex Xs,t  +∞    s,t  P ≥ z | X ≥ VaR (X), X = m dz. X + x s s,t s,t m,α s,t VaRm,α (X)

From Lemma 5.1, and Lemma 3.1, we have  +∞    s,t  Px Xs,t ≥ z | Xs,t ≥ VaR m,α (X), Xs = m dz s,t VaRm,α (X)



1 α



+∞ s,t

VaRm,α (X)

   Pm Xt−s ≥ z dz

 (z − m − b∗ (t − s))+ # dz s,t a∗ (t − s)z γ VaRm,α (X)    ∗ z − m − b 2 +∞ (t − s) ¯ # dz. Φ ≤ α VaRs,t a∗ (t − s)z γ m,α (X) 2 ≤ α





+∞

¯ Φ

Remark: This estimation seems complicated but, from a numerical point of s,t view, it is easy to simulate since we assumed m > 0 and then, VaR m,α (X) ≥ 0, integrating by parts we have:  +∞ ∗ (t−s))2   − (z−m−b −γ+1 ∗ −γ 2a∗ (t−s)z 2γ R≤K + γ(m + b (t − s))z dz (1 − γ)z e s,t VaRm,α (X)

⎛ ⎞ s,t s,t 2VaR m,α (X) VaR m,α (X) − m − b∗ (t − s) ¯⎝#

γ ⎠ , − Φ s,t α a∗ (t − s) VaR (X) m,α

where #

2

. α 2πa∗ (t − s) Moreover, if γ = 0, we have a more tractable formula: K=

Corollary 5.4. Let us assume (UB), γ = 0, then the constant R as in Proposition 5.3 satisfies # s,t ∗ (t−s))2 2 a∗ (t − s) − (VaRm,α (X)−m−b 2a∗ (t−s) √ R≤ e α 2π   s,t s,t VaR m,α (X) − m − b∗ (t − s) VaR m,α (X) − m − b∗ (t − s) ¯ # Φ . −2 α a∗ (t − s)

Dynamic VaR Associated to Diffusion Processes

311

Proof. If γ = 0, we have    ∗ 2 +∞ z − m − b (t − s) ¯ # dz R≤ Φ α VaRs,t a∗ (t − s) m,α (X) #  2 a∗ (t − s) +∞ ¯ = Φ(u) du s,t VaRm,α (X)−m−b∗ (t−s) α √ a∗ (t−s)

which yields the result. One can consider the risk at time t as defined by Talay and Zheng, and so define . s,t VVaR s,t m,α (X) = inf z ∈ IR, Px (Xt < z | Xt ≥ VaR m,α (X), Xs = m) ≥ q . The same arguments as those we used all along this section yield Lemma 5.5. For all 0 < s < t, we have  with equality if Pm Xt−s

s,t VVaR s,t m,α (X) ≤ VaRm,α2 ,  = VaR s,t m,α (X) = 0.

Proposition 5.6. If VaR s,t m,α (X) ≥ m :   s,t Ex Xt | Xt ≥ VaR s,t m,α (X), Xs = m ≤ VaR m,α (X) + R, where 2 R≤ α





+∞

¯ Φ

s,t

VaRm,α (X)

(z − m − b∗ (t − s))+ # a∗ (t − s)|z|γ

 dz.

Example 1: Consider that X is the Geometric Brownian Motion, i.e. where the coefficients σ and b are constant and do not depend on t, then we have the following estimate: s,t

Proposition 5.7. If VaR m,α (X) ≥ m, then for all 0 ≤ s < t s,t

s,t

  | Xs,t ≥ VaR m,α (X), Xs = m) = VaR m,α (X) + R Ex (Xs,t

where 2 R≤ α





+∞ s,t

¯ Φ

VaRm,α (X)

2

(ln z − ln m − (b − σ2 )+ (t − s))+ # σ 2 (t − s)

 dz

   2 s,t ln VaR m,α (X) − ln m − (b − σ2 )+ (t − s) − σ 2 (t − s) 2 ¯ # KΦ ≤ α σ 2 (t − s)   2 s,t ln VaR m,α (X) − ln m − (b − σ2 )+ (t − s) s,t ¯ # − VaR m,α (X)Φ σ 2 (t − s)

where K = e−

σ2 2

2

(t−s)−(ln m+(b− σ2 )+ (t−s))

.

Laurent Denis, Begoña Fernández, and Ana Meda

312

Proof. As in the previous proofs, we start with s,t

s,t

  | Xs,t ≥ VaR m,α (X), Xs = m) = VaR m,α (X) Ex (Xs,t  +∞ s,t   Px (Xs,t ≥ z | Xs,t ≥ VaR m,α (X), Xs = m) dz. + s,t VaRm,α (X) s,t

For all z ≥ VaR m,α (X) thanks to Proposition 3.4 we have 

s,t   ≥ z | Xs,t ≥ VaR m,α (X), Xs = m Px Xs,t 1 1   Pm (Xt−s ≥ z) = Pm (Yt−s ≥ ln z) α  α  2 2 ¯ (ln z − ln m − (b − σ2 )+ (t − s))+ # ≤ Φ . α σ 2 (t − s) ≤

This yields,  +∞ s,t

VaRm,α (X)

 Px (Xs,t ≥ z) dz

 2 (ln z − ln m − (b − σ2 )+ (t − s))+ # dz s,t σ 2 (t − s) VaRm,α (X)    σ2 + ) (t − s) u − ln m − (b − 2 +∞ 2 ¯ # = du eu Φ α ln VaRs,t σ 2 (t − s) m,α (X)    2 s,t ln VaR m,α (X) − ln m − (b − σ2 )+ (t − s) 2 s,t ¯ # − VaR m,α (X)Φ = α σ 2 (t − s)  2  +∞ (u−ln m−(b− σ )+ (t−s))2 2 1 u − 2 (t−s) 2σ e e du . +# 2πσ 2 (t − s) ln VaRs,t m,α (X) 2 ≤ α





+∞

¯ Φ

One can verify that #

1



+∞



eu e

2 (u−ln m−(b− σ )+ (t−s))2 2 2σ 2 (t−s)

du 2πσ 2 (t − s) ln VaRs,t m,α (X)   2 s,t ln VaR m,α (X) − ln m − (b − σ2 )+ (t − s) − σ 2 (t − s) ¯ # = KΦ , σ 2 (t − s)

where K = e−

σ2 2

2

(t−s)−(ln m+(b− σ2 )+ (t−s))

;

this ends the proof. 5.2. Second Order VaR conditioned on the past  In this Section we want to measure the VaR of Xs,t given that the process X exceeded VaR before time s. This would help to know how risky the future is (up to time t > s), given that you already exceeded VaR in the past. In a natural way we define

   s,t 0,s   E VaR X (X) = inf z ∈ IR, P < z | X ≥ VaR (X) ≥ q . x s,t s x,α x,α

Dynamic VaR Associated to Diffusion Processes

313

Theorem 5.8. Let us assume (UB), s,t

E x,α (X) ≤ r, VaR 0,s

where r is the unique root on [b∗ t + VaR x,α (X), +∞[ of the following equation: √ ¯ −1 (α/2) − VaR 0,s (X) − b∗ t = 0. (6) z − |z|γ a∗ tΦ x,α 0,s

Proof. Let z > VaR x,α (X) :

 Px Xs,t ≥ z | Xs ≥

We now introduce



0,s VaR x,α (X)

=



0,s  Px Xs,t ≥ z, Xs ≥ VaR x,α (X) 0,s

Px (Xs ≥ VaR x,α (X))

.

  0,s T = inf u > 0, Xu = VaR x,α (X) ,

denote by μ the law of T under Px . Then   s

   0,s   Px Xs,t ≥ z | T = r μ(dr) Px Xs,t ≥ z, Xs ≥ VaR x,α (X) = 0  s    ≤ Px Xr,t ≥ z | T = r μ(dr) 0  s    = PVaR0,s (X) Xt−r ≥ z μ(dr) x,α 0   0,s  s (z − VaR x,α (X) − b∗ (t − r))+ ¯ # μ(dr) 2Φ ≤ a∗ (t − r)|z|γ 0   0,s (z − VaR x,α (X) − b∗ t)+ ¯ √ ≤ 2Φ Px (T ≤ s) a∗ t|z|γ   0,s (z − VaR x,α (X) − b∗ t)+ 0,s ¯ √ = 2Φ Px (Xs ≥ VaR x,α (X)). a∗ t|z|γ So,

 ≥ z | Xs ≥ Px Xs,t



0,s VaR x,α (X)

 ¯ ≤ 2Φ

0,s

(z − VaR x,α (X) − b∗ t)+ √ a∗ t|z|γ

 ,

and we conclude as in Theorem 4.1. Remark: In other words, the bound we obtain is the same as the one we got for 0,t V aRVaR0,s (X),α (X). x,α

As previously, for γ = 0 or γ = 1/2 we are able to calculate this bound and this yields: Corollary 5.9. (i) If γ = 0, that is σ bounded, there is the following estimate: √ s,t ∗ ¯ −1 (α/2). E x,α (X) ≤ VaR 0,s a∗ tΦ VaR x,α (X) + b t + (ii) If γ = 1/2 we have: s,t ¯ −1 )2 (α/2) E x,α (X) ≤ VaR 0,s (X) + b∗ t + 1 a∗ t(Φ VaR x,α 2 ,

 1 ¯ −1 ¯ −1 )2 (α/2) + 4(VaR 0,s (X) + b∗ t) . + Φ (α/2) a∗ t a∗ t(Φ x,α 2

314

Laurent Denis, Begoña Fernández, and Ana Meda

References [BaTa1] Bally, V. and Talay, D. (1996). The law of the Euler scheme for stochastic differential equations. I. Convergence rate of the distribution function. Probab. Theory Related Fields 104 (1) 43–60. [BaTa2] Bally, V. and Talay, D. (1996). The law of the Euler scheme for stochastic differential equations. II. Convergence rate of the density. Monte Carlo Methods Appl. 2 (2) 93–128. [BoKl] Borkovec, M. and Klüppelberg, C. (1998). Extremal behavior of diffusion models in finance. Extremes 1 (1) 47–80. [Dav] Davis, R. (1982). Maximum and minimum of one-dimensional diffusions. Stoch. Proc. Appl. 13 1–9. [DFM] Denis, L., Fernández, B. and Meda, A. (2006). Estimation of Value at Risk for diffusion processes with jumps and their ruin probabilities. Preprint. [EKM] Embrechts, P., Klüppelberg, C. and Mikosch, T. (1999). Modelling Extremal Events for Insurance and Finance. Springer Verlag, BerlinHeidelberg-New York. [IkWa] Ikeda, N. and Watanabe, S. (1981). Stochastic Differential Equations and Diffusion Processes. North-Holland, Tokyo. [KarShr] Karatzas, I. and Shreve, S. (1991). Brownian Motion and Stochastic Calculus. Springer Verlag, Berlin, Heidelberg, New York. [McNeil] McNeil, A. (2000). Extreme value theory for risk managers. In Extremes and Integrated Risk Management (P. Embrechts, ed.) 3–18. Risk Books, Risk Waters Group, London. [ReYor] Revuz, D. and Yor, M. (1994). Continuous Martingales and Brownian Motion. Springer Verlag, Berlin, Heidelberg, New York. [TaZh] Talay, D. and Zheng, Z. (2003). Quantiles of the Euler scheme for diffusion processes and financial applications. Conference on Applications of Malliavin Calculus in Finance (Rocquencourt, 2001). Math. Finance 13 (1) 187–199.

IMS Collections Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz Vol. 4 (2008) 315–324 c Institute of Mathematical Statistics, 2008  DOI: 10.1214/074921708000000453

A Duality Identity between a Model of Bacterial Recombination and the Wright–Fisher Diffusion Xavier Didelot,1 Jesse E. Taylor,2 and Joseph C. Watkins3,∗ University of Oxford and University of Arizona Abstract: In this article, we establish, using a duality argument, an identity stating that the Laplace transform of the length of a contiguous bacterial recombination region equals the probability of choosing a given allele in a stationary population evolving according to the one-dimensional Wright–Fisher diffusion model. Beyond giving us an improved inferential strategy for parameter estimation in bacterial recombination, the matching of the selection and recombination parameters in the identity also suggests the existence of an intriguing formal relationship between gene conversion and the ancestral selection graph.

1. Introduction Bacterial genomes are made up of one or a handful of chromosomes which are usually circular, with size ranging from 140 Kb for the endosymbiont Carsonella (Nakabachi et al. 2006) to over 12 Mb for the myxobacterium Sorangium cellulosum (Pradella et al. 2002). Recombination is not obligate in bacteria but has been shown to happen frequently in nature in a variety of species (e.g. Maynard Smith et al. 1993, Guttman and Dykhuizen 1994, Feil et al. 2001). Bacterial sex is analogous to gene-conversion rather than crossing-over, in the sense that there are always a clear recipient and a clear donor cell, and the resulting bacterium has the same DNA as the recipient for all of its genome except for a small contiguous segment where it is identical to the donor. The average tract length of the imported regions has previously been estimated to be of the order of 1000 bp in several species (Milkman and Bridges 1990, Jolley et al. 2005, Fearnhead et al. 2005). When modeling recombination, the tract length distribution is usually assumed to be geometric (or exponential) with a mean estimated from the data (Falush et al. 2001, McVean et al. 2002, Falush et al. 2003, Suchard et al. 2003). The same assumption is usually made when modeling gene-conversion in eukaryotes (Wiuf and Hein 2000, Frisse et al. 2001). ∗ Completed during JCW’s visit to University of Oxford. JCW would like to take this opportunity to acknowledge the Department of Statistics and the Mathematical Genetics group for their hospitality. This research was supported in part by National Science Foundation grant BCS-0432262. 1 Department of Statistics, 1 South Parks Road, University of Oxford, Oxford OX1 3TG, UK, e-mail: [email protected] 2 Department of Statistics, 1 South Parks Road, University of Oxford, Oxford OX1 3TG, UK, e-mail: [email protected] 3 Joseph C. Watkins, Department of Mathematics, University of Arizona, 617 North Santa Rita Road, Tucson, Arizona 85716, USA, e-mail: [email protected] AMS 2000 subject classifications: Primary 92D10; secondary 60K25 Keywords and phrases: bacterial recombination, gene conversion, ancestral selection graph, Wright–Fisher diffusion, M/M/∞ queue, duality identity

315

316

Xavier Didelot, Jesse E. Taylor, and Joseph C. Watkins

Most previous methods estimating the recombination rate and tract length assume that each imported region on the genome is due to exactly one recombination event. However, as the rate of recombination, the average tract length and the time of exposure to recombination increase, so does the probability that several recombination events overlap, meaning that the intersection of chromosomal positions affected by at least two recombination events is non-empty (cf. Figure 1). If the sequences imported by different recombination events can not be distinguished (for example in the case of inter-population recombination as in Falush et al. (2003), it is possible to observe which regions of the genome have been imported (shown in grey on Figure 1), but not the exact starting point and tract length of individual recombination events. Thus, to do inference on the parameters governing the recombination process itself, we need to know how the distribution of length of contiguous imported regions is related to the initiation rate and tract length distribution of individual recombination events. In particular the derivations described here are used in the computer package ClonalFrame which infers bacterial microevolution using multilocus sequence data (Didelot and Falush 2007). Here we consider the genome to be continuous and of size L. Let ρ/2, μ−1 and δ denote the rate of recombination per genome, average recombination tract length and time of exposure of the genome to recombination respectively. We assume that recombination is uniformly likely to be initiated at any position of the genome, so that the rate of initiation is λ = ρ2 Lδ , and that the tract length of a single recombination event is exponentially distributed with mean μ−1 . We will also suppose that recombination events are initiated at their upstream boundaries and then proceed downstream; an equivalent result would be obtained if we allowed events to be initiated downstream and proceed upstream or even if we allowed the orientation to be determined at random but independently of all other recombination events. We would obtain a different process if we allowed recombination to proceed in both directions from the initiation point. Because we are concerned with genomes which are large in comparison with the total amount of material likely to have been imported, we will ignore the possibility of wrap-around recombination events in circular genomes or edge effects in linear genomes. With these assumptions in mind, we can model the distribution of imported material in the genome as being generated by a Poisson point process on R, with intensity measure λ dx, which determines the location of the recombination initiation points, each of which is the left end point of an interval of exponentially distributed length. Our problem is to determine the distribution of the length of maximally overlapping intervals as a function of λ and μ. To do so, we first observe that this interval-valued process is related to a queue. Indeed, each recombination event (interval) can be identified with a customer who stays in the queue for an exponentially distributed period of time with mean μ−1 . Since prior recombination events do not alter the tract length of subsequent recombination events starting in the same region, the queue can be said to have an infinite number of servers. Using this analogy, the distribution of the length of imported regions is the same as that of the busy period of an M/M/∞ queue (i.e. the contiguous periods of time when there is at least one customer in the system) with arrival rate λ and mean service time requirement μ−1 . The length of non-imported regions is distributed as the idle periods of that same queue (i.e. the contiguous periods of time when there is no customer in the system). Figure 2 shows the cumulative density functions of the busy periods of an M/M/∞ queue for different values of λ/μ estimated using Monte-Carlo simulations. Although the Laplace transform of the busy period queue has been determined and

A Model of Bacterial Recombination

317

ρ/2 δ

...

Fig 1. Illustration of the effect of bacterial recombination. The circle represents the bacterial genome and the bold arcs around it represent the different recombination events. The fragments of the genome affected by recombination are in grey.

318

Xavier Didelot, Jesse E. Taylor, and Joseph C. Watkins

developed by Guillemin and Simonian (1995) and Preater (1997) and is the subject of recent work by Roijers et al. (2006), we provide an alternative derivation of their results which exploits the method of duality. This approach is of interest because it proceeds via the Wright–Fisher diffusion and its moment dual, two stochastic processes which are at the heart of theoretical population genetics. 2. M/M/∞ queues and the Wright–Fisher diffusion We first recall that the M/M/∞ queue with arrival rate λ and mean waiting time μ−1 is a Markov process (Mt , t ≥ 0) with state space N = {0, 1, 2, · · ·} and generator     (1) GM φ(n) = nμ φ(n − 1) − φ(n) + λ φ(n + 1) − φ(n) , for any bounded function φ : N → R. Because Mt satisfies the strong Markov property, the busy period has the same distribution as the stopping time τM = inf{t > 0 : Mt = 0} if M0 = 1. To determine the Laplace transform of τM , we will exploit the fact that a simple time change of Mt leads to a moment dual for the Wright–Fisher diffusion. To see that this is true, let (pt , t ≥ 0) be a Wright–Fisher diffusion with state space [0, 1] and generator (2)

Gp φ(p) =

  1 p(1 − p)φ (p) + ν1 (1 − p) − ν2 p − σp(1 − p) φ (p). 2

for any twice continuously differentiable function φ : [0, 1] → R. As shown in Ethier and Kurtz (1986, Chapter 10), this diffusion process arises as the weak limit of a sequence of suitably scaled Markov chains which model the effects of genetic drift, mutation, and selection on the relative frequency p ∈ [0, 1] of an allele A1 in a finite population segregating two alleles, A1 and A2 . On the diffusive time scale we assume that A1 mutates to A2 at rate ν2 , that A2 mutates to A1 at rate ν1 , and that A2 has selective advantage σ ≥ 0 over A1 . We also note that if ν1 > 0 and ν2 > 0, then pt has a unique stationary distribution with density π(p) = Cp2ν1 −1 (1 − p)2ν2 −1 e−2σp ,

(3)

where C is a normalizing constant (Ethier and Kurtz, Chapter 10, Lemma 2.1). If we let (Nt , t ≥ 0) be a pure-jump Markov process on N corresponding to the generator (4)

GN φ(n) =

1 n(n − 1)(φ(n − 1) − φ(n)) 2 + nν1 (φ(n − 1) − φ(n)) + nσ(φ(n + 1) − φ(n)),

and we set f (p, n) = pn , then (5)

Gp f (p, n) = GN f (p, n) − ν2 nf (p, n).

Since all of the terms appearing in Eq. (5) are bounded, Theorem 4.11 and Corollary 4.13 of Ethier and Kurtz (1986, Chapter 4) imply that pt and Nt are related by the following duality identity: ! " t n0 Nt −ν2 0 Ns ds Ep [pt ] = En0 p e (6) ,

A Model of Bacterial Recombination

319

Fig 2. Cumulative density function of the busy time of an M/M/∞ queue with mean customer requirement μ−1 = 1 and different values of the arrival rate λ.

which holds for all p ∈ [0, 1], all n0 ∈ N, and all t ≥ 0. Furthermore, because Nt almost surely absorbs at 0 at some finite time T = inf{t > 0 : Nt = 0} (see for example Donnelly and Kurtz 1999), it follows that both the right-hand side, and therefore the left-hand side, of Eq. (6) converge as t → ∞. Since pt takes values in a compact space, this fact, along with the uniqueness of the stationary measure π(p) dp, implies that the law of pt tends weakly to π(p) dp as t → ∞ and leads to the following equation for the moments of the stationary measure: ! "  1 T −ν N ds (7) . pn0 π(p)dp = En0 e 2 0 s 0

Now let T1 , · · · , TJ be the jump times of the process Nt , where TJ = T and T0 = 0, so that J is the number of jumps taken by the process until it absorbs at 0, and let nk = NTk be the state occupied by Nt immediately following the k-th jump. Conditional on nk = n, the holding time Tk −Tk−1 is exponentially distributed with parameter 12 n(n − 1) + nν1 + nσ and is independent of all nj , j = k, and of all other holding times Tj − Tj−1 , j = k. Equation (6) can be rewritten as: 9 J :  1

e−ν2 nk−1 (Tk −Tk−1 )

pn0 π(p)dp = En0 0

(8)

9

k=1

:

J −ν2 (τk −τk−1 )

≡ En0

e

; < ≡ En0 e−ν2 τ ,

k=1 D

where, conditional on nk−1 = n, τk − τk−1 = nk−1 (Tk − Tk−1 ) is exponentially distributed with parameter 12 (n − 1) + ν1 + σ, with the same independence structure as above, and τ ≡ τJ .

320

Xavier Didelot, Jesse E. Taylor, and Joseph C. Watkins

Introducing the process Vt with generator GV φ(n) =

(9)

1 (n − 1)(φ(n − 1) − φ(n)) 2 + ν1 (φ(n − 1) − φ(n)) + σ(φ(n + 1) − φ(n)),

we see that Vt and Nt differ only by a time change, that τk is equal in distribution D to the time of the k-th jump by Vt , and that τ = inf{t > 0 : Vt = 0}. It follows from Eq. (8) that  1 ; < pn0 π(p)dp = En0 e−ν2 τ . (10) 0

Finally, to relate this result to the original M/M/∞ queue corresponding to (1), D λ , we will have M (t) = V (2μt) and observe that by taking ν1 = 1/2 and σ = 2μ D

therefore τM = τ /2μ. Thus, if n0 = 1 (so that the beginning of the busy period is initiated by the arrival of a single individual), then from Eqs. (3) and (10) we obtain the following equation for the Laplace transform, ψ(α), of the busy period τM : ) *  1 p(1 − p)α/μ−1 e−λp/μ dp ; −ατM < −(α/2μ)τ = 0 1 = E1 e ψ(α) ≡ E1 e (11) . (1 − p)α/μ−1 e−λp/μ dp 0 This Laplace transform uniquely specifies the statistical distribution of τM . In particular, the moments of τM can be simply calculated by evaluating the momentk generating function derivatives ψ (k) (0) = (−1)k E1 τM . 3. Discussion We now return to the statistical problem of the estimation of the parameters λ and μ−1 of the recombination process when all that is known is which regions of the genome have been imported. Since the mean length of non-imported regions is λ−1 , the maximum likelihood estimator of λ is the inverse of the mean length of non-imported regions of the bacterial genome. A second parameter estimate can be obtained by taking the inverse Laplace transform of the busy period distribution τM and maximizing the likelihood. More generally, we can take the lengths of busy and idle periods and maximize the product of their likelihoods for different value of λ and μ−1 . A simpler approach is to differentiate the Laplace transform and perform a method of moments estimate following the strategy in Section 5 of Roijers et al. (2006). Defining 

1

I(a, b) ≡

(1 − p)a−1 e−bp dp = e−b

0

= e−b

(12)

∞ k   b k=0

k!



1

pa−1 ebp dp 0

1

pk+a−1 dp = e−b

0

∞ k  b k=0

1 = e−b S(a, b), k! k + a

and noting that 

1

p(1 − p)

(13) 0

a−1 −bp

e

−b



1

(1 − p)pa−1 ebp dp = I(a, b) − I(a + 1, b),

dp = e

0

A Model of Bacterial Recombination

321

it follows that (14)

ψ(α) =

I(α/μ, λ/μ) − I(α/μ + 1, λ/μ) S(α/μ + 1, λ/μ) =1− . I(α/μ, λ/μ) S(α/μ, λ/μ)

To find the expected duration of the busy period, we use Equations 12 and 14 to calculate (15)

ψ  (α) = −

∂α S(α/μ, λ/μ) ∂α S(α/μ + 1, λ/μ) + S(α/μ + 1, λ/μ). S(α/μ, λ/μ) (S(α/μ, λ/μ))2

For the first term, the numerator is bounded and denominator is O(1/α) as α → 0. Thus, this term has limit 0. For the fraction in the second term, the singular part in the numerator for α small is −μ/α2 and in the denominator, it is (μ/α)2 . Thus, this fraction has limit −1/μ as α → 0. Consequently, (16)

E1 τM = −ψ  (0) =



1  (λ/μ)k eλ/μ − 1 1 S(1, λ/μ) = = μ μ (k + 1)k! λ k=0

This identity allows us to use the sample mean of the busy period to estimate the average recombination tract length μ−1 (Didelot and Falush 2007). Alternatively, this expression can be derived from the detailed balance condition satisfied by the stationary distribution of the queue. The Wright–Fisher diffusion describes the forward evolution of a population subject to random genetic drift, mutation and selection. The moment duality established in the previous section is closely related to the ancestral selection graph (ASG, Krone and Neuhauser 1997, Neuhauser and Krone 1997), which characterizes the genealogy of a sample of genes collected from such a population. In particular, the duality calculation matches the selective advantage to a term proportional to the recombination rate. An example of an ancestral selection graph is shown in Figure 3: looking back in time, when k lineages are present, the rate of coalescence is k(k − 1)/2 as in the coalescent (Kingman 1982) and the rate of branching is σk/2. Coalescence represents two lineages finding a common ancestor (represented by the two lines merging on the graph) and branching accounts for the unobserved selective deaths (represented by a lineage splitting into two on the graph). Exactly the same rates of coalescence and branching occur when considering recombination instead of selection and the resulting graph is then called an ancestral recombination graph (ARG, Hudson 1983, Griffiths and Marjoram 1996). In the ARG, the rate of branching per lineage is usually denoted ρ/2 and branchings represent recombination events through which a lineage inherits ancestral material from two parents. 1 Eq. (7) has the following genealogical interpretation. Observe that 0 pn0 π(p)dp is the probability that a sample of size n0 drawn from a stationary population evolving according to the Wright–Fisher diffusion consists only of individuals with allele A1 . The process Nt can be thought of as a lines-of-descent modification of the ancestral selection graph (ASG, Krone and Neuhauser 1997), in which n lineages, all of type A1 , are subject to the following events: pairs of lines coalesce at rate 1/2, individual lines each undergo selective branching at rate σ, and each survive until the most recent (forwards-in-time) A2 -to-A1 mutation at rate ν1 . The resulting disconnected graph is a subgraph of the ASG and does not contain complete information about the genealogy of the sample. The e−ν2 nk (Tk−1 −Tk ) terms in Eq.

322

Xavier Didelot, Jesse E. Taylor, and Joseph C. Watkins

Fig 3. An example of ancestral recombination/selection graph for n = 10 individuals and a rate of recombination/selection of ρ/2 = σ/2 = 1. Horizontal arrows represent donor/incoming branches.

A Model of Bacterial Recombination

323

(8) arise from our assumption that the sample consists of A1 individuals only and that the lines-of-descent survive only until the most recent mutation from A2 ; there is a factor of nk in the exponent because there are nk lines-of-descent between time Tk−1 and Tk . Finally, τ is just the extinction time for the lines-of-descent process, i.e., the time when all lineages, ancestral and virtual, have been absorbed by A2 individuals. While it is intriguing that in attempting to understand a phylogenetic model of evolution describing the effects of gene conversion on genomic diversification we have been led to a population genetical model of evolution of a non-neutral allele, further research will be necessary to determine whether this is coincidental or hints at some deeper connection. One good starting point would be the particle models of Donnelly and Kurtz (1999) that incorporate both ancestral selection graphs and ancestral recombination graphs into an ancestral inference graph. References [1] Didelot, Xavier and Falush, Daniel (2007). Inference of bacterial microevolution using multilocus sequence data. Genetics 175 1251–1266. [2] Donnelly, P. and Kurtz, T. G. (1999). Genealogical processes for Fleming– Viot models with selection and recombination. Ann. Appl. Prob. 9 1091–1148. [3] Ethier, S. N. and Kurtz, T. G. (1986). Markov Processes: Characterization and Convergence. John Wiley & Sons, New York. [4] Falush, D., Kraft, C., Taylor, N. S., Correa, P., Fox, J. G., Achtman, M. and Suerbaum, S. (2001). Recombination and mutation during long-term gastric colonization by Helicobacter pylori : Estimates of clock rates, recombination size, and minimal age. Proc. Natl. Acad. Sci. USA 98 15056–15061. [5] Falush, D., Stephens, M. and Pritchard, J. K. (2003). Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics 164 1567–1587. [6] Fearnhead, P., Smith, N. G., Barrigas, M., Fox, A. and French, N. (2005). Analysis of recombination in Campylobacter jejuni from MLST population data. J. Mol. Evol. 61 333–340. [7] Feil, E. J., Holmes, E. C., Enright, M. C., Bessen, D. E., Day, N. P. J., Chan, M.-S., Hood, D. W., Zhou, J. and Spratt, B. G. (2001). Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc. Natl. Acad. Sci. USA 98 182–187. [8] Frisse, L., Hudson, R. R., Bartoszewicz, A., Wall, J. D., Donfack, J. and Di Rienzo, A. (2001). Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. Hum. Genet. 69 831–843. [9] Griffiths, R. C. and Marjoram. P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Computational Biology 3 479–502. [10] Guillemin, F. and Simonian, A. (1995). Transient characteristics of an M/M/∞ system. Advances in Applied Probability 27 862–888. [11] Guttman, D. S. and Dykhuizen, D. E. (1994). Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science 266 1380– 1383. [12] Hudson, R. R. (1983). Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology 23 183–201.

324

Xavier Didelot, Jesse E. Taylor, and Joseph C. Watkins

[13] Jolley, K. A., Wilson, D. J., Kriz, P., McVean, G. and Maiden, M. C. J. (2005). The influence of mutation, recombination, population history, and selection on patterns of genetic diversity in Neisseria meningitidis. Mol. Biol. Evol. 22 562–569. [14] Kingman, J. F. C. (1982). The coalescent. Stochastic Processes and their Applications 13 235–248. [15] Krone, S. M. and Neuhauser, C. (1997). Ancestral process with selection. Theor. Pop. Biol. 51 210–237. [16] Maynard Smith, J., Smith, N. H., O’Rourke, M. and Spratt, B. (1993). How clonal are bacteria? PNAS 90 (10) 4384–4388. [17] McVean, G., Awadalla, P. and Fearnhead, P. (2002). A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 2002 1231–1241. [18] Milkman, R. and Bridges, M. M. (1990). Molecular Evolution of the Escherichia coli Chromosome. III. Clonal Frames. Genetics 126 505–517. [19] Nakabachi, Atsushi, Yamashita, Atsushi, Toh, Hidehiro, Ishikawa, Hajime, Dunbar, Helen E., Moran, Nancy A. and Hattori, Masahira (2006). The 160-kilobase genome of the bacterial endosymbiont Carsonella. Science 314 267. [20] Neuhauser, C. and Krone, S. M. (1997). The genealogy of samples in models with selection. Genetics 145 519–534. ¨ er, C., Reichenbach, H., Gerth, K. and [21] Pradella, S., Hans, A., Spro Beyer, S. (2002). Characterisation, genome size and genetic manipulation of the myxobacterium Sorangium cellulosum So ce56. Arch. Microbiol. 178 484– 492. [22] Preater, J. (1997). M/M/∞ transience revisited. Journal of Applied Probability 34 1061–1067. [23] Roijers, F., Mandjes, M. and van den Berg, H. (2007). Analysis of congestion periods of an M/M/∞-queue. Performance Evaluation 64 737–754. [24] Suchard, M. A., Weiss, R. E., Dorman, K. S. and Sinsheimer, J. S. (2003). Inferring spatial phylogenetic variation along nucleotide sequences: A multiple change-point model. Journal of the American Statistical Association 98 427– 437. [25] Wiuf, C. and Hein, J. (2000). The coalescent with gene conversion. Genetics 155 451–462.