The time costs depend of course on the list implementation. Only a few results ... The paper is organized as follows: in Section 2, we summarize basic notations.
Theoretical Computer North-Holland
Science
99
53 (1987) 99-124
RANDOM WALKS, LIST STRUCTURES
GAUSSIAN
PROCESSES
AND
G. LOUCHARD Laboratoire d' Informatique
Abstract. processes.
An asymptotic analysis of list structure properties leads to limiting Gaussian Several costs functions are shown to have asymptotic normal distributions.
Markovian
1. Introduction
List structures are well-known objects in Computer Science, let us mention: dictionaries, priority queues, symbol tables, linear lists, stacks, etc. (see [7, Ch. IV] for detailed description). We will consider here lists of length 2n, i.e. initially of size 0 and returning to size 0 at step 2n, on which some operations are performed, such as: insertions, deletions, successful queries, and unsuccessful queries. Let us call N2,, the total number of such lists, of any type, with all possible operations. We define a probability measure on lists by assigning to each history (sequence of values of the list and operations performed) the probability l/N*“. Some cost functions can be defined on each history: storage costs (total integrated size on [0,2n]) and time costs (total costs related to the operations performed). The time costs depend of course on the list implementation. Only a few results are available on probability distributions of these variables: the stack storage is analyzed in [14-161, where it is shown to be asymptotically (n +a) equivalent to a Brownian excursion area. Flajolet, Puech and Vuillemin [9] obtain exact mean and variance of storage and time costs for dictionaries and priority queues. The purpose
of the present
paper
is to develop
general
techniques
to derive asymptotic
distributions of list structure cost functions. We actually obtain limiting Gaussian Markovian processes for the histories and normal variables for linear and polynomial cost functions. The paper is organized as follows: in Section 2, we summarize basic notations. Section 3 describes some classical list structures and known results. Section 4 is devoted to the simplest list, the linear one, which we treat in some detail. In Section 5 we analyze priority queues and in Section 6 dictionaries. Section 7 concludes the paper. A preliminary presentation of our results is included in [17]. 0304.3975/87/$3.50
fi 1987. Elsevier
Science
Publishers
B.V. (North-Holland)
100
G.
Louchara
Basic notations The following
notations
will be used throughout
the paper.
2n := size of the structure,
LL := linear
list,
PQ := priority queue, D := dictionary, - := asymptotic to, for n + Co, -3 := converges to, for n + 00, +:= weak convergence of random functions in the space of all right continuous functions having left limits and endowed with the Skorohod metric (see [2, Chapter
IIII), M := mean
of some random variable (RV), of some RV, pk := kth moment of some RV, Y, p, Y-, ?:= classical random walks, V:= variance
Y* := weighted random walk, E,(B) := expectation of event B for a random walk starting from a at time 0, E*(B) := expectation of event B for a weighted random walk, g(B) := expectation of event B for a product of rectangular RV, GF := generating function, X( M, V) := the normal (or Gaussian) RV, BM := the classical Brownian motion (see [12] for a good introduction and [ 131 for some complexity applications), BE:= the standard Brownian excursion (see [3]), X( . ) := Markovian, Gaussian process with zero mean, Ezn :=the 2nth Euler number (or secant number) with exponential GF: Cr=‘=,E2,z2”/(2n)!
= set z (see [7, p. 1421)
E 2n - 4n+‘(2n)!/7r2n+1 (see [l, eqs. (4.3.69)
(1)
and (23.1.15)]),
n?:=1.3.5**.(2n-1) n?-fiem”(2n)“,
(2)
C2, := the nth Catalan c2n’
number
c>I 2n n
(n+l)
with GF: Cz=p=,C2,,zn = (l-m)/22
(see [6, p. 1351)
CZn - 4”/(J;;n3’2), A
l
M2, := the number
of paths (see Section
(3) 3) of length 2n with upward,
downward
101
List structures
and two types of level steps, with GF: Cy=‘=,6,~’ -p
(see
this is a generalization . {x} :==x - lx] ) l iirv := independent
= (1-2~
-J1-42)/22*
appendix);
of classical
identically
(4) Motzkin
distributed
numbers. random
variables.
3. Some list structures This section summarizes the main properties of the structures we will analyze in the sequel. (i) Following Flajolet et al. [9], we define a schema (or path) as a word 0 := o,02... 02, E {I, 0, C.?‘, o-}* such that for all j, 1
ifu < 1,
2
ifu = 1.
2-(“E+l) i Proof.
cp(n.5 u),
Firstly assume that u < 1. From the reflection
principle
[4, p. 721 we conclude
that, for m+co, &[FEdu]--[,(,,u-i)-q(m,u+i)]du
-2a,cp(m, m
2 =-[m
u) du
U
--+$m
l-u2
1+u log -l_u
I
p(m,u)du
-log By the random walk symmetry, we deduce the first part of (11). Between ne and n(2- a), ? and Y- are equivalent in probability. The coefficient &(2n)“’ is a normalizing factor (see [3, eq. (4.4)] for detailed justification). When u = 1, the discrete
probability related to Y(m) leads to 2-“’ but Y(m) and m do have the hence the factor + in the asymptotic density. Cl
same parity,
4.3. Weighted random walk Y* The weight defined tentatively set AX,(v)=
by (6) and Table
Y*([nv])-ny(v),
1 must now be taken
v E [O,
21,
into account.
Let us
(12)
where y( .) is a (deterministic) continuous nonnegative symmetric function (with y(0) = y(2) = 0) and X, ( . ) a random process with asymptotic zero mean (coefficients
Louchard
106
will be justified
in the sequel).
Moreover
as Y*( 1) = Y”(2n - 1) = 1 and Y* s n, we
must have limnY(i) The constraints (CY)
=limnY(2-i)
nY(.)sn.
we put on y( .) in the sequel will be summarized
YEC’, Y(‘)G
=I,
by y-conditions
Y(0) = Y(2) = 0, Y’(0) = -y’(2)
1,
= 1.
According to (6), we put on each trajectory of type (12) a total measure which is the product of probability measures as defined by (11) and the weight W = nyrr’ Y*(i). We firstly must find Y( .); then establish the stochastic properties of X,,( . ) and justify (12). Let us firstly look for an asymptotic formula for W along nY( .). We have: Lemma 4.3. Let y( . ) satisfy
(Cy).
Let
WI=
ev[ ‘F:lw(ny( f))]=
Wj,
k):=exp[~~~ log( v(i))].
(my)
exp(z)
and
Then 2
I
lwMu))d~+2
0
2--l/n
+
I
Y’(U) dv &({n~lJp
r/n
Y(O)
(13)
I’ My(v))
-j[log(Y($))
dv
-IW(Y($)]
+
where B, is the first Bernoulli polynomial. By convention, we say that the ‘convergence terms of (13) and (14) converge.
(14)
condition’
(CC)
is satisjied if the last
107
List structures
Proof.
The second term + 0 by (Cy). Letf(x) summation
formula, (2n -2)
:= log(y(x/n)).
The first term leads, by Euler’s
to
log n
+J,2n-‘log(y(;))
dx
(term (4)
(term @)I -:[log(Y(y)) -lW(Y(f))]
J
*n-1
+
~,({x~Y’(x)
(term (c)).
dx.
L
Term (b)+O
by (Cy);
term (a) is equal to
2bdy(~)) =nJ 0 dv -n J0 BY
My(v))
dv - n
J 2-1/n
h(y(~))
dv.
(CY), I/n
-n J0 Expression
log(y(v))dv-logn+l.
(13) is now straightforward
4.4. Determination This problem
and (14) is similarly
proved.
q
of y(.) is solved
in Theorem
4.6 at the end of this subsection.
4.4.1. Asymptotic probability measure To obtain y( . ), we use Laplace’s asymptotic method on functional space. Let E, 6 be small positive constants. Along any trajectory of type (12), for i E [2ne, 2n( I- S)], it is clear that, asymptotically, YP (see (10)) and Y are identically distributed (the hitting probability of the zero boundary is exponentially small). Let us assume that y satisfies (Cy) (this will have to be checked later on). When we let F, S + 0, it follows from (11) that the total probability measure on (12) is
108
G. Louchard
asymptoticahy (n + 00) the same measure as the limiting (7) for i E [2ns, 2n( 1 - S)], multiplied by
one (F, 6 + 0) deduced
$&(2n)3/22P”(‘+fi’_
(15)
We now define x1, x2 E [E, 2 - S], x2=x1 + A (with A > O), y, = y(xl), transition
probability
we neglect
p(u)) (I_
from
from [ nx, , ny,] to [ nx2, n(y2+dy2)],
y, = y(x2). The
as given by (7), leads (if
to
u2)-nN2( !z-“““‘2,2 “I;,YTu2)
(16)
rr
with u = n(y,-y,)/nA. To apply the Laplace method, this density, which is given by
we must obtain
the dominant
term in the log of
,
log(l-u2)+ulog
Now as A + 0 (in some sense to be precised later on), u + y’(xr). The dominant term in the log of the total asymptotic probability measure along ny( .) is now easily derived as -in
log(l -W(4)‘)
+y’(u)
log(s)]
du.
(17)
Three problems remain to be solved: (1) We must take into account, in (16), the factor &,‘J2rrA(l
-u’).
(18)
As we will see in Section 4.5, this is related to the asymptotic distribution of X,( . ). (2) The factor 2 mm’in (15) must be considered. It can easily be checked that, if the function y( . ) satisfies the expansion condition (CE)
y’(u)=l+Ku’+o(v’),
jai,
then
The integral in (17) can thus be extended to [0,2]. (3) We must check that we can neglect the contribution from p(u) in (7). This can be done by carefully coupling A with n. To ease the analysis, let F = 6 = A = 2/k say. By (S), we must add to (17) a term which is
-$&J [Y' (; )I .
List structures
By Euler’s
summation
formula,
10s
this is equivalent
to
(term (a)) (term @)I
-~[P[Y’(~-A)I-P[v’(A)II k-l
(term (c)). 1 Let us assume
that (CE) is satisfied.
p[y’(v)l =$+o
(
5
>
Then, by (8), for some K.
Term (a) becomes:
P[Y’(~)I dv and term (a) + 0 if nA’+’
+ ~0. n-m
Term (b) -&-
if nA’ + 00.
1 if -+O. ,.,Ai+’
Term (c)+0, The contribution nA j+’ + ccl.
0,
from
p( *) can thus be neglected
Note that, with this last condition, computed on Summarizing Lemma
4.4.
[E,
2. our results, v],
4.4.2.
n + 00, A + 0 with
term (b) +O, even if the measure
in (17) is
v