Deletions in Random Binary Search Trees: A Story of Errors Wolfgang Panny Applied Computer Science, WU-Wien, Augasse 2–6, A-1190 Vienna, Austria
Abstract The usual assumptions for the average case analysis of binary search trees (BSTs) are random insertions and random deletions. If a BST is built by n random insertions the expected number of key comparisons necessary to access a node is 2 ln n + O(1). This well-known result is already contained in the first papers on such ‘random’ BSTs. However, if random insertions are intermixed with random deletions the analysis of the resulting BST seems to become more intricate. At least this is the impression one gets from the related publications since 1962, and it is quite appropriate to speak of a story of errors in this context, as will be seen in the present survey paper, giving an overview on this story. Key words: random binary search tree, analysis of algorithms, deletions
1
Introduction and prerequisites
Binary search trees (BSTs) are among the most prominent and commonly used data structures for symbol table algorithms [Knu98, 426]. The usual search- and insertion algorithms for BSTs (cf. e.g., [Knu98, 429]) are quite efficient for this purpose. BSTs are especially suitable for applications where (apart from accessing and dynamically inserting the symbols into the table) it is also required to linearly process them according to their sort sequence, e.g., to print a sorted list of the symbols, say. The first publications on BSTs are due to P. F. Windley [Win60], A. D. Booth and A. J. T. Colin [BC60], and T. N. Hibbard [Hib62]. Each of these papers comprises among other things a ⋆ A preliminary german version of this paper has appeared in [GST03, 75-88]. However, the present version has been enlarged and reworked to shed more light on the technical aspects. Email address:
[email protected] (Wolfgang Panny).
Preprint submitted to Journal of Statistical Planning and Inference19 November 2008
description of binary tree insertion and the expected number of key comparisons thereby incurred. 1 Hibbard first showed how to realize deletions from a BST in a reasonable way [Hib62], thereby considerably extending the application range of BSTs. The BST structure is not restricted to access the nodes ‘by key’. They alternatively can be retrieved ‘by rank’, which only requires straight-forward modifications. In this way BSTs can be extended to a versatile and efficient data organisation for the representation and manipulation of linear lists [Knu98, 471–475]. Other refinements aim at protecting against the O(n) worst case behavior of the original structure. This is achieved by imposing additional constraints on the shape of the trees to insure an access time of O(log n) even in the worst case. The most prominent species of such balanced search trees is due to G . M. Adelson-Velsky and E. M. Landis [AVL62], who devised their scheme as early as 1962. Essentially the same purpose can also be achieved by randomization. To this approach belong Seidel and Aragon’s “treaps” [SA96], which in fact are a reinvention of Vuillemin’s “cartesian trees” [Vui80]. Other proponents of this approach are Martinez and Roura with their “randomized binary search trees” [MR98]. In this paper only the original BST structure with access functions search, insert and delete will be considered. The usual (and reasonable) assumption for the average case analysis of binary search trees are random insertions. In a BST built by n random insertions the expected number of key comparisons necessary to access a node is 2 ln n+O(1), which is a well-known result already contained in the first papers on BSTs [Win60], [BC60], [Hib62]. However, if random insertions are intermixed with random deletions the analysis of the resulting BST seems to become much more intricate and involved. At least this is the impression one gets from the publications on the subject since 1962, and it is quite appropriate to speak of a story of errors in this context. In this survey we shall take a closer look at this story, which will be done in section 2. In the remainder of the present section some conceptual and notational prerequisites are compiled.
1.1 Binary search trees and access functions
A binary tree can be defined as a finite set of nodes that is either empty, or consists of a root and the elements of two disjoint binary trees called the left 1
Additionally, Windley’s paper contains a comprehensive discussion of tree insertion sorting, Booth and Colin consider the effect of arranging the first 2i −1 elements to form a complete binary tree. As predecessors one should also mention A. I. Dumey [Knu98, 453], D. J. Wheeler and C. M. Berners-Lee [Dou59, 5], [Win60, 84].
2
and right subtrees of the root [Knu97, 312].
It is clear from this definition that a non-empty binary tree always has at least one empty subtree. Sometimes it is convenient to consider these empty subtrees as additional (special) nodes, referred to as external nodes. A binary tree with explicit external nodes is called an extended binary tree [Knu97, 399]. In this case the “ordinary” nodes are called internal nodes. A binary tree with n (internal) nodes has n + 1 external nodes. In an extended binary tree the external nodes coincide with the leaves.
Fig. 1. Binary tree and corresponding extended binary tree
In a binary search tree (BST) a key is attached to every node such that the following condition holds: For every node the keys in its left (right) subtree are smaller (greater) than the key attached to the node. 4 3 1
8 9
5 6
Fig. 2. Binary search tree
Implementation of the search operation follows immediately from the above search condition. If the BST contains a matching key, the search is successful. Otherwise it is an unsuccessful search. Implementation of the insertion operation is straight forward, too: The actual insertion is preceded by a search for the key to be inserted. Since multiple keys are not allowed, the search should be unsuccessful. Hence it eventually terminates at an empty subtree. As final step the corresponding external node is replaced by the new node (with empty left and right subtrees). The delete operation is a bit more complicated. To describe it more easily 3
the following notation will be used: Let T denote a BST, and let v ∈ T be a node of T . Then the left and right subtree of v will be denoted by L(v) and R(v). The key of node v is denoted by k(v) and the references to the root nodes of L(v), R(v) are dentoted by ℓ(v) and r(v), respectively. The father of v is symbolized by f (v) (to simplify matters it will be assumed that also the root of T has a father w such that L(w) = T ). The actual deletion must be preceded by a search to localize the node to be deleted. This search must be successful. Let v denote the node to be deleted. For Hibbard’s deletion method [Hib62, 24] the following two cases must be considered: a ) R(v) = ∅: Then replace the reference to v in f (v) with ℓ(v). b ) R(v) = 6 ∅: Then delete the node vmin with minimal key 2 from R(v) and replace k(v) with k(vmin ). Knuth suggested to separately deal with the case R(v) 6= ∅ and L(v) = ∅ which results in the following modified deletion method 3 : a ) R(v) = ∅: Then replace the reference to v in f (v) with ℓ(v). a’) R(v) = 6 ∅ and L(v) = ∅: Then replace the reference to v in f (v) with r(v). b’) R(v) = 6 ∅ and L(v) 6= ∅: Then delete the node vmin with minimal key from R(v) and replace k(v) with k(vmin ). Accordingly, both methods behave differently only when the right subtree is empty and the left subtree is not.
1.2 Random insertions, random deletions and random binary search trees Regarding random insertions we often encounter the assumption that the n keys are independent of each other and uniformly distributed in [0, 1]. In fact every continuous distribution can be taken as well, since it is only required that each permutation of the keys has the same probability to occur as input sequence [Knu77]. Therefore it suffices to take random permutations of {1, 2, . . . , n} as input sequences, wich is often done for analytical purposes in order to simplify matters. In this context H. M. Mahmoud [MN03, 254] quite appropriately coined the phrase of the ‘random permutation model of randomness’.
/(n + 1) different shapes. If a BST is built A BST with n nodes can take 2n n by n random insertions (which will be symbolized by I n ), each single shape 2
The deletion of vmin can be done analogous to case a), since we necessarily have L(vmin ) = ∅. Hence we only have to replace the reference to vmin in f (vmin ) by r(vmin ). 3 [Knu97, 435] refers to the corresponding new step in Algorithm D as ‘step D1 1 ’. 2
4
has a well-defined probability. Hence a distribution Fn of shapes results. If a further random insertion is applied to a BST built by I n , the distribution of shapes is Fn+1 . Binary search trees whith a distribution of shapes coinciding with Fn are called random binary search trees. A:
I:
II:
B:
III:
IV:
V:
Fig. 3. Shapes of BSTs with n = 2 and n = 3 nodes
Example: Figure 3 shows all shapes for n = 2 and n = 3. For random binary search trees the probabilities for these shapes are p2 = (1/2, 1/2) and p3 = (1/6, 1/6, 1/3, 1/6, 1/6), respectively. A deletion is a random deletion if each key present in the BST is equally likely to be deleted (clearly the corresponding probability is 1/n if the BST has n nodes). Sometimes a sequence of random insertions and random deletions is considered. Such a mixed sequence can be represented by a word w ∈ {I, D}∗ . For instance IIIDI represents three random insertions followed by a random deletion followed by one more random insertion. It goes without saying that the Ds in w must not outnumber the Is and this condition must hold for each prefix P (w) of w, too. 1.3 Search efficiency of BSTs Let T be a BST and let v ∈ T be a node of T . The distance of v from the root of T (measured by the number of arcs on the corresponding path) is called the level d(v) of v [Knu97, 308]. The quantity I(T ) =
n X
d(vi )
(1)
i=1
is called the internal path length of T , where summation extends over all (internal) nodes v1 , v2 , . . ., vn . Correspondingly an external path length E(T ) can be defined, where now summation extends over the n + 1 external nodes. It is not hard to see that both quantities are connected by the relation E(T ) = I(T ) + 2n.
(2)
An important quantity to characterize the search efficiency of a BST is the mean number of comparisons C(T ) in a successful search. Since the level d(v) 5
of a node v is exactly one less than the number of comparisons necessary to locate it, one gets C(T ) =
I(T ) + 1. n
(3)
On the other hand the number of comparisons in an unsuccessful search corresponds with the level of the external node at which the search terminates. Consequently the mean number of comparisons C ′ (T ) in an unsuccessful search equals C ′ (T ) =
E(T ) . n+1
(4)
Applying (2) and (3) to the last equation yields the relation
C(T ) = 1 +
1 n
C ′ (T ) − 1.
(5)
Clearly the quantities I(T ), E(T ), C(T ) und C ′ (T ) are equivalent: if the value of one of them is known, the values of the other three are determined, too. Another quantity helpful to characterize the worst case behavior is the height D(T ) = 1 + max {d(vi ) | i = 1, 2, . . . , n} .
(6)
The height of T equals the maximal level of an external node [Knu98, 459] and corresponds with the number of comparisons in the worst case (for both sucessful and unsucessful searches). Example: Let T denote a BST with the same shape as the binary tree of Figure 1. Then we have I(T ) = 11, E(T ) = 25, C(T ) = 18/7, C ′ (T ) = 25/8, and D(T ) = 4. To get a better understanding of the behavior of random BSTs, the expected values of the above quantities for BSTs generated by n random insertions are of particular interest. Let the expected values of (1)–(4) and (6) be denoted by In , En , Cn , Cn′ and Dn , respectively. Using the harmonic numbers Hn = 1 + 1/2 + 1/3 + . . . + 1/n, the first four quantitites can be expressed in the following way (cf. e.g., [Knu98, 431]): In = 2(n + 1)Hn − 4n = 2n ln n + O(n) En = 2(n + 1)Hn − 2n = 2n ln n + O(n) 6
(7) (8)
1 Cn = 2 1 + Hn − 3 = 2 ln n + O(1) n Cn′ = 2Hn+1 − 2 = 2 ln n + O(1)
(9) (10)
Example: For n = 2, 3 one obtains I2 = 1, E2 = 5, C2 = 3/2, C2′ = 5/3; I3 = 8/3, E3 = 26/3, C3 = 17/9, C3′ = 13/6. One of the easiest ways to derive equations (7)–(10) seems to be the following one: Let ve denote an external node of T . Let us assume that ve shall be replaced by an additional internal node, where the resulting BST is symbolized by T ′ . It is not hard to see that the effect of such an insertion on the external path length can be expressed by E(T ′ ) = E(T ) + d(ve ) + 2, where d(ve ) is the level of ve in T . If an additional node is randomly inserted into a random BST of size n, each of the n + 1 external nodes has the same chance to be replaced by the node to be inserted. Since the expected level of an external node is En /(n + 1), the expected net increase of En is En /(n + 1) + 2. Hence En+1 = En + En /(n + 1) + 2 =
n+2 En + 2, n+1
which furnishes
En 2 En+1 = + . n+2 n+1 n+2 Applying identity (4) and decrementing n yields the well-known recurrence relation ′ Cn′ = Cn−1 +
2 . n+1
(11)
Taking into account that C0′ = 0 one immediately gets the solution Cn′
1 1 1 =2 , + + ...+ 2 3 n+1
(12)
which checks with equation (10) and equations (7)–(9) follow. As to the expected height of a random BST only the asymptotic equivalent Dn = α ln n −
3α ln ln n + O(1) 2(α − 1)
(13)
is known, where α = 4.3110704 . . . is the largest (real) solution of (2e/α)α = e [Ree03]. 7
2
The story of errors
2.1 Hibbard’s Theorem and an incorrect generalization
As already mentioned Hibbard was the first author who also devised a delete operation for BSTs which (together with the search and insert operation) has been specified as an ALGOL 60 procedure in [Hib62]. Incidentally, Hibbard motivates the BST structure as a compromise to achieve a reasonable balance between search and update efficiency: Whereas an ordered table has optimal search efficiency (average and worst case time complexity of O(log n)), its update efficiency is rather poor (average and worst case time complexity of O(n)). On the other hand a linked list has optimal update efficiency (because an insertion or deletion only needs a fixed time amount), but now search efficiency degrades pretty much (average and worst case time complexity of O(n)). The trade-off brought about by the BST structure consists in offering, under random conditions, an expected search and insertion time both of O(log n). Hibbard introduces the usual (and reasonable) assumption for the average case analysis of BSTs, namely that each permutation of keys to be inserted has the same probability to occur. 4 Adopting this random permutation model of randomness (cf. section 1.2), Hibbard in his Theorem 1 gives the results (9) and (10) for Cn and Cn′ , where he refers to these quantities as mean internal search length l′ (n) and mean open search length l(n), respectively. Hibbard also deals with the question whether random deletions (as defined in section 1.2) affect the randomness of BSTs. The answer, somewhat surprisingly, is No. The corresponding result referred to as Hibbard’s Theorem has been restated by Knuth [Knu98, 432] in the following way: H: After a random element is deleted from a random tree by algorithm D, the resulting tree is still random. 5 To properly understand this theorem one should keep in mind that the statement ‘a BST is random’ is a statement about the distribution of its shapes (cf. section 1.2). Thus given a BST T randomly constructed by I n+1 D, Hibbard’s Theorem claims that T has he same distribution of shapes as a BST randomly constructed by I n . 4 5
Hibbard speaks of randomly constructed (binary) search trees in this context. Algorithm D is Hibbard’s deletion algorithm (cf. section 1.1), of course.
8
Theorem H has been shown by Hibbard [Hib62] and, a bit more palatably, by Knuth [Knu77], [Knu98]. Therefore the following presentation is oriented towards the exposition adopted by Knuth. To simplify matters advantage is taken of the random permutation model of randomness, where it suffices to deal with random permutations of {1, 2, . . . , n} as input sequences, each permutation being equally likely (cf. section 1.2). We therefore consider the set Π of all permutations of Nn = {1, 2, . . . , n} and the set Σ of all permutations of Nn+1 = {1, 2, . . . , n, n + 1}. Let π ∈ Π and let S(π) denote the shape of the BST T (π) resulting from π. Then the probability P (Si ) of shape Si of a BST randomly constructed by I n can be given essentially by counting the permutations resulting in shape Si (it 2n has already been mentioned that there are n /(n+1) such shapes altogether): P (Si ) =
|{π | π ∈ Π, S(π) = Si }| . n!
(14)
Regarding the probability Q(Si ) of shape Si of a BST randomly constructed by I n+1 D, we have to consider histories (σ, d), σ ∈ Σ, d ∈ Nn+1 , where σ specifies a BST of size n + 1 and d is the element to be deleted. As before S(σ, d) denotes the shape of the BST emerging from history (σ, d). It follows from the definition of random insertions and deletions that all such histories are equally likely. Thus the probability Q(Si ) can be given essentially by counting the histories leading to shape Si : Q(Si ) =
|{(σ, d) | σ ∈ Σ, d ∈ Nn+1 , S(σ, d) = Si }| . (n + 1)! (n + 1)
(15)
Hibbard’s Theorem states that Q(Si ) = P (Si ) for all shapes Si . In order to show this a surjective mapping f : Σ × Nn+1 → Π, (σ, d) 7→ π will be constructed by which each history maps into a permutation such that S(f (σ, d)) = S(σ, d). This means that the shape resulting from permutation f (σ, d) is identical with the shape resulting from history (σ, d). If it is possible to define mapping f in such a way that |f −1 (π)| = (n + 1)2 for each π ∈ Π then we are done since this has the following consequence: Whenever there are m (say) permutations contributing to the numerator of (14), there are exactly m(n + 1)2 histories contributing to the numerator of (15), which entails Q(Si ) = P (Si) for all shapes and Hibbard’s Theorem follows. But such a mapping f exists and can be specified in the following way. Let σ = (s1 , s1 , . . . , sn+1 ), d ∈ Nn+1 , and let sj = d and sk = d + 1 if d < n + 1. Furthermore, let s′i
=
si , if si ≤ d; si − 1, if si > d;
and
ℓ=
9
j, if d = n + 1; max{j, k}, otherwise.
Then f defined by f (σ, d) = (s′1 , s′2 , . . . , s′ℓ−1 , s′ℓ+1 , . . . , s′n+1 ) has the required properties.
As a straightforward consequence of Hibbard’s Theorem we are led to the following generalization: H′ : After m2 random elements are deleted from a random tree of size m1 by Hibbard’s deletion algorithm, the resulting tree is still random (m1 > m2 ). This means that the distribution of shapes of a BST randomly constructed by I m1 D m2 is equal to the distribution resulting from I m1 −m2 . Generalization H′ is completely decent (cf. also [Knu77, 355]). Hence the following further generalization H′′ also seems self-evident, at least intuitively, and it is almost shocking to learn that H′′ (which initiates the ‘story of errors’) is incorrect: H′′ In general, therefore, we assert that, starting with an empty list and performing m1 random insertions and m2 random deletions in any order, if m1 − m2 = n then Cn′
1 1 1 =2 + + ...+ 2 3 n+1
and
Cn =
n+1 ′ Cn − 1 ≈ Cn′ − 1. n
The above formulation checks with the original wording. Only the symbols t and t′ , used by Hibbard have been replaced by Cn′ and Cn , respectively. Hibbard speaks of an empty list, where one would expect an empty (binary search) tree. This is due to the fact that Hibbard refers to his implementation of the BST structure as doubly linked list because of the two pointers to the left and right subtree. As to Hibbard’s formulation ‘m1 random insertions and m2 random deletions in any order’ it goes without saying that the number of deletions must not be greater than the number of insertions at any point of time. After all, the incorrect generalization H′′ only was a consequence of the “fact”, taken for granted by Hibbard that every (feasible) sequence of random insertions arbitrarily mixed with random deletions results in a random BST. This is not true indeed, but it was not easy to become aware of this error regarding the true assertions that BSTs randomly constructed by I n D or I m1 D m2 are really random. 10
2.2 Knott discovers Hibbard’s mistake, Knott’s conjecture
After Hibbard’s paper [Hib62] had appeared in 1962 the whole scientific community has overlooked Hibbard’s incorrect generalization H′′ and cherished the illusion that each possible sequence of random insertions and deletions results in a random BST. Even D. E. Knuth, at the time of writing the first edition of [Knu73a], was not aware of this error, 6 as can be seen from the following passage [Knu73a, 429]: K.1: Since Algorithm D 7 is quite unsymmetrical between left and right, it stands to reason that a long sequence of random deletions and insertions will make the tree get way out of balance, so that the efficiency estimates we have made will be invalid. But actually the trees do not degenerate at all! The reason why ‘the trees do not degenerate at all’ (unfortunately it turned out that this is not true), lies in Knuth’s believing at the time of writing [Knu73a] that a random BST is not affected by mixed update sequences as long as they are random. It has already been pointed out in section 1.1 that a modification of Hibbard’s deletion method has been suggested by Knuth. It has been proved in [Knu73a, ex.6.2.2-14] that Knuth’s modification in fact improves Hibbard’s method in the following sense: Let T denote an arbitrary BST and let TH′ and TK′ denote the BST resulting from deletion of node v by Hibbard’s and Knuth’s deletion method, respectively. Then regarding the internal path length (cf. 1.3) we always have I(TK′ ) ≤ I(TH′ ). Actually there are many instances, where the strict inequality I(TK′ ) < I(TH′ ) holds. Hence with respect to search efficiency of the resulting BST, Knuth’s method is superior to Hibbard’s. These circumstances result in another, far too optimistic comment, which again has to be understood as an implication of the illegal generalization H′′ [Knu73a, 431f]: K.2: Exercise 14 shows that Algorithm D with this extra step 8 always leaves a tree that is at least as good as the original Algorithm D, in the path-length sense, and sometimes the result is even better. Thus, a sequence of insertions and deletions using this modification of algorithm D will result in trees which are actually better than the theory of random trees would predict: the average computation time for search and insertion will tend to decrease as time goes on. 6
However, Knuth has also contributed a great deal to discovering Hibbard’s error, as will be seen in the sequel of the present section. 7 Algorithm D specifies Hibbard’s deletion method (cf. section 1.1). 8 This is step D1 1 of Knuth’s modification. 2
11
According to Knuth it was Gary D. Knott, who first discovered the inadmissibility of Hibbard’s generalization H′′ . Knott was Ph.D. student at Stanford. The title of his thesis reads Deletions in Binary Storage Trees (vgl. [Kno75]). Knuth himself was the advisor of this thesis, which has been completed in 1975. Knott’s discovery dates back to 1972, when he happened to detect Hibbard’s fallacy in the course of preparatory studies for his thesis (cf. [Knu73b, 431], [Knu98, S.435]). But it has not been published sooner than 1975, when it appeared within the framework of his thesis [Kno75, 35] and has also been delt with in [Knu73b, 2nd printing (1975), 431], [Knu77, 353], and [JK78, 302] — always emphasizing Knott’s authorship. For instance [Knu73b, 2nd printing (1975), 431] contains the following passage: K.3a: Although Theorem H 9 is rigorously true, in the precise form we have stated it, it cannot be applied, as we might expect, to a sequence of deletions followed by insertions. The shape of the tree is random after deletions, but the relative distribution of values in a given tree shape may change, and it turns out that the first random insertion after deletion actually destroys the randomness property on the shapes. This startling fact, first observed by Gary Knott in 1972, must be seen to be believed. In [Knu77, 353] Knuth frankly admits that even he had been taken in by Hibbard’s mistake: K.3b: The I ∗ Dr property 10 might seem to be all that one needs to guarantee insensitivity to any number of deletions, when they are intermixed with insertions in any order. At least, many people (including the present author when writing the first edition of [Knu73a]) believed this, and the subtle fallacy in this reasoning was apparently first pointed out by G. D. Knott in his thesis [Kno75]. A similar passage can be found in [JK78, 302]: K.3c: However, Knott also discovered a surprising paradox: Although Hibbard’s theorem establishes that n + 1 random insertions followed by a random deletion produce a tree whose shape has the distribution of n random insertions, it does not follow that a subsequent random insertion yields a tree whose shape has the distribution of n+1 random insertions! For ten years it had been believed that Hibbard’s theorem proved the stability of the algorithms under repeated insertions and deletions (cf. [Hib62, p.25], [Knu73a, first printing, pp 9
Hibbard’s Theorem, i.e. Theorem H of section 2.1. In [Knu77] several kinds of deletion disciplines are studied, where random deletions as defined in section 1.2 of the present paper are denoted by the symbol Dr . Accordingly, the I ∗ Dr notation corresponds to our I n D notation, meaning several random insertions followed by a single random deletion. 10
12
429–432]; the discovery of a subtle fallacy in this reasoning therefore came as a shock. Hibbard’s generalization H′′ can conveniently by disproved by a simple counter-example. Such a counter-example has been given in [Knu77, 353], [Knu73b, 2nd printing (1975), ex. 6.2.2-15] and in [JK78, 302f]. For that purpose the distribution of shapes of a BST randomly constructed by IIIDI is calculated. If the resulting distribution differs from the corresponding distribution for a random BST of size 3, Hibbard’s generalization has been shown to be invalid. As in the proof of Hibbard’s Theorem H the probabilty for each shape can be computed essentially by counting the pertaining histories, since by definition of random insertions and deletions each such history is equally likely. Each such history can be characterized by a tuple (i1 , i2 , i3 , d, i4 ), where (i1 , i2 , i3 , i4 ) represents a permutation of {1, 2, 3, 4} and d ∈ {1, 2, 3} indicates the node subject to deletion (depending on the value of d, the node with the smallest, middle, or largest key is deleted). By way of example, the tuple (2, 4, 3, 1, 1) represents the following history: First the keys 2, 4, 3 are inserted, then key 2 is deleted, eventually key 1 is inserted, leading to a BST of shape III, as shown in Figure 3. There are 4! × 3 = 72 histories altogether and 11, 13, 25, 11, and 12 of them result in a BST of shape I, II, III, IV, and V, respectively. 11 Hence the shapes of BSTs randomly constructed by IIIDI occur with probabilities p3,1 = (11/72, 13/72, 25/72, 11/72, 12/72) , which are different from the corresponding probabilities for BSTs randomly constructed by III (or random BSTs, for short) p3 = (1/6, 1/6, 1/3, 1/6, 1/6) , as already given in section 1.2 and this proves the invalidity of Hibbard’s generalization H′′ . Since we know from Hibbard’s Theorem H that the BSB randomly constructed by IIID is random indeed (from the corresponding 18 histories there are 9 leading to shape A and 9 leading to B) it must have been insertion i4 after deletion d which destroyed randomness. However, something is wrong with the trees resulting from deletion d, namely that the sets of the two remaining keys (i.e. the sets {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}) are no longer independent of the shapes. If they were independent, each of them would have the same probability 1/6 to occur under both shapes. But they actually occur with probabilities 6/36, 7/36, 8/36, 5/36, 6/36, 4/36 under shape A and 6/36, 5/36, 4/36, 7/36, 6/36, 8/36 under shape B. Hence, ‘the relative distribution of values in a given tree shape may change’ as stated in 11
The following histories, e.g., are contributing to shape I: (3, 2, 4, 2, 1), (3, 4, 2, 2, 1), (3, 2, 4, 3, 1), (3, 4, 2, 3, 1), (4, 1, 3, 1, 2), (4, 2, 3, 1, 1), (4, 2, 3, 2, 1), (4, 3, 1, 1, 2), (4, 3, 2, 1, 1), (4, 3, 2, 2, 1), (4, 3, 2, 3, 1).
13
citation K3.a and that is the reason why the next random insertion destroys randomness. The expected internal path length of a BST randomly constructed by IIIDI is 191/72 = 2.652777 . . . and this is better than the expected value under randomness which amounts to I3 = 8/3 = 2.666 . . . 12 This just hints at the next error in our history. Gary Knott in his thesis also gave some empirical data summarizing the results of simulation experiments, where BSTs randomly constructed by I n (ID)m have been monitored (m = 24 and n = 2, . . . , 9, 11, 14, 19, 49, 98). For each n 1600 BSTs have been generated both for Hibbard’s and Knuth’s deletion method. The empirical results strongly suggested that random deletions have a positive impact on the expected internal path length as compared to randomness and that this effect is even more pronounced for Knuth’s deletion method. Based on these observations Knott came to the following conjecture which in the formulation of [JK78, 304] reads: More precisely, Knott’s conjecture is this: Consider a pattern of m + k insertions and m deletions, in some order, where the number of deletions never exceeds the number of insertions. For example, one of the patterns with m = 4 and k = 4 is I I I D I I D I I I D D. To do each insertion, put a new random element into the tree, say a uniform random number between 0 and 1; to do each deletion, choose a random element uniformly from among those present. All of these random choices are to be independent. Then for each fixed pattern of I’s and D’s, the average path length of the resulting tree is conjectured to be at most equal to the average path length of the pattern consisting solely of k I’s. To get further evidence for Knott’s conjecture (apart from the empirical data contained in his thesis) Jonassen and Knuth in [JK78] were studying BSTs randomly constructed by I I I, I I I D I, I I I D I D I, . . . , I I I (D I)m , . . . where they are dealing with the probabilities p3,m for the shapes I,II,. . .,V resulting from such random sequences for general m (the special values for m = 0 and m = 1 have just been given in the present section). In particular they were able to show that a steady state exists and that the values of the
12
If an analogous enumeration of histories IIIDI based on Knuth’s deletion method is done one obtains the probabilities p′ 3,1 = (13/72, 14/72, 24/72, 10/72, 11/72) for the shapes. Surprisingly, the expected internal path length of 192/72 = 8/3 equals the random value in that case.
14
steady state probabilities p3,∞ = limm→∞ p3,m are p3,∞
12 18 10 12 8 2 5 4 − α+2β, 2α− β −3, 2α− β −1, −2α+ β, β − α = 3 3 5 5 3 5 5 3
= (0.150, 0.196, 0.353, 0.137, 0.164) , where α = eI0 (1) = 3.44152386 . . ., β = eI1 (1) = 1.53626172 . . ., and Iν (x) is the modified Bessel function of the first kind (the numerical values are rounded to 3 decimal places). The pertaining internal path length is 2.64749 . . . which again is better than the random value I3 = 8/3 = 2.666 . . . 13 Another interesting result is the observation that the sequence of expected internal path lengths I3,m , m = 0, 1, 2, . . ., is not monotonic, e.g., I3,0 = 2.66666, I3,1 = 2.65278, I3,2 = 2.64815, I3,3 = 2.64691, I3,4 = 2.64680, I3,5 = 2.64697, I3,6 = 2.64715, I3,7 = 2.64729, . . . It should be remarked that the problem at hand seems simpler than it actually is. This is also reflected by the title of the paper ‘A Trivial Algorithm Whose Analysis Isn’t’. Even though the size of the resulting trees is only as small as 3, the analysis involves Bessel functions (as just encountered above) and the solution of bivariate integral equations. 14 Jonassen and Knuth in their paper also study the distribution of shapes resulting from the same random sequences, but this time based on Knuth’s deletion method. The pertaining steady state probabilities p′ 3,∞ (again the numerical values are rounded to 3 decimal places) are √ √ √ √ p′3,∞ = 8 e − 13, 20 − 12 e, 1/3, 12 e − 59/3, 40/3 − 8 e
= (0.190, 0.215, 0.333, 0.118, 0.144) corresponding to a path length of exactly 8/3 which equals the random value ′ I3 . Regarding the expected internal path lengths I3,m , m = 0, 1, 2, . . ., it turns ′ out rather surprisingly that they do not vary with p 3,m but remain fixed to the 13
In a similar study R. A. Baeza-Yates [BY89] addressed the same problem for n = 4. Among other things he deals with the limiting probabilities p4,∞ . It turns out that the corresponding internal path length again is better than the random value I4 . 14 For instance, the solution of the following equation was crucial 1 2 − 2x + f∞ (x, y) + f∞ (x, y) = 3
Zx 0
f∞ (t, y) dt +
Zy x
f∞ (x, t) dt ,
where f∞ (x, y)/2 is the conditional steady state probability for shape A immediately after a deletion has been performed, given that the BST contains the keys x, y after deletion (0 ≤ x < y ≤ 1).
15
random value I3 for all m. This means that, for n = 3, Hibbard’s method involves a better internal path length than Knuth’s improved method. Jonassen and Knuth comment on that in this way: . . . the average internal path length actually turns out to be worse when we use the “improved” algorithm. On the other hand, Knott’s empirical data in [Kno75] indicate that the modified algorithm does indeed lead to an improvement when the trees are larger. On the basis of Knott’s empirical findings and the analytical results due to himself and Jonassen, Knuth continues remark K.3a [Knu73b, 431] in the following way: K.3a’: Empirical evidence suggests strongly that the path length tends to decrease after repeated deletions and insertions, so the departure from randomness seems to be in the right direction; a theoretical explanation for this behavior is still lacking. The status reached so far can be summarized by saying that it is entirely due to Knott to have detected Hibbard’s incorrect generalization H′′ . However, with his conjecture stating that departure from randomness is in the right direction, he was wrong himself.
2.3 Eppinger disproves Knott’s conjecture
To closer examine Knott’s conjecture J. L. Eppinger [Epp83] did extensive simulation runs. For convenience he confined himself to BSTs randomly constructed by sequences of the form I n (ID)m as Knott did. However, the BSTs studied by Eppinger are covering a much wider range of size than Knott’s trees. Eppinger considers sizes of n = 26 , 27 , . . . , 211 , whereas Knott’s sizes are ranging from 2 to 98 only. Also the number m of paired (ID)-updates observed by Eppinger is much larger. Whereas m always remains fixed to m = 24 in Knott’s setting, Eppinger embarks on a study of trees with an order of magnitude of m = 2n2 , which amounts to about 9 millions for n = 2 048. Eppinger now observes the behavior of the mean internal path length IPLn,m and how it develops for increasing values of m, where the mean value is based on sample sizes ranging from 750 to 6 800. For the largest generated BST the sample size was 5 840. Eppinger in his own words describes his findings in the following way: [Epp83, 667f]: Initially, IPLn,m decreases, as Knott observed. After some critical point, though, IPLn,m starts to increase, eventually levelling off after approximately n2 I/D pairs. Figure 7 is a comparison chart in which IPLn,i/In is plotted 16
as a function of i/n2 for each of the values of n tested. 15 (The latter ratio normalizes the x axis.) Perhaps the most significant observation is that as n increases so does the asymptotic value for IPLn,m /In . Binary tree operation, such as insertion and deletion, can be modeled by Markov chains (but the state space would be quite large). Since any binary tree may be obtained by applying some combination of I/D pairs to any other binary tree, the limm→∞ IPLn,m exists. Figure 7 suggests that lim IPLn,m > In m→∞
for sufficiently large values of n (roughly greater than 128). Thus binary trees seem to become “worse than random” after many insertions and deletions. 1.60 2048 node tree
1.50
IPL n,i / In
1.40 1.30
1024 node tree
1.20
512 node tree
1.10
256 node tree
1.00 128 node tree 64 node tree 0.90 0
0.50
1.00
1.50
2.00
2.50
3.00
3.50
i/n 2
Fig. 4. Comparison chart for asymmetric deletions (Eppinger’s Fig. 7)
Eppinger applies regression analysis methods to get an indication on how the steady state internal path length limm→∞ IPLn,m depends on n. The resulting approximation reads lim IPLn,m = 0.0280n log32 n − 0.392n log22 n + 3.03n log2 n − 4.81n
m→∞
suggesting a behavior of Θ n log3 n . 15
To illustrate the above quotation a redrawing of Eppinger’s original Figure 7 is shown as Figure 4 of the present paper.
17
However that may be, by Eppinger’s empirical results Knott’s conjecture definitely turned out to be wrong. There were at least two shortcomings in Knott’s setup that made it impossible to detect the true behavior of the internal path length under repeated random insertions and deletions: 16 a) The largest BST generated by Knott had a size of only 98. For such a relatively small BST the steady state value at which IPLn,m stabilizes is smaller than the random value In . b) Even if the trees generated by Knott were to have a sufficient size the number of iterated (ID)-pairs (namely m = 24) is far too small to get to the region where IPLn,m starts to increase after its initial decrease. By a second series of experiments Eppinger shows that the unfavorable behavior of internal path length is owing solely to the obvious asymmetry of Hibbard’s deletion method. Using a symmetric version of Hibbard’s deletion method, 17 Eppinger observes a much better behavior of internal path length, since now IPLn,m continually decreases for increasing values of m until it eventually stabilizes at a level of about 88% of the corresponding random value In . It must be remarked that Eppinger’s argument trying to ensure the existence of a steady state distribution of the BST shapes contains yet another error in our story. Unfortunately it is not feasible to interpret the shapes of the BSTs randomly constructed by the sequence I n (ID)m as a Markov chain. It is wellknown that a finite irreducible and aperiodic Markov chain (with stationary transition probabilities) has a steady state and this fact is brought into play by Eppinger in order to support his (erroneous) proposition. It is actually true that there are only finitely many shapes and that every shape may be obtained from any other shape by an appropriate sequence of (ID) pairs. Regarding aperiodicity it can easily be shown that every shape can be obtained from itself by a single update pair. Hence all preconditions for a steady state seem to be fulfilled. But this argument is not feasible, since we have no Markov chain at all. This can be shown by means of a counter-example which, regarding its underlying idea, is quite similar to the one used to disprove Hibbard’s incorrect 16
However, in order to exonerate Knott it should be hinted at the improved hardware performance available to Eppinger for his simulation runs. The performance had improved by about two orders of magnitude since the time of writing of Knott’s thesis. 17 The original version of Hibbard’s deletion method shows a bias towards the right. It is straightforward to formulate a reflected version showing a bias towards the left. The symmetric version results from switching between the two biased versions, which can be done by toggling or in a randomized manner. Eppinger reports no significant difference between these two switching alternatives.
18
generalization H′′ in section 2.2. For that purpose let us consider the random sequence II(ID)m resulting in a BST Tm with 2 nodes for every m ≥ 0. The shape of Tm shall be denoted by Sm . Of course Sm ∈ {A, B}, adopting the notation introduced in Figure 3. The transition probability P{S2 = z | S1 = y} can be calculated by determining M1 and M2 where M1 is the set of all histories (i1 , i2 , i3 , d1 , i4 , d2) whose prefix (i1 , i2 , i3 , d1 ) results in shape S1 = y, and M2 ⊂ M1 contains those histories from M1 leading to shape S2 = z. Then the transition probability is P{S2 = z | S1 = y} =
0, if |M2 | = 0, |M2 | / |M1 | otherwise;
since by definition of random insertions and deletions every history is equally likely. For instance, let y = A, z = B, and h1 = (4, 2, 1, 2, 3, 3), h2 = (4, 2, 1, 1, 3, 1) and h3 = (3, 4, 1, 3, 2, 1). Then all three histories belong to M1 , but only h1 also belongs to M2 . By enumerating the pertaining histories one eventually obtains |M1 | = 108 and |M2 | = 25. If the process of shapes S0 , S1 , S2 , . . . actually constitutes a Markov chain the transition probability P{S2 = z | S1 = y} = 25/108 should be independent of the earlier state S0 . To check this we consider, e.g., S0 = A, and determine M1′ ⊂ M1 and M2′ ⊂ M2 , where the primed subsets contain the respective histories visiting state S0 = A after the first two insertions. It turns out that |M1′ | = 84 and |M2′ | = 19. Hence the corresponding conditioned transition probability is 19/84 which is different from the unconditioned transition probability 25/108. Hence the process under consideration is no Markov chain at all, not even a Markov chain with time dependent transition probabilities. 18 However, this error can not impair Eppinger’s merit: the detection that departure from randomness is by no means in the right direction is entirely due to his empirical study.
2.4 Culberson and Munro conjecture that it is even worse than suspected by Eppinger and conceive an analytic model for their conjecture Also J. Culberson and J. I. Munro in several papers [Cul84], [Cul85], [Cul86], [CM89], [CM90] are dealing with Knott’s conjecture. They again consider BSTs randomly constructed by sequences I n (DI)m , as Knott and Eppinger did before. All empirical results contained in above papers are in accordance 18
In the following section EFD updates will be brought into play. It is perhaps worth mentioning at this place that the process induced under EFD-updates actually is a Markov chain. Moreover, it has been shown in [Gri07] that this is not only the case for Hibbard’s deletion algorithm but remains true for various deleting disciplines under EFD updates.
19
with and confirm Eppinger’s observations. They even suggest to assume that the mean internal path length behaves still worse than suspected by Eppinger (who assumed an order of Θ(n log3 n)). Culberson’s and Munro’s experiments confirm the conjecture that the steady state value of the internal path length has an order of Θ(n3/2 ). This behavior has not only been observed for Hibbard’s deletion method but also for Knuth’s improved method, which only performs marginally better than Hibbard’s. Culberson and Munro also did experiments to study the behavior of symmetric versions of Hibbard’s and Knuth’s deletion method. Again the results confirm Eppinger’s observations or are complementary to them as far as Knuth’s method is concerned. For the symmetric versions of both methods the internal path lengths observed in the steady state were definitely better than random. For larger values of n (n ≥ 1024) the ratios limm→∞ IPLn,m /In seem to amount to about 0.88 and 0.87, 19 respectively. This conforms to Eppinger’s results and confirms that the unfavorable behavior of internal path length under both of the original deletion methods is primarily due to their asymmetry. Also for the symmetric versions the improvement of Knuth’s method compared to Eppinger’s is marginal. Correspondingly Culberson and Munro endorse Eppinger’s recommendation to use a symmetric version of Knuth’s deletion method (cf., e.g. [CM90, 311]). In addition to their empirical work, Culberson and Munro also address analytical aspects of the problem. So they propose a model for both of the asymmetric deletion methods in order to theoretically explain the Θ(n3/2 ) behavior of limm→∞ IPLn,m , which is the behavior they conjectured based on their empirical observations. Such a model has been devised in [CM90]. 20 However, this model merely applies to the rather special case that for each pair (DI) occuring in a sequence I n (DI)m the insertion I following the random deletion D is not random at all but only reinserts the key that has been deleted just before. Since under this model the set of keys does not change after the initial BST has been randomly constructed by I n , the corresponding BSTs have been called Exact Fit Domain (EFD) trees by the authors. Under this restricted model it actually could be proved that limm→∞ IPLn,m = Θ n3/2 in fact, where this result applies to both Hibbard’s and Knuth’s asymmetric deletion method. The authors believe that their EFD model could be generalized to also comprise the more general update pairs (DI), as usually considered. A corresponding model, put forward in [CM89], is intuitively appealing but remains 19
Knuth even reports a ratio of only 0.86 [Knu98, 435]. This paper has been submitted earlier (namely July 28, 1986 versus Dec. 1987) than [CM89]. Hence regarding its content it is a predecessor of [CM89] even though it had appeared later. 20
20
somewhat hypothetical, since the presentation is not mathematically com 3/2 follows plete. Under this model a steady state internal path length of Θ n also in the general case. Moreover, based on this model it seems quite plausible to conjecture that the leading term asymptotically is lim IPLn,m
m→∞
1/2
1 2 ∼ 3 π
n3/2 = 0.266 . . . n3/2
as n → ∞, which also shows a good agreement with the empirical data. After all, the above conjecture and its explanation by the authors appeared reliable and convincing enough to Knuth to comment on it in the following way [Knu98, 435]: Further study by Culberson and Munro [CM89], [CM90] has led to a plausible conjecture that the average search time in the steady q state is asymptotically 2n/9π. It is to be hoped that this puts an end to the story of errors.
References [AVL62] G.M. Adelson-Velsky and E.M. Landis. An algorithm for the organization of information. Soviet Math. Doclady, Vol. 3:1259–1263, 1962. [BC60]
A.D. Booth and A.J.T. Colin. On the efficiency of a new method of dictionary construction. Information and Control, Vol. 3:327–334, 1960.
[BY89]
R.A. Baeza-Yates. A trivial algorithm whose analysis isn’t: A continuation. BIT, Vol. 29:378–394, 1989.
[CM89]
J. Culberson and J.I. Munro. Explaining the behaviour of binary search trees under prolonged updates: A model and simulations. The Computer Journal, Vol. 32(1):68–75, 1989.
[CM90]
J. Culberson and J.I. Munro. Analysis of the standard algorithms in exact fit domain binary search trees. Algorithmica, Vol. 5(3):295–311, 1990.
[Cul84]
J. Culberson. Updating Binary Trees, Technical Report CS-84-08. University of Waterloo, Canada, 1984.
[Cul85]
J. Culberson. The effect of updates in binary search trees. Proceedings of the 17th ACM Symposium on the Theory of Computing (STOC), pages 205–212, 1985.
[Cul86]
J. Culberson. The Effect of Asymmetric Updates in Binary Search Trees. PhD thesis, Computer Science Dept., University of Waterloo, Waterloo, Ontario, 1986.
[Dou59] A.S. Douglas. Techniques for the recording of, and reference to data in a computer. The Computer Journal, Vol. 2:1–9, 1959.
21
[Epp83] J.L. Eppinger. An empirical study of insertion and deletion in binary search trees. Communications of the ACM, Vol. 26(9):663–669, 1983. [Gri07]
S. Grill. Gleichgewichtsverteilungen bei zuf¨ alligen Updates in bin¨ aren Suchb¨ aumen. PhD thesis, Dept. of Information Systems and Operations, Vienna University of Economics and Business Administration, Vienna, Austria, 2007.
[GST03] Andreas Geyer-Schulz and Alfred Taudes, editors. Informationswirtschaft: Ein Sektor mit Zukunft, volume 33 of LNI. GI, 2003. [Hib62]
T.N. Hibbard. Some combinatorial properties of certain trees with applications to searching and sorting. JACM, Vol. 9:13–28, 1962.
[JK78]
A. Jonassen and D.E. Knuth. A trivial algorithm whose analysis isn’t. J. Comput. Syst. Sci., Vol. 16:301–322, 1978.
[Kno75] G.D. Knott. Deletions in Binary Storage Trees. PhD thesis, Computer Science Dept., Stanford University, Stanford, Calif., 1975. [Knu73a] D.E. Knuth. The Art of Computer Programming, volume 3. AddisonWesley, Reading, MA, 1973. [Knu73b] D.E. Knuth. The Art of Computer Programming, volume 3, 2nd printing (1975). Addison-Wesley, Reading, MA, 1973. [Knu77] D.E. Knuth. Deletions that preserve randomness. IEEE Transactions on Software Engineering, Vol. SE-3(5):351–359, 1977. [Knu97] D.E. Knuth. The Art of Computer Programming, volume 1. AddisonWesley, Reading, MA, 3 edition, 1997. [Knu98] D.E. Knuth. The Art of Computer Programming, volume 3. AddisonWesley, Reading, MA, 2 edition, 1998. [MN03]
H.M. Mahmoud and R. Neininger. Distribution of distances in random binary search trees. The Annals of Applied Probability, Vol. 13(1):253–276, 2003.
[MR98]
C. Martinez and S. Roura. Randomized binary search trees. JACM, Vol. 45(2):288–323, 1998.
[Ree03]
B. Reed. The height of a random binary search tree. JACM, Vol. 50:306– 332, 2003.
[SA96]
R. Seidel and C.R. Aragon. Randomized search trees. Algorithmica, Vol. 16:464–497, 1996.
[Vui80]
J. Vuillemin. A unifying look at data structures. Communications of the ACM, Vol. 23(4):229–239, 1980.
[Win60] P.F. Windley. Trees, forests, and rearranging. The Computer Journal, Vol. 3:84–88, 1960.
22