We introduce a new updatable representation of binary trees. ... as text indexing, where the node of a binary tree corre- .... Adding a lg r+ 2 lg lg r prefix to. 2.
Representing Dynamic Binary Trees Succinctly J. I n n M u n r o *
Venkatesh Raman
t
A d a m J. S t o r m *
(internal) nodes. Logically attaching an external node to each position in the tree without a child we have n + l We introduce a new updatable representation of binary trees. external nodes. The structure requires the information theoretic minimum Data may be associated with all internal a n d / o r 2n + o(n) bits and supports basic navigational operations in all external nodes. This data is taken to be either of constant time and subtree size in O(lg n). In contrast to the constant size or a lg n 1 bit reference. One choice may linear update costs of previously proposed succinct reprebe made for the size of internal node data and another sentations, our representation supports updates in O(lg 2 n) for external node data. Proofs, however, are given amortized time. only for the case where data is associated with external nodes. The updates permitted are the natural insertion 1 Introduction or deletion of a single node. We allow insertions to the Trees, particularly binary trees, are elementary structree along an edge or by inserting a new leaf. Conversely, tures in many aspects of computing. The standard repa node with one child or a leaf may be deleted. resentation of a tree, with a pointer or two per parentWe adopt a natural model of a random access machild relationship, is easy to navigate and update. Furchine under which a lg n bit word can be manipulated thermore, the structure can be easily augmented so that with the usual operations in unit time, i.e. the size of operations such as determining subtree size can also be the tree roughly matches word size. It was under this supported in constant time. Unfortunately, this repremodel that Jacobson[5] showed how to represent a tree sentation can be very costly, even prohibitive, in terms using 2n + o(n) bits and be able to determine the parof space. This is particularly true in applications such ent or child of a node in lg n bit inspections. Munro as text indexing, where the node of a binary tree correand P~man[7] improved this to inspecting a constant sponds to an index point in a text file. number of lg n bit words, and added a number of operTaking this reasonable point of view that a pointer ations including subtree size. Clark and Munro[3/gave used in representing an n node tree takes lg n bits, the a representation aimed at large trees to be kept on secusual representation of a binary tree requires 2 n i g h ondary storage. They broke the tree into pieces so that bits (even without parent pointers). On the other each piece could be stored on a page of memory using a hand, a binary tree can be represented in fewer than 2n 3n + o(n) bit representation. An update could be made bits, as there are only (2~)/(n + 1) or about 22'~/n a/2 by totally recomputing the page in question and modibinary trees on n nodes. Indeed Jacobson[5], Munro fying any other pages along the path from the root. In and Raman[7], and others have proposed 2n + o(n) bit a disk based model, of course, this implies rewriting all representations that permit fast navigation of a tree. pages along such a path. Nevertheless, their approach These approaches, and the work presented here, all lead was effective in the practice of maintaining suffix trees. to a mapping from the n (internal) nodes of a tree onto Although the approach taken here is very different, their the integers [1, n / a n d from the external nodes onto the work is, in a very loose sense, the starting point for the integers [1, n + 1]. This leads to a way of associating work presented here. Our main results can be stated as auxiliary data with nodes and external nodes. Most of follows: these approaches, however, are inherently static. The focus of this paper is a succinct, quickly navigable and updatable representation of binary trees. Our T h e o r e m 1.1 There exists a 2n + o(n) bit binary representations deal with arbitrary binary trees on n tree representation that can be created in linear time and facilitates navigation and subtree size queries in *Department of Computer Science, University of Water- constant time. The structure also supports finding any loo, Waterloo, Ontario N2L 3G1, Canada, email: {imuuro, extra (fixed size) data associated with nodes. Given a~torm}~uwaterloo, ca the location at which an insertion or deletion is to be Abstract
t I n s t i t u t e of Mathematical Sciences, Chennal, India 600 113, vr~m~Tt@imsc, e r n e t , i n
lWe use l g n to denote Jig2 n + 1]
529
performed, updates to the tree can be made in poly-log time depending on data associated with internal and/or external nodes. In particular: • If no data is associated with nodes, update time is O(lg ~ n) worst case and O(lglgn) amortized. • If data of fixed constant size is associated with internal nodes and/or external nodes, update time is O(lg 3 n) worst case and O(lg n) amortized. • If fixed size data of O(lgn) bits (such as references to an arbitrary record) is associated with internal and/or external nodes, update time is O(lg4n) worst case and O(lg 2 n) amortized. In the next section we give a high level description of our structure. Subsequently, we provide a more detailed description of the structure and how it facilitates insertions and deletions.
2
O v e r v i e w of t h e Structure
We first describe our data structure giving the invariants that later are used by the search procedures and maintained by the update algorithms. The basic notion is to divide the tree into subtrees of size O(lg 2 n) ~the root's subtree may be smaller). We call these O{lg n) sized sub-trees small trees and we store them in blocks. Each of these small trees is then subdivided into tiny trees of O(lg n) nodes. These are stored in sub.blocks. The limited size of these tiny trees enables us to maintain a table of the representations of all possible binary trees of size at most clgn (for any constant c < 1/2). Such a table, permits the representation of a tiny tree of size O(lgn) using a O(lgn) sized pointer to its representation. Moreover, since there are only O(lg n) tiny trees for each small tree, we can use explicit O(lglgn) sized pointers between tiny trees. We now give a more detailed description of the blocking structure and how it responds to additions (the structure's response to deletions is analogous).
from each child togetl~er with the node itself. If this new block contains at least lg 2 n nodes it is a "complete block" and a new incomplete block is passed to its parent. Otherwise, the combined block remains incomplete and is passed to the parent. Finally, the block containing the root is viewed as complete regardless of its size. Clearly we could restrict the size of a (complete) block t o being between lg 2 n and 2 lg2 n nodes, however relaxing the upper bound to 3 lg2n will be helpful in performing updates. The matter of references between small trees is, however, of some concern. As each small tree will have one parent node in another small tree, only O ( n / l g 2 n) pointers ( = O ( n / l g n ) = o(n) bits) are required for references between parent and child small trees. However, an individual small tree could have O(lg ~ n) child small trees. Hence these inter-block child pointers are not stored in the blocks themselves, but in an auxiliary structure. As a consequence the size of a block will depend only on the number of nodes in the small tree it represents.
2.1.1 Block Organization In allocating storage during updates, it is convenient to group together subtrees of roughly the same size. Hence we say that blocks with between lg 2 n + (i I) lg n and lg 2 n + i lg n nodes are in group i. Each block in the grouping is allocated the same amount of space: adequate for the largest but wasteful only by a factor of (1 + 1 / l g n ) for the smallest. Within a block grouping, blocks are stored contiguously so that no space is maintained between blocks. Block groupings are stored in an array, ordered by block size. The grouping with smallest blocks is first in the array, and the grouping of largest sized blocks is at the end. We use the optimally resizable arrays of Brodnik, et. al.[2] which permits accesses, extensions, and contractions of the array to be performed in constant time. The space overhead is proportional to the square root of the number of "words" in the array. Between each block grouping is some empty space to facilitate growth and contraction. This will be at most 3 lg n 2.1 Blocks words of lg n bits each (i.e. at most 3 lg2n bits). To The tree is divided into subtrees of between lg~n maintain a traversable structure, blocks are connected and 3 lg 2 n nodes. These small trees are stored in blocks. to their children (and children to parent) using explicit (Note that the root's block is the only block that may pointers of size lg n. This gives a total of 2 pointers per be less than lg 2 n nodes in size.) This division can be block (one parent, one child) since each block can be done using a greedy algorithm which performs a post referenced only once as a child. As a result, while one order traversal of the tree in the following manner: block may have O(lg 2 n) pointers, there will be no more At each node we determine the size of the "incomthan O ( n / l g 2 n) pointers in total. plete blocks" presently containing each of its children. Each external node is viewed as an incomplete block 2.1.2 Inter-Block Pointers of size 0 and passed to its parent. At each internal All inter-block pointers for a given block are stored node a new block is found by taking the sub-block part
530
contiguously in a separate pointer block. The main difference in the storage technique used for blocks and that used for pointer blocks is that between pointer block groupings there is no unused space. This is because, as we shall see later, pointer blocks only need modification upon block splitting or merging. As a result, when pointer block sizes change, they do so dramatically and so the mechanism employed in block rearranging is invalid. The technique used to maintain pointer blocks is somewhat involved. It can, however, be found in [8]. The inter-block pointers are arranged in a B-tree within the pointer block so that an external node can find its pointer in constant time. We discuss the details of this B-tree in section 2.2.3.
L e m m a 2.1 (Table Size) Representing the e ( l g n ) tables can be achieved with O(n c lg n) bits. It is interesting that the tables do not contribute to the dominant space term of our structure.
2.2.2 External N o d e N u m b e r i n g s In addition to a pointer to the explicit tree representation, each sub-block stores some information used to determine external node numbers within a given block. As described below, there are three types of external nodes in a sub-block: inter-sub-block pointers, interblock pointers, and genuine external nodes of the tree (these may be implications of real data). Due to the way pointers are stored in our structure, we must be able to determine in constant time, how many external nodes precede the node, within the block, in a preorder 2.2 S u b - B l o c k s There are two main components of each sub-block: traversal. To achieve this we store an array (called the external a pointer to the table representation, and leaf numbering information. We first discuss the crucial aspect of node numbering array) in each block, which has an entry pointer components since it is by far the more impor- for each sub-block. A given sub-block's entry in the tant aspect, and then we go on to explain the external array stores the number of external nodes that precede node numbering information, why it is necessary, and the first external node of the sub-block. Additionally, in the table of tree representations, for each node of how it is used. the "tiny" tree we store the number of external nodes preceding the node within the sub-block. With this 2.2.1 Pointer To Tree Representation information, and the number of external nodes before The blocks of size O(lg 2 n) are divided into subthe root of the sub-block and within the block, we can blocks of between ¼lg n and i~ lg n nodes. As mendetermine external node numberings for each node of tioned before, these sub-blocks are pointers to a tathe tree in constant time. ble containing a representation of every possible binary tree with at most ¼lg n nodes. Since there are roughly 2 2r binary trees on r (or up to r) nodes, there will be 2.2.3 Sub-Block Pointers We know that inter-sub-block pointers need only roughly vfn entries in the table. As a consequence there is no problem with the space requirements of represent- be of size lg lg n. Moreover, the number of inter-subing a copy of each possible subtree, therefore, we will block pointers in a given sub-block is one less than ignore the actual table of vfn trees for the present. The the number of sub-blocks in the block. To store the table will actually maintain additional information use- pointers we number the external nodes in postorder ful when performing insertions and deletions so we shall and as previously mentioned, maintain a count of the number of external nodes before a sub-block. We store return to the issue later. The references to subtrees in this table could be the list of a block's inter-sub--block pointers in a B-tree given by the parenthesis encoding of Jacobson[5] or of that is in fact a simplified version of a fusion tree[4] with Munro and Raman[7]. We observe however, that as the following properties: there are about 4 r / 2 ~ r 3 ] 2 binary trees on r nodes, 2 r 32 lg r - O(1) bits suffice. Adding a lg r + 2 lg lg r prefix to indicate the value of r gives us a 2r - ~ lg r + o(lg r) bit designation for a subtree. Hence each sub-block of size r can use fewer than 2r bits to encode itself. Ultimately the space taken by our encoding is dominated by the virtually optimal encodings of these "tiny" trees. All other space used is to facilitate navigation, updates and interpretation. Within this extra space we require references to parent and child sub-blocks and to external data fields.
1. Each node of the B-tree is of size ½ lg n bits. 2. The keys of the tree are of size lglg n bits and represent the external node's number in the block. 3. The ½1gn bit nodes each store from ~41glgn to 21glgn of these lg lg n sized keys.
4. At all times the tree is of height 2. We know that a B-tree with these properties will remain height 2 with up to lg 2 n/(4 lg lg n) 2 > lg n keys.
531
Since we have O(lg n) external nodes that are inter-subblock pointers in each block, we say that an external node is an inter-sub-block pointer if it is represented by a key in the tree. Assuming that the external node is found in the tree, we now must find the pointer associated with the desired position. To allow this to be done in constant time we store an additional B-tree node for each of the nodes on the second level of the B-tree which instead of storing keys, stores the intersub-block-pointers (we call this node the pointer node). This can be done since our keys and pointers are of the same size. When we find that a key is in a given second level node of the B-tree and at a given position, we go to the same position in the pointer node and we will find the pointer associated with that external node. With nodes containing up to lgn/21glgn keys, searching for a desired node could take O(lglgn) time with binary search. Instead, we maintain a table that will allow us to achieve the constant time bound. We store a table outside of the block structure which is indexed by a ½lg n sized key. Since there exists an entry of the table for each of the ½lg n sized keys, there are vz~ entries in total. Each entry stores in sequential order lg n/lglg n records of size l g l g n . When we wish to know which branch to take on the top level of our Btree, or if a key exists on the second level, we index into the table based on the B-tree node's bit representation (note we choose the table's index to be the same size as each node). Having found the correct entry in the table, we use the lg lg n sized external node number to determine how many records into the entry we must skip (i.e. external node ~ in the block accesses the £th record in the list). The record to which you skip encodes the branch of the tree you must traverse or, if on the second level, where your corresponding key should reside. This table will allow us to index into the tree in constant time and hence we will be able to determine whether an external node represents an inter-sub-block pointer in constant time.
O(lgn) giving us O(~/~lgn) extra space. Since there are O(n/lg z n) blocks there will be n~ lglg n+n/2 l g n + O ( v / ~ l g n ) , or more simply O(n/lglgn), bits used in storing the structure's inter-sub-block pointers. []
2.2.4
Internal Sub-block Organization
The internal organization of blocks is crucial to achieving quick accesses and updates. The sub-blocks within a block are arranged similarly to how we store blocks. Sub-blocks are stored in sub-block groupings such that all sub-blocks in a sub-block grouping are the same size =t=lglgn. The sub-block groupings are arranged such that the grouping that contains the sub-blocks of smallest size are first and the largest sub-block sized grouping is last. Between sub-block groupings we maintain a gap of at most lg n bits so that when sub-blocks grow upon insertions they can be rearranged easily. Since there are O(lg n/lg lg n) subblock groupings and there are at most lg n bits between each pair of groupings, the total space used in inter-subblock grouping gaps is lg 2 n~ lg lg n.
2.3
Data Representation
The data representation technique used in our model is key to achieving the space constraints desired. It is clear that since, pointers to data dominate the numbers of overall pointers, if we had explicit pointers from the external nodes of the tree to the data set then we would require O(n lgn) bits of additional space which would push us beyond the desired 2n space bound. As a result we implement a storage technique for the data that mirrors the structural storage technique. In our model we consider the case in which data records are of constant size (the case where data records are lg n bit references is analogous). We store these fixed size records in blocks of size i lg n (i is at most ~ lg n). Like our structural blocks, the data blocks are stored in memory according to their size with all blocks of the same size stored contiguously. Within these blocks we contiguously store our fixed sized data records. AddiLemma 2.2 (Inter-Sub-Block Pointers) tionally, as with previously defined structural blocks, we O(n/lglgn) bits su~ce to represent all inter-sub- maintain gaps between groupings of same sized blocks block pointers. so that shuffling of blocks can be done efficiently. Proof. In each block we have O(lgn) inter-subIn each structural block is a pointer to the data block pointers each of size lglg n. The structure that block that stores its data. Additionally we know that if holds the inter-sub-block pointers consists of one root an external node of the subtree is not an inter-sub-block node and lg n/lg lg n external nodes each of size ½lg n. or inter-block pointer, then it is external to the entire Additionally, the actual lg lg n sized pointers are stored tree and as such is associated with some data. After in a second external node level containing lgn/lg lg n verifying that the leaf is external to the tree we can find leaf nodes each of size ½lg n. Therefore in total we have its data in the block's corresponding data block. lg 2 n~ lg lg n + ½ lg n bits used up in storing a block's To simplify searching for the data within the d a t a inter-sub-block pointers. Additionally, outside the block block, we use the previously described external node structure, there is a table of vrff entries each of size numberings. To determine which record to access in
532
the data block we must know precisely the number of external pointers that precede us within the block and sub--block. These values can be found in the respective B-trees and when subtracted from the external node numbering, give the data record that is to be accessed. It should be noted that if it is desired that the tree's internal nodes have associated data then an analogous method can be used to represent the internal data. 3
M o d i f y i n g the Structure
When an insertion is made into a block its size increases, however, provided it does not increase beyond the lgn block increment it remains valid in its current location. If however, the block size has increased beyond the lgn block increment, it must be relocated.
blocks until the first block of the last block grouping has been moved to the end of the array and the array is extended if necessary (see figure 2). This will guarantee that block size will always be between 2 lg n and 3 lg 2 n. Finally we must update block pointers to maintain the tree's structure. A similar mechanism is implemented to avoid gap sizes from becoming too large upon deletions. The details of this complimentary mechanism are similar to the gap resizing algorithm above and as such are left to the reader. 3.3
Splitting Blocks Another implication of having block size constraints in the presence of insertions is that block sizes could exceed the maximum allowable size. In this case we must split the block so that the two resultant blocks are 3.1 R e l o c a t i n g B l o c k s both of a legal size. Splitting a block is thus performed When a block grows we know that relocation will in three steps: finding a node at which the block can result in the block being placed in the next largest block be validly split, splitting the block, and placing the grouping (if one is not available the block must be split). resulting two blocks in their proper places. We will To do this a copy of the block is made and then the last assume for now we can perform a block split in constant block of the block grouping is moved into the moving time by dereferencing one pointer to disconnect the block's location (we know the last block will fit since all two trees. Later we will show that in fact O(lg 2 n) blocks in a block grouping are of the same size). This amount of work may be necessary if a sub-block is split move increases the gap between the two block groupings by the old block size. After the gap grows, its size will as well. Since we have shown that block reallocation be large enough so that the moving block can be placed takes O(lg n) time, to show that block splitting requires within. Resultantly, the moving block becomes the first O(lg 2 n) time we must show that we will always be able block of the following block grouping. Figure 1 shows to find a valid splitting node in O(lg 2 n) time. We can see that if we can split the block such that how blocks are relocated. both parts have between 1/3 and 2/3 of the bits we will be left with 2 blocks both of which are now validly sized. Lemma 3.1 (Block Relocation) A block can be reIt is known from [6] that it is possible to split a binary located in O(lg n) time. tree of n nodes through removal of an edge so that each subtree has no more that 2n/3 vertices. This leads to 3.2 S h r i n k i n g Gap Sizes the following claim: We can see that in the process of relocating blocks upon the addition of a new node, gap sizes decrease. C l a i m 3.1 A block split can be performed in O(lg 2 n) From this we can deduce that there will be a situation time. where a given gap will close (i.e. be reduced to 0 bits in size) and the above described block relocation algorithm We can see that splitting a block requires O(lg 2 n) will fail. To avoid gaps closure we monitor the gaps time if and only if at the sub-block level, all splitting and when a gap is reduced to lg n bits we perform gap operations can be completed in O(lg 2 n) time or less. resizing. In the next section we go on to describe the sub-block structure and eventually show that the above claim 3.2.1 Gap Resizing is true. In the process, we prove Theorem 1.1; that To resize the gap between block grouping a and b insertions and deletions can be done in O(lg 2 n) time. (where block size(a) < block size(b)) we make a copy of the first block in block grouping b (bl) and delete 3.4 Inserting into Sub-blocks bl so that its previous location is now a gap. Then we When a new node is inserted into the structure, the make a copy of the first block of block grouping b + l actual insertion takes place at the sub-block level. After ( ( b + 1)1), and place bl at the end of block b. After this the correct block and sub-block are found, the sub-block has been done, the gap between a and b is now sizeof(b) is traversed to find the location where the new node will (or at least lg 2 n). We continue copying and moving be placed. Once the location where the new node is to
533
(a)
(1~)
(c)
Figure 1: ( R e l o c a t i n g B l o c k s ) In (a) block 2.2 is the destination of a new node. With the new node block 2.2 becomes too large and must move. In (b) the block is copied to a new location and the last block of the block grouping takes its place. Finally in (c), the now larger 2.2 is placed at the top of the next block grouping and the gap between the two block groupings decreases by lg n.
3.4.1 Splitting Sub-blocks be placed is found, the insertion can take place. When an insertion is made into a sub-block of An insertion requires us to change the sub-block pointer so that it points to the new representation of largest allowable size, the sub-block must be split. This the tree. We know that this new representation will be can be done using the three step method outlined for in the table representing trees of size r + 1. Accordingly, block splitting. The actual splitting of a sub-block is after generating the new representation of the tree, by a bit more complicated since we do not have explicit modifying the encoding to account for the insertion, we pointers to simply reassign. To perform the split we first determine the node simply perform a binary search of the r + 1 size table to find the offset of the correct encoding. Once we have at which the sub-tree will be split. After the node found the correct encoding, we set that to be the new has been found, we split the sub-tree and generate offset and we set r + 1 to be the new size. This takes the encodings for the two new sub-trees. To find the tree representations for the two new trees we first O(lg n) time as the tables are of size n ~. When we add a node to a sub-block, its size must determine the size of the trees and then search increases by two bits. As a result, we may have to move the corresponding tables for the representation's offset. the sub-block to the next sub-block grouping (since sub- Following the search of the appropriate tables, we block sizes are in increments of lg lg n). This sub-block determine the external node numberings by splitting the reorganization is similar to the block-reorganization original external node numberings. Since we now have a new sub-block we must add one previously described. Additionally, since we assume that insertions may occur at the leaf level, the external inter-sub-block pointer to the B-tree where the pointers node's data must be added to the tree structure. This are stored. This is the final stage in the splitting process. is done by inserting the leaf's record into the external data structure defined above. Finally, the increase in L e m m a 3.3 A sub-block can be split in O(lgn) worst leaves forces us to increment external node numberings case time and O(1) amortized time. (we omit these details). L e m m a 3.4 A block split can be performed in O(lg 2 n) L e m m a 3.2 Modifying the tree structure upon inser- time and O(1) amortized time. tion of a node requires O(lgn) time.
534
..
(.1
(b)
(c)
(d}
Figure 2: ( G a p R e s i z i n g ) In (a) block 1.2 grows and is relocated. This leaves the gap between block grouping 1 and 2 at lg n bits and so the gap must be resized. Block 1.2 moves to the bottom of the second block grouping and block 3.0 moves to the bottom of the third block grouping. This continues until (c) where block (b-1).O is moved and b.O is moved to the bottom of the array of blocks. Notice that in (d) the array of blocks has grown by the size of the last block.
3.5
Modifying Data When an insertion occurs, the data associated with the given node must be added to the data structure. To do this we first determine (by examining the external node numberings), the location within the data block where the data record belongs. Then, we locate the data block through its pointer in the block and rewrite the block with the inserted record in place. When our records are of constant size, rewriting a data block will take O(lgn) amortized time. Conversely, when the data records are lg n bit references we will require O(lg z n) amortized time to rewrite blocks. Following the insertion the block is larger and must be relocated. This relocation is performed as described previously when dealing with structural block relocation. Lemma3.5 Modifying data blocks takes O(lgn) amortized time for fixed sized records and O(lg2n) amortized time for lg n sized references. This is the dominant cost in modifying our structure and proves Theorem 1.1.
3.6
Changes in lgn One difficulty with our model is that it has a reliance on the value of lg n however, this value has the
potential to change over time. As a result there are steps that must be taken when the value of lg n increases or decreases. We know that before lg n doubles or halves, n must be squared or rooted. This means that we can amortize the cost of changing the structure over O(n 2) operations.
3.7
S u b t r e e Size Since we divide our tree structure into small blocks, if we maintain at the root of the small block the block's subtree size, in the worst case updating subtree size will require visiting each of these small blocks. Additionally, if we maintain the subtree size within the small block at the root of each tiny block, when updating, these values must also be modified. While performing these updates would take o(n) time, we would like to have the ability to update subtree size in constant time. To achieve this we consider a certain class of accesses to the tree. When concerned with subtree size we say that navigation through the tree begins at the root and may end at any point in the tree (although for purposes of claims regarding worst case time for updating the size of subtrees, we will assume navigation ends at the root). Each small block contains a count of its subtree
535
size beginning at its root and ending at its leaves. Additionally, each tiny block in the structure maintains a count of the number of nodes in its subtree starting at its root. Finally, in the table of tree representations, we store the number of descendant nodes within the tiny block for each node of the tree. When we wish to determine the subtree size at a given node of the tree we take the value in the table and we add to it the subtree sizes of each of its descendant tiny blocks. Then we add to that number the subtree sizes of each of its descending small blocks. Since in the worst case there are O(lg 2 n) descendant small blocks and O(lgn) descendant tiny blocks, determining subtree size takes O(lg 2 n) time. When an insertion or deletion takes place, we must update the subtree size of each of the tiny blocks in the current small block. This update will potentially require visiting all the tiny blocks within the small block and accordingly will take O(lg n) time. After this operation the small block has correct current subtree size information however, all ancestor small blocks may have incorrect information. To correct this we must visit all the small blocks which we traversed to get to the node at which the insertion or deletion took place. Since we already have visited these small blocks we can amortize the cost of updating the subtree size over the steps taken in traversing to the current node. This allows us to achieve amortized constant time updates to subtree size. It should be noted that we could compute the subtree size in constant time by maintaining the subtree size sums of descendant blocks (i.e. each block maintains the sum of all preceding blocks subtree size). With these sums we could simply determine the first descendant block and the last descendant block and from these, determine the subtree size. The problem with this model is that updating the sums would take O(lg 2 n) time. C o r o l l a r y 3.1 The results of Theorem 1.1 apply to a forest of binary trees with the added result that two trees in the forest can be joined in O(lg 2 n) time. C o r o l l a r y 3.2 Updates can performed on a binary tree in amortized constant time with the use of O ( n lg lg n ) space. Corollary 3.2 follows by storing the small trees in a conventional manner with explicit parent-child pointers and explicit pointers to the data, though these pointers require only l g l g n bits each. A few other minor modifications are required, but we omit these details. 4 Conclusions We have presented a binary tree representation that is within a lower order term of the information theoret-
536
ically optimal number of bits. Additionally we have shown how our structure, unlike the ones previously proposed facilitates insertions and deletions in O(lg 2 n) time. It would be interesting to consider the problem of improving the time for an update. While our model can represent arbitrary k-degree ordinal trees, it does so by performing a trivial mapping which requires O(k) time to determine the kth child of any of the tree's nodes. The problem of succinctly representing, and efficiently updating, trees of higher degree so that navigation can be performed efficiently [1] remains open. It may well be amenable to the our techniques. References [1] D. Benoit, E. D. Demaine, J. I. Munro, and V. Raman, "Representing Trees of Higher Degree", In Proceedings of the 6th International Workshop on Algorithms and Data Structures (WADS), volume 1663 of LNCS, pages 169-180, Springer-Verlag, 1999. [2] A. Brodnik, S. Carlsson, E. D. Demaine, J. I. Munro, and R. Sedgewick, "Resizable Arrays in Optimal Time and Space", In Proceedings of the 6th International Workshop on Algorithms and Data Structures (WADS), volume 1663 of LNCS, Springer-Verlag, (1999) 37-48. [3] D. R. Clark and J. I. Munro, "Efficient Suffix Trees on Secondary Storage", Proceedings of the 7th ACMSIAM Symposium on Discrete Algorithms (SODA), (1996) 383-391. [4] M. L. Fredman, and D. E. WiUard, "Surpassing the Information Theoretic Bound with Fusion Trees", Journal of Computer and System Sciences, 43 (1993) 424436 [5] G. Jacobson, "Space-efficient Static Trees and Graphs", Proceedings of the IEEE Symposium on the Foundations of Computer Science (FOCS) (1989) 549554. [6] R. J. Lipton, and R. E. Tarjan, "A Separator Theorem For Planar Graphs", SIAM Journal of Applied Mathematics, 36(2) (1979) 177-189. [7] J. I. Munro and V. Raman, "Succinct Representations of Balanced Parentheses, Static Trees and Planar Graphs", In Proceedings of the 38th Annum Symposium on Foundations off Computer Science (FOCS), (1997) 118-126. [8] A. J. Storm, Representing Dynamic Binary Trees Succinctly, MMath thesis, U. Waterloo, 2000.