Jul 1, 2014 - consumer-oriented software such as Apple's Time Machine, as well as ... new versions to be generated by performing an update operation on ...
Cache-Oblivious Persistence 1 ¨ ur Ozkan ¨ Pooya Davoodi∗ 1 , Jeremy T. Fineman† 2 , John Iacono‡ 1 , and Ozg¨ 1
arXiv:1402.5492v2 [cs.DS] 1 Jul 2014
2
New York University Georgetown University
Abstract Partial persistence is a general transformation that takes a data structure and allows queries to be executed on any past state of the structure. The cache-oblivious model is the leading model of a modern multi-level memory hierarchy. We present the first general transformation for making cache-oblivious model data structures partially persistent.
1
Introduction
Our result is a general transformation to make a data structure partially persistent in the cacheoblivious model. We first review both the persistence paradigm and the cache-oblivious model before presenting our result. Persistence. Persistence is a fundamental data structuring paradigm whereby operations are allowed not only on the current state of the data structure but also on past states. Being able to efficiently work with the past has become a basic feature of many real world systems, including consumer-oriented software such as Apple’s Time Machine, as well as being a fundamental tool for algorithm development. There are several types of persistence which vary in power as to how new versions can be created from existing ones. One can also view persistence as a transformation that extends the operations of an underlying data structure ADT. Data structures, generally speaking, have two types of operations, queries and updates. A query is an operation that does not change the structure, while updates do1 . A version is a snapshot of the data structure at the completion of an update, and updates generate new versions. In all variants of persistence queries are allowed in any version, past or present. The simplest form of persistence is partial persistence whereby updates can only be performed on the most recently generated version, called the present. The ensuing collection of versions models a linear view of time whereby one can execute operations only in the present and have a read-only view back into the past. A more complicated form of persistence is known as full persistence and allows ∗
Research supported by NSF grant CCF-1018370 and BSF grant 2010437. Research supported by NSF grants CCF-1314633 and CCF-1218188. ‡ Research supported by NSF grant CCF-1018370. 1 Note that, for example, in self-adjusting structures (e.g. [10, 13, 14]) operations considered to be queries are not queries by this definition since rotations are performed to move the searched item to the root, thus changing the structure. †
1
new versions to be generated by performing an update operation on any of the previous versions; the new version is branched off from the old one and the relationship among the versions form a tree. The most complicated form of persistence is when the underlying data structure allows the combination of two data structures into one. Such operations are called meld-type operations, and if meld-type operations are allowed to be performed on any two versions of the data structure, the result is said to be confluently persistent. In confluent persistence, the versions form a DAG. In [7], the retroactive model was introduced, which differed from persistence by allowing insertion and deletion of operations in the past, with the results propagated to the present; this is substantially harder than forking off new versions. Much of the work so far on persistence has been done in the pointer model. The landmark result was when in [8, 12] it was shown that any pointer based structure with constant in-degree can be made fully persistent with no additional asymptotic cost in time and space; the space used was simply the total number of changes to the structure. The first attempt at supporting confluent operations was [9], but the query runtimes could degrade significantly as a result of their transformation. This was largely because by merging a structure with itself, one can obtain an implicit representation of an exponential sized structure in linear steps. In [6] the situation was improved by showing that if a piece of data can only occur once in each structure, then the process of making a structure confluently persistent becomes reasonable. The cache-oblivious model. The Disk-Access Model (DAM) is the classic model of a two-level memory hierarchy; there is a computer with internal memory size M , and an external memory (typically called for historical reasons the disk ), and data can be moved back and forth between internal memory and disk in blocks of a given size B at unit cost. The underlying premise is that since disk is so much slower than internal memory, counting the number of disk-block transfers, while ignoring all else, is a good model of runtime. A classic data structure in the DAM model is the B-tree [1]. However as the modern computer has evolved, it has become not just a two-level hierarchy but a multi-level hierarchy, ranging from the registers to the main storage, with several levels of cache in between. Each level has a smaller amount of faster memory than the previous one. The most successful attempt to cope with this hierarchy to date has been employing cache-oblivious algorithms [11], which are not parameterized by M and B. These algorithms are still analyzed in a two-level memory hierarchy like the DAM, but the values of M and B are used only in the analysis and not known to the algorithm. By doing the analysis for an arbitrary two-level memory hierarchy, it applies automatically to all the levels of a multi-level memory hierarchy. Cache-oblivious algorithms are analyzed in the ideal-cache model [11], which is the same as the DAM model but for one addition: the system automatically and optimally decides which block to evict from the internal memory when loading a new block. We use the term cache-oblivious model to encapsulate both the ideal-cache model and the restriction that the algorithm be cache oblivious. Some DAM-model algorithms are trivially cache oblivious, for example scanning an array of size N has cost O(N/B + 1) in both models. Other structures, such as the B-tree, are parametrized on the value of B and thus cache-oblivious alternatives to the B-tree require completely different methods (e.g. [3, 4]). Previous persistence results for memory models. In [5] a general method to make a pointerbased DAM model algorithm fully persistent (assuming constant in-degree) was presented where 2
updates incur a O(log B) slowdown compared to the ephemeral version. In [2] a partially persistent dictionary for the cache-oblivious model was presented, with a runtime of O(logB (V + N )) for all queries and updates, where N is the size of the dictionary and V is the number of updates. Both of these results are based on adapting the methods of the original persistence result for the memory model at hand, and are still tied to the view of a structure as being pointer-based. In contrast, we make no assumptions about a pointer-based structure and do not use any of the techniques from [12]. Persistence in the cache-oblivious model. We present a general transformation to make any cache-oblivious data structure partially persistent; this is the first such general result. In a cache-oblivious algorithm, from the point of view of the algorithm, the only thing that is manipulated is the lowest level of memory (the disk). The algorithm simply acts as if there is one memory array which is initially empty, and the only thing the algorithm can do is look at and change memory cells. As the ideal cache assumption states that the optimal offline paging can be approximated within a constant factor online, the algorithm does not directly do any memory management; it just looks at and changes the memory array. Working directly with the memory array as a whole is a necessary feature of efficient cache-oblivious algorithms, and the overriding design paradigm is to ensure data locality at all levels of granularity. We define the space used by a cache-oblivious algorithm, U , to be the highest address in memory touched so far. The runtime of any operation on a particular state of a data structure is expressed as a function of B and M (only), which are unknown to the algorithm and in fact are different for different levels of the memory hierarchy. In partial persistence, another parameter of interest is the number V of updates performed. This is simply defined as the total number of memory cells changed by the algorithm. So, the goal is to keep a history of all previous states of the data structure, and support queries in the past for any cache-oblivious structure while minimizing any additional space and time needed. As a warm up, look at two simple methods to implement cache-oblivious persistence. If one were to copy the entire memory with every update operation one would be able to achieve partial persistence, but with unacceptable space usage and update cost (since this method would have to spend O(U/B) time copying the memory for every update). On the other extreme, one could store for each location in memory a dynamic predecessor-query structure that records all the values held by that cell in memory. While this approach brings down the space usage, it destroys any locality in memory that the algorithm tried to create; for example, an array scan will cost O(U ) rather than O(U/B), as array elements which were sequential in the ephemeral structure would no longer be stored sequentially. Our results. The most straightforward way to present our results is in the case when the ephemeral data structure’s runtime does not assume more than a constant sized memory, and does not benefit from keeping anything in memory between operations. These assumptions often correspond to query-based structures, where blocks that happen to be in memory from previous queries are not of use, and are often reflected in the fact that there is no M in the runtime of the structure. In this case our results are as follows: queries in the present take no additional asymptotic cost; queries in the past and updates incur a O(logB U )-factor time overhead over the cost if they were executed with a polynomially smaller block size. In particular, a runtime of tq (B) in the ephemeral structure will become O(tq (Θ(B (1−ǫ)/ log 3 )) logB U ) for any constant ǫ > 0. Thus, as a simple example, a structure of size Θ(N ) and runtime O(logB N ) will support queries in the 3
present in time O(logB N ), persistent queries in the past and updates in time O(log2B N ). Structures where memory plays a role are more involved to analyze. There are two cases. One is where memory is needed only within an operation, but the data structure’s runtime is not dependent on anything being left in memory from one operation to the next operation. In this case, as in the above, queries in the present will be run as asymptotically fast as in the ephemeral using a memory of size a constant factor smaller. For queries in the past and updates, our transformation will have the effect of using the runtime of the ephemeral where the memory size is reduced by a O(B 1−(1−ǫ)/ log 3 logB U ) and a O(B 1−(1−ǫ)/ log 3 ) factor, respectively. Thus, in this case, a query of time tq (M, B) in the ephemeral structure will take time O(tq (Θ(M ), Θ(B))) for a query in the present, O(tq (Θ(M/B 1−(1−ǫ)/ log 3 logB U ), Θ(B (1−ǫ)/ log 3 )) log B U ) for a query in the past, and O(tq (Θ(M/B 1−(1−ǫ)/ log 3 ), Θ(B (1−ǫ)/ log 3 )) logB U ) for an update. When the runtime of a structure depends on keeping something in memory between operations, the runtimes of the previous paragraph hold for updates and queries in the present. But, for queries in the past, this is not possible. It is impossible to store the memory of all previous versions in memory. For queries in the past, suppose that executing a sequence of queries in the ephemeral takes time tq (M, B) if memory was initially empty. By treating the sequence as one meta query, the results of the previous paragraph apply, and our structure will execute the sequence in O(tq (Θ(M/B 1−(1−ǫ)/ log 3 logB U ), Θ(B (1−ǫ)/ log 3 )) log B U ) time. Having tq (M, B) represent the runtime in the ephemeral assuming memory starts off empty could cause tq (M, B) to be higher than without this requirement. Short sequences of operations that have zero or sub-constant cost will have the largest impact, while long or expensive sequences will not. Data structures often have runtimes that are a function of a single variable, commonly denoted by N , representing the size of the structure. This letter makes no appearance in our construction or analysis, however, as it would be too restrictive—bounds that include multiple variables (e.g. graph algorithms) or more complicated competitive or instance-based analysis can all be accommodated in the framework of our result. For the same reason we treat space separately and can freely work with structures with non-linear space or whose space usage is not neatly expressed as a function of one variable. All of our update times are amortized, as doubling tricks are used in the construction. The amortization cannot be removed easily with standard lazy rebuilding methods, as in the cacheoblivious model one cannot run an algorithm for a certain amount of time as this would imply knowledge of B. A lower bound on the space usage is the total number V of memory changes that the structure must remember. If the number of updates is sufficiently long compared to ephemeral space usage (in particular, if V = Ω(U log 3 )), then our construction has a space usage of O(V log U ). Overview of methods. The basic design principle of a cache-oblivious data structure is to try to ensure as much locality as possible in executing operations. So, our structure tries to maintain locality. It does so directly by ensuring that the data of any consecutive range of memory of size w of any past version will be stored in a constant number of consecutive ranges of memory of size O(wlog 3 ). Our structure is based on viewing changes to memory as points in two dimensions, with one dimension being the memory address, and the other the time the change in memory was made. With this view, our structure is a decomposition of 2-D space-time, with a particular embedding into memory, and routines to update and query.
4
2
Preliminaries
First we define precisely what a cache-oblivious data structure is, and how its runtime is formulated. A cache-oblivious data structure manipulates a contiguous array of cells in memory by performing read and write operations on the cells of this array, which we denote by A. The array A represents the lowest level of the memory hierarchy. By the ideal cache assumption [11], cache-oblivious algorithms need not (and in fact must not) manage transfers between levels of memory; they simply work with the lowest level. Ephemeral primitives. The ephemeral data structure can be viewed as simply wishing to interact with lowest level of memory using reads and writes: • Read(i): returns A[i]. • Write(i, x): sets A[i] = x. Ephemeral cache-oblivious runtime and space usage. Given a sequence of primitive operations, we can define the runtime to execute the sequence in the cache-oblivious model. The runtime of a single primitive operation depends on the sequence of primitives executed before it, and is a function of M and B that returns either zero if the desired element is in a block in memory or one if it is not. This is computed as follows: for a given B, view A as being partitioned into contiguous blocks of size B. Then, compute the most recent M/B blocks that have had a primitive operation performed in them. If the current operation is in one of those M/B blocks, its cost is zero; if it is not, its cost is one. Space usage is simply defined as the largest memory address touched so far, which we require to be upper bounded by a linear function in the number of Write operations. Persistence. In order to support partial persistence, the notion of a version number is needed. We let V denote the total number of versions so far, which is defined to be the number of Write operations executed so far. Let Av denote the state of memory array A after the vth Write operation. Then, supporting partial persistence is simply a matter of supporting one additional primitive, in addition to Read and Write: • Persistent-Read(v, i): returns Av [i] We wish to provide a data structure that will support persistent operations with runtime in the cache-oblivious model as close as possible to that of structures that support only ephemeral operations.
3
Data Structure
We view the problem in two dimensions, where an integer grid represents space-time. We say there is a point labeled x at (i, v) if at time v a Write(i, x) was executed, setting A[i] = x. Thus the x axis represents space and the y axis represents time; we assume the coordinates increase up and to the right. Let P refer to the set of all points associated with all Write operations. In such a view P is sufficient to answer all Read and Persistent-Read queries. A Read(i) simply returns the 5
label of the highest point in P with x-coordinate i, and Persistent-Read(v, i) returns the label of the highest point in P at or directly below (i, v). We refer to the point (i, v) to be the value associated with memory location i at time v, that is, Av [i]. We denote by V the number of Write operations performed, which is the index of the most recent version. All of the points lie in a rectangular region bounded by horizontal lines representing time zero at the bottom and the most recent version at the top, and vertical lines representing array location zero (on the left) and the maximum space usage so far (on the right). At a high level, we store a decomposition of this rectangular region into vertically-separated square regions. For each of these square-shaped regions, we use a balanced hierarchical decomposition of the square that stores the points and supports needed queries, which we call the ST-tree. As new points are only added to the top of the structure, only the top square’s ST-tree needs to support insertion. As such, the non-top squares’ ST-trees are stored in a more space-efficient manner and as new squares are created the old ones are compressed.
3.1
Space-Time Trees
We define the Space-Time Tree or ST-tree which is the primary data structure we use to store the point set P and support fast queries on P . This tree is composed of nodes, each of which has an associated space-time rectangle (which we simply call the rectangle of the node). See Figure 1. The tree has certain properties: • Each node at height h in the tree (a leaf is defined to be at height 0) corresponds to a rectangle of width 2h . This implies all leaves are at the same depth. • Internal nodes have two or three children. An internal node’s rectangle is partitioned by its children’s rectangles. • A leaf is full if and only if it contains a point in P . An internal node is full if and only if it has two full children. If an internal node has three children, one must be full. • Some rectangles may be three sided and open in the upward direction (future time); these are called open, while four-sided rectangles are called closed. All open rectangles are required to be non-full. The above conditions imply that a node’s rectangle is partitioned by the rectangles of half the width belonging to its children which we call left, right, and if there is a third one, upper. Each node stores: • Pointers to the three children (left, right and upper) and parent. • The coordinates of its rectangle. Additionally, each leaf node (height 0 and thus width 1) stores the following • The at most 1 point in P that intersects the rectangle of the leaf. • The results of a Persistent-Read corresponding to the point at the bottom of the rectangle. Lemma 1. If a node at height h is full, then its rectangle intersects at least 2h points in P . 6
Figure 1: The integer grid representing a sequence of Write operations is shown for U = 16. Red circles represent points of P . Hollow circles represent the answers to the Persistent-Read queries at the bottom of each leaf node’s rectangle. Each rectangle is associated with a node in an ST-tree. Three subtrees of the ST-tree are also highlighted. The blue shaded rectangle (and the tree above) corresponds to the top tree. The green (plus red) shaded rectangle (and the tree below) corresponds to the bottom tree. The red shaded rectangle corresponds to the subtree shown to the left of the grid.
7
Proof. Follows directly from the fact that full nodes have at least two full children, full leaves intersect one point in P contained in their rectangle, and the rectangles of all leaves are disjoint and at the same level.
3.2
Global data structure
• The variable U will be stored and will represent the space usage of the data structure rounded up to the next power of two. • The data structure will store an array of ⌈ VU ⌉ ST-trees. The last one is called the top tree, and the others are called bottom trees. The tree’s root’s rectangles will partition the grid [1..U ] × [1..∞]. Bottom tree’s roots correspond to squares of size U ; the j-th bottom root corresponds to square [1..U ] × [((j − 1)U + 1)..jU ]. The top tree’s rectangle is the remaining three-sided rectangle [1..U ] × [((⌈ VU ⌉ − 1)U + 1)..∞]. • An array storing a log of all V Write operations which will be used for rebuilding. • A current memory array, call it C, which is of size U . The entry C[i] contains the value of A[i] at the present, and a pointer to the leaf containing the highest point in P in column i; which also contains the value of A[i] at the present. • A pointer p to the leaf corresponding to the most recent Persistent-Read.
3.3
How the ST-trees are stored
The roots of all ST-trees correspond to rectangles of width U and thus have height log U , and structurally are subgraphs of the complete 3-ary tree of height log U . The top tree is stored in memory in a brute force way as a complete 3-ary tree of height log U using a biased Van Emde Boas layout [11]. This layout depends on a constant 0 < ǫ < 1 and can be viewed as partitioning a tree of height h into a top tree of height ǫh and 3ǫh bottom trees of height h(1 − ǫ). Each of the nodes of these trees is then placed recursively into memory. This will waste some space as nodes and subtrees that do not exist in the tree will have space reserved for them in memory. Thus the top ST-tree uses space U log 3 . There will be a level of the van Emde Boas layout that includes the leaves and has size in the range 3log3 B = B to 3(1−ǫ) log3 B = B 1−ǫ nodes. Any path of length k in an ST-tree will be stored in O(1 + logk B ) blocks. Additionally, we have the following lemma: Lemma 2. Any induced binary tree of height h will be stored in O(1 +
2h ) B (1−ǫ)/ log 3
blocks.
Proof. There is a height h′ in the range log3 B to log3 B 1−ǫ whereby all induced subtrees of nodes ′ at that height will fit into a block. A binary tree of height h will have 2h−h trees of height ′ h′ , and 2h−h − 1 nodes of height greater than h′ . Each tree of height h′ is stored in memory in B consecutive locations and therefore intersects at most two blocks, thus, even if each node above height h′ is in a different the of blocks is stored in is total number the hsubtree block, h h 1−ǫ ′ 2 2 2 h−log B h−h 3 ) = O 2(1−ǫ) log3 B = O B (1−ǫ) log3 2 = O B (1−ǫ)/ log 3 . ) = O(2 O(2 The bottom trees are stored in a more compressed manner; the nodes appear in the same order as if they were in the top tree, but instead of storing all nodes in a complete 3-ary tree, only those 8
nodes that are actually in the bottom tree are stored. Thus the size of the bottom tree is simply the number of nodes in the tree. The facts presented for the top trees and Lemma 2 also hold for the compressed representation.
3.4 3.4.1
Executing operations Read(i)
We simply return C[i]. 3.4.2
Persistent-Read(v, i)
The answer is in the tree leaf containing (i, v), the point associated with this operation. We find this leaf by moving p in the obvious way: move the pointer p to parent nodes until you reach a rectangle containing (i, v) or the root of a tree. If you reach a root, set p the root of the appropriate tree, the ⌊v/U ⌋th one. Then move the pointer p down until you hit a leaf. The answer is in the leaf. 3.4.3
Write(i, x)
We call a node a top node if its rectangle is open. We will maintain the invariant that no top node is full. Since Write modifies P by adding a point above all others, this guarantees that the new point will intersect only non-full nodes. The insertion begins by checking for two special cases. The first is if the memory location i is larger than U , which is an upper bound on the highest memory location written so far. In this case, we apply the standard doubling trick and U is doubled, all the trees are destroyed and re-built by re-executing all Writes which are stored in the log as mentioned in Section 3.2. The current memory array C is also doubled in size. The other special case is when once every U operations the point associated with the Write is on the (U + 1)-th row of the rectangle of the top tree and a new top tree is needed. In this case, the representation of the existing top tree is compressed and copied as a new bottom tree, removing any unused memory locations, and the top tree is reinitialized as a binary tree where the leafs contain values stored in the corresponding cells of C. Then the main part of the Write operation proceeds as follows. • Sets C[i] = x • Increments V • Follows the pointer to the leaf containing (i, V ). Note that this is a top node. Add the data to this leaf and mark it full. This means that the point set P now contains the new Write, as it must. However, it is a top node and is full, which violates the previously stated invariant. We then use the following general reconfiguration to preserve that fact that top nodes can not be full.
9
Reconfiguration. When a node becomes full we mark it as such and proceed according to one of the following two cases. 1) The parent of the node already has three children, in which case the parent also becomes full. We recurse with the parent. 2) The parent of the node has only two children, which implies the parent is still not full. At this point, we close all open rectangles in the subtree of the current node and add a third child to the parent. (These procedures are explained below in more detail.) The rectangle of the new child is on top of the rectangle of the current node. In other words, when a leaf node becomes full this leads to that leaf node and 0 or more of its ancestors becoming full. We recurse until the highest such ancestor node p and close every open rectangle in its subtree, and add as the third child to the parent of p a sibling node whose open rectangle is on top of p’s newly closed rectangle. See Figure 2. Closing rectangles. We close the open rectangles in the subtree of a node by traversing the 2 children of each node that correspond to open rectangles and changing the top side of each open rectangle from ∞ to V . Adding the third child. In order to add a third child to a node at height h + 1, we create a complete binary tree of height h whose root becomes the third child. Note that the leafs of this tree contain the answers to the Persistent-Read queries at the corresponding points. We copy this information from C. Lemma 3. The amortized number of times a node at height h + 1 gains a third child following an insertion into its subtree is O( 21h ). Proof. We use a simple potential function where adding a third child on top of a node p at height 1 for each h has cost 1, and all top nodes at height l ≤ h in the subtree of p have a potential of 2h−l full child they have. Observe that each step during the handling of an insert, a top node with two full children becomes no longer a top node and it is marked as full. Thus, since the potential difference at the level of p matches the cost, the amortized cost at any other level is zero except for at the leaf level where the amortized cost is 21h .
4 4.1
Analysis Space usage
Lemma 4. The space usage of a bottom tree is O(U log U ). Proof. Since each internal node in the tree has at least two children, we only need to bound only the number of leaves in the tree. The leaves of the tree can be partitioned into two sets depending of whether or not they are full. Recall that a leaf is full if and only if its rectangle intersects a point in P . If a leaf is not full, we call it an empty leaf. Note that each empty leaf contains the answer to the Persistent-Read operation at the bottom of its rectangle. There are U points from P in the square of each bottom tree and we will charge the empty leaves to these U points. Observe that an empty leaf is only created when we create the initial tree or when we create a new complete subtree as the third child of a node in the tree. In the former case, there are at most U leaves and this happens once per each tree. In the latter case, call the third child q and denote by p the sibling of q that has a rectangle below the rectangle of q. Note that by construction, p is full. Assuming q 10
Figure 2: The integer grid and the resulting tree as a result of executing a sequence of Write operations.
11
and p are of height h, by Lemma 1, there are at least 2h points from P in the rectangle of p. We can charge the 2h leaves created in the subtree of q to these points. How many times can such a point be charged? It is charged at most once per level. There are log U levels. Thus, there are at most O(U log U ) leaves in a bottom tree. Since each bottom tree is stored contiguously in space linear in the number of leaves, the lemma follows. Lemma 5. The space usage of the entire structure is O(U log 3 + V log U ). Proof. The top tree uses space O(U log 3 ). From Lemma 4, the O(V /U ) bottom ST-trees use space O(U log U ) each. The log of Write operations is size O(V ). The current memory array is size O(U ). Other global information takes size O(1).
4.2
Comments on memory allocation
According to the ideal cache assumption, in the cache-oblivious model the runtime is within a constant factor of that which has the optimal movement of blocks in and out of cache. Thus, we can assume a particular active memory management strategy and a runtime of using this strategy is an upper bound of the runtime in the cache-oblivious model. In particular, we can base our assumed memory management strategy on the one an ephemeral structure would use on the same sequence of operations, with particular memory and block sizes. Globally, we assume the memory M is split into three equal parts, with each part dedicated to the execution of each of the three primitive operations.
4.3
Analysis of Read
Theorem 6. Suppose a sequence X of Read operations takes time T (M, B) to execute ephemerally in the cache-oblivious model on a machine with memory M and block size B and an initially empty memory. Then in our structure, the runtime is O(T (M/3, B)). Proof. Since executing Read operations is simply done using the current memory buffer, and there is a memory of size M/3 allocated for Reads, this gives the result.
4.4
Analysis of Persistent-Read
Lemma 7. Consider the leaves of the ST-tree corresponding to a subarray of ephemeral memory of size w at a fixed time v (Av [i . .. i + w − 1] for somei). These leaves and all of their ancestors w in the ST-tree are contained in O B (1−ǫ)/ blocks log 3 + log B U
Proof. Let L be the set of all points corresponding to the memory locations Av [i . . . i + w − 1]. There are two disjoint rectangles corresponding to nodes in the ST-tree with width at most 2w such that the two rectangles contain and partition all elements of L. Let xl and xr be the nodes corresponding to these two rectangles. Those leaves in subtrees of xl and xr corresponding to points in L are the leaves of two binary trees with roots xl and xr . Since the height of xl and xr are at w most 1 + log w, by Lemma 2 any binary tree rooted at them will occupy at most O 1 + B (1−ǫ)/ log 3 blocks. The nodes on the path between xl and xr are stored in O(logB U ) blocks.
12
Theorem 8. Suppose a sequence X of Persistent-Read operations executed on the same version takes time T (M, B) to execute ephemerally in the cache-oblivious model on a machine with memory M and block size B and an initially empty memory. Then in our structure, the runtime is ! ! 1 1−ǫ 3M , B log 3 (1 + logB U ) O T 1−ǫ B 1− log 3 (1 + logB U ) M
Proof. Consider executing X ephemerally with memory keeps in memory
3(1+logB
M
1−ǫ
and block size B log 3 . It
M 3(1 + log B U )B 3(1 + logB U )B B 1−ǫ 1−ǫ ephemeral blocks. Each ephemeral block is stored in O B log 3 /B log 3 = O(1) blocks; including the ancestors of the block this becomes O(1 + logB U ). Now, by Lemma 7 the memory blocks M consecutive memory locations of length B in needed to keep the leaves representing 3(1+log B U )B the persistent structure and their ancestors is 1−ǫ 1− log 3
1
1− 1−ǫ U )B log 3
·
1−ǫ log 3
=
M M · (1 + logB U ) = . 3(1 + logB U )B 3B M Thus they can all be stored in the third of memory allocated to Persistent-Read operations ( 3B blocks), and thus moving p to any location in the memory in the ephemeral structure will involve walking it entirely though locations in memory in the persistent structure, and thus will have no cost. Now supposethe ephemeral structure moves one block into memory at unit cost. This could 1−ǫ 1−ǫ require moving O B log 3 /B log 3 + logB U = O(1 + logB U ) blocks to move the associated leaves and their ancestors in the persistent structure into memory by Lemma 7. Thus, this is the slowdown factor the persistent structure will incur relative to the ephemeral.
4.5
Analysis of Write
We charge the cost of rebuilding to Write operations. This only increases their cost by a multiplicative constant factor since we double U after Ω(U ) Writes. Every U Writes, we make the existing top tree a bottom tree and create a new top tree. Three steps involved in this process are closing the rectangles of the top tree, compression of the top tree to a bottom tree, and initializing a new top tree and populating the leafs. The following lemmas bound the cost of these steps. Lemma 9. Copying performed to populate the leafs of a newly created subtree of height h, or closing 2h the open rectangles in the subtree of a node at height h costs O B (1−ǫ)/ log 3 block transfers.
Proof. We only close rectangles that are open and we copy nodes from the array C to leafs of the new subtree which only has open rectangles. If a node is closed all of its children are closed and do not need to be traversed. Given any node, since at most 2 out of its potentially 3 children can be open, the tree we traverse to close rectangles or copy nodes is a binary tree. By Lemma 2, each of 2h these tasks costs O B (1−ǫ)/ log 3 block transfers. 13
U log U
block transfers. U blocks by Proof. The top tree is initially a complete binary tree and is stored in O B (1−ǫ)/ log 3 Lemma 2. We have to account for the additional blocks used to store the elements inserted and nodes created in the top tree after it was created. By Lemma 3, O( 2Uk ) nodes are added to the top tree at height k. When a node at height k is added to the top tree, it is the root of a complete binary search tree and its subtree is stored in O 2k /B (1−ǫ)/ log 3 blocks by Lemma 2. This implies that added nodes take an additional log XU U U log U 2j =O · 2j B (1−ǫ)/ log 3 B (1−ǫ)/ log 3 j=1 Lemma 10. Compression of the top tree into a bottom tree costs O
B (1−ǫ)/ log 3
blocks.
Theorem 11. Suppose a sequence of k Write operations takes time T (M, B) to execute ephemerally in the cache-oblivious model on a machine with memory M and block size B and an initially empty memory. Then in our structure, the runtime is ! ! 1 1−ǫ M log U 3 log 3 + k 1−ǫ O T 1−ǫ , B 1− log 3 B B log 3 Proof. To find the item, the analysis is similar to that of Persistent-Read, except we can use the pointers in C to directly go to the item. This removes the logB U terms. We need to bound the cost of operations performed to maintain the ST-tree. Recall that after each insertion, we go up the tree until we find an ancestor p such that its parent is not full. Then, we create a new sibling node q whose leafs are populated with the copies of values stored in the corresponding top leaf nodes of p, and close the rectangles in the subtree of p. Letting the height of p’s parent be h, we refer to this sequence of operations as the expansion of a node at height h. The creation of the new top tree occurs once every U Writes, and thus the amortized cost per Write is O(log U/B (1−ǫ)/ log 3 ) by Lemma 10. Since the expansion of a node at height h happens because there have been at least 2h insertions since the node was created, by Lemma 9, the amortized cost of a Write operation is at most log XU 1 log U 2j log U . · 1−ǫ = O 1−ǫ + 1−ǫ 2j B log 3 B log 3 B log 3 j=1 We also consider the case when all Write operations are executed in unique memory cells. This is a reasonable assumption when we have update operations involving numerous Write operations where for each memory cell we only need to remember the last Write executed during that update operation. Theorem 12. Suppose a sequence of Write operations performed on unique cells of A takes time T (M, B) to execute ephemerally in the cache-oblivious model on a machine with memory M and block size B and an initially empty memory. Then in our structure, the runtime is ! 1−ǫ ! 1 log 3 M B 3 logB U O T , 1− 1−ǫ B log 3 log B log B 14
Proof. Observe that given M ′ , M ′′ , B ′ , B ′′ if B ′ ≤ B ′′ and M ′ /B ′ = M ′′ /B ′′ , then T (M ′ , B ′ ) ≥ T (M ′′ , B ′′ ). Thus, since no two Write operations overlap, we have in the worst case that 1−ǫ ! 1−ǫ 1 B log 3 B log 3 3M , · . k≤T 1−ǫ log B B 1− log 3 log B log B The result then follows by substituting k in Theorem 11.
References [1] Rudolf Bayer and Edward M. McCreight. Organization and Maintenance of Large Ordered Indices. Acta Inf., 1:173–189, 1972. [2] Michael A. Bender, Richard Cole, and Rajeev Raman. Exponential Structures for Efficient Cache-Oblivious Algorithms. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming, pages 195–207, 2002. [3] Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-Oblivious B-Trees. SIAM J. Comput., 35(2):341–358, 2005. [4] Michael A. Bender, Ziyang Duan, John Iacono, and Jing Wu. A locality-preserving cacheoblivious dynamic dictionary. J. Algorithms, 53(2):115–136, 2004. [5] Gerth Stølting Brodal, Konstantinos Tsakalidis, Spyros Sioutas, and Kostas Tsichlas. Fully persistent B-trees. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 602–614, 2012. [6] S´ebastien Collette, John Iacono, and Stefan Langerman. Confluent persistence revisited. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 593–601, 2012. [7] Erik D. Demaine, John Iacono, and Stefan Langerman. Retroactive data structures. ACM Transactions on Algorithms, 3(2), 2007. [8] James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, and Robert Endre Tarjan. Making data structures persistent. J. Comput. Syst. Sci., 38(1):86–124, 1989. [9] Amos Fiat and Haim Kaplan. Making data structures confluently persistent. J. Algorithms, 48(1):16–58, 2003. [10] Michael L. Fredman, Robert Sedgewick, Daniel Dominic Sleator, and Robert Endre Tarjan. The Pairing Heap: A New Form of Self-Adjusting Heap. Algorithmica, 1(1):111–129, 1986. [11] Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cacheoblivious algorithms. ACM Transactions on Algorithms, 8(1):4, 2012. [12] Neil Sarnak and Robert Endre Tarjan. Planar Point Location Using Persistent Search Trees. Commun. ACM, 29(7):669–679, 1986.
15
[13] Daniel Dominic Sleator and Robert Endre Tarjan. Self-Adjusting Binary Search Trees. J. ACM, 32(3):652–686, 1985. [14] Robert Endre Tarjan. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM, 22(2):215–225, 1975.
16