E cient dynamic method-lookup for object oriented ... - Semantic Scholar

1 downloads 0 Views 227KB Size Report
Theorem 7. There exists a deterministic data structure for solving the dy- namic colored-ancestors problem using O(nC + D) space. Find(p; c) takes. O( logn.
Ecient dynamic method-lookup for object oriented languages? (Extended Abstract)

Paolo Ferragina 1 and S. Muthukrishnan 2 1 2

Dipartimento di Informatica, Universita di Pisa, Italy. [email protected] Dept. of Computer Science, Univ. of Warwick, UK. [email protected]

1 Introduction We consider the following dynamic data structural problem. We are given a rooted tree of n nodes and a set f1; 2; : :P:; Cg of colors. Each node u has a subset of these colors, say of size du, and u du = D. Note D  C . The problem is to dynamically maintain this tree under updates, that is, insert(p; c) and delete(p; c) operations, and answer find(p; c) queries. The operations insert(p; c) and delete(p; c) respectively add and remove the color c from the node pointed to by pointer p (the tree does not change topology under these dynamic operations). The find(p; c) query returns the nearest ancestor, if any, of the node pointed to by p (possibly that node itself) which has the color c, 1  c  C . If no such ancestor exists, Find(p; c) returns Null. We call this the dynamic colored-ancestors problem. (If update operations are not allowed, we have the static colored-ancestors problem). The dynamic colored ancestors problem is an abstraction of a problem in compilers for object oriented languages (OOLs) like Smalltalk and Objective C. There this problem is called the dynamic method look-up problem (described below in context). Since nearly every statement of such programs may rely on answering a query in the dynamic method look-up problem, fast dynamic method look-up is a major issue for implementing these OOLs. In this paper, we present highly ecient algorithms for the dynamic method look-up problem that show a range of deterministic as well as randomized tradeo s amongst the crucial practical parameters for this problem, namely, update time, query time and space utilization. Our results are the rst nontrivial theoretical results for this fundamental problem. In practice, only ad hoc techniques have been employed thus far in implementing dynamic method look-up; we expect our theoretical solution to help improve performance in practice as well due to their extreme simplicity. Our results are achieved by establishing tradeo s for the problem of maintaining nested intervals dynamically which may be of independent interest. ?

The rst author was partially supported by MURST of Italy. The second author was partially supported by ESPRIT LTR Project no. 20244 - ALCOM-IT.

Background and Relevance. Object-oriented languages (OOLs) are popular in

software development (See [5] on OOLs). The modular units are abstract data types called classes, comprising data and functions (or selectors in the OOL parlance); each selector has possibly multiple implementations (or methods) each in a di erent class. When a selector s is invoked in a class c, the relevant method for s inherited by c has to be determined. That is the fundamental problem of method-lookup. OOLs support single or multiple inheritance depending, respectively, on whether a class can inherit from only one superclass, or several superclasses. Because of its conceptual simplicity, single inheritance is a popular choice, which we have adopted here. Abstractly, the hierarchy of classes with single inheritance is represented as a rooted tree in which each node is a class. Function invocations in OOLs are replaced by message expressions each of which is a receiver-selector pair hr; si. Here, r is an instance of some class and s is some selector. Say the class of r is p. Then resolving this message expression invokes the method implementing s in p, or that inherited by p. Formally, a method for some selector s that is inherited by a class p is that in the nearest ancestor, if any, of p in the hierarchy tree which has an implementation for s. When abstracted that gives the colored ancestor problem above where each node is a class, each distinct color is a selector and each instance of a particular color c is a method. From now on we refer to the method-lookup and colored-ancestors problems interchangeably. Since nearly every statement in a purely OOL may involve resolving a message expression (including, assignments, integer addition etc.), implementations spend considerable time on method-lookup (time spent on this varies depending on applications, languages etc. Some gures are: 23% in SOAR system [19], 26:9% in C++ programs [17], over 50% in [20]). We study a dynamic version of the method-lookup problem in which the hierarchy tree is xed but classes can be frequently modi ed by addition or removal of methods. 3 This dynamic formulation can be easily reduced to the dynamic colored-ancestors problem: adding/removing a method in a class is inserting/deleting a color in a node of the (hierarchy) tree; resolving a message expression hr; si is answering FIND(p; c), where p is the (class) node of r and c is the color associated with s. There are two chief scenarios where this dynamic method-lookup problem arises. First consider the compiling phase. There the compiler starts with a \base" hierarchy tree provided by the language and the programs typically change the classes in this hierarchy by adding/removing methods. 4 So resolving a message expression hr; si even for statically-typed OOLs, where the class of It is quite reasonable to consider also the version of colored ancestors problem where the tree hierarchy itself is modi ed by adding/deleting classes/nodes although such changes are relatively rare. Here we do not explicitly consider that problem, however, our solutions extend to that version by giving the same trade-o s when O(log n) arbitrary changes are made on the tree. 4 Programs might add new functions, or completely delete all methods associated with a function. These cases are trivially handled by our data structures. So, we focus here only on the case when methods are added/deleted. 3

r is known at the compile-time, requires solving the dynamic colored-ancestors problem. Second consider any software development environment (see Chapter 9 in [4]) or a continuously running application [12]; here, developers and users modify the classes on-line in response to changing demands. There too method-lookup involves maintaining the hierarchy under insertion or deletion of methods so as to answer lookup queries eciently. Whatever be the scenario, the dynamic method-lookup can be easily solved using the static colored ancestors problem provided the hierarchy is recompiled for each update operation. This is unacceptable since the hierarchies are huge and recompilation is prohibitively time-consuming (recompilation can be done \while having lunch", or for large inheritance structures \overnight" { Page 57 in [17]). It is preferable to have a method-lookup procedure that incrementally adjusts the methods being added/removed; this is most appropriately addressed within the paradigm of dynamic algorithms, as we do here. Status and Our Results. As a benchmark, we rst review the bounds for the

static colored ancestors problem. Two straightforward algorithmic approaches to this problem are as follows: A. For each possible (p; c) pair, we precompute the value returned for Find(p; c) and store that in a two dimensional table indexed by p and c. Subsequently Find(p; c) is answered by looking up this table. B. Any Find(p; c) query is processed by traversing up the tree from p checking at each ancestor if that contains c. The best bounds by these approaches as well as known non-trivial bounds for this problem are as follows: 5 Table 1 Algorithm Type Space Query Time A Deterministic O(nC ) O(1) B Randomized (n + D) O(n) [14] Randomized (n + D) O(log p log n) [14] Deterministic (n + D) O( log n) For the dynamic colored ancestor problem, the known bounds are only those obtained by extending the approaches for the static version. 6 We summarize these below. Table 2 Algorithm Type Space Update Time Query Time A Deterministic O(nC ) O(n) O(1) B Randomized (n + D) O(1) O(n) [14] Deterministic (n + D) O(log n) O(log n) [1] Deterministic (n + D) O( logloglogn n ) O( logloglogn n ) A number of details have been left out in citing these results. For instance, in [14], the preprocessing algorithm is randomized and the queries are processed deterministically. Also, they have varying preprocessing times. 6 In [1], they solve the problem of maintaining a set of well-balanced parentheses dynamically; that can be used to solve the dynamic colored-ancestors problem. 5

In this paper, we design new data structures for this problem and exhibit a range of tradeo s among update times, query times and space utilization. Here we list some selected tradeo s ( < 1 is any positive constant). Table 3 Algorithm Type Space Update time Query time Section 4 Det O(nC + D) po(n ) O(log log n) log n Section 4 Det O(nC + D) O( log n1+ ) O( p ) log log n  Section 5 Det (n + D) o(n ) O( log n) Section 4 Rand (n + D) o(n ) O(log log n) log n Section 4 Rand (n + D) polyloglog(n) O( log log ) log n In fact, we prove general theorems, namely, Theorems 7 and 8, in terms of a parameter which is set judiciously to get the tradeo s above (actually we obtain slightly stronger bounds for these cases). The last three lines deal with results that use optimal space. Since in practice O(nC ) space turns out to be prohibitively large 7, there is a need to optimize space usage; hence, we focus on optimal space algorithms. Our noteworthy technical achievements are the tradeo s in line 3 and 4 above where we achieve the same query time as for the static version (with and without randomization respectively) with o(n ) update time and optimal space. We brie y comment on the potential relevance of our results in practice. Since method look-up is of fundamental importance, a number of practical solutions have been proposed and evaluated. (Over half a dozen papers have appeared in recent premier conferences on OOLs, e.g., OOPSLA and ECOOP. See [7, 13, 17] for a survey of these together with existing practical approaches and performance comparisons.) Their approach to dynamic method-lookup is to essentially solve the static method-lookup for the large \base" hierarchy as eciently as possible and then handle updates separately in some ad hoc manner, so periodically relevant portions of the hierarchy may be recompiled [7, 17]. While these methods are shown to give extremely good performance for the static problem, convincing evidence is missing on their performance in dynamic situations. Indeed their approach to solving the static version seem to additionally complicate the dynamic case. For instance, in practice method A is used and the table is stored in a compact form so as to use a realistic amount of space. However, this compact form is dicult to update: a sophisticated algorithm has to be invoked to recompact the table of (nC ) entries ! 8 We believe a careful experimental study should be performed to evaluate various approaches to the dynamic method-lookup problem in practice. We expect our theoretical approach to be attractive in practice as well. For common hierarchies such as the NeXTStep 3.2 (for Objective-C) and Objectworks 4.1 (for Smalltalk), O(nC ) space is 8 and 16 Megabytes respectively! [14] 8 Minimal compaction is NP-hard. Various heuristics are used in practice. See [7, 17] for more details. 7

Technical Overview. Consider the following geometric stabbing problem . We are given a set of nested intervals with distinct integer endpoints from a Universe 1   U . This set must be maintained under update operations: insert (delete) which adds (removes) an interval in this set. A Stab(i) query returns the smallest interval in the current set that contains integer i (which is well-de ned since the intervals are nested). The dynamic colored-ancestor problem can be easily reduced to the stabbing problem above with U = 2n (recall that the hierarchy tree is xed). Consider the Euler Tour of the rooted (hierarchy) tree. We associate the interval [f; l] with each node which occurs in the Euler Tour rst at the f th position and lastly at the lth position. Note that 1  f < l  2n. For each color c, 1  c  C , Sc is de ned to be the set of all intervals associated with the tree nodes that have color c. Clearly the intervals in set Sc are nested and their endpoints are distinct. Inserting or deleting a color c in a node u corresponds to inserting or deleting the interval of u in Sc . To answer a Find(p; c) query, it suces to determine the smallest interval in Sc that contains the left (or equivalently, the right) endpoint of the interval associated with the node pointed to by p, that is, it suces to answer Stab(i) in Sc where i is the left (or, right) endpoint of the interval at the node pointed to by p. The stabbing problem can be solved using a number of well-known data structures such as interval tree [8], segment tree [2], interval trie [15], etc. (see [3] for an excellent survey). However, these data structures were designed for handling the more general situation in which the intervals are not necessarily nested, and they possibly share endpoints. Thus they give weak bounds (that is, only comparable to the bounds in Table 2) for our constrained problem. Our bounds are achieved by adopting the interval trie structure due to Overmars [15]. At the high level, our problem divides into subproblems much the same way as the general stabbing problem in [15]. However, because of our specialties, the subproblems can be represented by an appropriate \sample" of intervals. We develop a bucketing strategy that deterministically maintains these sample intervals to represent collections of nested intervals in the subproblems; this allows us to design a space-optimal solution.

2 Preliminaries Since we are working on a xed and small universe U = 2n, we can take advantage of some known results that exploits the power of RAM model. We start by recalling two data structures that allow all the retrieval and update operations commonly associated with binary trees to be implemented in sublogarithmic time complexity.The query operations we are interested in are: Succ(k) that retrieves the smallest element in a set with value greater than k; and, Pred(k) that retrieves the greatest element in a set with value smaller than k. The following tradeo s between the space/randomization and update times can be achieved. Theorem1. Given N distinct integers in the universe 1   U ,

1. The q-fast trie uses O(N ) pspace and performs insertions, deletions, Succ and Pred operations in O( log U ) time deterministically [18]. 2. The Van Emde Boas data structure uses O(U ) space and performs insertions, deletions, Succ and Pred operations in O(log log U ) time deterministically. With randomization, space becomes O(N ) with no change to other bounds [16].

The stabbing problem we consider can be equivalently formalized as the dynamic parenthesis matching problem [10]. We use the following simple result on this problem. (The best known bound for this problem is in [1]. That does not help us obtain improved tradeo s, so we omit them here.)

Theorem 2. [10] The parentheses tree data structure solves the dynamic parentheses problem using O(N ) space in O(log N ) time for each operation. Our deterministic and randomized solutions will take as starting point an elegant and simple solution of Overmars [15] to the general stabbing query problem, that is, one in which Stab(p) returns the list of all the intervals that contain p in a set of general (not necessarily nested) intervals. In the theorem below, the parameter F (U ) can be properly chosen to get di erent tradeo s.

Theorem 3. [15] Let V be a set of N intervals in the universe 1   U . The Interval Trie data structure performs insertions and deletions of intervals in V taking O(F (U ) + log log U ) time. Stab(p) can be answered in O( F UU + R) time, where R is the number of reported intervals. The total space required is O(U F UU + N F (U )). Using randomization, the space becomes O(N F (U )). log log ( )

log log ( )

3 A general improvement In this section, we describe a general reduction that converts any data structure for our stabbing problem into another that uses less space (modulo linear space) at the cost of a small slowdown in update and queries. We use the property that the intervals are nested and that their endpoints are distinct.

Theorem 4. Let V be a set of N nested intervals having distinct endpoints in the universe 1   U . Let D(U; N ) be a data structure that uses sp(U; N ) space, performs insertion and deletion of intervals in id(U; N ) time, and answers a Stab(p) query in sq(U; N ) time. For any positive integer parameter K , we can ; U )) space, performs design a data structure D0 (U; N; K ) that uses O(N + p sp( N K N insertion and deletion of intervals in O(idp (U; K ) + log U + log K ) time and answers a Stab(p) query in O(sq(U; NK ) + log U + log K ) time. Proof. (Sketch) Let EV be the ordered sequence of endpoints of the intervals

in V . We partition EV into subsets EV1; EV2 ; : : : of size K each, where EVi

contains all the endpoints in EV between the ((i? 1)K +1)-th to the iK -th one (for i = 1; : : :; NK ). Let Vi be the set of intervals which start or end in EVi ; Vi is of size O(K ). The set Vi can be partitioned into three subsets called RRi; Pi; LLi . Set RRi (resp. LLi ) contains all the intervals in Vi which have only their right (resp. left) endpoint in EVi . Set Pi contains all the intervals in Vi whose both endpoints belong to EVi . Since the intervals in V are nested, intervals in LLi do not intersect the ones belonging to RRi { in fact, the former lie entirely to the right of the latter. Furthermore, all the intervals in LLi (resp. RRi) enclose the largest (rep. smallest) endpoint in EVi . We build a separate parentheses tree on each set Vi (Theorem 2), and a q-fast trie (Theorem 1) on the set formed by the smallest and largest endpoints of all the EV 's. Both use O(N ) space in total. Thep latter allows the retrieval of the set EVi which contains a given endpoint in O( log U ) time. Furthermore, we build the data structure D(U; KN ), described in the theorem, on a sample subset of the intervals in V of size O(N=K ). This sample comprises four intervals from each set Vi : the smallest and largest intervals in RRi , and the smallest and largest intervals in LLi . In a sense, the sample of these four intervals contain all necessary information about the intervals in LLi and RRi . Note that the samples from RRi are nested as are the samples from LLi . The whole set of data structures gives D0 (U; N; K ). Insertions and deletions on D0 can be done paying attention to the fact that we need to maintain appropriate samples for each Vi . Maintaining each set Vi of size O(K ) under the update operations presents some diculties. This requires split and merge operations of a set when it becomes too large or too small. The following can be accomplished. The sets EVi and EVj , with i  j , that should contain the endpoints of the inserted (deleted) interval can be found in O(plog U ) time. The cost of updating the parentheses tree built on Vi and Vj is O(log K ). Splitting (and merging) both sets EVi and EVj takes amortized time O(log K ). The changes, if any, needed in D(U; N=K ) for the sampled intervals from Vi and Vj (there are at most eight such intervals) take O(id(U; N=K )) time in all. Finally, all amortized bounds can be converted to worst case ones by performing \lazy" split and merge operations (details in the full paper, see also [9]). Processing Stab(p) query is more involved. If p lies between two consecutive EV 's sets, then Stab(p) returns the smallest interval (if any) containing p in D(U; N=K ) which can be retrieved in O(sq(U; N=K )) time. For the restpof this proof, assume p lies within some set EVi : this set can be found in O( log U ) time using the q-fast trie. If an interval in Vi contains p { this interval is found by using the parentheses tree built on Vi { then we can immediately conclude that this interval is the answer to Stab(p) query because of the nesting property. Otherwise, we have to cope with the more general situation in which a long interval, which spans more than one EV 's, contains p. By de nition, this interval belongs to some LLj (or equivalently, it belongs to some RRj +h , with h > 0)

and thus we can use data structure D(U; N=K ). First, we answer Stab(p) in the sampled set { this requires O(sq(U; N=K )) time. Second, let EVj and EVj be the two sets containing the endpoints of the retrieved interval. We answer two Stab(p) queries in sets Vj and Vj , and return the smallest interval of the two retrieved ones. We claim that this interval is the answer to the original Stab(p) query. Observe that the last two Stab(p) queries on sets EVj and EVj are necessary because of our sampling from V . ut 0

0

0

If we use the randomized Van Emde Boas tree in place of the q-fast trie in the discussion above, we achieve the following result: Corollary 5. A randomized data structure D00 (U; N; K ) can be designed that N uses O(N + sp( K ; U )) space, performs insertion and deletion of intervals in O(id(U; KN ) + log log U + log K ) time and answers a Stab(p) query in O(sq(U; KN ) + log log U + log K ) time.

4 First solution: Very fast query We modify the interval trie data structure in Theorem 3 for our problem. (If we used the interval trie without modi cations, Stab(p) query takes O(n) time in the worst case.) Let F (U ) be a positive integer. Consider V as the set of N intervals in the universe 1   U which are nested and their endpoints are distinct. We split the set V in subsets V1 ; V2; : : :, where Vi contains all the intervals of length between F (U )i?1 and F (U )i. We will treat each Vi separately. For each Vi , we divide the universe in equal parts of length F (U )i?1. An interval has its left endpoint in one part, its right endpoint in another part and overlaps at most F (U ) parts. For each part in Vi we have three lists: L (resp., R) containing all the left (resp. right) endpoints of the intervals in Vi starting (resp. ending) in this part, and LR containing all the right endpoints of the intervals in Vi starting in this part (i.e., having left endpoint in L). See Figure 1 for an illustrative example. We store each list LR in a q-fast trie thus using O(N ) total space (Theorem 1). Then, let Li (resp., Ri ) be the set of endpoints contained in all the lists L (resp. R) of the parts in Vi . We build a Van Emde Boas tree on Li (resp. Ri ) denoted BLi (resp. BRi ). Therefore, we have two Van Emde Boas trees per set Vi , thus using O(U loglogF (UU ) ) space in total. Clearly, the leftmost endpoint of each list L (and the rightmost endpoint of each list R) can be easily maintained under the update operations by using BLi (resp. BRi ) and by exploiting the total ordering among the endpoints in Li (resp. Ri ); hence their retrieval takes O(1) time. Note that the intervals in Vi overlapping a given part are nested. Therefore, for each part we maintain the smallest interval in Vi that covers this part; this interval is denoted S . That removes the additive O(NF (U )) term from the space complexity of the Interval Trie (Theorem 3). The resulting data structure

F(U)

U F(U) i-1

An example of interval in Vi

S U R

L

LR

A part with its lists L, R, LR and its interval S

Fig. 1.

An example of an interval and a part in Vi .

requires overall O(U loglogF (UU ) + N ) space. 9 Processing Stab(p) is a simple variation of the one for the Interval Trie. We scan the sets V1 ; V2; : : : in order. We are done as soon as we nd the rst Vh that has an interval overlapping the query point since for j > h, the set Vj contains intervals larger than the retrieved one in Vh . To retrieve this set Vh , we proceed as follows. For each examined set Vi , we determine the part that contains the query point in O(1) time by arithmetic operations. We check its lists L and R and its interval S , if any, in that order. Note that only one of the lists L or R may contain an interval overlapping the query point because of the nesting condition. L and R are processed similarly and here we describe the processing of L only. If the smallest endpoint in L is to the left of p { this can be checked in O(1) time as indicated above { then there exists an interval in L overlapping p. We retrieve Pred(p) in Li which is exactly the left endpoint of the smallest interval containing p because of the nesting condition and the de nition of list Li. We can perform this step using the Van Emde Boas tree BLi associated with the currently examined set Vi . If neither L nor R contain an interval covering p, 9

In the classical structure [15], the list of all the intervals overlapping a given part is maintained. That simpli es update operations but slows down queries signi cantly. In our solution here, we maintain S which speeds up the query time and reduces the space used; however, our approach faces some diculties in the design of update operations. We overcome these diculties by maintaining the list LR and by exploiting the specialties of our stabbing problem.

then we look at the smallest interval S . If S is well-de ned, that is, there exists an interval in Vi covering this part, then we are done because S is the smallest interval containing p. Otherwise (the part under consideration is not covered), we proceed to the next set Vi+1 . Therefore, processing each set Vi takes O(1) time if the desired interval is not found; otherwise, it takes O(log log U ) time to retrieve the desired interval from Li, Ri or S . Processing Stab(p) therefore takes O( loglogF (UU ) + log log U ) time. This data structure can be maintained under update operations provided that care is taken in maintaining the interval p S . We can show that the insertion of a new interval in V takes pO(F (U ) + log U ) time, and the deletion of an interval from V takes O(F (U ) log U ) time. Due to the lack of space we defer the details to the full paper. There we also show how to reduce the space required by a Van Emde Boas tree of a poly-logarithmic factor in the deterministic case, still preserving the same query and update bounds. Although this is only a small improvement, we use it for implementing Van Emde-Boas trees BLi and BRi of each set Vi . This way, we achieve an improved deterministic solution that reduces the space usage from O(U loglogF (UU ) ) to O(U + N ). We therefore conclude:

Theorem 6. Let V be a set of N nested intervals having distinct endpoints in the universe 1   U . There exists a deterministic data structure that uses O(U + N ) space, answers Stab(p) queries in O( loglogF (UU ) + log log U ) time, supports the inp sertion of a new interval in O(F (U ) + log U ) time and the deletion of an p interval in O(F (U ) log U ) time. For the randomized result, we use the randomized version of Van Emde Boas tree with dynamic perfect hashing [6]. Therefore we have all the ingredients to state our rst result about the dynamic colored-ancestors problem.

Theorem 7. There exists a deterministic data structure for solving the dynamic colored-ancestors problem using O(nC + D) space. Find(p; cp ) takes O( F nn + loglog n) worst-case time, Insert(p;pc) takes O(F (n) + log n) worst-case time, and Delete(p; c) takes O(F (n) log n) worst-case time with this data structure. There exists a randomized data structure for solving the dynamic colored-ancestors problem using (n + D) space. Find(p; c) takes O( F nn + log log n) time, Insert(p; c) takes O(F (n) + loglog n) time, and Delete(p; c) takes O(F (n) loglog n) time with this data structure. log log ( )

log log ( )

Proof. For each color c 2 C ,Pconsider the set of Nc intervals induced by the

nodes with color c (note that c Nc = D, see section 1). We use Theorem 6 with N = Nc , U = 2n on each set of intervals. We process Find(p; c), Insert(p; c) or Delete(p; c) operations by rst determining the interval associated with the node pointed to by p (in O(1) time), and then executing the appropriate operation on the data structure associated with color c. Summing up all these time complexities, we achieve the bounds stated in the theorem. ut

log n

The rst two lines of Table 3 are obtained by setting F (n) = 2 log log n and F (n) = log n respectively inlogTheorem 7. The last two lines of Table 3 are obn log log n and F (n) = log log n respectively. tained by setting F (n) = 2

5 Second Solution: Deterministic Optimal Space The solution presented in the previous section required O(n + Nc ) space to store the Nc intervals corresponding to the nodes colored c, for each 1  c  C ; therefore, summed over all colors this required O(nC + D) space in all, where D = Pc Nc (see Section 1). Our goal here is to design a data structure for that problem which uses (Nc ) space for each color c independent of the hierarchy size n. Then the entire data structure would merely occupy (n + D) space when summed over all the colors, still preserving ecient updates and queries. Consider the data structure in Section 4 for the set V of intervals associated with a color c. There are two sources for the additive term O(U ) = O(n) in the space complexity in Theorem 6. One is the Van Emde Boas trees that store sets Li and Ri . The other is the array that stores the parts in each set Vi so they may be retrieved in O(1) time. We can easily implement lists Li and Ri using q-fast tries thereby occupying O(Nc ) space in total (Theorem 1). Unfortunately, no deterministic scheme exists for storing the parts forming each set Vi in optimal space and guarantee O(1) access time (see the survey in [11] and observe that our randomized approach in Section 4 used perfect hash functions). The best we can do is to use the (compressed trie) data structure for storing a set of n integers in O(nk) space and support Insert, Delete and Retrieve operations in N ) time, where N is the size of the Universe and k, an integer parameter, O( log log k is the branching factor of the trie [11]. However, this slows down all bounds by N ) factor. a O( log log k In what follows, we adopt an alternate approach of simultaneously storing the parts of all Vi 's deterministically by a trie organization, which does not compromise on the performance while achieving optimal space. Our overall approach is to rst build a trie T , for each color c, having branching factor F (U ) (and therefore, height H = O( loglogF (UU ) )) and occupying space O(Nc F (U ) H ); subsequently, we reduce the space to (Nc ) by applying Theorem 4. Recall the partition of the set V of intervals into subsets V1 ; V2; : : :, as de ned in Section 4. As described there, for each set Vi , we partition the universe into parts of size F (U )i?1 and for each part, we have lists L, R and LR, and S , the smallest interval in Vi overlapping this part (hereafter called smallest covering interval ). Lists L, R and LR are stored in a q-fast trie each. Before describing T , we de ne a complete balanced tree T 0 of branching factor F (U ), which is a natural representation of the partition V1 ; V2; : : : in form of a tree (T 0 has height H ). The root of T 0 denotes the entire universe 1   U . Level i represents the partition of the universe into disjoint parts of size F (U )i?1 (leaves are at level 1). This way, a node in T 0 at level i denotes a part in Vi of

size F (U )i?1. Each such node thus has three lists L, R and LR and its smallest covering interval S , all of them de ned on set Vi and thus containing only intervals of length between F (U )i?1 and F (U )i. Notice that, the part denoted by a node w is divided into F (U ) parts of equal size and each of them is associated with a child of w. Therefore, a node w at level i + 1 is the parent of a node v at level i if the part of length F (U )i?1 denoted by v is enclosed in the part (of length F (U )i) denoted by w. Clearly all the parts containing a given point lie on a downward path in T 0. The nodes on this path can be identi ed by starting from the root of T 0 and performing a downward traversal driven by appropriate arithmetic calculations. Hence, all the algorithms designed in Section 4 can be easily extended to work on this complete tree organization (rather than on the array), by observing that each operation in Section 4 a ects (at most) F (U ) parts in a set Vi , and these parts now correspond to (at most) F (U ) sibling nodes in T 0 which are the children of at most two nodes at level i + 1. The drawback of this approach is that T 0 still requires O(n + Nc ) space. We reduce the space usage by pruning T 0 and thus obtain the trie T : If all associated lists (i.e., L, R, LR, and the smallest covering interval S ) of a node v are empty and all its descendant nodes have been deleted, then we delete v. We associate an array of size F (U ) with each internal node v; this array stores the (possibly NULL) pointers to the children of v in T . This ensures that any speci c child of node v can be retrieved in O(1) time. We claim that T uses O(H Nc F (U )) space. We account for that space by considering each interval in V . Namely, any interval in Vi can determine the existence in the trie T of at most F (U ) nodes at level i. Clearly at most two nodes at level i + 1 remain undeleted because of the presence of these F (U ) sibling nodes at level i. Consider now the path of length  H leading each of those nodes. Each node on that path is also not deleted and uses at most O(F (U )) space (to store the array of pointers to its children). Thus, at most Nc  2  H  O(F (U )) = O(H Nc F (U )) space is used for T in total. As in [18], we can safely assume that one can allocate or deallocate the memory space (i.e., an uninitialized array of size F (U )) of any internal node in T in O(1) time. That completes the description of T ; it remains to show how queries and updates are implemented using this pruned trie. The Stab(p) query is answered as in Section 4. The problem now is that we cannot determine in O(1) time the part in Vi (for any i) which contains the query point p because we have a pruned trie. However, as previously observed, the structure of T allows us to infer that all the parts (nodes) containing p lie on an downward path in T . Therefore, we traverse downward T and we drive this traversal by exploiting the way in which the part associated with a node w is regularly decomposed in its children. In each visited node (i.e., for each part), we proceed as indicated in section 4, taking into account the fact that we are proceeding from the largest part to the smallest one (i.e., from the root to a leaf of T ). We omit the precise p details here. We conclude that Stab(p) query can be answered in O(H + log U ) time; the rst pterm O(H ) comes from traversing downward T with p and the second term O( log U ) comes from manipulating

various lists stored as q-fast trie's (Theorem 1). The update of the pruned trie T under the insertion or the deletion of an interval is more involved, although based on the operations in Section 4. Consider inserting (or deleting) an interval in Vi . The key issue to be described is how to retrieve the parts in Vi that are a ected by the updating process. Note that the nodes associated with these parts might not exist in T (because their lists could be empty and their descendants deleted). Using downward traversals of T , we can either retrieve the nodes present in T associated with those a ected parts in Vi or else determine that these nodes do not exist in T . Since the F (U ) parts in Vi a ected by the updating process are the children of at most two (sibling) nodes at level i + 1 in T , two downward traversals suce for retrieving the (at most) F (U ) a ected parts in Vi . Once this has been accomplished, the algorithm proceeds as in Section 4 (Theorem 7). That leads to possibly installing two paths of length  H and F (U ) (at most) new sibling nodes at level i (these correspond to the a ected parts in Vi whose lists are no longer empty because of the update). Summing up all the timep bounds, we attain that a new interval can be inserted in Op(H + F (U ) + log U ) time, and an interval can be deleted in O(H + F (U ) log U ) time. Finally, we apply Theorem 4 with K = O(H F (U )) thus reducing p the space to (Nc ) which is optimal. As a result O(log F (U ) + log H + log U ) time is padded to the query and update bounds. Thus we conclude: (setting F (n) = 2 log n gives the the third bound in Table 3):

Theorem8. There exists a data structure that deterministically solves the dynamic colored-ancestors problem using (n + D) space. Find(p; c) takes O( loglogF (nn) + plog n + log F (n)) worst-case time. Insert(p; c) takes O( loglogF (nn) + F (n) + plog n) worst-case time. Delete(p; c) takes O(( loglogF (nn) + F (n)plog n) worst-case time. Acknowledgments: We sincerely thank Dr. Ian Maung for discussions on

OOLs.

References 1. A. Amir, M. Farach, R. M. Idury, H. La Poutre, and A. A. Scha er. Improved dictionary matching. Information and Computation, 119:258{282, 1995. 2. J. L. Bentley. Algorithms for the Klee's rectangle problems. Dept. Computer Science, Carnegie-Mellon Univ., unpublished notes, 1977. 3. Y. Chiang and R. Tamassia. Dynamic algorithms in computational geometry. Proc. IEEE, vol. 80, no. 9, 1992. 4. J. Coplien. Advanced C++ Programming Styles and Idioms. Addison-Wesley Publishing Company. 5. B. J. Cox and A. J. Novobilski. Object{Oriented Programming: An Evolutionary Approach. Addison{Wesley, Reading, MA, 1991.

6. M. Dietzfelbinger, A. Karlin, K. Mehlhorn, F. Mayer auf der Heide, H. Rohnert, and R. E. Tarjan. Dynamic perfect hashing: upper and lower bounds. In Proc. IEEE Symp. on Found. of Computer Science, 524{531, 1988. 7. K. Driesen. Method lookup strategies in dynamically{typed object{oriented programming languages. Master's thesis, Vrije Universiteit Brussel, 1993. 8. H. Edelsbrunner. A new approach to rectangle intersections, Part I. Int. J. Computer Mathematics, vol. 13, 209{219, 1983. 9. P. Ferragina and R. Grossi. A fully-dynamic data structure for external substring search. In ACM Symp. on Theory of Computing , 693{702, 1995. Full version in Technical Report 18/96 , Dipartimento di Sistemi e Informatica, Universita di Firenze, Italy. 10. R. H. Gutting and D. Wood. The parentheses tree. Inform. Scie., vol. 27, 151{162, 1982. 11. K. Mehlhorn and A. Tsakalidis. Data Structures. In Handbook of Theoretical Computer Science . Ed. J. Van Leeuwen. Elsevier Science Publisher, 1990. 12. B. Meyer. Object-0riented Software Construction. Prentice{Hall Inc., Englewood Cli s, NJ, 1988. 13. M. Muller. Method dispatch in dynamically{typed object{oriented languages. Master's thesis, University of New Mexico Albuquerque, 1995. 14. S. Muthukrishnan and M. Muller. Time space tradeo s for method look-up in objected oriented programs. Proc. 7th ACM Symp. on Discrete Algorithms, 1996. 15. M. H. Overmars. Computational geometry on a grid: an overview, NATO ASI Series, vol. F40, 167{184, 1988. 16. P. van Emde Boas. Preserving order in a forest in less than logarithmic time and linear space. Info. Proc. Letters, 6(3):80{82, 1977. 17. J. Vitek. Compact dispatch tables for dynamically{typed object{oriented languages. Research thesis, University of British Columbia, Vancouver, 1995. 18. D. E. Willard. New trie data structures which support very fast search operations. Journal of Computer and System Science, 28:379{394, 1984. 19. D. Ungar. The design and evaluation of a high performance Smalltalk system. ACM Distinguished Dissertation, The MIT Press, 1987. 20. D. Ungar, R. Blau, R. Foley, D. Samples, D. Patterson. Architecture of SOAR: Smalltalk on RISC. IEEE Proc. 1984.

This article was processed using the LaTEX macro package with LLNCS style

Suggest Documents