area, as only two methods were dominant: Nested Sets (Celko [2000],[2004]),
and nearly ubiquitous .... Joe Celko's Trees and Hierarchies in SQL for Smarties.
Nested Intervals Tree Encoding with Continued Fractions VADIM TROPASHKO Oracle Corp. ________________________________________________________________________ We introduce a new variation of Tree Encoding with Nested Intervals, find connections with Materialized Path, and suggest a method for moving parts of the hierarchy. Categories and Subject Descriptors: H.2 [Database Management]: General Terms: SQL, Hierarchical Query Additional Key Words and Phrases: Tree Encoding, Nested Intervals, Continued Fractions, Materialized Path ________________________________________________________________________
1. INTRODUCTION There are several methods to query graph structures, in general, and trees, in particular, in SQL. They can be grouped into 2 major categories: •
Hierarchical/recursive SQL extensions
•
Tree Encodings
This article focuses upon Tree Encodings. Until recently, not much was happening in that area, as only two methods were dominant: Nested Sets (Celko [2000],[2004]), and nearly ubiquitous encoding of path from the root to the node (see, for example, Roy [2003] where path is enveloped into a user-defined type). We call such encoding Materialized Path in order to emphasize analogy with other incremental evaluation structures (and this term seems to be gaining slow adoption in the field). Indeed, full path has redundant information: if we know node’s path, then we can immediately tell what the path of its parent is. Like redundancies in the other incremental evaluation structures, Materialized Path enhances our query answering abilities. Unlike those, however, maintaining Materialized Path is a trivial task. While Nested Sets certainly appealed to many users as an elegant technique (especially compared to goofy string parsing in case of Materialized Path), the former has 2 fundamental disadvantages: •
Nested Sets encoding is volatile. In a word, roughly half of the tree nodes should be relabeled whenever a new node were inserted.
Authors’ email address:
[email protected].
•
Querying ranges is asymmetric from performance perspective. It is easy to answer if a point falls inside some interval, but it is hard to index a set of intervals that contain a given point. For Nested Sets this translates into a difficulty answering queries about node’s ancestors.
Tropashko [2003a] introduced Nested Intervals that generalize Nested Sets. Since Nested Sets encoding with integers allows only finite gaps to insert new nodes, it is natural to use dense domain such as rational numbers. One particular encoding schema with Binary Rational Numbers was developed in the rest of the article, and was a subject of further improvements in the follow up articles. Binary Rational Encoding has many nice theoretical properties, and essentially is a numeric reflection of Materialized Path. It, however, has one significant flaw from practical perspective. Binary Fractions utilize integer numbers domain rather uneconomically, so that overflow prevents tree scaling to any significant size. In general, Nested Intervals allow a certain freedom choosing particular encoding schema. Tropashko [2004] developed alternative encoding with Farey Fractions. It solved scalability problems, but it remained unclear how this new encoding is related to Materialized Path. Furthermore, a predictable question from developers’ community was “How to relocate subtrees in this new schema?” This article addresses both concerns. We’ll have to modify Farey Encoding slightly, but the reward is that the connection with Materialized Path becomes transparent. The method for relocating subtrees is almost as simple as the one with Binary Fractions. 2. CONTINUED FRACTIONS Simple Continued Fraction is a list of integers structurally arranged like this:
1
3+
1
12 +
1
5+
1+
1 21
where (3,12,5,1,21) in general case is a sequence of arbitrary natural numbers. Continued fraction representation is ambiguous, for example
1
3+
1
12 + 5+
1
=3+ 1
1+
1
12 +
1
5+
1 21
1
1+
20 +
1 1
We, therefore, have to be careful when associating Continued Fraction with Materialized Path. The critical idea is the following “identity”
1
3.12 . 5.1 . 20 = 3 +
1
12 +
1
5+
1
1+
20 +
1 x
where we allow x in the range between 1 and ∞. Our primary motivation for changing rational number into rational function is the ability to nest them. For example, concatenating paths 3.12 and 5.1.20 corresponds to nesting
1
x=5+ 1+
1 20 +
1 y
inside of
3.12 = 3 +
1 12 +
1 x
which is a simple substitution of x. 3. NESTED INTERVALS For our application purposes it is important that continued fractions can be interpreted as nested intervals. Simplifying the right side of the path 3.12.5.1.20 we get
3.12 . 5.1 . 20 =
4688 x + 225 1521 x + 73
It is easy to see that the function at the right side has extreme values at the ends of the segment x∈[1,∞). Therefore, the path 3.12.5.1.20 can alternatively be associated
with semi open segment (4688/1521, 4913/1594]. Let’ s double check that the latter is nested inside the interval corresponding to the path 3.12. Indeed,
[3+1/(12+1/1),
37/12) and 40/13 < 4688/1521