Solving Schema Conversion Problem between XML and Relational Models: Semantic Approach Dongwon Lee
Murali Mani
Wesley W. Chu
Penn State University
UCLA
UCLA
[email protected]
[email protected]
[email protected]
June 13, 2003
Abstract Schema conversion problem aims to convert a source schema S in the given model M1 to an equivalent target schema T in the desired model M2 . In this paper, we especially study schema conversion problem between XML and relational models. We present three semantics-based schema conversion algorithms: 1) CPI converts an XML schema to a relational schema while preserving semantic constraints of the original XML schema, 2) NeT derives a nested structured XML schema from a flat relational schema by repeatedly applying the nest operator so that the resulting XML schema becomes hierarchical, and 3) CoT takes a relational schema as input, where multiple tables are interconnected through inclusion dependencies and generates an equivalent XML schema as output.
Index Terms: XML, Schema, Schema Conversion Problem, Semantic Constraints
1
Introduction
Recently, XML [13] has emerged as the de facto standard for data format on the web. The use of XML as the common format for representing, exchanging, storing, and accessing data poses many new challenges to database systems. Since the majority of everyday data is still stored and maintained in relational database systems, we expect that the needs to convert data format between XML and relational models will grow substantially. To this end, several schema conversion algorithms have been proposed 1
(e.g., [25, 31, 69, 15]). Although they work well for the given applications, the XML-to-Relational or Relational-to-XML conversion algorithms only capture the structure of the original schema and largely ignore the hidden semantic constraints. Consider the following example for XML-to-Relational conversion case. Example 1. Consider a DTD that models journal publications:
Suppose the combination of title and year uniquely identifies the journal (i.e., key). Using the hybrid inlining algorithm introduced in [69], the DTD would be transformed to the following relational schema: journal
(title,year,month,society)
paper (pid,title,journal_title,journal_year,abstract)
While the relational schema correctly captures the structural aspect of the DTD, it does not enforce correct semantics. For instance, it cannot prevent a tuple t1 : paper(100,’DTD...’,’TKDE’, 3000,’...’) from being inserted. However, tuple t1 is inconsistent with the semantics of the given DTD since the DTD implies that the journal paper cannot exist without being associated with a journal and there is apparently no journal TKDE published in the year 3000 yet. In database terms, this kind of violation can be easily prevented by an inclusion dependency enforcing “paper[journal title,journal year] 2
⊆ journal[title,year]”.
The reason for this inconsistency between the DTD and the transformed relational schema is that most of the proposed conversion algorithms, so far, have largely ignored the hidden semantic constraints of the original schema. Therefore, in this paper, we address the semantic aspect of schema conversion between XML and relational models. Our contributions are: • We present an extensive survey with respect to the schema matching problem among various data models and their proposed solutions. With our classification, one can appreciate the extent and difficulty of the problem. 2
NeT & CoT
RDB
Schema Designer
XML Schemas
CPI
Figure 1: Overview of our schema conversion algorithms. • We present three novel schema conversion algorithms between XML and relational models (illustrated in Figure 1): 1. CPI (Constraints-preserving Inlining Algorithm): identifies various semantics constraints in the original XML schema and preserves them by rewriting them in the final relational schema. 2. NeT (Nesting-based Translation Algorithm): derives a nested structure from a flat relational schema by repeatedly applying the nest operator so that the resulting XML schema becomes hierarchical. 3. CoT (Constraints-based Translation Algorithm): Since NeT is applicable to a single table at a time, in order to capture the overall picture of relational schema, CoT considers inclusion dependencies during the translation, and merges multiple inter-connected tables into a coherent and hierarchical parent-child structure in the final XML schema. • We present experimental results of the proposed algorithms against both real and synthetic XML schema and data, and show that our proposals are very effective in capturing hidden semantics from a source schema and translating them into a target schema.
2
Related Work
It is important to differentiate the problem that we deal with in this paper, named as schema conversion problem, from another similar one known as schema matching problem. Given a source schema s1 and target schema t1 , the schema matching problem finds a “mapping” that relates elements in s1 to ones in t1 . On the other hand, in the schema conversion problem, only a source schema s2 is given and the 3
% Network Hierarchical Relational Source ER model OO UML XML Meta-model
Network [52, 36] [74, 71]
Hierarchical [51] [62, 55]
Target model Relational ER [61] [57] [73] [24, 39, 2] [37, 43] [66, 8] [9] [56] [54] [22, 26] [3, 65, 6]
OO [52, 36] [64] [28, 44] [33, 5] [60] [34, 1]
UML
XML
[56] [56]
[50, 27] [76] [7, 21]
[10, 38]
Table 1: Classification of the related works of schema conversion problem. and represent the focus of the schema matching problem (within various models) and that of the schema conversion problem (between XML and relational models), respectively. is the focus of this paper. Works in the meta-model row aim at developing a universal matching/conversion framework supporting multiple data models. goal is to find a target schema t2 that is equivalent to s2 . Often, the source and target schemas in the schema matching problem belong to the same data model1 (e.g., relational model), while they belong to the different models in the schema conversion problem (e.g., relational and XML models). Schema matching problem itself is a difficult problem with many important applications and deserves special attention. In this section, therefore, we only focus on the related works of the schema conversion problem among several popular data models. For other discussion on the schema matching problem, refer to [53] (survey), [58] (latest development), etc.
2.1
Schema Conversion Problem in the Literature
Table 1 classfies related works on the schema conversion problem among seven popular data models – Network2 , Hierarchical3 , ER [16], Relational [20], OO, UML [11], and XML [13] models. Table 1 can be functionally divided into two parts. The first part containing ER, OO and UML data models and the second part containing the remaining network, hierarchical, relational and XML data models. The first part comprises of conceptual schema modeling schemes used to model real world enterprises. It is primarily used to represent all aspects of the real world enterprise which include the user 1
There are cases where schema matching problem deals with a mapping between different data models (e.g., [53]), but we believe most of such cases can be replaced by: 1) a schema conversion between different models, followed by 2) a schema matching within the same model. 2 Also known as CODASYL model or DBTG model. 3 e.g., IBM IMS.
4
Conversion Methods XML to Relational Relational to XML
Structure-oriented [25, 69, 31, 12, 41, 67, 42, 18] [72, 30, 68, 4, 15]
Constraints-oriented [46, 17, 23] [48]
Table 2: Classification of schema conversion between XML and relational models. view, structural view, object view, the behavioral view and the physical view. The real world information stated in these three models is easily understandable. OO data models can directly be instantiated and the data instances or databases queried Figure 1 gives the evolution chart of the different data models. The trend has always been for schema and data translation, between consecutive models or models with overlapping time frames, as they have evolved. This explains why there has been no direct literature comparing older Network and hierarchical data models with more recently developed data models like UML or XML. Relational databases have been far the most popular and accepted data models and has huge amount of research. Numerous researches for schema conversion have been carried out and are listed in the Table 1 Conversion between different models has been extensively investigated. For instance, [19] deals with conversion problems in the OODB area; since OODB is a richer environment than RDB, their work is not readily applicable to our application. The logical database design methods and their associated conversion techniques to other data models have been extensively studied in ER research. For instance, [5] presents an overview of such techniques. However, due to the differences between ER and XML models, those conversion techniques need to be modified substantially. More recently, [6] studies a generic mapping between arbitrary models with the focus of developing a framework for model management. Apart from conversion approaches, it is worthwhile to note that there have been also recent investigations on native XML storage systems such as [40].
2.2
Schema Conversion between XML and Relational Models
Towards conversion between XML and relational models, an array of research has addressed the particular issues lately. On the commercial side, database vendors are busily extending their databases to adopt XML types. Typically, they can handle XML data using BLOB/CLOB formats along with a limited 5
keyword searching or using some object-relational features [18, 4], but not many details have been revealed. On the research side, Table 2 shows the classification of such related work. • Structure-oriented XML to Relational conversion: Work done in STORED [25] is one of the first significant and concrete attempts to this end and deals with non-valid XML documents. STORED uses a data mining technique to find a representative DTD whose support exceeds the predefined threshold and convert XML documents to relational format using the DTD. [12] discusses template language-based conversion from DTD to relational schema which requires human experts to write an XML-based conversion rule. [69] presents three inlining algorithms that focus on the table level of the schema conversions. On the contrary, [31] studies different performance issues among eight algorithms that focus on the attribute and value level of the schema. [70] proposes a DTD-independent mapping algorithm. While ignoring specific characteristics hidden in each DTD, [70] decomposes XML documents into element, attribute, text and path tables, so that the changes of DTDs of the XML documents do not necessarily result in invalid mapping as found in examples [25, 69]. Since our CPI algorithm provides a systematic way of finding and preserving constraints from a DTD, ours is an improvement to the existing conversion algorithms. Recent work in [41] attempts a conversion approach based on the notion of meta schema between XML and relational models, but mainly focuses on the structural mapping unlike ours. • Constraints-oriented XML to Relational conversion: [46] proposes a method where the hidden semantic constraints in DTD are systematically found and translated into relational formats. Since the method is orthogonal to the structure-oriented conversion methods, it can be used along with algorithms [25, 12, 69, 31] with little change. We are not aware of any other work on this problem. This paper is an extended work of [46]. • Structure-oriented Relational to XML conversion: Some primitive work has been done in [72] dealing with the conversion from relational tables to XML documents. SilkRoute [30] provides a declarative query language (RXL) for viewing relational data in XML. Applications express the answer data as a query over the view and SilkRoute dynamically materializes the fragment of an XML view. [68] extends SQL to specify the conversion process declaratively, whereas SilkRoute proposes 6
a new language RXL and describes an extensive study on the issues of efficiently implementing the algorithms. Similar to SilkRoute, XPERANTO [15] aims to provide a uniform XML interface to underlying ORDB, transparently providing an XML-to-SQL query rewriter and a table-to-XML answer converter. Its output XML view is, however, mainly specified by the user’s input XML queries. In addition, there have been other DTD inference algorithms that take as “input” a set of XML documents [32] or a view description [63]. • Constraints-oriented Relational to XML conversion: Path constraints on a semi-structured model [14] or XML model [29] have been studied, mostly with respect to their implication problems. However, to our best knowledge, there has not been much work on this direction of conversion problem. For instance, [29] proposes three languages to capture the semantics of XML model and presents implication results, but does not deal with issues on converting constraints from RDB to XML model. Recently, the authors proposed to convert relational schema to XML schema using the hidden data semantics found by the nest operator in [48].
3
The CPI Algorithm
Transforming a hierarchical XML model to a flat relational model is not a trivial task due to several inherent difficulties such as non-trivial 1-to-1 mapping, existence of set values, complicated recursion, and/or fragmentation issues [69]. Most XML-to-Relational conversion algorithms (e.g., [12, 25, 31, 69]) have so far mainly focused on the issue of structural conversion, largely ignoring the semantics already existed in the original XML schema. Let us first describe various semantic constraints that one can mine from the DTD. Throughout the discussion, we will use the example DTD and XML document in Tables 3 and 4.
3.1
Semantic Constraints in DTDs
Cardinality Constraints: In a DTD declaration, there are only 4 possible cardinality relationships between an element and its sub-elements as illustrated below: 7
conf conf title date date
(title,date,editor?,paper*)> id ID #REQUIRED> (#PCDATA)> EMPTY> year CDATA #REQUIRED mon CDATA #REQUIRED day CDATA #IMPLIED>
Table 3: A DTD for Conference.
1. (0,1): An element can have either zero or one sub-element. (e.g., sub-element price) 2. (1,1): An element must have one and only one sub-element. (e.g., sub-element title) 3. (0,N): An element can have zero or more sub-elements. (e.g., sub-element ref) 4. (1,N): An element can have one or more sub-elements. (e.g., sub-element author) Following the notations in [5], let us call each cardinality relationship as type (0,1), (1,1), (0,N), (1,N), respectively. From these cardinality relationships, mainly three constraints can be inferred. First is whether or not the sub-element can be null. We use the notation “X 9 ∅” to denote that an element X cannot be null. This constraint is easily enforced by the NULL or NOT NULL clause in SQL. Second is whether or not more than one sub-element can occur. This is also known as singleton constraint in [75] and is one kind of equality-generating dependencies. Third, given an element, whether or not its subelement should occur. This is one kind of tuple-generating dependencies. The second and third types will be further discussed below. 8
Int’l Conf. on Conceptual Modeling 2005 May 20
[email protected] Indexing Model for Structured... Logical Information Modeling...
[email protected] Making Sense of Scientific... 391.4337 Constraints-preserving Trans...
[email protected] ...
Table 4: An example XML document conforming to the DTD in Table 3. Inclusion Dependencies (INDs): An Inclusion Dependency assures that values in the columns of one fragment must also appear as values in the columns of other fragments and is a generalization of the notion of referential integrity. Trivial form of INDs found in the DTD is that “given an element X and its sub-element Y , Y must be included in X (i.e., Y ⊆ X)”. For instance, from the conf element and its four sub-elements in the Conference DTD, the following INDs can be found as long as conf is not null: {conf.title ⊆ conf, conf.date ⊆ conf, conf.editor ⊆ conf, conf.paper ⊆ conf}. Another form of INDs can be found in the attribute definition part of the DTD with the use of the IDREF(S) keyword. For instance, consider the contact and editor elements in the Conference DTD shown below:
(name,(email|phone)?>
id
ID
#REQUIRED>
9
IDREF
#REQUIRED>
(person*)>
eids IDREFS #IMPLIED>
The DTD restricts the aid attribute of the contact element such that it can only point to the id attribute of the person element4 . Further, the eids attribute can only point to multiple id attributes of the person element. As a result, the following INDs can be derived: {editor.eids ⊆ person.id, contact.aid ⊆ person.id}. Such INDs can be best enforced by the “foreign key” if the attribute being referenced is a primary key. Otherwise, it needs to use the CHECK, ASSERTION, or TRIGGERS facility of SQL. Equality-Generating Dependencies (EGDs): The Singleton Constraint [75] restricts an element to have “at most” one sub-element. When an element type X satisfies the singleton constraint towards its sub-element type Y , if an element instance x of type X has two sub-elements instances y1 and y2 of type Y , then y1 and y2 must be the same. This property is known as Equality-Generating Dependencies (EGDs) and denoted by “X → Y ” in database theory. For instance, two EGDs: {conf → conf.title, conf → conf.date} can be derived from the conf element in Table 3. This kind of EGDs can be enforced by SQL UNIQUE construct. In general, EGDs occur in the case of the (0,1) and (1,1) mappings in the cardinality constraints. Tuple-Generating Dependencies (TGDs): TGDs in a relational model require that some tuples of a certain form be present in the table and use the “” symbol. Two useful forms of TGDs from DTD are the child and parent constraints [75]. 1. Child constraint: "Parent Child" states that every element of type P arent must have at least one child element of type Child. This is the case of the (1,1) and (1,N) mappings in the cardinality constraints. For instance, from the DTD in Table 3, because the conf element must contain the title and date sub-elements, the child constraint conf {title, date} holds. 4 Precisely, an attribute with IDREF type does not specify which element it should point to. This information is available only by human experts. However, new XML schema languages such as XML-Schema and DSD can express where the reference actually points to [45].
10
Relationship (0,1) (1,1) (0,N) (1,N)
Symbol ? * +
not null no yes no yes
EGDs yes yes no no
TGDs no yes no yes
Table 5: Cardinality relationships and their corresponding semantic constraints. 2. Parent constraint: "Child Parent" states that every element of type Child must have a parent element of type P arent. According to XML specification, XML documents can start from any level of element without necessarily specifying its parent element, when a root element is not specified by . In the DTD in Table 3, for instance, the editor and date elements can have the conf element as their parent. Further, if we know that all XML documents were started at the conf element level, rather than the editor or date level, then the parent constraint {editor, date} conf holds. Note that the title conf does not hold since the title element can be a sub-element of either the conf or paper element.
3.2
Discovering and Preserving Semantic Constraints from DTDs
The CPI algorithm utilizes a structure-based conversion algorithm as a basis and identifies various semantic constraints described in Section 3.1. We will use the hybrid algorithm [69] as the basis algorithm. CPI first constructs a DTD graph that represents the structure of a given DTD. A DTD graph can be constructed when parsing the given DTD. Its nodes are elements, attributes, or operators in the DTD. Each element appears exactly once in the graph, while attributes and operators appear as many times as they appear in the DTD. CPI then annotates various cardinality relationships (summarized in Table 5) among nodes to each edge of the DTD graph. Note that the cardinality relationship types in the graph consider not only element vs. sub-element relationships but also element vs. attribute relationships. Figure 2 illustrates an example of such annotated DTD graph for the Conference DTD in Table 3. Once the annotated DTD graph is constructed, CPI follows the basic navigation method provided by the hybrid algorithm; it identifies top nodes [69, 47] that are the nodes: 1) not reachable from any nodes (e.g., source node), 2) direct child of “*” or “+” operator node, 3) recursive node with indegree > 11
year mon
(1,1) (1,1) conf (0,1) (1,1) (1,1) date (1,1) (0,N) title id (0,1)
day aid
(1,1) contact
(0,1)
(0,1) (1,1) (0,N) (1,N)
?
paper
(1,1)
author
(1,1) (0,N)
cite
id
(1,1) (0,1) id
* +
(0,N) eids
(0,N)
(1,1)
(0,1) top node
editor
(1,N)
person
(1,1) id
name
(0,1)
format
(0,1)
(1,1)
fn
phone
(1,1) ln
(0,1) email
Figure 2: An annotated DTD graph for the Conference DTD in Table 3. paper id p1 p2 p3 p7
root elm conf conf conf paper
parent elm conf conf cite –
fk conf er05 er05 – –
fk cite – – c100 –
title Indexing ... Logical ... Making ... Constraints ...
contact aid dao shah – lee
cite id – c100 – c200
cite format – ACM – IEEE
Table 6: Final relational “data” for the paper element in the Conference DTD in Table 3, generated by CPI algorithm. 1, or 4) one node between two mutually recursive nodes with indegree = 1. Then, starting from each top node T , inline all the elements and attributes at leaf nodes reachable from T unless they are other top nodes. In doing so, each annotated cardinality relationship can be properly converted to its counterpart in SQL syntax as described in Section 3.1. The details of the algorithm is beyond the scope of this paper and interested readers are referred to [47]. For instance, Figure 3 and Table 6 are such output relational schema and data in SQL notation, automatically generated by the CPI algorithm.
4
The NeT Algorithm
The simplest Relational-to-XML translation method, termed as FT (Flat Translation) in [48], is to translate 1) tables in a relational schema to elements in an XML schema and 2) columns in a relational schema to attributes in an XML schema. FT is a simple and effective translation algorithm. However, since FT translates the “flat” relational model to a “flat” XML model in a one-to-one manner, it does not 12
CREATE TABLE paper ( id NUMBER NOT NULL, title VARCHAR(50) NOT NULL, contact_aid VARCHAR(20), cite_id VARCHAR(20), cite_format VARCHAR(50) CHECK (VALUE IN ("ACM", "IEEE")), root_elm VARCHAR(20) NOT NULL, parent_elm VARCHAR(20), fk_cite VARCHAR(20) CHECK (fk_cite IN (SELECT cite_id FROM paper)), fk_conf VARCHAR(20), PRIMARY KEY (id), UNIQUE (cite_id), FOREIGN KEY (fk_conf) REFERENCES conf(id), FOREIGN KEY (contact_aid) REFERENCES person(id) );
Figure 3: Final relational “schema” for the paper element in the Conference DTD in Table 3, generated by CPI algorithm. utilize several basic “non-flat” features provided by the XML model for data modeling such as representing repeating sub-elements through regular expression operators (e.g., “*”, “+”). To remedy the shortcomings of FT, we propose the NeT algorithm that utilizes various element content models of the XML model. NeT uses the nest operator [35] to derive a “good” element content model. Informally, for a table t with a set of columns C, nesting on a non-empty column X ∈ C collects all tuples that agree on the remaining columns C − X into a set5 . Formally, Definition 1 (Nest) [35]. Let t be a n-ary table with column set C, and X ∈ C and X = C − X. For each (n − 1)-tuple γ ∈ ΠX (t), we define an n-tuple γ ∗ as follows: γ ∗ [X] = γ, and γ ∗ [X] = {κ[X] | κ ∈ 2
t ∧ κ[X] = γ. Then, nestX (t) = {γ ∗ | γ ∈ ΠX (t)}.
After nestX (t), if column X has only a set with “single” value {v} for all the tuples, then we say that nesting failed and we treat {v} and v interchangeably (i.e., {v} = v). Thus when nesting failed, the following is true: nestX (t) = t. Otherwise, if column X has a set with “multiple” values {v1 , ..., vk } with k ≥ 2 for at least one tuple, then we say that nesting succeeded. Example 2. Consider a table R in Table 7. Here we assume that the columns A, B, C are non-nullable. 5
Here, we only consider single attribute nesting.
13
#1 #2 #3 #4 #5 #6 #7
A 1 1 2 3 4 4 5
B a a a a b b b
C 10 20 10 10 10 20 20
+
A {1,2,3} 1 4 {4,5}
(a) R
A+ 1 {2,3} 4 5
B a a b b
C+ {10,20} 10 {10,20} 20
(f) nestA (nestC (R))
B a a b b
A 1 1 2 3 4 4 5
C 10 20 10 20
(b) nestA (R) A 1 2 3 4 5
B a a a b b
C 10 20 10 10 10 20 20
A 1 2 3 4 5
(c) nestB (R) = R
C+ {10,20} 10 10 {10,20} 20
(g) nestB (nestC (R))
B a a a a b b b
A+ {1,2,3} 1 4 {4,5} (h)
B a a b b
B a a a b b
C+ {10,20} 10 10 {10,20} 20
(d) nestC (R)
A+ 1 {2,3} 4 5
C 10 20 10 20
nestC (nestB (nestA (R))) = nestB (nestC (nestA (R)))
(i)
A+ {1,2,3} 1 4 {4,5}
(e)
B a a b b
B a a b b
C 10 20 10 20
nestB (nestA (R)) = nestC (nestA (R)) C+ {10,20} 10 {10,20} 20
nestB (nestA (nestC (R))) = nestA (nestB (nestC (R)))
Table 7: A relational table R and its various nested forms. Column names containing a set after nesting (i.e., nesting succeeded) are appended by “+” symbol. In computing nestA (R) at (b), the first, third, and fourth tuples of R agree on their values in columns (B, C) as (a, 10), while their values of the column A are all different. Therefore, these different values are grouped (i.e., nested) into a set {1,2,3}. The result is the first tuple of the table nestA (R) – ({1,2,3}, a, 10). Similarly, since the sixth and seventh tuples of R agree on their values as (b, 20), they are grouped to a set {4,5}. In computing nestB (R) at (c), there are no tuples in R that agree on the values of the columns (A, C). Therefore, nestB (R) = R. In computing nestC (R) at (d), since the first two tuples of R – (1, a, 10) and (1, a, 20) – agree on the values of the columns (A, B), they are grouped to (1, a, {10,20}). Nested tables (e) through (j) are constructed similarly.
2
Since the nest operator requires scanning of the entire set of tuples in a given table, it can be quite expensive. In addition, as shown in Example 2, there are various ways to nest the given table. Therefore, it is important to find an efficient way (that uses the nest operator minimum number of times) of obtaining an acceptable element content model. For a detailed description on the various properties of the nest operator, the interested are referred to [48, 49]. Lemma 1. Consider a table t with column set C, candidate keys, K1 , K2 , . . . , Kn ⊆ C, and column set K such that K = K1 ∩ K2 ∩ . . . ∩ Kn . Further, let |C| = n and |K| = m (n ≥ m). Then, the number of P k necessary nestings, N , is bounded by N ≤ m k=1 m 14
Lemma 1 implies that when candidate key information is available, one can avoid unnecessary nestings substantially. For instance, suppose attributes A and C in Table 7 constitute a key for R. Then, one needs to compute only: nestA (R) at (b), nestC (R) at (d), nestC (nestA (R)) at (e), nestA (nestC (R)) at (f) in Table 7. After applying the nest operator to the given table repeatedly, there can be still several nested tables where nesting succeeded. In general, the choice of the final schema should take into consideration the semantics and usages of the underlying data or application and this is where user intervention is beneficial. By default, without further input from users, NeT chooses the nested table where the most number of nestings succeeded as the final schema, since this is a schema which provides low “data redundancy”. The outline of the NeT algorithm is as follows: 1. For each table ti in the input relational schema R, apply the nest operator repeatedly until no nesting succeeds. 2. Choose the best nested table based on the selected criteria. Denote this table as t0i (c1 , . . . , ck−1 , ck , . . . , cn ), where nesting succeeded on the columns {c1 , . . . , ck−1 }. (a) If k = 1, follow the FT translation. (b) If k > 1, i. For each column ci (1 ≤ i ≤ k − 1), if ci was nullable in R, use c∗i for the element content model, and c+ i otherwise. ii. For each column cj (k ≤ j ≤ n), if ci was nullable in R, use c?j for the element content model, and cj otherwise.
5
The CoT Algorithm
The NeT algorithm is useful for decreasing data redundancy and obtaining a more intuitive schema by 1) removing redundancies caused by multivalued dependencies, and 2) performing grouping on attributes. However, NeT considers tables one at a time, and cannot obtain a overall picture of the relational schema where many tables are interconnected with each other through various other dependencies. To remedy this problem, we propose the CoT algorithm that uses Inclusion Dependencies (INDs) of relational schema. 15
General forms of INDs are difficult to acquire from the database automatically. However, we shall consider the most pervasive form of INDs, foreign key constraints, which can be queried through ODBC/JDBC interface. The basic idea of the CoT is the following: For two distinct tables s and t with lists of columns X and Y , respectively, suppose we have a foreign key constraint s[α] ⊆ t[β], where α ⊆ X and β ⊆ Y . Also suppose that Ks ⊆ X is the key for s. Then, different cardinality binary relationships between s and t can be expressed in the relational model by a combination of the following: 1) α is unique/not-unique, and 2) α is nullable/non-nullable. Then, the translation of two tables s, t with a foreign key constraint works as follows: 1. If α is non-nullable (i.e., none of the columns of α can take null values), then: (a) If α is unique, then there is a 1 : 1 relationship between s and t, and can be captured as . (b) If α is not-unique, then there is a 1 : n relationship between s and t, and can be captured as . 2. If s is represented as a sub-element of t, then the key for s will change from Ks to (Ks − α). The key for t will remain the same.
Extending this to the general case where multiple tables are interconnected via INDs, consider the schema with a set of tables {t1 , ..., tn } and INDs ti [αi ] ⊆ tj [βj ], where i, j ≤ n. We consider only those INDs that are foreign key constraints (i.e., βj constitutes the primary key of the table tj ), and where αi is non-nullable. The relationships among tables can be captured by a graph representation, termed as IND-Graph. Definition 2 (IND-Graph) An IND-Graph G = (V, E) consists of a node set V and a directed edge set E, such that for each table ti , there exists a node Vi ∈ V , and for each distinct IND ti [α] ⊆ tj [β], there exists an edge Eji ∈ E from the node Vj to Vi .
2
Note the edge direction is reversed from the IND direction for convenience. Given a set of INDs, the IND-Graph can be easily constructed. Once an IND-Graph G is constructed, CoT needs to decide the 16
student(Sid, Name, Advisor) emp(Eid, Name, ProjName) prof(Eid, Name, Teach) course(Cid, Title, Room) dept(Dno, Mgr) proj(Pname, Pmgr) student(Advisor) ⊆ prof(Eid) emp(ProjName) ⊆ proj(Pname) prof(Teach) ⊆ course(Cid) prof(Eid, Name) ⊆ emp(Eid, Name) dept(Mgr) ⊆ emp(Eid) proj(Pmgr) ⊆ emp(Eid)
Table 8: An example schema with associated INDs. student
proj prof
emp
course
dept
Figure 4: The IND-Graph representation of the schema in Table 8 (top nodes denoted by rectangular nodes). starting point to apply translation rules. For that purpose, we use the notion of top nodes. Intuitively, an element is a top node if it cannot be represented as a sub-element of any other element. Let T denote the set of top nodes. Then, CoT traverses G, using say Breadth-First Search (BFS), until it traverses all the nodes and edges, while capturing the INDs on edges as either sub-elements (when the node is visited for the first time) or IDREF attributes (when the node was visited already). Example 3. Consider a schema and its associated INDs in Table 8. The IND-Graph with two top nodes is shown in Figure 4: 1) course: There is no node t, where there is an IND of the form course[α] ⊆ t[β], and 2) emp: There is a cyclic set of INDs between emp and proj, and there exists no node t such that there is an IND of the form emp[α] ⊆ t[β] or proj[α] ⊆ t[β]. Then, • First, starting from a top node course, do BFS scan. Pull up a reachable node prof into course and make it as sub-element by . Similarly, the node student is also pulled up into its parent node prof by . Since the node student is a leaf, no nodes can be pulled in: . Since there is no more unvisited reachable node from course, the scan stops. 17
DTD Name novel play tstmt vCard ICE MusicML OSD PML Xbel XMI BSML
Semantics Domain literature Shakespeare religious text business card content synd. music desc. s/w desc. web portal bookmark metadata DNA seq.
DTD Schema Elm/Attr ID/IDREF(S) 10/1 1/0 21/0 0/0 28/0 0/0 23/1 0/0 47/157 0/0 12/17 0/0 16/15 0/0 46/293 0/0 9/13 3/1 94/633 31/102 112/2495 84/97
Relational Schema Table/Attr → 5/13 6 9 14/46 17 30 17/52 17 22 8/19 18 13 27/283 43 60 8/34 9 12 15/37 2 2 41/355 29 36 9/36 9 1 129/3013 10 7 104/2685 99 33
9∅ 9 30 22 13 60 12 2 36 1 7 33
Table 9: Summary of CPI algorithm. • Next, starting from another top node emp, pull up neighboring node dept into emp similarly by and . Then, visit a neighboring node prof, but prof was visited already. To avoid data redundancy, an attribute Ref prof is added to emp accordingly. Since attributes in the left-hand side of the corresponding IND, prof (Eid, N ame) ⊆ emp(Eid, N ame), form a super key, the attribute Ref prof is assigned type IDREF, and not IDREFS: and . • Next, visit a node proj and pull it up to emp by and . In next step, visit a node emp from prof, but since it was already visited, an attribute Ref emp of type IDREFS is added to proj, and scan stops. It is worthwhile to point out that there are several places in CoT where human experts can help to find a better mapping based on the semantics and usages of the underlying data or application.
6
Experimental Results
6.1
CPI Results
CPI was tested against DTDs gathered from OASIS6 . For all cases, CPI successfully identified hidden semantic constraints from DTDs and correctly preserved them by rewriting them in SQL. Table 9 shows a summary of our experimentation. Note that people seldom used the ID and IDREF(S) constructs in 6
http://www.oasis-open.org/cover/xml.html
18
Test Set Balloons1 Balloons2 Balloons3 Balloons4 Hayes Bupa Balance TA Eval Car Flare
# of attr. / tuple 5 / 16 5 / 16 5 / 16 5 / 16 6 / 132 7 / 345 5 / 625 6 / 110 7 / 1728 13 / 365
NestRatio 42 / 64 42 / 64 40 / 64 42 / 64 1/6 0/7 56 / 65 253 / 326 1870 / 1957 11651 / 13345
ValueRatio 80 / 22 80 / 22 80 / 42 80 / 22 792 / 522 2387 / 2387 3125 / 1120 660 / 534 12096 / 779 4745 / 2834
Size before / after 0.455 / 0.152 0.455 / 0.150 0.455 / 0.260 0.455 / 0.149 1.758 / 1.219 7.234 / 7.234 6.265 / 2.259 1.559 / 1.281 51.867 / 3.157 9.533 / 5.715
# of nested attr. 3 3 3 3 1 0 4 5 6 4
Time (sec.) 1.08 1.07 1.14 1.07 1.01 4.40 21.48 24.83 469.47 6693.41
Table 10: Summary of NeT experimentations. their DTDs except the XMI and BSML cases. The number of tables generated in the relational schema was usually smaller than that of elements/attributes in DTDs due to the inlining effect. The only exception to this phenomenon was the XMI case, where extensive use of types (0,N) and (1,N) cardinality relationships resulted in many top nodes in the ADG. The number of semantic constraints had a close relationship with the design of the DTD hierarchy and the type of cardinality relationship used in the DTD. For instance, the XMI DTD had many type (0,N) cardinality relationships, which do not contribute to the semantic constraints. As a result, the number of semantic constraints at the end was small, compared to that of elements/attributes in the DTD. This was also true for the OSD case. On the other hand, in the ICE case, since it used many type (1,1) cardinality relationships, it resulted in many semantic constraints.
6.2
NeT Results
Our preliminary results comparing the goodness of the XSchema obtained from NeT and FT with that obtained from DB2XML v 1.3 [72] appeared in [48]. We further applied our NeT algorithm on several test sets drawn from UCI KDD7 / ML8 repositories, which contain a multitude of single-table relational schemas and data. Sample results are shown in Table 10. Two metrics are used as follows:
NestRatio = ValueRatio = 7 8
# of successful nesting # of total nesting # of original data values # of data values D in the nested table
http://kdd.ics.uci.edu/ http://www.ics.uci.edu/∼mlearn/MLRepository.html
19
part
partsupp
lineitem
supplier
customer
orders
nation
region
Figure 5: The IND-Graph representation of TPC-H schema. where D is the number of individual data values present in the table. For example, the D in the row ({1, 2, 3}, a, 10) of a nested table is 5. High value for NestRatio shows that we did not perform unnecessary nesting and high value for ValueRatio shows that the nesting removed a lot of redundancy. In our experimentation9 , we observed that most of the attempted nestings are successful, and hence our optimization rules are quite efficient. In Table 10, we see that nesting was useful for all the data sets except for the Bupa data set. Also nesting was especially useful for the Car data set, where the size of the nested table is only 6% of the original data set. Time required for nesting is an important parameter, and it jointly depends on the number of attempted nestings and the number of tuples. The number of attempted nestings depends on the number of attributes, and increases drastically as the number of attributes increases. This is observed for the Flare data set, where we have to do nesting on 13 attributes.
6.3
CoT Results
For testing CoT, we need some well-designed relational schema where tables are interconnected via inclusion dependencies. For this purpose, we use the TPC-H schema v 1.3.010 , which is an ad-hoc, decision support benchmark and has 8 tables and 8 inclusion dependencies. The IND-Graph for the TPC-H schema is shown in Figure 5. CoT identified two top-nodes – part and region, and eventually generated the XML document having interwoven hierarchical structures; six of the eight inclusion dependencies are mapped using sub-element, and the remaining two are mapped using IDREF attributes. Figure 6 shows a comparison of the number of data values originally present in the database, and the number of data values in the XML document generated by FT and CoT. Because FT is a flat translation, 9 10
Available at http://www.cs.ucla.edu/∼mani/xml http://www.tpc.org/tpch/spec/h130.pdf
20
# of data values in XML document
250000
FT CoT
200000
150000
100000
50000
0
0
0.5
1 1.5 size of TPC-H raw data (MB)
2
2.5
Figure 6: Size comparison of two algorithms. the number of data values in the XML document generated by FT is the same as the number of data values in the original data. However, CoT is able to decrease the number of data values in the generated XML document by more than 12%.
7
Conclusion
We have presented a method to transform a relational schema to an XML schema, and two methods to transform an XML schema to a relational schema, both in structural and semantic aspects. All three algorithms are “correct” in the sense that they all have preserved the original information of relational schema. For instance, using the notion of information capacity [59], a theoretical analysis for the correctness of our translation procedures is possible; we can actually show that CPI, NeT and CoT algorithms are equivalence preserving transformations. Despite the difficulties in conversions between XML and relational models, there are many practical benefits. We strongly believe that devising more accurate and efficient conversion methodologies between XML and relational models is important. The prototypes of our algorithms are available at: http://www.cobase.cs.ucla.edu/projects/xpress/
21
References [1] L. Al-Jadir and F. El-Moukaddem. “F2/XML: Storing XML Documents in Object Databases”. In Int’l Conf. on Object Oriented Info. Systems (OOIS), Montpellier, France, Sep. 2002. [2] M. Andersson. “Extracting an Entity Relationship Schema from a Relational Database through Reverse Engineering”. In Int’l Conf. on Conceptual Modeling (ER), Manchester, U.K., Dec. 1994. [3] P. Atzeni and R. Torlone. “Schema Translation between Heterogeneous Data Models in a Lattice Framework”. In IFIP Working Conf. on Data Semantics (DS), Atlanta, GA, 1985. [4] S. Banerjee, V. Krishnamurthy, M. Krishnaprasad, and R. Murthy. “Oracle8i - The XML Enabled Data Management System.”. In IEEE ICDE, San Diego, CA, Feb. 2000. [5] C. Batini, S. Ceri, and S. B. Navathe. “Conceptual Database Design: An Entity-Relationship Approach”. The Benjamin/Cummings Pub., 1992. [6] P. Bernstein, A. Halevy, and R. Pottinger. “A Vision of Management of Complex Models ”. ACM SIGMOD Record, 29(3):55–63, Dec. 2000. [7] V. Bisova and K. Richta. “Transformation of UML Models into XML”. In ADBIS-DASFAA Symp. on Advances in Databases and Information Systems, Prague, Czech Republic, Sep. 2000. [8] M. Blaha, W. Premerlani, and H. Shen. “Converting OO Models Into RDBMS Schema”. IEEE Software, 11(3):28–39, 1994. [9] A. Boccalatte, D. Giglio, and M. Paolucci. “An Object-Oriented Modeling Approach Based on Entity-Relationship Diagrams and Petri Nets”. In IEEE Int’l conf. on Systems, Man and Cybernetics (SMC), San Diego, CA, Oct. 1998. [10] G. Booch, M. Christerson, M. Fuchs, and J. Koistinen. “UML for XML Schema Mapping Specification”. http://www.rational.com/media/uml/resources/media/uml xmlschema33.pdf. [11] G. Booch, J. Rumbaugh, and I. Jacobson. “The Unified Modeling Language User Guide”. AddisonWesley, 1998. 22
[12] R.
Bourret.
“XML
and
Databases”.
Web
page,
Sep.
1999.
http://www.rpbourret.com/xml/XMLAndDatabases.htm. [13] T. Bray, J. Paoli, and C. M. Sperberg-McQueen (Eds). “Extensible Markup Language (XML) 1.0 (2nd Edition)”. W3C Recommendation, Oct. 2000. http://www.w3.org/TR/2000/REC-xml-20001006. [14] P. Buneman, W. Fan, and S. Weinstein. “Path Constraints in Semistructured and Structured Databases”. In ACM PODS, Seattle, WA, Jun. 1998. [15] M. Carey, D. Florescu, Z. Ives, Y. Lu, J. Shanmugasundaram, E. Shekita, and S. Subramanian. “XPERANTO: Publishing Object-Relational Data as XML”. In Int’l Workshop on the Web and Databases (WebDB), Dallas, TX, May 2000. [16] P. P. Chen. “The Entity-Relationship Model - Toward a Unified View of Data”. ACM Trans. on Database Systems (TODS), 1(1):9–36, 1976. [17] Y. Chen, S. Davidson, and Y. Zheng. “Constraint Preserving XML Storage in Relations”. In Int’l Workshop on the Web and Databases (WebDB), Madison,WI, Jun. 2002. [18] J. M. Cheng and J. Xu. “XML and DB2”. In IEEE ICDE, San Diego, CA, Feb. 2000. [19] V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. “From Structured Document to Novel Query Facilities”. In ACM SIGMOD, Minneapolis, MN, Jun. 1994. [20] E. F. Codd. “A Relational Model of Data for Large Shared Data Banks”. Comm. ACM, 13(6):377– 387, 1970. [21] R. Conrad, D. Scheffner, and J. C. Freytag. “XML Conceptual Modeling using UML”. In Int’l Conf. on Conceptual Modeling (ER), Salt Lake City, UT, Oct. 2000. [22] R. d. S. Mello and C. A. Heuser. “A Bottom-Up Approach for Integration of XML Sources”. In Int’l Workshop on Information Integration on the Web (WIIW), Rio de Janeiro, Brazil, Apr. 2001. [23] S. Davidson, W. Fan, C. Hara, and J. Qin. “Propagating XML Constraints to Relations”. In IEEE ICDE, Bangalore, India, Mar. 2003. 23
[24] K. H. Davis and A. K. Arora. “Converting A Relational Database Model into an Entity-Relationship Model”. In Int’l Conf. on Conceptual Modeling (ER), New York, NY, Nov. 1987. [25] A. Deutsch, M. F. Fernandez, and D. Suciu. “Storing Semistructured Data with STORED”. In ACM SIGMOD, Philadephia, PA, Jun. 1998. [26] R. dos S. Mello and C. A. Heuser. “A Rule-Based Conversion of a DTD to a Conceptual Schema”. In Int’l Conf. on Conceptual Modeling (ER), Yokohama, Japan, Nov. 2001. [27] D. W. Embley and W. Y. Mok. “Developing XML Documents with Guranteed “Good” Properties”. In Int’l Conf. on Conceptual Modeling (ER), Yokohama, Japan, Nov. 2001. [28] C. Fahrner and G. Vossen.
“Transforming Relational Database Schemas into Object-Oriented
Schemas according to ODMG-93”. In Int’l Conf. on Deductive and Object-Oriented Databases (DOOD), Singapore, Dec. 1995. [29] W. Fan and J. Sim´eon. “Integrity Constraints for XML”. In ACM PODS, Dallas, TX, May 2000. [30] M. F. Fernandez, W.-C. Tan, and D. Suciu. “SilkRoute: Trading between Relations and XML”. In Int’l World Wide Web Conf. (WWW), Amsterdam, Netherlands, May 2000. [31] D. Florescu and D. Kossmann. “Storing and Querying XML Data Using an RDBMS”. IEEE Data Eng. Bulletin, 22(3):27–34, Sep. 1999. [32] M. N. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. “XTRACT: A System for Extracting Document Type Descriptors from XML Documents”. In ACM SIGMOD, Dallas, TX, May 2000. [33] M. Gogolla, A. K. Huge, and B. Randt. “Stepwise Re-Engineering and Development of ObjectOriented Database Schemata”. In Int’l Workshop on Database and Expert Systems Applications, Vienna, Austria, Aug. 1998. [34] J. Hou, Y. Zhang, and Y. Kambayashi. “Object-Oriented Representation for XML Data”. In Int’l Symp. on Cooperative Database Systems for Advanced Applications (CODAS), Beijing, China, Apr. 2001. 24
[35] G. Jaeschke and H.-J. Schek. “Remarks on the Algebra of Non First Normal Form Relations”. In ACM PODS, Los Angeles, CA, Mar. 1982. [36] S. Jajodia. “On Equivalence of Relational and Network Database Models”. Information Processing Letters, 20(1):51–54, 1985. [37] S. Jajodia and P. A. Ng. “Translation of Entity-Relationship Diagrams into Relational Structures”. J. Systems and Software (JSS), 4(2/3):123–133, Jul. 1984. [38] M. R. Jensen, T. H. Moller, and T. B. Pedersen. “Converting XML Data to UML Diagrams for Conceptual Data Integration”. In Int’l Workshop on Data Integration Over the Web (DIWeb), Interlaken, Switzerland, Jun. 2001. [39] W. Ji, C. R. Carlson, and D. Dreyer. “An Algorithm Converting Relational Schemas to Nested Entity-Relationship Schemas”. In Int’l Conf. on Conceptual Modeling (ER), San Mateo, CA, Oct. 1991. [40] C.-C. Kanne and G. Moerkotte. “Efficient Storage of XML Data”. In IEEE ICDE, San Diego, CA, Feb. 2000. [41] G. Kappel, E. Kapsammer, S. Rausch-Schott, and W. Retschitzegger. “X-Ray - Towards Integrating XML and Relational Database Systems.”. In Int’l Conf. on Conceptual Modeling (ER), pages 339– 353, Salt Lake City, UT, Oct. 2000. [42] M. Klettke and H. Meyer. “XML and Object-Relational Database Systems - Enhancing Structural Mappings Based on Statistics”. In Int’l Workshop on the Web and Databases (WebDB), Dallas, TX, May 2000. [43] W. Kozaczynski and L. Lilien. “An Extended Entity-Relationship (ER) Database Specification and its Automatic Verification and Transformation into the Logical Relational Design”. In Int’l Conf. on Conceptual Modeling (ER), New York, NY, Nov. 1987. [44] N. Lammari. “An Algorithm to Extract IS-A Inheritance Hierarchies from a Relational Database”. In Int’l Conf. on Conceptual Modeling (ER), Paris, France, Nov. 1999. 25
[45] D. Lee and W. W. Chu. “Comparative Analysis of Six XML Schema Languages”. ACM SIGMOD Record, 29(3):76–87, Sep. 2000. [46] D. Lee and W. W. Chu. “Constraints-preserving Transformation from XML Document Type Definition to Relational Schema”. In Int’l Conf. on Conceptual Modeling (ER), pages 323–338, Salt Lake City, UT, Oct. 2000. [47] D. Lee and W. W. Chu. “CPI: Constraints-Preserving Inlining Algorithm for Mapping XML DTD to Relational Schema”. J. Data & Knowledge Engineering (DKE), 39(1):3–25, Oct. 2001. [48] D. Lee, M. Mani, F. Chiu, and W. W. Chu. “Nesting-based Relational-to-XML Schema Translation”. In Int’l Workshop on the Web and Databases (WebDB), Santa Barbara, CA, May 2001. [49] D. Lee, M. Mani, F. Chiu, and W. W. Chu. “NeT & CoT: Translating Relational Schemas to XML Schemas using Semantic Constraints”. In ACM CIKM, McLean, VA, Nov. 2002. [50] M.-L. Lee, S. Y. Lee, T. W. Ling, G. Dobbie, and L. A. Kalinichenko. “Designing Semistructured Databases: A Conceptual Approach”. In Int’l Conf. on Database and Expert Systems Applications (DEXA), Munich, Germany, Sep. 2001. [51] Y. E. Lien. “Hierarchical Schemata for Relational Databases”. ACM Trans. on Database Systems (TODS), 6(1):48–69, Mar. 1981. [52] Y. E. Lien. “On the Equivalence of Database Models”. J. ACM, 29(2):333–362, Apr. 1982. [53] J. Madhavan, P. A. Berstein, and E. Rahm. “Generic Schema Matching with Cupid”. In VLDB, Roma, Italy, Sep. 2001. [54] E. Marcos, B. Vela, and J. M. Cavero. “Extending UML for Object-Relational Database Design”. In Int’l Conf. on the Unified Modeling Language (UML), Toronto, Ontario, Canada, Oct. 2001. [55] G. Margrave, E. L. Lusk, and R. A. Overbeek. “Tools for the Creation of IMS Database Designs from Entity-Relationship Diagrams”. In Int’l Conf. on Conceptual Modeling (ER), Anaheim, CA, 1983. 26
[56] P. McBrien and A. Poulovassilis. “A Uniform Approach to Inter-Model Transformations”. In Conf. on Advanced Information Systems Engineering (CAiSE), Heidelberg, Germany, Jun. 1999. [57] E. Meier, R. Dippold, J. Mercerat, A. Muriset, J.-C. Untersinger, R. Eckerlin, and Flavio Ferrara. “Hierarchical to Relational Database Migration”. IEEE Software, 11(3):21–27, 1994. [58] R. J. Miller, L. Haas, and M. A. Hernandez. “Schema Mapping as Query Discovery”. In VLDB, Cairo, Egypt, Sep. 2000. [59] R. J. Miller, Y. E. Ioannidis, and R. Ramakrishnan. “Schema Equivalence in Heterogeneous Systems: Bridging Theory and Practice (Extended Abstract)”. In EDBT, Cambridge, UK, Mar. 1994. [60] W.-Y. Mok and D. P. Paper.
“On Transformations from UML Models to Object-Relational
Databases”. In Hawaii Int’l Conf. on System Sciences (HICSS), Maui, HI, Jan. 2001. [61] S. B. Navathe. “An Intuitive Approach to Normalize Network Structured Data”. In VLDB, Montreal, Quebec, Canada, Oct. 1980. [62] S. B. Navathe and A. Cheng. “A Methodology for Database Schema Mapping from Extended Entity-Relationship Models into the Hierarchical Model”. In Int’l Conf. on Conceptual Modeling (ER), Anaheim, CA, 1983. [63] Y. Papakonstantinou and P. Velikhov. “Enhancing Semistructured Data Mediators with Document Type Definitions”. In IEEE ICDE, pages 136–145, Sydney, Austrialia, Mar. 1999. [64] W. J. Premerlani and M. R. Blaha. “An Approach for Reverse Engineering of Relational Databases”. Comm. ACM, 37(5):42–49, 1994. [65] X. Qian. “Correct Schema Transformations”. In EDBT, Avignon, France, 1996. [66] A. Ramfos, N. J. Fiddian, and W. A. Gray. “A Meta-Translation System for Object-Oriented to Relational Schema Translations”. In British National Conf. on Databases (BNCOD), ButterworthHeinemann, 1991.
27
[67] A. Schmidt, M. L. Kersten, M. Windhouwer, and F. Waas. “Efficient Relational Storage and Retrieval of XML Documents”. In Int’l Workshop on the Web and Databases (WebDB), Dallas, TX, May 2000. [68] J. Shanmugasundaram, E. J. Shekita, R. Barr, M. J. Carey, B. G. Lindsay, H. Pirahesh, and B. Reinwald. “Efficiently Publishing Relational Data as XML Documents”. In VLDB, Cairo, Egypt, Sep. 2000. [69] J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, and J. Naughton. “Relational Databases for Querying XML Documents: Limitations and Opportunities”. In VLDB, Edinburgh, Scotland, Sep. 1999. [70] T. Shimura, M. Yoshikawa, and S. Uemura. “Storage and Retrieval of XML Documents using ObjectRelational Databases”. In Int’l Conf. on Database and Expert Systems Applications (DEXA), pages 206–217, Florence, Italy, Aug. 1999. [71] D. Steinberg, Y. Raz, and E. Kantorowitz. “Translating ERROL, a High Level, ER, Structured English Languages for DBTG Databases”. In Int’l Conf. on Conceptual Modeling (ER), New York, NY, Nov. 1987. [72] V. Turau.
“Making Legacy Data Accessible for XML Applications”.
Web page, 1999.
http://www.informatik.fh-wiesbaden.de/∼turau/veroeff.html. [73] J. Winans and K. H. Davis. “Software Reverse Engineering from a Currently Existing IMS Database to an Entity-Relationship Model”. In Int’l Conf. on Conceptual Modeling (ER), Lausanne, Switzerland, Oct. 1990. [74] E. Wong and R. H. Katz. “Logical Design and Schema Conversion for Relational and DBTG Databases”. In Int’l Conf. on Conceptual Modeling (ER), Los Angeles, CA, 1979. [75] P. T. Wood. “Optimizing Web Queries Using Document Type Definitions”. In Int’l Workshop on Web Information and Data Management (WIDM), pages 28–32, Kansas City, MO, Nov. 1999.
28
[76] R. Xiaou, T. S. Dillon, E. Chang, and L. Feng. “Modeling and Transformation of Object-Oriented Conceptual Models into XML Schema”. In Int’l Conf. on Database and Expert Systems Applications (DEXA), Munich, Germany, Sep. 2001.
29