Available online at www.sciencedirect.com
ScienceDirect
Available online at www.sciencedirect.com Procedia Computer Science 00 (2017) 000–000
ScienceDirect
www.elsevier.com/locate/procedia
Procedia Computer Science 110 (2017) 422–427
The International Workshop on Big Data and Networks Technologies (BDNT 2017)
Integrating XML and Relational Data Hassana NASSIRI a *, Mustapha MACHKOUR b, Mohamed HACHIMI a b
a Laboratory of Engineering Sciences, University Ibn Zohr, Faculty of Sciences, Agadir, Morocco Laboratory of the Computing Systems and Vision, University Ibn Zohr, Faculty of Sciences, Agadir, Morocco
Abstract For the past few decades, there has been a significant interest in data integration, and lots of work has been introduced in this field. Herein we discussed a way to handle the problem of integrating heterogeneous data model, namely relational and XML. Since the relational model is the most data model used to manage data for years. Similarly, XML is rapidly becoming more and more popular as a standard format for exchanging information. In such way, building a sort of connection bridging these two models is clearly a need. To this point, we aim to define a system to extract data regardless of the nature of their model and make one query enough to retrieve data from different models, which are XML and relational in our case. Accordingly, users do not have to be familiar with multiple query language syntax at a time, they can extract data represented in both models by any query language, even if it is not the correspondent query language of the model in question. © 2017 The Authors. Published by Elsevier B.V. © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs. Peer-review under responsibility of the Conference Program Chairs. Keywords: XML; XPath; SQL; Relational Model; Model Integration
1. Introduction This paper addresses the problem of accessing data represented in different data models by using a unique query language. Since users may not be familiar with multiple query language syntax at a time, we need to facilitate accessing data by making one single query in any query language enough to retrieve data on any data model. As a start, the models under study are the relational and XML (eXtensible Markup Language), and we aim to integrate others especially the most used and famous ones. Why exactly these two models? In one hand, relational databases
* Corresponding author. Tel.: +212 699-702925 E-mail address:
[email protected] 1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs.
1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs. 10.1016/j.procs.2017.06.107
2
Hassana NASSIRI et al. / Procedia Computer Science 110 (2017) 422–427 Author name / Procedia Computer Science 00 (2015) 000–000
423
are still very essential and critical infrastructure in most organizations and have utility widely used to manage and maintain a large volume of data. On the other hand, XML has received considerable attention due to its multiple benefits, especially as it is auto-descriptive, extensible and usable in all fields of applications. Last, they are complementary in practice. There have been many attempts to query XML and relational data. Moreover, with XML becomes the lingua franca of data interchange increasingly, many research has been done to query XML Database using relational database system1,2. Furthermore, others have been focused on designing general systems to manage XML among other data formats3. Such approaches have great opportunities as it has some limitations as well 4. Our purpose, therefore, is to define a system for querying data stored in different models, such as the relational and XML with any query language of these models. That means that users do not need to know each query language of each data model, one query language is enough and will meet the purpose, even if the language is none of the corresponding ones of the model in question. Hence, our proposal system will be independent of the data model and the query language as well. The remainder of this work is organized as follows. Section 2 introduces some terms related to our zone. Section 3 describes some related work and compares our approach to them. Section 4 explains the main idea of our approach. Finally, Section 5 summarizes our contribution. 2. Terminology 2.1. XMLType Since Release 1 version 9i, the XMLType data type was introduced to facilitate the native management of XML documents by providing different storage and indexing models to best fit the kind of XML data and the pattern of its use5. The data type XMLType provides a number of the features, most relating to XML (validation of XML schema and XSL transformation), other which relate to Structured Query Language (SQL), like: definition of a column of a table (playing the role of a type of data), definition of a table (playing the role of an object type), declaration of variables in PL/SQL, and, call for stored PL/SQL Procedures. 2.2. SQL/XML Since SQL is the standard language for the access and the management of data stored in relational Database Management System (RDBMS), it is natural that businesses and consumers around the world want to have the opportunity to integrate their XML data into relational data, through the language of SQL. In the second half of 2000, the SQL/XML project6 was approved, it is a new part of SQL, which is XML-Related specifications, it was then only the infrastructure7. After that, SQL/XML have known a few small corrections adding new features8,9 and making good progress. Besides, different software providers have made their implementations standard including IBM Corporation and Oracle Corporation. SQL/XML Introduced XML data types and added a number of functions to SQL in such a way that it is possible to build XML elements and attributes from relational. This language includes data type XMLType, functions of XPath extraction, and, integration of XQuery features to SQL. 2.3. XML DB In Oracle Database 12c, the module XML DB 10,11,12 provides new features so users can get all the benefits of relational database technology plus all the advantages of the W3C's XML standards. This technology expands the capabilities of the relational database by providing full support for all of the key XML standards. The module XML DB offers an independent structure for the storage and management of XML data. Regardless of the storage mode, the functions of extraction of the XML DB produce the same results. It offers the following features: an XML data type which is used to store and manage XML content; a collection of methods and SQL operators to act on the XML; and, the possibility to save a schema consistent with the W3C in the Oracle database.
424
Hassana NASSIRI et al. / Procedia Computer Science 110 (2017) 422–427 Author name / Procedia Computer Science 00 (2015) 000–000
3
Oracle XML DB also supports the SQL/XML standard which includes many functions to construct XML data. This allows SQL-centric development techniques to be used to publish XML directly from relational data stored in Oracle Database 12c. 3. Related work Recent RDBMS such as Oracle supports some kind of uniform querying of mixed relational and XML data. As mentioned before, Oracle XMLDB technology extends the possibilities of the relational database by offering all the features of an XML database and provides an independent structure for the storage and management of XML data. Furthermore, there has been various work related to XML and relational database and variant studies trying to figure out a link between these models in order to efficiently store and query data, through varied approaches: XML views over relational, using RDBMS to store XML data and query rewrite and translation. According to the systems studied in literature and based on our understanding of them, these approaches focused on querying data via various directions: relational to XML sense or XML to relational sense by a translation tool to transform XML queries into SQL or the opposite. In relational to XML sense, here are some approaches that go with it: the ROX (Relational over XML) project at IBM13, presents a way to efficiently support relational over XML and the SQL to XQuery translation approach, and discuss the feasibility of querying natively stored XML data through SQL interfaces. It is an approach that provides access to relational data, based on SQL along relational views over XML. Likewise, The BEA AquaLogic Data Services Platform14, a unified, service-oriented, XML-based view of data from heterogeneous information sources, which can be queried using XQuery. It proposes a framework that can transform SQL statements to XPath expressions. Also, Vidhya et al15 examines how XML data can be queried using XPath or SQL and introduces a framework where SQL statements can be transformed to XPath queries that enable users to access XML and relational database through SQL. Again, Kore et al16 designs an SQL interface for XML databases that can convert SQL queries to XPath expressions and extract data from XML documents. And, Bhargavi et al17 proposes a framework for converting SQL join (Left, Right, and Full) queries into XPath expressions to allow users to access XML database through SQL queries only. In the other sense, i.e. XML to relational, some techniques are discussed here: Tatarinov et al18 explains the manner to support XML ordered data model using a relational database system, by encoding order as a data value. It proposes three order encoding methods and algorithms for translating ordered XPath expressions into SQL using these encoding methods. In the same token, Bin Chiu et al19 presents a way to process queries over XML by RDBMS through mapping XML documents/schema to RDBMS schema and use XQuery to retrieve XML documents. It introduces a system to store in and retrieve XML documents from a relational database system. XML documents are translated into tables in the relational database and stored in a shredded schema and could be queried by the query language XQuery. The input is the user’s query, and then the XQuery expressions are translated into SQL queries. The results are transformed into XML documents and returned to the user as output. Whereas, Manolescu et al20 presents a methodology for integrating heterogeneous data sources, including relational and treestructured data sources, under an XML global schema, which is implemented in the Agora data integration system21, and explains their approach for translating XQuery to SQL. Moreover, Fan et al22 discusses the Query translation in the presence of recursive data schemas and provides algorithms for rewriting an XPATH query into an equivalent XPATH query over a recursive DTD. DeHaan et al23 addresses two issues: the translation of XQuery expressions to SQL statements and the development of efficient execution strategies of the resulting queries. The proposed techniques target a relational implementation, but it can be used within native XML system too. Additionally, Chen et al24 presents BLAS25 a Bi-LAbeling based XPath Processing System, as a generic and efficient system for XML storage and XPath query processing by leveraging relational databases. Also, it represents algorithms for translating XPath query to SQL query. What makes our approach different and new is that it works on double sense, we mean that there is a way to query relational over XML and vice versa. In other words, it is possible for users to access and extract data with either SQL or XPath from heterogeneous models (relational and XML case), so they do not need to control the use of the two different types of query languages to retrieve data on various database systems. It is an independent system of either data model and query language as well. On the other hand, our approach is valid with or without the
4
Hassana NASSIRI et al. / Procedia Computer Science 110 (2017) 422–427 Author name / Procedia Computer Science 00 (2015) 000–000
425
need of storing XML in RDBMS. Also, it eliminates the need of learning another language to manipulate hybrid data such as SQL/XML, just SQL or XPath can do the task. 4. Our system 4.1. Motivation Here a conversation between a user (A), and what existed approaches (B) can offer, which personalizes some of the problems that users can face in the database world. • • • • • •
A: “Oh no I have some data sources in XML and other ones in relational databases; I know I can use SQL to retrieve data in the relational model, but what I’m supposed to do in XML case?” B: “It is possible to query your XML and relational with SQL, we can translate your SQL so you can access both models.” B: “You can manage your XML data using RDBMS by choosing what storage mode fit your situation the most.” A: “My colleague only knows XPath, he does not have a way to access data, right?” B: “Sorry, we are afraid that you are right.” Well, the answer given through our project is “do not panic, you do not have to do any extra work or your colleague neither, use SQL or XPath for the two cases, and we will take care of the rest to let you get the query result in the right form.”
4.2. Objective Each database query language is specific to a particular data model, for example, SQL to extract data from the relational and XPath or XQuery for querying XML. Then, it is difficult for those users to retrieve data because they need the correspondent query language. Hence, to overcome this and make it easier for them to get what they want with less effort, we aim to make one query - no matter what the query language can be - sufficient to retrieve data even if it is none the correspondent data model. Also, the user does not have to store or manage XML data using RDBMS also no need to make any physical changes at the databases level. We discussed here a way to handle the problem of integrating relational and XML data to support both XPath and SQL queries. Fig. 1 explains this goal; it shows that with SQL we can extract data in relational and in XML model, the same as with XPath.
Fig. 1. One query to retrieve data from XML and/or relational data models
4.3. Challenges Many defies may make the task difficult, the most challenging aspect that needs to be addressed is how to efficiently bridge semantics gaps between these technologies. Table 1 shows some of these differences in brief.
Hassana NASSIRI et al. / Procedia Computer Science 110 (2017) 422–427 Author name / Procedia Computer Science 00 (2015) 000–000
426
5
Table 1. XML and Relational differences. Relational
XML
Regular structure
Heterogeneous structure
Flat data
Nested elements on several levels
The order has no importance
The Order has an importance
Static schemas
Schemas tend to be more extensible
Always have a schema
May or may not have a schema
4.4. The infrastructure The groundwork for the bulk of our approach involves two phases, as presented in fig. 2. In the first phase, we generate a Universal Query Language (UQL), where the input is the user Query. In the second one, we identify the query through the Common Model of query specification (CMQS).
Fig. 2. (a) Logical infrastructure of the system; (b) Retrieving data from heterogeneous models
The Initial query can be written with either SQL or XPath. UQL presents our Intermediate Query Language (IQL). Why using an IQL? Because it can operate on the lowest level semantics, can increase possibilities to use more transformations and optimizations, can be an aid to switch between several query languages, and conversion between two languages will be through it. CMQS is the abstract layer where we specify queries against the specific data model to extract data. As shown in Fig.3, the procedure begins with one query, which will be decomposed into a set of sub-queries, each query interrogates the suitable model resulting from an answer. Then, the answers to all these queries are recomposed to form an answer to the initial query. Besides, we will be able to provide the user the answer according to the query language used in the first place. For instance, if the initial query was made using SQL, then the result will be displayed in a tabular form.
6
Hassana NASSIRI et al. / Procedia Computer Science 110 (2017) 422–427 Author name / Procedia Computer Science 00 (2015) 000–000
427
5. Conclusion and outlook We try to extract data with both SQL and XPath. In fact, even if query languages are specific to a particular data model, we will be able to query a data model with the database query language of the other (with the noncorresponding Query language of the concerning model). Hence, we have built a system that can extract data independently from the query language and the storage model of data. The relational model is the most data model used to manage data for years. Similarly, XML is rapidly becoming popular, and its importance as a standard format for the exchange of information with management more and more powerful of the documents is very remarkable in the last years. This creates the need for building some bridge between the two. Due to that, we choose to make them the first models under study in this contribution. In the future work, we will aim to integrate further data models to our system, so it can be independent of any data model as possible as we can, at least for the most known ones. References 1. D. Florescu and D. Kossmann, “Storing and Querying XML Data using an RDMBS,” IEEE Data Eng. Bull., vol. 3, pp. 27–34, 1999. 2. J. Shanmugasundaram, E. Shekita, J. Kiernan, R. Krishnamurthy, E. Viglas, J. Naughton and I. Tatarinov, “A general technique for querying XML documents using a relational database system,” ACM SIGMOD Rec., vol. 30, no. 3, p. 20, 2001. 3. M. Rys, D. Chamberlin, and D. Florescu, “XML and Relational Database Management Systems: the Inside Story,” SIGMOD ’05 Proc. 2005 ACM SIGMOD Int. Conf. Manag. data, pp. 945–947, 2005. 4. J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. De- Witt, and J. Naughton. Relational Databases for Querying XMLDocuments: Limitations and Opportunities. In VLDB, 1999. 5. Oracle Corporation, “Oracle XML DB: Choosing the best XMLType Storage Option for Your Use Case,” no. October, 2009. 6. D. Mann and P. Northwest, “Iso / Iec Jtc 1 / Sc 32 N00575 H2-2000-331R2,” pp. 1–6, 2000. 7. A. Eisenberg and J. Melton, “SQL/XML and the SQLX Informal Group of Companies”, ACM SIGMOD Record, Vol. 30 No. 3, Sept. 2001 8. A. Eisenberg and J. Melton, “SQL/XML is making good progress,” ACM SIGMOD Rec., vol. 31, no. 2, p. 101, 2002. 9. A. Eisenberg and J. Melton, “Advancements in SQL/XML,” SIGMOD Rec., vol. 33, no. 3, pp. 79–86, 2004. 10. R. Murthy and S. Banerjee, “Xml schemas in Oracle XML DB,” Proc. 29th Int. Conf. Very large databases, vol. 29, pp. 1009–1018, 2003. 11. M. Krishnaprasad, Z. H. Liu, A. Manikutty, J. W. Warner, V. Arora, and S. Kotsovolos, “Query Rewrite for XML in Oracle XML DB,” Data Base, 2004. 12. Z. H. Liu, M. Krishnaprasad, and V. Arora, “Native XQuery processing in oracle XMLDB,” Proc. ACM SIGMOD Int. Conf. Manag. Data, pp. 828–833, 2005. 13. A. Halverson, V. Josifovski, G. Lohman, H. Pirahesh, and M. Mörschel, “ROX: Relational Over XML,” Proc. 30th Int. Conf. Very Large Data Bases, pp. 264–275, 2004. 14. S. Jigyasu, S. Banerjee, V. Borkar, M. Carey, K. Dixit, A. Malkani and S. Thatte, “SQL to XQuery translation in the aquaLogic data services platform,” Proc. - Int. Conf. Data Eng., vol. 2006, p. 97, 2006. 15. P. M. Vidhya and P. Samuel, “Query translation from SQL to XPath,” 2009 World Congr. Nat. Biol. Inspired Comput. NABIC 2009 - Proc., pp. 1749–1752, 2009. 16. H. A. Kore, S. D. Hivarkar, N. K. Pathak, R. S. Bakle, and P. S. S. Kaushik, “Querying XML Documents by Using Relational Data base System,” vol. 3, no. 3, pp. 5322–5324, 2014. 17. K. Bhargavi and H. S. Chaithra, “Join queries translation from SQL to XPath,” 2013 IEEE Int. Conf. Emerg. Trends Comput. Commun. Nanotechnology, ICE-CCN 2013, no. Iceccn, pp. 346–349, 2013. 18. I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang, “Storing and querying ordered XML using a relational database system,” 2002 ACM SIGMOD Int. Conf. Manag. Data, SIGMOD’02, no. October, pp. 204–215, 2002. 19. Y. Bin Chiu, H. H. Chen, C. Y. Liu, S. C. Chen, and C. W. Hung, “Efficient Storage and Retrieval of XML Documents Using XQuery,” Adv. Mater. Res., vol. 779–780, pp. 1685–1688, Sep. 2013. 20. I. Manolescu, D. Florescu, and D. Kossmann, “Answering XML Queries on Heterogeneous Data Sources.,” Vldb, vol. 1, pp. 241–250, 2001. 21. I. Manolescu, D. Florescu, D. Kossmann, F. Xhumari, and D. Olteanu, “Agora: Living with XML and relational,” Vldb, pp. 623–626, 2000. 22. W. Fan, J.X. Yu, H. Lu, J. Lu, R. Rastogi, Query translation from XPATH to SQL in the presence of recursive DTDs, in: Proceedings of International Conference on Very Large Data Bases (VLDB), 2005, pp. 337–348 23. D. DeHaan, D. Toman, M. P. Consens, and M. T. Özsu, “A comprehensive XQuery to SQL translation using dynamic interval encoding,” Proc. SIGMOD 2003, pp. 623–634, 2003. 24. Y. Chen, S. B. Davidson, and Y. Zheng, “A bi-labeling based XPath processing system,” Inf. Syst., vol. 35, no. 2, pp. 170–185, 2010. 25. Y. Chen, S. B. Davidson, and Y. Zheng, “BLAS: An Efficient XPath Processing System,” Proc. ACM SIGMOD Int. Conf. Manag. data, pp. 47–58, 2004.