Missing:
Spring 2000
The eXtensible Markup Language: An Introduction to XML Documents & Databases
1
Christophides Vassilis
Spring 2000
Preliminary Issues
Christophides Vassilis
2
1 1
Spring 2000
What is a document?
●
●
●
Content: the components (words, images etc). which make up a document Structure: the organization and inter-relationship of the components Presentation: how a document looks and what processes are applied to it
3
Christophides Vassilis
Spring 2000
Separating these things means... ●
●
●
●
The content can be re-used ◆ for printing ◆ for querying ◆ for exchanging The structure can be formally validated The presentation can be customized for ◆ different media ◆ different audiences … in short, the information can be uncoupled from its processing
Christophides Vassilis
4
2 2
Spring 2000
Documents vs Databases Document world ●
plenty of small documents ◆ usually
●
implicit structure
●
●
◆ machine ●
◆ data, ●
methods
paradigms
as”, WYSIWYG name, date, subject
friendly
content
◆ Data
Independence, Transaction Management, Query Languages
metadata ◆ author
types
records
annotation
paradigms
dynamic
explicit structure ◆
●
content
◆ “Save
◆ usually
friendly
◆ form/layout, ●
a few large databases
paragraph, toc,
tagging ◆ human
●
●
static
◆ section, ●
Database world
●
metadata ◆ schema
description
5
Christophides Vassilis
Spring 2000
DBMS ANSI/SPARC Architecture (;7(51$/ /(9(/ VIEW 1
VIEW 2
VIEW 3
INTEGRATION &21&(378$/ /(9(/
LOGICAL SCHEMA PHYSICAL SCHEMA Christophides Vassilis
,17(51$/ /(9(/ 6
3 3
Spring 2000
What to do with them Database
Documents ●
editing
●
updating
●
spell-checking
●
cleaning
●
counting words ●
querying
●
composing/transforming
●
retrieving (IR)
●
printing
7
Christophides Vassilis
Spring 2000
Query Languages Document Retrieval Claude Monet DQG San Diego Museum of Art
Christophides Vassilis
Database Querying VHOHFW p IURP Artists a, a.artwork p ZKHUH a.first = “Claude” DQG a.last = “Monet” DQG p.located = “San Diego Museum of Art”
8
4 4
Spring 2000
The Long Road of Document Standards
Christophides Vassilis
Rick Jelliffe 1999
9
Spring 2000
What’s Wrong with HTML ●
If written properly, normal HTML may reflect document presentation, but it cannot adequately represent the semantics & structure of data
$UWLVW1DPH
$UWLIDFW7LWOH
%!021(7&ODXGH%!%5! +D\VWDFNVDW&KDLOO\DW6XQULVH%5! 'DWH %5! 'LPHQVLRQV 2LORQFDQYDV%5! 0DWHULDO [FP[LQ %5! ,PDJH 6DQ'LHJR0XVHXPRI$UW%5! 5HIHUHQFH 0XVHXP 3! ,0*65& ³KWWSDUWFKLYH PPRQHWKD\ULFNVMSJ´!
Christophides Vassilis
10
5 5
Spring 2000
HTML Document Presentation vs. …
11
Christophides Vassilis
Spring 2000
… XML Data Representation ●
A possible XML markup of the same information will retain the structure (and the semantics) of the various data objects
ClaudeMonet Haystacks at Chailly at Sunrise 1865 Oil on canvas 3060 11 7/823 3/4 San Diego Museum of Art Christophides Vassilis
12
6 6
Spring 2000
XML can be Published as normal Web Data
13
Christophides Vassilis
Spring 2000
What is XML? ●
Markup Meta-Language for domain or application specific structured documentation ◆ Mathematical,
●
chemical, musical, publishing, etc.
Developed by the SGML Editorial Board formed under the auspices of the World Wide Web Consortium (W3C) ◆ Founded
in 1996 by Jon Bosac (Sun) and various Web/SGML vendors: Textuality, Netscape, Microsoft, INSO, HP, Highland, NCSA, ArbortText, GRIF, SoftQuand
●
Subset of SGML optimized for use in the Inter/Intranet ◆ SGML
is proving difficult to implement for Web/Intranet applications ◆ SGML has been hard to cost-justify to management ●
Opens the way for a new generation of Web applications ◆ Improve precision during searching and retrieval ◆ Enable multiple usage of the same data ◆ Facilitate distributed processing with more versatile
Christophides Vassilis
ways to manipulate data 14
7 7
Spring 2000
Why XML? ●
●
XML provides key features for a new generation of Web applications: ◆ Structuring: unlike HTML it preserves the structure of the data ◆ Extensibility: not a fixed format like HTML but user-oriented tagging ◆ Validation: provides the means to consuming applications to check data for structural validity on importation ◆ Presentation Late Binding: describes data, not visual presentation ◆ Human Readable: similar to HTML ◆ Interchange: good for transmission of data from server to browser, and from application to application, or machine to machine ◆ Open standard: non proprietary format XML becomes an integral part of the Web infrastructure ◆ Microsoft Explorer (V5.0) already offers XML browsing ◆ Ongoing XML implementation by Netscape ◆ Various XML middleware and manipulation tools 15
Christophides Vassilis
Spring 2000
The XML Language Family ●
XML (Extensible Markup Language) ◆A
subset of SGML (ISO 8879) designed for easy implementation
●
XLink (Extensible Linking Language) ◆A
set of standard hypertext mechanisms based on HyTime (ISO/IEC 10744) and the Text Encoding Initiative (TEI)
●
XSL (Extensible Stylesheet Language) ◆A
standard stylesheet language for structured information derived from DSSSL (ISO/IEC 10179) and key CSS concepts
Christophides Vassilis
∆τ N $ 35 0 30 0 25 0 20 0 15 0
Ta gs A t tribu te
10 0
50
0
HTML 0. 0
TEI L ite 1. 6
IB M ID Doc 4. 0
16
8 8
Spring 2000
Interrelationships Among the Various W3C Efforts
17
Christophides Vassilis
Spring 2000
XML Syntax and Semantics
Christophides Vassilis
18
9 9
Spring 2000
An Example of XML Markup (OHPHQW1DPH (OHPHQW&RQWHQW Claude Monet Haystacks at Chailly at Sunrise $WWULEXWH1865 Oil on canvas $WWULEXWH9DOXH 1DPH 3060 11 7/823 3/4 San Diego Museum of Art (PSW\(OHPHQW Christophides Vassilis
19
Spring 2000
The Logical Tree Structure of XML ARTIST NAME
ARTWORK
FIRST
LAST
ARTIFACT
&ODXGH
021(7
TITLE
DATE
DIM
DIM
H W
H W
+D\VWDFNV
IMAGE KD\ULFNVMSJ
2LORQFDQYDV MATERIAL Christophides Vassilis
6DQ 'LHJR LOCATION 0XV
20
10 10
Spring 2000
XML Document Type Definitions '2&7 (/(0(17DUWLVWQDPHERUQGHDWKDUWZRUNQDWLRQDOLW\"LQIOXHQFHV ! $77/,67DUWLVWRLG,'5(48,5('[POODQJ1072.(1,03/,('! (/(0(17QDPHILUVWODVW ! (/(0(17ILUVW3&'$7$ ! (/(0(17ODVW3&'$7$ ! (/(0(17DUWZRUNDUWLIDFW ! (/(0(17DUWIDFWWLWOHGDWHPDWHULDOGLP ORFDWLRQLPDJH ! (/(0(17WLWOH3&'$7$ ! (/(0(17GLPKHLJKWZLGWK ! $77/,67GLPPHWULFFP_LQ µFP¶! (/(0(17ORFDWLRQ3&'$7$ ! (/(0(17LPDJH(037
●
Mixed models must be optional repeatable OR-groups, with #PCDATA first 23
Christophides Vassilis
Spring 2000
What XML can express? ●
Sequence « , » (/(0(17QDPHILUVWODVW !
●
Choice « | » (/(0(17PHGLDLPDJH_YLGHR !
●
Option ( 1 or 0 ) « ? » (/(0(17DUWLVW«QDWLRQDOLW\"«
●
Repetition (1 or more ) « + » (/(0(17DUWZRUNDUWLIDFW !
●
Option and Repetition ( 0, 1 or more ) « * » (/(0(17DUWIDFWGLP !
Christophides Vassilis
24
12 12
Spring 2000
XML Content Models and Regular Expressions ●
Each element content model is defined by a regular expression ◆ Example: QDPHDGGU HPDLO
●
Each regular expression determines a corresponding finite state
automaton ●
This suggests a simple parsing program
DGGU QDPH ●
HPDLO
Content Models should be defined by unambiguous regular expressions 25
Christophides Vassilis
Spring 2000
XML Regular Expressions: Another Example ●
Adding in the optional greet further complicates things ◆ Example: name,address*,(tel | fax)*,email*
DGGUHVV QDPH
HPDLO
WHO WHO
HPDLO
ID[ ID[
Christophides Vassilis
HPDLO
26
13 13
Spring 2000
Definition of Attribut’s Content (value1 | value2 | .. |valuen) value enumeration CDATA string of valid XML character data
$77/,67GLPPHWULFFP_LQ ! GLPPHWULF µLQ¶! $77/,67UHIKUHI&'$7$! UHIKUHI µKWWS DUWFKLYHPPRQHW[PO¶! $77/,67DUWLVWRLG,'! DUWLVWRLG 0RQHW! $77/,67DUWLIDFWFUHDWRU,'5()! DUWLIDFWFUHDWRU 0RQHW!
ID symbolic identifier of an element IDREF(S) reference (or list of) to element’s identifiers ENTITY(IES) $77/,67LPDJHILOH(17,7
More types (e.g., DATE) may soon be part of the standard 27
Christophides Vassilis
Spring 2000
Attribute Default Values ●
●
●
●
Value µYL¶ ◆ a given value from an enumeration of values ),;(' value ◆ the value is the only possible instance for the attribute 5(48,5(' ◆ the value must be supplied ,03/,(' ◆ the value can be optionally supplied
Christophides Vassilis
28
14 14
Spring 2000
XML Entities ●
Entities allow the definition of short strings to stand for more complex information, which can reside inside or outside the document or its DTD
●
Used for substitutions of data or markup: ◆ DTD
level e.g., markup declaration (Parameter entity)
◆ Document ●
level e.g., data and markup instances (General entity)
Used for references to external data or markup sources: ◆ the
content of the entity can be found using an XML system-specific storage location (Specific entity)
◆ the
content of the entity can be found by mapping a public identifier to a system-specific storage location (Public entity)
29
Christophides Vassilis
Spring 2000
XML Parameter Entities ●
Parameter entities are used for extensible declarations (e.g., macros) of complex content models or attributes in a DTD (17,7
42
21 21
Spring 2000
Mapping Between XML and Objects
43
Christophides Vassilis
Spring 2000
XML vs Relational DBMS SURMHFWV WLWOHEXGJHWPDQDJHG%\
HPSOR\HHV QDPHVVQDJH
'2&7 (/(0(17GE SURMHFWVHPSOR\HHV ! (/(0(17SURMHFWVSURMHFW ! (/(0(17HPSOR\HHVHPSOR\HH ! (/(0(17SURMHFWWLWOHEXGJHWPDQDJHG%\ ! (/(0(17HPSOR\HHQDPHVVQDJH ! @!
Christophides Vassilis
'2&7 (/(0(17GESURMHFW_HPSOR\HH ! (/(0(17SURMHFWWLWOHEXGJHWPDQDJHG%\ ! (/(0(17HPSOR\HHQDPHVVQDJH ! @!
44
22 22
Spring 2000
Recursive DTDs '2&7 (/(0(17JHQHDORJ\SHUVRQ ! (/(0(17SHUVRQQDPH GDWH2I%LUWK SHUVRQPRWKHU SHUVRQIDWKHU ! @! ●
What is the problem with this? 45
Christophides Vassilis
Spring 2000
Recursive DTDs cont’d. '2&7 (/(0(17JHQHDORJ\SHUVRQ ! (/(0(17SHUVRQQDPH GDWH2I%LUWK SHUVRQ"PRWKHU SHUVRQ" !IDWKHU @! ●
What is now the problem with this?
Christophides Vassilis
46
23 23
Spring 2000
Some Things are Hard to Specify ● Each employee element is to contain name, age and ssn elements in some order
(/(0(17HPSOR\HH QDPHDJHVVQ _DJHVVQQDPH _VVQQDPHDJH _« ! ●
Suppose there were many more fields !
47
Christophides Vassilis
Spring 2000
Specifying ,' and ,'5()Attributes '2&7 (/(0(17IDPLO\SHUVRQ ! (/(0(17SHUVRQQDPH ! (/(0(17QDPH3&'$7$ ! $77/,67SHUVRQLG,'5(48,5(' PRWKHU,'5(),03/,(' IDWKHU,'5(),03/,(' FKLOGUHQ,'5()6,03/,('! @!
Christophides Vassilis
48
24 24
Spring 2000
Some Conforming XML data IDPLO\! SHUVRQLG MDQHPRWKHU PDU\IDWKHU MRKQ! QDPH!-DQH'RHQDPH! SHUVRQ! SHUVRQLG MRKQFKLOGUHQ MDQHMDFN! QDPH!-RKQ'RHQDPH! SHUVRQ! SHUVRQLG PDU\FKLOGUHQ MDQHMDFN! QDPH!0DU\'RHQDPH! SHUVRQ! SHUVRQLG MDFNPRWKHU ´PDU\IDWKHU MRKQ! QDPH!-DFN'RHQDPH! SHUVRQ! IDPLO\! Christophides Vassilis
49
Spring 2000
An Alternative XML DTD Specification '2&7 (/(0(17IDPLO\SHUVRQ ! (/(0(17SHUVRQPRWKHU"IDWKHU"FKLOGUHQQDPH ! $77/,67SHUVRQLG,'5(48,5('! (/(0(17QDPH3&'$7$ ! (/(0(17PRWKHU(037
50
25 25
Spring 2000
The Revised XML Data IDPLO\! SHUVRQLG MDQH´! QDPH!-DQH'RHQDPH! PRWKHULGUHI PDU\´!PRWKHU! IDWKHULGUHI MRKQ!IDWKHU! SHUVRQ! SHUVRQLG MRKQ´! QDPH!-RKQ'RHQDPH! FKLOGUHQLGUHIV MDQHMDFN!FKLOGUHQ! SHUVRQ! IDPLO\! 51
Christophides Vassilis
Spring 2000
Mapping between XML and Tables
Christophides Vassilis
52
26 26
Spring 2000
Bluring the Frontiers between Data & Documents
53
Christophides Vassilis
Spring 2000
Towards XML-enabled DBMS ●
●
Xml-enabled database system ◆ Store XML data/documents into the database server ◆ Query and search valid and wellformed XML ◆ Generate XML data from the database server ◆ Add XML capabilities in supporting database facilities XML has the potential to impact four important markets ◆ Web integration ◆ Web publishing ◆ Application integration ◆ Electronic commerce
Christophides Vassilis
DBMS Integrate with other facilities Generate XML Store XML
54
27 27
Spring 2000
Storing XML Data ●
Enhance XML storage facilities in the database: ◆ Utilities to load XML data into the database ◆ Provide more efficient database storage (componentized storage, compression, indexing,…) ◆ XML export tools from the server ◆ Allow server-to-server replication of XML data
Database HTML HTML XML
Database 55
Christophides Vassilis
Spring 2000
Querying and Searching XML Data
●
●
Fine-grained access to XML documents Search XML data efficiently ◆ Special SQL queries over valid + well-formed XML ◆ Content-based indexing (e.g. Text indexes) for searching XML data efficiently ◆ Support for XML query languages (e.g. XQL) on XML data
Christophides Vassilis
Database
HTML HTML XML
Web 56
28 28
Spring 2000
Generating and Manipulating XML ●
●
Generate XML from the database server ◆
Map ODMG, SQL92, SQL3 and PL/SQL datatypes to XML
◆
Provide mappings between java, SQL and XML types
Script XML content from the database ◆ Allow SQL queries to return XML results ◆ Provide embedded XML in stored procedures ◆ Java scripting: support embedded XML in java ◆ Common APIs to access any XML content in databases
Web
HTML XML
Database
57
Christophides Vassilis
Spring 2000
Database X
Database Y XML
XML Total
Christophides Vassilis
XML
XML Sorted
58
29 29
Spring 2000
Epilogue
● ● ● ● ●
1960’s: Data Centric 1970’s: Process Centric 1980’s: Object Oriented 1990’s: Component Based 2000’s: XML?
59
Christophides Vassilis
Spring 2000
Data was our First Focus
5HFRUG/D\RXWV 3ULQWHU/D\RXWV 6\VWHP)ORZ&KDUWV 'HFLVLRQ7DEOHV ¶V 'DWD
%DWFK-REVZHUHD6HULHV RIVPDOO3URJUDPV Christophides Vassilis
60
30 30
Spring 2000
Then we Focused on Logic
*272/HVV3URJUDPPLQJ 6WUXFWXUHG3URJUDPPLQJ 7RS'RZQ'HVLJQ ¶V 'DWD
¶V /RJLF
3URJUDPV%HFDPH 9HU\/DUJH 61
Christophides Vassilis
Spring 2000
Object Oriented Programming Focused on Runtime Behavior &RPPRQ7HUPVIRU $QDO\VLVDQG'HVLJQ 7LJKWO\&RXSOHG&RGH ¶V 'DWD
¶V /RJLF
¶V
&RGH5HXVHZDVWKH+RO\*UDLO 5DUHO\$FKLHYHG Christophides Vassilis
22
62
31 31
Spring 2000
Component Programming Shifted the Focus to Interfaces
&RGH5HXVH ,'(%DVHG&RPSRVLWLRQ /LPLWHG$FFHSWDQFH ¶V 'DWD
¶V
¶V
/RJLF
&RPS
¶V 22
6HULDOL]DWLRQ7LHGWR&RGH
63
Christophides Vassilis
Spring 2000
XML Returns the Focus to Data
;0/:UDSSHUVIRU,QFRPSDWLEOH6\VWHPV ,QGXVWU\6SHFLILF0DUNXS/DQJXDJHV ;0/IRU3HUVLVWHQW'DWDDQG&RPSRVLWLRQ ¶V ;0/
¶V
¶V
/RJLF
&RPS
¶V
;0/(QDEOHV0LGGOHZDUH IRU$SSOLFDWLRQ6SHFLILF'DWD Christophides Vassilis
22
64
32 32
Spring 2000
BIBLIOGRAPHY ●
● ● ●
● ● ● ● ●
●
●
Charles F. Goldfarb, Paul Prescod, Paper Michael, Leventhal, et al. “The XML Handbook”. Printice Hall, 1998. David Megginson. “Structuring XML Documents”. Printice Hall, 1998. Simon St. Laurent. “XML : Extensible Markup Language”. IDG Books, 1998. Rick Jelliffe. “The XML and SGML Cookbook : Recipes for Structured Information”. Printice Hall, 1998. Simon St. Laurent. “Xml : A Primer”. IDG Books, 1998. Steven Holzner. “XML Complete”. McGraw-Hill, 1997. Richard Light, Tim Bray. “Presenting Xml”. Macmillan Publishing, 1997. Bryan Pfaffengerger. “Web Publishing With XML in Six Easy Steps”, 1997. Steven J. DeRose. “The SGML FAQ Book : Understanding the Foundation of HTML and XML”. Kluwer Academic Publishers, 1997. Sean McGrath. “XML by Example: Building E-Commerce Applications”. Printice Hall, 1998 Charles F. Goldfarb, Steve Pepper, Chet Ensign. “SGML Buyer’s Guide : A Unique Guide to Determining Your Requirements and Choosing the Right SGML and XML Products and Services”. Printice Hall, 1998. 65
Christophides Vassilis
33 33