The eXtensible Markup Language: An Introduction to XML Documents

0 downloads 0 Views 492KB Size Report
... or more times by reference to their names ... element content can consist of a free ... reference (or list of) to entity names ..... q Steven Holzner. “XML Complete”.
Missing:
Spring 2000

The eXtensible Markup Language: An Introduction to XML Documents & Databases

1

Christophides Vassilis

Spring 2000

Preliminary Issues

Christophides Vassilis

2

1 1

Spring 2000

What is a document?







Content: the components (words, images etc). which make up a document Structure: the organization and inter-relationship of the components Presentation: how a document looks and what processes are applied to it

3

Christophides Vassilis

Spring 2000

Separating these things means... ●







The content can be re-used ◆ for printing ◆ for querying ◆ for exchanging The structure can be formally validated The presentation can be customized for ◆ different media ◆ different audiences … in short, the information can be uncoupled from its processing

Christophides Vassilis

4

2 2

Spring 2000

Documents vs Databases Document world ●

plenty of small documents ◆ usually



implicit structure





◆ machine ●

◆ data, ●

methods

paradigms

as”, WYSIWYG name, date, subject

friendly

content

◆ Data

Independence, Transaction Management, Query Languages

metadata ◆ author

types

records

annotation

paradigms

dynamic

explicit structure ◆



content

◆ “Save

◆ usually

friendly

◆ form/layout, ●

a few large databases

paragraph, toc,

tagging ◆ human





static

◆ section, ●

Database world



metadata ◆ schema

description

5

Christophides Vassilis

Spring 2000

DBMS ANSI/SPARC Architecture (;7(51$/ /(9(/ VIEW 1

VIEW 2

VIEW 3

INTEGRATION &21&(378$/ /(9(/

LOGICAL SCHEMA PHYSICAL SCHEMA Christophides Vassilis

,17(51$/ /(9(/ 6

3 3

Spring 2000

What to do with them Database

Documents ●

editing



updating



spell-checking



cleaning



counting words ●

querying



composing/transforming



retrieving (IR)



printing

7

Christophides Vassilis

Spring 2000

Query Languages Document Retrieval Claude Monet DQG San Diego Museum of Art

Christophides Vassilis

Database Querying VHOHFW p IURP Artists a, a.artwork p ZKHUH a.first = “Claude” DQG a.last = “Monet” DQG p.located = “San Diego Museum of Art”

8

4 4

Spring 2000

The Long Road of Document Standards

Christophides Vassilis

 Rick Jelliffe 1999

9

Spring 2000

What’s Wrong with HTML ●

If written properly, normal HTML may reflect document presentation, but it cannot adequately represent the semantics & structure of data

$UWLVW1DPH

$UWLIDFW7LWOH

%!021(7&ODXGH%!%5! +D\VWDFNVDW&KDLOO\DW6XQULVH%5! 'DWH %5! 'LPHQVLRQV 2LORQFDQYDV%5! 0DWHULDO [FP [LQ %5! ,PDJH 6DQ'LHJR0XVHXPRI$UW%5! 5HIHUHQFH 0XVHXP 3! ,0*65& ³KWWSDUWFKLYH PPRQHWKD\ULFNVMSJ´!

Christophides Vassilis

10

5 5

Spring 2000

HTML Document Presentation vs. …

11

Christophides Vassilis

Spring 2000

… XML Data Representation ●

A possible XML markup of the same information will retain the structure (and the semantics) of the various data objects

ClaudeMonet Haystacks at Chailly at Sunrise 1865 Oil on canvas 3060 11 7/823 3/4 San Diego Museum of Art Christophides Vassilis

12

6 6

Spring 2000

XML can be Published as normal Web Data

13

Christophides Vassilis

Spring 2000

What is XML? ●

Markup Meta-Language for domain or application specific structured documentation ◆ Mathematical,



chemical, musical, publishing, etc.

Developed by the SGML Editorial Board formed under the auspices of the World Wide Web Consortium (W3C) ◆ Founded

in 1996 by Jon Bosac (Sun) and various Web/SGML vendors: Textuality, Netscape, Microsoft, INSO, HP, Highland, NCSA, ArbortText, GRIF, SoftQuand



Subset of SGML optimized for use in the Inter/Intranet ◆ SGML

is proving difficult to implement for Web/Intranet applications ◆ SGML has been hard to cost-justify to management ●

Opens the way for a new generation of Web applications ◆ Improve precision during searching and retrieval ◆ Enable multiple usage of the same data ◆ Facilitate distributed processing with more versatile

Christophides Vassilis

ways to manipulate data 14

7 7

Spring 2000

Why XML? ●



XML provides key features for a new generation of Web applications: ◆ Structuring: unlike HTML it preserves the structure of the data ◆ Extensibility: not a fixed format like HTML but user-oriented tagging ◆ Validation: provides the means to consuming applications to check data for structural validity on importation ◆ Presentation Late Binding: describes data, not visual presentation ◆ Human Readable: similar to HTML ◆ Interchange: good for transmission of data from server to browser, and from application to application, or machine to machine ◆ Open standard: non proprietary format XML becomes an integral part of the Web infrastructure ◆ Microsoft Explorer (V5.0) already offers XML browsing ◆ Ongoing XML implementation by Netscape ◆ Various XML middleware and manipulation tools 15

Christophides Vassilis

Spring 2000

The XML Language Family ●

XML (Extensible Markup Language) ◆A

subset of SGML (ISO 8879) designed for easy implementation



XLink (Extensible Linking Language) ◆A

set of standard hypertext mechanisms based on HyTime (ISO/IEC 10744) and the Text Encoding Initiative (TEI)



XSL (Extensible Stylesheet Language) ◆A

standard stylesheet language for structured information derived from DSSSL (ISO/IEC 10179) and key CSS concepts

Christophides Vassilis

∆τ N $ 35 0 30 0 25 0 20 0 15 0

Ta gs A t tribu te

10 0

50

0

HTML 0. 0

TEI L ite 1. 6

IB M ID Doc 4. 0

16

8 8

Spring 2000

Interrelationships Among the Various W3C Efforts

17

Christophides Vassilis

Spring 2000

XML Syntax and Semantics

Christophides Vassilis

18

9 9

Spring 2000

An Example of XML Markup (OHPHQW1DPH (OHPHQW&RQWHQW Claude Monet Haystacks at Chailly at Sunrise $WWULEXWH1865 Oil on canvas $WWULEXWH9DOXH 1DPH 3060 11 7/823 3/4 San Diego Museum of Art (PSW\(OHPHQW Christophides Vassilis

19

Spring 2000

The Logical Tree Structure of XML ARTIST NAME

ARTWORK

FIRST

LAST

ARTIFACT

&ODXGH

021(7

TITLE

DATE

DIM

DIM

H W

H W

+D\VWDFNV 

IMAGE KD\ULFNVMSJ

      2LORQFDQYDV MATERIAL Christophides Vassilis

6DQ 'LHJR LOCATION 0XV

20

10 10

Spring 2000

XML Document Type Definitions '2&7 (/(0(17DUWLVW QDPHERUQGHDWKDUWZRUNQDWLRQDOLW\"LQIOXHQFHV ! $77/,67DUWLVWRLG,'5(48,5('[POODQJ1072.(1,03/,('! (/(0(17QDPH ILUVWODVW ! (/(0(17ILUVW 3&'$7$ ! (/(0(17ODVW 3&'$7$ ! (/(0(17DUWZRUN DUWLIDFW ! (/(0(17DUWIDFW WLWOHGDWHPDWHULDOGLP ORFDWLRQLPDJH ! (/(0(17WLWOH 3&'$7$ ! (/(0(17GLP KHLJKWZLGWK ! $77/,67GLPPHWULF FP_LQ µFP¶! (/(0(17ORFDWLRQ 3&'$7$ ! (/(0(17LPDJH(037


Mixed models must be optional repeatable OR-groups, with #PCDATA first 23

Christophides Vassilis

Spring 2000

What XML can express? ●

Sequence « , » (/(0(17QDPH ILUVWODVW !



Choice « | » (/(0(17PHGLD LPDJH_YLGHR !



Option ( 1 or 0 ) « ? » (/(0(17DUWLVW «QDWLRQDOLW\"«



Repetition (1 or more ) « + » (/(0(17DUWZRUN DUWLIDFW !



Option and Repetition ( 0, 1 or more ) « * » (/(0(17DUWIDFW GLP  !

Christophides Vassilis

24

12 12

Spring 2000

XML Content Models and Regular Expressions ●

Each element content model is defined by a regular expression ◆ Example: QDPHDGGU HPDLO



Each regular expression determines a corresponding finite state

automaton ●

This suggests a simple parsing program

DGGU QDPH ●

HPDLO

Content Models should be defined by unambiguous regular expressions 25

Christophides Vassilis

Spring 2000

XML Regular Expressions: Another Example ●

Adding in the optional greet further complicates things ◆ Example: name,address*,(tel | fax)*,email*

DGGUHVV QDPH

HPDLO

WHO WHO

HPDLO

ID[ ID[

Christophides Vassilis

HPDLO

26

13 13

Spring 2000

Definition of Attribut’s Content (value1 | value2 | .. |valuen) value enumeration CDATA string of valid XML character data

$77/,67GLPPHWULF FP_LQ ! GLPPHWULF µLQ¶! $77/,67UHIKUHI&'$7$! UHIKUHI µKWWS DUWFKLYHPPRQHW[PO¶! $77/,67DUWLVWRLG,'! DUWLVWRLG 0RQHW! $77/,67DUWLIDFWFUHDWRU,'5()! DUWLIDFWFUHDWRU 0RQHW!

ID symbolic identifier of an element IDREF(S) reference (or list of) to element’s identifiers ENTITY(IES) $77/,67LPDJHILOH(17,7
More types (e.g., DATE) may soon be part of the standard 27

Christophides Vassilis

Spring 2000

Attribute Default Values ●







Value µYL¶ ◆ a given value from an enumeration of values ),;(' value ◆ the value is the only possible instance for the attribute 5(48,5(' ◆ the value must be supplied ,03/,(' ◆ the value can be optionally supplied

Christophides Vassilis

28

14 14

Spring 2000

XML Entities ●

Entities allow the definition of short strings to stand for more complex information, which can reside inside or outside the document or its DTD



Used for substitutions of data or markup: ◆ DTD

level e.g., markup declaration (Parameter entity)

◆ Document ●

level e.g., data and markup instances (General entity)

Used for references to external data or markup sources: ◆ the

content of the entity can be found using an XML system-specific storage location (Specific entity)

◆ the

content of the entity can be found by mapping a public identifier to a system-specific storage location (Public entity)

29

Christophides Vassilis

Spring 2000

XML Parameter Entities ●

Parameter entities are used for extensible declarations (e.g., macros) of complex content models or attributes in a DTD (17,7 42

21 21

Spring 2000

Mapping Between XML and Objects

43

Christophides Vassilis

Spring 2000

XML vs Relational DBMS SURMHFWV WLWOHEXGJHWPDQDJHG%\

HPSOR\HHV QDPHVVQDJH

'2&7 (/(0(17GE  SURMHFWVHPSOR\HHV ! (/(0(17SURMHFWV SURMHFW ! (/(0(17HPSOR\HHV HPSOR\HH ! (/(0(17SURMHFW WLWOHEXGJHWPDQDJHG%\ ! (/(0(17HPSOR\HH QDPHVVQDJH !  @!

Christophides Vassilis

'2&7 (/(0(17GE SURMHFW_HPSOR\HH ! (/(0(17SURMHFW WLWOHEXGJHWPDQDJHG%\ ! (/(0(17HPSOR\HH QDPHVVQDJH !  @!

44

22 22

Spring 2000

Recursive DTDs '2&7 (/(0(17JHQHDORJ\ SHUVRQ ! (/(0(17SHUVRQ QDPH GDWH2I%LUWK SHUVRQPRWKHU SHUVRQIDWKHU !  @! ●

What is the problem with this? 45

Christophides Vassilis

Spring 2000

Recursive DTDs cont’d. '2&7 (/(0(17JHQHDORJ\ SHUVRQ ! (/(0(17SHUVRQ QDPH GDWH2I%LUWK SHUVRQ"PRWKHU SHUVRQ" !IDWKHU  @! ●

What is now the problem with this?

Christophides Vassilis

46

23 23

Spring 2000

Some Things are Hard to Specify ● Each employee element is to contain name, age and ssn elements in some order

(/(0(17HPSOR\HH   QDPHDJHVVQ _ DJHVVQQDPH _ VVQQDPHDJH _« ! ●

Suppose there were many more fields !

47

Christophides Vassilis

Spring 2000

Specifying ,' and ,'5()Attributes '2&7 (/(0(17IDPLO\ SHUVRQ ! (/(0(17SHUVRQ QDPH ! (/(0(17QDPH 3&'$7$ ! $77/,67SHUVRQLG,'5(48,5(' PRWKHU,'5(),03/,(' IDWKHU,'5(),03/,(' FKLOGUHQ,'5()6,03/,('! @!

Christophides Vassilis

48

24 24

Spring 2000

Some Conforming XML data IDPLO\!  SHUVRQLG MDQHPRWKHU PDU\IDWKHU MRKQ!  QDPH!-DQH'RHQDPH!  SHUVRQ!  SHUVRQLG MRKQFKLOGUHQ MDQHMDFN!  QDPH!-RKQ'RHQDPH!  SHUVRQ!  SHUVRQLG PDU\FKLOGUHQ MDQHMDFN!  QDPH!0DU\'RHQDPH!  SHUVRQ!  SHUVRQLG MDFNPRWKHU ´PDU\IDWKHU MRKQ!  QDPH!-DFN'RHQDPH!  SHUVRQ! IDPLO\! Christophides Vassilis

49

Spring 2000

An Alternative XML DTD Specification '2&7 (/(0(17IDPLO\ SHUVRQ ! (/(0(17SHUVRQ PRWKHU"IDWKHU"FKLOGUHQQDPH ! $77/,67SHUVRQLG,'5(48,5('! (/(0(17QDPH 3&'$7$ ! (/(0(17PRWKHU(037
50

25 25

Spring 2000

The Revised XML Data IDPLO\!  SHUVRQLG MDQH´! QDPH!-DQH'RHQDPH!  PRWKHULGUHI PDU\´!PRWKHU!  IDWKHULGUHI MRKQ!IDWKHU! SHUVRQ!  SHUVRQLG MRKQ´! QDPH!-RKQ'RHQDPH! FKLOGUHQLGUHIV MDQHMDFN!FKLOGUHQ!  SHUVRQ!  IDPLO\! 51

Christophides Vassilis

Spring 2000

Mapping between XML and Tables

Christophides Vassilis

52

26 26

Spring 2000

Bluring the Frontiers between Data & Documents

53

Christophides Vassilis

Spring 2000

Towards XML-enabled DBMS ●



Xml-enabled database system ◆ Store XML data/documents into the database server ◆ Query and search valid and wellformed XML ◆ Generate XML data from the database server ◆ Add XML capabilities in supporting database facilities XML has the potential to impact four important markets ◆ Web integration ◆ Web publishing ◆ Application integration ◆ Electronic commerce

Christophides Vassilis

DBMS Integrate with other facilities Generate XML Store XML

54

27 27

Spring 2000

Storing XML Data ●

Enhance XML storage facilities in the database: ◆ Utilities to load XML data into the database ◆ Provide more efficient database storage (componentized storage, compression, indexing,…) ◆ XML export tools from the server ◆ Allow server-to-server replication of XML data

Database HTML HTML XML

Database 55

Christophides Vassilis

Spring 2000

Querying and Searching XML Data





Fine-grained access to XML documents Search XML data efficiently ◆ Special SQL queries over valid + well-formed XML ◆ Content-based indexing (e.g. Text indexes) for searching XML data efficiently ◆ Support for XML query languages (e.g. XQL) on XML data

Christophides Vassilis

Database

HTML HTML XML

Web 56

28 28

Spring 2000

Generating and Manipulating XML ●



Generate XML from the database server ◆

Map ODMG, SQL92, SQL3 and PL/SQL datatypes to XML



Provide mappings between java, SQL and XML types

Script XML content from the database ◆ Allow SQL queries to return XML results ◆ Provide embedded XML in stored procedures ◆ Java scripting: support embedded XML in java ◆ Common APIs to access any XML content in databases

Web

HTML XML

Database

57

Christophides Vassilis

Spring 2000

Database X

Database Y XML

XML Total

Christophides Vassilis

XML

XML Sorted

58

29 29

Spring 2000

Epilogue

● ● ● ● ●

1960’s: Data Centric 1970’s: Process Centric 1980’s: Object Oriented 1990’s: Component Based 2000’s: XML?

59

Christophides Vassilis

Spring 2000

Data was our First Focus

‡5HFRUG/D\RXWV ‡3ULQWHU/D\RXWV ‡6\VWHP)ORZ&KDUWV ‡'HFLVLRQ7DEOHV ¶V 'DWD

%DWFK-REVZHUHD6HULHV RIVPDOO3URJUDPV Christophides Vassilis

60

30 30

Spring 2000

Then we Focused on Logic

‡*272/HVV3URJUDPPLQJ ‡6WUXFWXUHG3URJUDPPLQJ ‡7RS'RZQ'HVLJQ ¶V 'DWD

¶V /RJLF

3URJUDPV%HFDPH 9HU\/DUJH 61

Christophides Vassilis

Spring 2000

Object Oriented Programming Focused on Runtime Behavior ‡&RPPRQ7HUPVIRU $QDO\VLVDQG'HVLJQ ‡7LJKWO\&RXSOHG&RGH ¶V 'DWD

¶V /RJLF

¶V

&RGH5HXVHZDVWKH+RO\*UDLO 5DUHO\$FKLHYHG Christophides Vassilis

22

62

31 31

Spring 2000

Component Programming Shifted the Focus to Interfaces

‡&RGH5HXVH ‡,'(%DVHG&RPSRVLWLRQ ‡/LPLWHG$FFHSWDQFH ¶V 'DWD

¶V

¶V

/RJLF

&RPS

¶V 22

6HULDOL]DWLRQ7LHGWR&RGH

63

Christophides Vassilis

Spring 2000

XML Returns the Focus to Data

‡;0/:UDSSHUVIRU,QFRPSDWLEOH6\VWHPV ‡,QGXVWU\6SHFLILF0DUNXS/DQJXDJHV ‡;0/IRU3HUVLVWHQW'DWDDQG&RPSRVLWLRQ ¶V ;0/

¶V

¶V

/RJLF

&RPS

¶V

;0/(QDEOHV0LGGOHZDUH IRU$SSOLFDWLRQ6SHFLILF'DWD Christophides Vassilis

22

64

32 32

Spring 2000

BIBLIOGRAPHY ●

● ● ●

● ● ● ● ●





Charles F. Goldfarb, Paul Prescod, Paper Michael, Leventhal, et al. “The XML Handbook”. Printice Hall, 1998. David Megginson. “Structuring XML Documents”. Printice Hall, 1998. Simon St. Laurent. “XML : Extensible Markup Language”. IDG Books, 1998. Rick Jelliffe. “The XML and SGML Cookbook : Recipes for Structured Information”. Printice Hall, 1998. Simon St. Laurent. “Xml : A Primer”. IDG Books, 1998. Steven Holzner. “XML Complete”. McGraw-Hill, 1997. Richard Light, Tim Bray. “Presenting Xml”. Macmillan Publishing, 1997. Bryan Pfaffengerger. “Web Publishing With XML in Six Easy Steps”, 1997. Steven J. DeRose. “The SGML FAQ Book : Understanding the Foundation of HTML and XML”. Kluwer Academic Publishers, 1997. Sean McGrath. “XML by Example: Building E-Commerce Applications”. Printice Hall, 1998 Charles F. Goldfarb, Steve Pepper, Chet Ensign. “SGML Buyer’s Guide : A Unique Guide to Determining Your Requirements and Choosing the Right SGML and XML Products and Services”. Printice Hall, 1998. 65

Christophides Vassilis

33 33

Suggest Documents