Session 10: XML Information Retrieval - New York University

2 downloads 555 Views 397KB Size Report
Enables added functionality (e.g., object caching). Persistence Broker Design Pattern www.lap.ttu.ee/erki/failid/konspekt/bakalaureusetoo/defence.ppt ...
XML for Java Developers G22.3033-002 Session 10 - Main Theme XML Information Retrieval (Part II) And XML-Based Frameworks (Part I) Dr. Jean-Claude Franchitti

New York University Computer Science Department Courant Institute of Mathematical Sciences

1

Agenda „

Summary of Previous Session

„

XML Object Persistence

„

Advanced XQuery Concepts

„

Presentation Oriented Publishing (POP) Frameworks

„

Web Services

„

Assignment 5a-c (due next week)

2

1

Summary of Previous Session „

Summary of Previous Session

„

Applications of XML to Database Technology „

XML Query Languages

„

XPath

„

XML Queries

„

XQuery: A Query Language for XML

„

XML Query Engines

„

XML Registries API

„

Web Services Support in J2EE / .Net

„

Assignment 5a+5b

3

Part I XML Object Persistence Also See: http://www.rpbourret.com/xml/XMLAndDatabases.htm

4

2

Object Persistence • Need for objects to exist beyond an application’s lifetime Objects Objects Objects

Application

Data Store

• Limitations of the OOP / RDBMS direct coding approach – Difficult to handle data structure changes – Difficult to reuse persistence logic • Solution is to use a “Persistence Framework” 5

Direct JDBC • JDBC is NOT a Persistence Framework – JDBC is a database connection utility – Ok for “Window on data” style application – Ok when business logic is entrenched on database

Application

JDBC rows

SQL

6

3

XML Object Persistence (“SOAP Competitors" in http://www.cis.ohio-state.edu/rsrg/ase/colloquia/craig-swartout-xml.pdf) „

Started as SODL and XMOP „ „

„

„ „ „

Simple Object Definition Language XML Metadata Object Persistence

XML and JavaBeans interoperability (e.g., BML, Coins, etc.) XML and EJB integration XML serialization for Java (e.g., Koala, etc.) SOAP - XML-RPC protocol „

Ability to invoke (persistent) objects’ methods remotely 7

Transferring Data Between Databases and Java Objects via XML • Example Project: XSU (www.cis.ksu.edu/~htu3434/presentation.ppt) – Transform data retrieved from object-relational database tables into XML – Extract data from a XML document and insert the data into the appropriate columns/attributes of a table – Extract data from a XML document and apply this data to updating or deleting values of the appropriate columns/attributes

• Implementation – DB: Oracle 9i Release 1 Server – XSU Utility/Tool + Castor – Alternatives to Castor: XML DB (Oracle 9i Release 2) http://abra.sourceforge.net/doc/javadoc/org/ephman/abra/tools/Marshaller.html 8

4

XSU Architecture www.cis.ksu.edu/~htu3434/presentation.ppt

Castor XML

XSU

Oracle 9i Sample Data

XML document

Java classes

Objects 9

Persistence Broker Design Pattern www.lap.ttu.ee/erki/failid/konspekt/bakalaureusetoo/defence.ppt

• Enables full separation of business and persistence logic • Easily customizable for different applications • Enables added functionality (e.g., object caching) 10

5

Persistence Broker Architecture Simple architecture

Using a Persistence Broker User Interface Classes

User Interface Classes

Business classes

Persistence Broker

Business classes

Cache

Database Database

11

Persistent Persistent Queries Objects Objects

Application Application Applications

Persistent Persistent Identities Objects Objects Persistent Persistent Persistent Objects Objects Objects

Cache Persistent Persistent Persistent Objects Objects Objects

Persistence Broker

Persistent Persistent Persistent Persistent Persistent Persistent Objects Objects Objects Objects Objects Objects

Persistent Persistent Persistent Objects Objects Objects

Persistent Persistent Persistent Objects Objects Objects

Caches persistent objects

Looking up cached instances

Retrieving Storing and deleting

Persistent Persistent Queries Objects Objects

Deleting storage and deletion

Persistence Mapping

Persistent Persistent Field Objects Objects Values

Reflection

Specifying the field values of objects to store or delete

Persistent Persistent Field Objects Objects Values

Persistent Persistent Field Objects Objects Values

Specifying the criteria field values

Database-specific SQL queries

Persistent Persistent Persistent Objects Objects Objects

Persistent Persistent Field Objects Objects Values

Returning the field values for retrieved objects

Handles executing SQL statements

Accesses object fields dynamically

Persistent Persistent Persistent Objects Objects Objects

Delegating queries from the application to lower level Contains mappings between classes and tables

Persistent Persistent Persistent Objects Objects Objects

Persistent Persistent Field Objects Objects Values

Query Mechanism Persistent Persistent SELECT Objects Objects Queries

Persistent Persistent ACTION Objects Objects Queries

Conversion Persistent Persistent Column Objects Objects Values

Converts object field values to table column values and vice versa

Persistent Persistent SELECT Objects Objects Results

Database

12

6

Object Persistence Layer in J2EE http://www.jfs2003.de/folien/A6_Clarke_Oracle.PPT

• Abstracts persistence details from the application layer J2EE & Web Services

object-level querying and creation results are objects

Objects

results are returned as raw data

rows

Objects

Persistence Layer JDBC SQL

object creation and updates through object-level API API uses SQL or database specific calls

13

TopLink Runtime Architecture Application Server Application

Data Source TopLink Mappings

Cache TX

Entities Java Objects EJB Entity Beans

JDBC

Query

Data Access

App Logic

Session

JSP, Servlet, Struts, etc.

CMP/ BMP

JTA J2EE Container 14

7

TopLink & XML / J2EE • XML documents can be represented at different levels of abstractions in J2EE applications: – Parsed document (DOM, SAX …) - parser – Unmanaged Java objects from non-transactional data source – data converter – Managed Java objects from a transactional data source – persistence manager

• Developers do not need to work with low level XML documents and manually code persistence manager functionality 15

Three Levels of XML Representation DOM

Unmanaged Object

Managed Object/EJB

Persistence Manager O-X Data Converter XML Parser XML Document

File Web Service

BPM

JDBC

J2C

XDB

EIS

JMS 16

8

Sun JDO • Java Data Objects (JDO) • First standardized, completely object oriented approach to object persistence. • Developed as Java Specification Request 12 under the support of the Java Community Process. • Application programmers can use JDO to directly store their Java domain model instances into the persistent store. • JDO fills a large gap in the area of database programming. 17

Benefits of Using JDO • • • • • •

Transparent persistence Database independence Portability Ease of use High performance Integration with EJB

18

9

JDO Architecture

19

Software Stacks Supported MyApplication

Persistence Manager (XJDO Implementation)

Java-based Data Store Access (eXist XML DB Driver Class Impl)

Data Store (eXist Native XML DB)

MyApplication

Persistence Manager (LiDO JDO Implementation)

Java-based Data Store Access (mysql-connector-java2.0.14)

Data Store (My SQL)

MyApplication

Persistence Manager (LiDO JDO Implementation)

Java-based Data Store Access (LIBeLIS file db connector)

Data Store (LIBeLIS' FileDB)

20

10

eXist Native XML DB http://exist.sourceforge.net/

• eXist is an Open Source, native XML DB • eXist features – Completely written in Java – Lightweight – Efficient – Index-based XPath query processing – Extensions for keyword search – Tight integration with existing XML development tools – Can be deployed two ways: • A stand-alone server process, inside a servlet-engine • Directly embedded into an application.

• Other popular native XML DB: Apache Xindice – http://xml.apache.org/xindice/ 21

Conversion Process for Retrieved Objects http://students.depaul.edu/~lahrens/se690/deliver/XJDO-Final%20Pres.ppt

XML Document Stored in XML Database

eXist XML Database



Document object when retrieved from XML database as DOM

org.w3c.Document

Element Name=”xobj”

Attr Package=“xjdo.test.beans”

Object Instance Returned to Application Developer xjdo.test.beans.Person firstName=”Luann” lastName=”Ahrens” address=”555 NoWhere St.” 22

11

XJDO and XSLT Compatibility • Possible to use the data in XML data base for other means without using the XJDO application • Data can be accessed by other than via Java applications using XJDO • Using XSLT and XPath, the XJDO XML format can be transformed to a different XML format

Luann Ahrens 555 NoWhere St.

XSLT Transformation

23

Part II Advanced XQuery Concepts

24

12

XQuery Today • XQuery: The W3C XML Query Language • DOM+XPath+XSLT applications can now be implemented in just one language: XQuery • XQuery is expressive, concise, easy to learn • XQuery is implementable, and optimizable • XQuery supports integration of data from multiple sources • Several implementations of XQuery are currently available • XQuery provides preliminary support for update 25

XQuery Design Goals • Language Expressive power – Functionality derived from XML-QL, XQL, SQL, OQL – Applicable to the many different types of XML data – Implementation based on published use-cases

• XQuery Engine Implementations – Can be implemented on top of traditional databases, XML repositories, XML programming libraries, etc. – Queries may combine data from many sources

• Minimalist design – Small, easy to understand, clean semantics

26

13

Querying Heterogeneous Data DOM

DOM

SAX

SAX

DBMS XML

W3C XML Query Data Model

XQuery

W3C XML Query Data Model

DBMS XML

Java

Java

COBOL

COBOL 27

XQuery Expressions • XQuery is a functional language – Each query is an expression – Expressions can be easily combined

• Structure of a query – Namespace declarations (optional) – Function definitions (optional) – Query expression: may include many expressions 28

14

XQuery Expressions • • • • • • • • •

Path expressions: /a//b[c = 5] FLWR expressions: FOR ... LET ... WHERE ... RETURN Element constructors: ... Variables and constants: $x, 5 Operators and function calls: x + y, -z, foo(x, y) Conditional expressions: IF ... THEN ... ELSE Quantifiers: EVERY var IN expr SATISFIES expr Sorted expressions: expr SORTBY (expr ASCENDING , ... ) Updates expressions: INSERT, REPLACE, DELETE 29

Sample Document XML and Java Maruyama H. Addison - Wesley 34.99 31

Path Expressions html

33

FLWR Expressions • FOR - LET - WHERE - RETURN • Similar to SQL’s SELECT - FROM - WHERE FOR $book IN document("books.xml")//book WHERE $book/publisher = "Addison-Wesley" RETURN { $book/title, $book/author } 34

17

FOR vs. LET • FOR iterates on a sequence, binds a variable to each node • LET binds a variable to a sequence as a whole FOR $book IN document("books.xml")//book LET $a := $book/author WHERE contains($book/publisher, "Addison-Wesley”) RETURN { $book/title, Number of authors: { count($a) } } 35

Inner Joins FOR $book IN document("www.books.com/bib.xml")//book, $quote IN document("www.amazon.com/quotes.xml")//listing WHERE $book/isbn = $quote/isbn RETURN { $book/title } { $quote/price } SORTBY (title) 36

18

Outer Joins FOR $book IN document("books.xml")//book RETURN { $book/title } { FOR $review IN document("reviews.xml")//review WHERE $book/isbn = $review/isbn RETURN $review/rating } SORTBY (title) 37

Combining Expressions { FOR $book IN document("books.xml")//book RETURN { $book/author, $book/title } SORTBY (author, title) }

38

19

Combining Expressions Expression

39

Combining Expressions Expression Can be extended as: { FOR $book IN Expression RETURN Expression

} 40

20

Combining Expressions (continued)

{ FOR $book IN Expression RETURN { Expression, Expression } SORTBY (Expression, Expression) }

41

Combining Expressions (example)

{ FOR $book IN document("bib.xml")//book RETURN { $book/author, $book/title } SORTBY (author, title) }

42

21

Functions • Built-in functions – max(), min(), sum(), count(), avg() – distinct(), empty(), contains() – the normative set has not yet been fixed

• User-defined functions – Defined in XQuery syntax – May be recursive – May be typed

• Extensibility mechanisms will be added 43

Functions (continued) FUNCTION depth(ELEMENT $e) RETURNS integer { -- An empty element has depth 1 -- Otherwise, add 1 to max depth of children IF empty($e/*) THEN 1 ELSE max(depth($e/*)) + 1 } depth(document("partlist.xml"))

44

22

Data Types • W3C XML Schema simple types – – – –

string boolean integer float

"Hello" true, false 47, -369 -2.57, 3.805E-2

• Type constructor functions – date("2000-06-25")

• Operators and functions to be defined... 45

Bibliography Example XML and Java Maruyama Hiroshi Addison Wesley 34.99 2002 46

23

Books by Author XML and Java Maruyama Maruyama Hiroshi Hiroshi XML and Java (1st Edition) Addison Wesley XML and Java (2nd Edition) 34.99 2002 ... 47

Inverting the Hierarchy FOR $a IN distinct(document(“books/books.xml")//book/author) XML Java LET $b := and document(“books/books.xml")//book[author = $a] RETURN Maruyama Hiroshi { $a } { $b/title SORTBY (.) } Addison Wesley 34.99 SORTBY(author/last, author/first) 2002 48

24

INSERT, DELETE, REPLACE • INSERT FOR $e IN /emp INSERT count($e/skill) BEFORE $e/skill[1]

• REPLACE FOR $e IN /emp WHERE $e/empno = "1234” REPLACE $e/job WITH “Software Architect" 49

INSERT, DELETE, REPLACE (continued)

• DELETE FOR $e IN /emp/[job = "Programmer"], $s IN $e/skill WHERE $s/rating < 4 OR $s/cert_date < date(“2000-01-01") DELETE $s

50

25

Current Limitations on Update • No distributed update - single data source • No updates on views

51

Advanced XQuery Concepts „

Mainstream XQuery Engines „

Software AG’s QuiP „

„

H. Katz XQEngine „

„

http://www.fatdog.com

API for XML Databases „

http://www.xmldb.org/xapi/ „

„

http://developer.softwareag.com/tamino/quip/

Supported in eXist and ozone-db

Experiment with Complex Queries and QuiP

52

26

XQuery Additional Information • W3C XQuery http://www.w3.org/TR/xquery.html

• W3C XML Query Use Cases http://www.w3.org/TR/xmlquery-use-cases.html

• W3C XML Query Requirements http://www.w3.org/TR/xmlquery-req.html

• W3C XML Query Data Model http://www.w3.org/TR/query-datamodel.html

• W3C XML Query Algebra http://www.w3.org/TR/query-algebra.html

53

Part III Presentation-Oriented Publishing Frameworks

54

27

POP Frameworks „

Client-Side POP „

„

Server-Side POP „ „ „

„

IE5 Cocoon & XSP Rocket CPAN’s Perl Framework

References „ „

http://www.runtime-collective.com/JavaXML.html http://www.andrena.de/Objektforum/Archiv/Download/200210-ka-PortalsWithArachne3.pdf

55

Apache Cocoon http://xml.apache.org/cocoon

• • • •

XML based publishing Framework An Apache Software Foundation open source project Written in Java, runs mostly as a servlet Started as a simple servlet based XSL styling engine for http://java.apache.org site • Current version is in the second generation of evolution • Designed for scalability ( uses SAX processing ) -can process huge documents using small amount of memory 56

28

Apache Cocoon (continued)

• Cocoon promotes the separation of Content, Logic, Presentation and Management in website design.

57

Cocoon Servlet Dirs. & Files Directory for auto mounting sub-sitemaps Cocoon Configuration file

A sub-sitemap directory Main sitemap file Directory for log files 58

29

Cocoon Sitemap • Sitemap Goal – Used to de-couple the exposed URI space from the actual location of resources – Allows easily changeable specification of processing steps

• Sitemap Contents – Component declarations • generators, transformers, serializers, ...

– Resource declarations • named collection of pipeline components

– Pipeline declarations • sequential arrangement of components for processing

59

Cocoon Sitemap (continued)

• A sitemap is an XML file • Sitemaps are hierarchical -- A sitemap can point, explicitly or implicitly, to subsitemaps • A sitemap is translated into a java program and is compiled into bytecode • Changes to sitemaps can be loaded dynamically and asynchronously 60

30

Sample Cocoon Pipeline

61

Cocoon Request Processing • Request is dispatched to matching pipeline • Basic pipeline operation – The generator generates XML content – Zero or more transformers transform the content – The serializer writes it to the output stream

• Different Kinds of generators – File, Directory, XSP, JSP, Stream, …

• Different Kinds of transformers – XSLT, I18N, Log, …

• Different Kind of Serializers – HTML, XML, Text, PDF, SVG, ...

62

31

Dynamic Content Generation from XSP