Describing Structure and Semantics of Graphs Using ...

2 downloads 0 Views 123KB Size Report
Rensselaer Polytechnic Institute, Department of Computer Science [email protected]. .... The nodes property is an RDF Bag list of the node instances that belongs to the graph. This gives an .... People at Rensselaer Computer Science.
Extreme Markup Languages 2001

Montréal, Québec August 14-17, 2001

Describing Structure and Semantics of Graphs Using an RDF Vocabulary John Punin Rensselaer Polytechnic Institute, Department of Computer Science [email protected] Mukkai Krishnamoorthy Rensselaer Polytechnic Institute, Department of Computer Science [email protected]

Keywords: RDF; RGML; graph, node, and edge; hypergraphs; Exposition: RGML; Experiments and demos: RGML describing a web collection; NOT3

Abstract The RDF Graph Modeling Language (RGML) is a W3C RDF vocabulary to describe graph structures, including semantic information associated with a graph. Viewing general graphs as Web resources, RGML defines graph, node, and edge as RDF classes and attributes of graphs (such as label and weight) as RDF properties. Some of these RDF properties establish relationships between graph, node, and edge instances. RDF Statements about graph elements involve subjects, predicates and objects. Subjects and predicates are RDF Resources, while objects are either RDF Resources or RDF Literals. RGML uses the XML Schema datatypes for RDF Literals. RGML can be easily combined with other RDF vocabularies, for example, to add Dublin Core properties. RGML is very useful for describing webgraphs (the structure of a web site), web collections, and sitemaps.

Describing Structure and Semantics of Graphs Using an RDF Vocabulary § 1 Introduction We present the RDF Graph Modeling Language (RGML), a W3C RDF vocabulary [RDF] to describe graph structures. General graphs can be seen as Web resources. RGML defines graph, node, and edge as RDF classes and attributes of graphs, such as label and weight, as RDF properties. Some of these RDF properties establish relationships between graph, node, and edge instances. Hence, a graph instance "knows" which node and edge instances belong to this graph. There is no restriction on the number of graph, node, and edge instances that can be defined in a RGML file. Neither is there a restriction about nodes and edges shared between several graphs. Graphs can be included in other graphs; such graphs are called subgraphs. We use RDF Schema [RDFS] to provide with the definition of RGML classes and properties. RGML can be easily combined with other RDF vocabularies to add new properties, for example, you can add a Dublin Core [DC] dc:title property to a node instance to assign a string title to a node. Several examples in this paper show how easy it is to describe different kind of graphs using the RGML vocabulary. Our main interest is to describe webgraphs [WWWPAL]. A webgraph describes the structure of a web site where the nodes are web pages and the edges are hyperlinks. We use the Dublin Core [DC] vocabulary to add information to nodes (web pages) such as dc:title (title of the web page), dc:format (mime of the web page), and dc:date (date of creation of the web page). Using RGML we can also define web collections (a collection of web pages that represent one document) as subgraphs of a webgraph. Site-maps can also be seen as graph instances; hence RGML is a good candidate for describing site-maps.

§ 2 Motivation XML [XML] and XML Schema [XMLSCHEMA] provide a syntactic description of an underlying document. The existing XML vocabularies designed to describe graph structures [GRAPHXML] [GXL] [XGMML] provide a means to describe structural information about nodes, edges, subgraphs, etc. RGML, based on the W3C RDF/XML model, describes the semantics of graphs, as well as structural information. RGML files are written with the XML syntax and the semantics use the RDF model. By using RGML, one can describe the graph structures and add statements to describe the elements of the graph. Two graphs may have the same structure even though one may describe network topology while the other may describe a web site. The metadata related to these graphs are different. The RDF model permits the combination of different metadata using XML Namespaces [XMLNS], so it is possible to differentiate between several types of graphs. RGML describes the semantics of an abstract graph based on the appropriate vocabulary. So, for example, when we describe a webgraph, we use the vocabulary of dublin core. The most interesting aspect of RGML is that the semantics of an arbitrary graph can be specified using RDF.

§ 3 RGML Datatypes RDF Statements about graph elements involve subjects, predicates and objects. Subjects and predicates are RDF Resources, while objects are either RDF Resources or RDF Literals. It is not possible to distinguish the datatypes of RDF Literal values and hence

Extreme Markup Languages 2001

p. 1

Describing Structure and Semantics of Graphs Using an RDF Vocabulary

RGML adopts the XML schema datatypes. RGML follows the same syntax given by DAML+OIL ontology markup language [DAML]. The core RGML datatypes are integer, boolean and string datatypes. These datatypes are the range (properties map from a domain to range.) for some RGML properties.

§ 4 RGML Classes The RGML classes of a general graph are Graph, Node, and Edge. Properties of these classes allow to establish relationship of these three classes. This will enable a graph instance which relates with several node and edge instances. Two node instances relate with an edge instance.

4.1 Graph Class The Graph class is a RDF class that defines a general graph. This is the Graph class that is the domain of the RDF properties: label and directed. The nodes and edges RDF properties help to relate instances of the Graph class with instances of Node and Edge classes. Other kind of graphs such as webgraphs, RDF graphs, and network graphs can be subclasses of this Graph Class.

4.2 Node Class The Node class is an RDF class that defines a general node. The label and weight RDF properties have Node class as domain. All nodes of a graph are instances of these class. Webnodes (web pages) of a webgraph can be subclasses of the Node class.

4.3 Edge Class The Edge class is a RDF Class that defines a general edge. The label and weight RDF properties are related to the Edge class. The source and target RDF properties of the Edge class allow it to relate two node instances or, in case of a hypergraph, many node instances. (The instances of edges that have a list of nodes are called hyperedges. A hyperedge is defined as a subset of a set of nodes. Graphs that contain hyperedges are called hypergraphs. ) Webedges (Hyperlinks) of a webgraph can be subclasses of the Edge class.

Extreme Markup Languages 2001

p. 2

Describing Structure and Semantics of Graphs Using an RDF Vocabulary



§ 5 RGML Properties The domain of the RGML properties are the Graph, Node and Edge RDF classes. These properties allow to relate instances of the RGML classes to build the structure of a general graph.

5.1 label Property The label property is a global property for all instances of RGML classes. It gives a unique string identifier to the instances of the graphs, nodes and edges.

5.2 directed Property The directed property has a Boolean range and a Graph domain. The values that can be assigned to this property are either true or false. This indicates whether the graph is directed or not. Mixed graphs allow directed and undirected edges in the same graph. The directed property also belongs to a specific edge instance to indicate whether that edge is directed or not. This edge property will allow one to describe mixed graphs.

Extreme Markup Languages 2001

p. 3

Describing Structure and Semantics of Graphs Using an RDF Vocabulary

5.3 nodes Property The nodes property is an RDF Bag list of the node instances that belongs to the graph. This gives an idea of inclusion so we can determine what nodes are members of the graph. The nodes property can belong to the Edge class. The instances of edges that have a list of nodes are called hyperedges. A hyperedge is defined as a subset of a set of nodes. Graphs that contain hyperedges are called hypergraphs. If the hypergraph is directed, the nodes property of the hyperedges is an RDF Seq. The RDF Seq gives an order to the list of nodes in the hyperedge.

5.4 edges Property The edges property is an RDF Bag list of the edge instances that belongs to the graph. This gives an idea of inclusion so we can determine what edges are members of the graph.

5.5 graphs Property The graphs property is an RDF Bag list of the subgraph instances of the graph parent. By using this property, we can relate subgraphs with the graph that they originated from.

5.6 weight Property The weight property belongs to node and edge instances. The weight is usually a

Extreme Markup Languages 2001

p. 4

Describing Structure and Semantics of Graphs Using an RDF Vocabulary

numerical value assigned to a node or edge in a weighted graph.

5.7 source Property The source property is used to indicate the source node instance of a edge instance.

5.8 target Property The target property is used to indicate the target node instance of an edge instance.

§ 6 RGML Examples 6.1 Simple Graph Example

Extreme Markup Languages 2001

p. 5

Describing Structure and Semantics of Graphs Using an RDF Vocabulary

Figure 1: Example: simple graph

This is a simple graph with three nodes and two edges. See Figure 1 [above] for a graphical representation of this simple graph.

Extreme Markup Languages 2001

p. 6

Describing Structure and Semantics of Graphs Using an RDF Vocabulary



6.2 Webgraph Example

Figure 2: Example: webgraph

This graph is a webgraph where the two nodes represent two web pages and the edges represent the hyperlinks between these web pages. We used the Dublin Core vocabulary to add properties to the nodes, such as dc:title, dc:format, and dc:date. The dc:title indicates the title of the web page, the dc:format indicates the mime type of the web page, and the dc:date indicates the date of creation of the web page. The rgml:weight property indicates the size in bytes of the web page. The rgml:label of the node is used to hold the URL of the web page and the rgml:label of the edge is used to hold the anchor text of the hyperlink. See Figure 2 [above] for a graphical representation of this webgraph.

Extreme Markup Languages 2001

p. 7

Describing Structure and Semantics of Graphs Using an RDF Vocabulary

Rensselaer Computer Science Department 2001-03-01 text/html People at Rensselaer Computer Science Department 2001-02-26 text/html Courses at Rensselaer Computer Science Department 2001-01-25 text/html

6.3 RDF Graph Example

Figure 3: Example: RDF graph

This is the RGML description of the famous RDF Graph [RDF] that represents the statement: "Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila ". See Figure 3 [above] for a graphical representation of this simple graph.

Extreme Markup Languages 2001

p. 8

Describing Structure and Semantics of Graphs Using an RDF Vocabulary



6.4 Hypergraph Example A hypergraph is defined as a set of vertices and a set of edges called hyperedges. A hyperedge is a subset of the set of vertices. If the hypergraph is directed the hyperedge is an ordered set of vertices and, if the hypergraph is undirected, the hyperedge is an unordered set of vertices. The following RGML example describes a directed hypergraph.

Extreme Markup Languages 2001

p. 9

Describing Structure and Semantics of Graphs Using an RDF Vocabulary



§ 7 Adding graph rules using RGML and Logic primitives Tim Berners-Lee has designed a new language, Notation 3 [NOT3], to describe the RDF data model. Using Notation 3, we can add logic primitives to RGML to generate RDF properties that are commonly used to describe graph information such as, cycle, adjacent, path, etc. We will add simple rules to define two graph concepts: adjacency and paths between nodes. The latter is a modification of the example given by the Euler proof engine [EULER]. Two nodes are adjacent if they are connected by an edge. This can be expressed using two simple rules, where e is an edge, and u and v are nodes: forall (e, u, v) : source(e,u) and target(e,v) -> adjacent(u,v) forall (u,v) : adjacent(u,v) -> adjacent(v,u)

Using Notation 3 we can express these rules as: @prefix rdf: . @prefix rdfs: . @prefix rgml: . @prefix log: . @prefix : . :adjacent rdfs:domain rgml:Node; rdfs:range rgml:Node. {{:e rgml:source :u. :e rgml:target :v.} log:implies { :u :adjacent :v } } a log:Truth; log:forAll :e, :u, :v. {{ :u :adjacent :v. } log:implies { :v :adjacent :u. }; } a log:Truth; log:forAll :u, :v.

A path is a sequence of consecutive nodes in a graph. Two nodes, u and v, are said to

Extreme Markup Languages 2001

p. 10

Describing Structure and Semantics of Graphs Using an RDF Vocabulary

be consecutive if there is a directed edge from u to v, where u is the parent of v (rule 1). The second rule states that if the node u is the parent of the node v, there is a directed path from u to v. The third, a recursive rule, states the condition for the existence of a path from nodes u to w. These three rules are the following (where u, v, w are nodes; e is an edge and g is a graph): forall (g,e,u,v) : g is directed and source(e,u) and target(e,v) -> parent(u,v) forall (g,u,v) : g is directed and parent(u,v) -> path(u,v) forall (g,u,v,w) : g is directed and parent(u,v) and path(v,w) -> path(u,w)

Using Notation 3 we can express these rules as: @prefix rdf: . @prefix rdfs: . @prefix rgml: . @prefix log: . @prefix : . :parent rdfs:domain rgml:Node; rdfs:range rgml:Node. :path rdfs:domain rgml:Node; rdfs:range rgml:Node. {{ :g rgml:directed "true". :e rgml:source :u. :e rgml:target :v.} log:implies { :u :parent :v. }; } a log:Truth; log:forAll :e, :u, :v , :g. {{ :g rgml:directed "true". :u :parent :v} log:implies {:u :path :v}} a log:Truth; log:forAll :g, :u, :v. {{ :g rgml:directed "true". :u :parent :v. :v :path :w} log:implies {:u :path :w}} a log:Truth; log:forAll :g, :u, :v, :w.

To express the rules we use the logic properties log:implies and log:forAll that are part of the logic schema [NOT3]. We use the cwm engine [CWM] to process these rules on a simple graph expressed in RGML.

§ 8 Conclusion In this paper, we defined a new RDF vocabulary (RGML) to describe the structure and semantics of graphs. We showed how to combine different vocabularies in order to add rich metadata to the graph components, and we defined logic rules to express graph properties. We are currently expanding our logic rules to express many other graph properties. Future RGML modules will add new vocabularies to describe different types of graphs, such as webgraphs, network graphs, organization charts, UML graphs, etc.

§ 9 RDF/XML Serialization of the RGML RDF Schema

Extreme Markup Languages 2001

p. 11

Describing Structure and Semantics of Graphs Using an RDF Vocabulary

-->

Extreme Markup Languages 2001

p. 12

Describing Structure and Semantics of Graphs Using an RDF Vocabulary



Extreme Markup Languages 2001

p. 13

Describing Structure and Semantics of Graphs Using an RDF Vocabulary



Bibliography [CWM] Tim Berners-Lee. Semantic Web Area for Play: Closed World Machine. http://www.w3.org/2000/10/swap/Overview.html , February, 2001. [DAML] DARPA Agent Markup Language (DAML). http://www.daml.org/ [DC] S. Weibel, J.Kunze, C. Lagoze, and M. Wolf. Dublin Core Metadata for Resource Discovery, Internet RFC 2413. http://purl.oclc.org/dc/ , 1998. [EULER] Jos De Roo. Euler proof mechanism. http://www.agfa.com/w3c/euler/ 2001.

, June,

[GRAPHXML] Herman I, Marshall MS. GraphXML - An XML-based graph description format. Proceedings of the Symposium on Graph Drawing, 2000; 52-62 [GXL] R. C. Holt, A. Winter, A. Schurr. GXL: Towards a Standard Exchange Format Proceedings 7th Working Conference on Reverse Engineering (WCRE 2000) [NOT3] Tim Berners-Lee. Notation 3. http://www.w3.org/DesignIssues/Notation3.html April, 2001.

,

[RDF] O. Lassila and R. Swick. W3C, Resource Description Framework (RDF) Model and Syntax Specification. http://www.w3.org/TR/REC-rdf-syntax , 1999. [RDFS] D. Brickley and R.V. Guha. W3C, Resource Description Framework (RDF) Schema Specification 1.0. http://www.w3.org/TR/rdf-schema/ , 2000. [WWWPAL] J. Punin, M. Krishnamoorthy. WWWPal System - A System for Analysis and Synthesis of Web Pages. In Proceedings of the WebNet 98 Conference, Orlando, November, 1998. [XGMML] J. Punin, M. Krishnamoorthy. XGMML, Extensible Graph Markup and Modeling Language Specification, 1999. http://www.cs.rpi.edu/~puninj/XGMML/draft-xgmml.html

[XML] T. Bray, J. Paoli and C. M. Sperberg-McQueen, Extensible Markup Language (XML 1.0) - http://www.w3.org/TR/REC-xml , 2000. [XMLNS] T. Bray, D. Hollander and A. Layman. Namespaces in XML, http://www.w3.org/TR/REC-xml-names , 1999. [XMLSCHEMA] H. Thompson et al. XML Schema Part 1: Structures. http://www.w3.org/TR/xmlschema-1 , 2001.

Extreme Markup Languages 2001

Extreme Markup Languages 2001

p. 14

Describing Structure and Semantics of Graphs Using an RDF Vocabulary

Montréal, Québec, August 14-17, 2001 This paper produced from XML source via XSL, Saxon and Apache FOP Mulberry Technologies, Inc., August 2001

Extreme Markup Languages 2001

p. 15