Extending RDF in distributed knowledge-intensive ... - Semantic Scholar

2 downloads 30987 Views 225KB Size Report
E-mail addresses: [email protected].au (J. Shen), [email protected].au (Y. .... of network management information, Ci, and is identified by a domain name.
Future Generation Computer Systems 20 (2004) 27–46

Extending RDF in distributed knowledge-intensive applications Jun Shen∗ , Yun Yang CICEC—Center for Internet Computing and E-Commerce, School of Information Technology, Swinburne University of Technology, PO Box 218, Hawthorn, Melbourne 3122, Australia

Abstract The resource description framework (RDF) has become a formal language tool to specify the semantics of distributed systems, such as web services nowadays. In fact, it can also be extended to describe entities and relationships within specific application environments to support knowledge sharing and ontology construction. This paper presents two case studies on a network management knowledge model and a distributed workflow system process ontology. With practical experiences, the authors suggest how RDF can be applied innovatively and effectively to reengineer data integration solutions in different novel knowledge-intensive areas, which, in the past, were built upon traditional modelling languages, such as XML. © 2003 Elsevier B.V. All rights reserved. Keywords: Resource description framework; Network management; Knowledge sharing; Distributed workflow systems; Ontology construction

1. Introduction In the semantic web area, resource description framework (RDF) plays a promising role [8]. The RDF and RDF schema (RDFS) have absorbed theories of object-oriented programming, relational database and knowledge representations. Based on the first-order logic, 3-tuples (triples) of RDF statements, which describe relations between resources and properties, are concise, natural, and most of all, flexible. With our experiences, we have extended RDF-based languages to diverse applications which are complementary to the deployment of web services, such as knowledge sharing between network management agents and process ontology construction among workflow engines. The basic capabilities exploited by us were RDF’s rich semantics power for knowledge representation, which allows traditional network information models and data integration solutions to be upgraded to express more complicated computing entities and relations. The practical case studies presented in this paper will demonstrate such extra advantages of RDF in these diverse areas beyond the semantic web. The most important enrichment of RDF that we have utilised in our research is OIL (ontology inference layer) [12] and its extension and integration with agent language DAML [7], as well as RDF Context [15] and FIPA-RDF [11]. New concepts and primitives have been introduced in OIL and the like, for example, expression (oil:ClassExpression), rule (fipa:Rule), axiom (daml:TransitiveProperty), context (rdfc:asserts) and activity (fipa:Action). Based on the XML syntax, a suite of supporting parsers and analysis tools for RDF descriptions have been developed and integrated in many prototypes or systems.1 ∗ Corresponding author. Tel.: +61-3-9214-8752; fax: +61-3-9819-0823. E-mail addresses: [email protected] (J. Shen), [email protected] (Y. Yang). 1 http://www.semanticweb.org.

0167-739X/$ – see front matter © 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0167-739X(03)00163-8

28

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

In this paper, after introducing our basic objectives and related work, we describe our methods in the following sections. Section 3 discusses an agent-based network management knowledge base (MKB), and Section 4 investigates a data integration solution to workflow deployment through constructing process ontology. By comparison, our discussions and conclusion will be addressed in Sections 5 and 6. 1.1. Agent-based network management: from one standard information model to knowledge model Mobile and intelligent agents (IAs) play active roles in network management platforms and products nowadays [5]. Meanwhile, new information models and protocol interfaces are emerging within the Internet communities, for example, Script MIB (Management Information Base) [19]. The next generation structure of management information has therefore become a critical issue, which leads to the work in progress towards SMIng (Next Generation Structure of Management Information) [20]. However, agent communications should be concerned when deploying a multi-agent system to network management platforms. In our previous prototype [22], KQML (Knowledge Query and Manipulation Language) [10] was used in dialogs between managing agents by taking advantage of the JatLite toolkit.2 The basic contents of agents’ dialogs include script codes of operating actions and attribute-value pairs of managed objects, but they lack sufficient ontology support to describe relations and semantics of objects, either managing agents or managed agents. In this paper, we discuss how to construct a self-contained knowledge model based on the RDF and its extensions. Our underlying information model is SMIng, which is independent of ASN.1 but explicitly defines terms that had been derived from former versions of the structure of management information (SMI) [6]. We avoid making network management agents acting as inference machines but require them offer capabilities to handle messages and store triples, which are bound to a lightweight knowledge base. Moreover, inline coding of URI [3] helps us to locate related management data and involved agents conveniently. In this paper, we will mainly show how SMIng modules can be described in RDFS by integrating knowledge elements. 1.2. Process ontology for workflow system: from different languages to one integrated ontological language As advanced e-services boom, distributed workflow systems are still one of the most promising solutions for supporting processes, such as e-business processes. As an integral language tool, XML plays important roles in data and application integration. Many kinds of standards and protocols have emerged increasingly, and also redundantly, in favour with XML. Considering the applications in the e-business area, we should tolerate the coexistence of cXML,3 ebXML,4 xCBL5 and so forth. Meanwhile, we face comparable embarrassment when considering development and interoperability issues in workflow systems themselves. Since Hsu [13] edited an assembly of technical reports from various groups, little substantial work has been done on data integration for data sharing and exchange among workflow systems. In the context of database research, certain theories and models have been proposed in [2], where semantics and logics of both distributed data and concurrent activities have been made clearer, more formal and explicit. Nevertheless, when we look at current situations of cooperation among dispersed and heterogeneous workflow applications, we should emphasise more concerns on simpler representations and interchange of data between Internet/Intranet-wide entities. WfMC6 strives to fix this problem and regards XML as a granted bearer by specifying XPDL (XML Process Description Language) [24] and WfXML. Meanwhile, Riempp [17] has proposed additional ‘Interface 6’, or enlargement of Interface 4, for interoperability between workflow engines and/or managers. Ironically, few 2 3 4 5 6

http://java.stanford.edu. http://www.cxml.org. http://www.ebXML.org. http://www.commercenet.com. http://www.wfmc.org.

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

29

implementations were wholly bound to such standards or interfaces due to the inherent lack of sound support for explicit semantics. From this point of view, Zhuge’s work has taken further consideration on knowledge sharing among workflow engines, so-called knowledge flow or knowledge grid [29,30]. Our solution is to construct and define integrated process ontology for workflow engines to understand each other. RDF and its extensions can describe workflow entities and relationships explicitly and formally. Similar to the network management agent case mentioned above, we avoid making workflow engines behaving like inference machines but require them the basic capabilities to handle messages and store triples, which are bound to a concise ontology rather than a competent knowledge base. Moreover, inline coding of URI also helps us to locate workflow related data and entities conveniently. We will show how the RDF-based process ontology is advantageous over XML-based definitions by so far. We are also interested in applying RDF mechanisms in soft devices clusters environments [31]. 2. Related work 2.1. Next generation network management SMIng is devised as a long-term network information model, although it is still based on the model of hierarchical managed objects, which are organised in an object identifier tree where only leaf nodes represent objects [20]. Specific SMIng modules are readable to both MIB compliers and human beings; therefore SMIng has a minimal but complete set of data types, which are suitable to both programming languages and management information models. The ‘module’ statement in SMIng, which is correspondent to a certain MIB module, contains mandatory and optional statements and sections in an obligatory order. The most important statements include ‘typedef’, ‘scalar’, ‘table’ and ‘extension’ statements. The ‘column’ statements, which are defined in ‘table’ and ‘row’ statements, specify leaf nodes accordingly as well as ‘scalar’ statements. Two core modules, IRTF-NMRG-SMING and IRTF-NMRG-SMING-TYPES, define the root node and basic data types. ‘Extension’ statements may describe agent’s deviated capabilities from the ‘compliance’ statements of the modules that the agent implements [20]. In all network management frameworks, managed resource objects are classified by a certain hierarchical tree structure, while every node is attributed to an object identifier (oid). Data types of SMIng variables include octets, numeric, enumeration and bit strings, which are the base to define new data types. Any implementation of SMIng modules describes the formal semantics and interpretation of certain variables that belong to sets of scalar, columnar or notification variables that emerge in these modules. However, the concrete semantics of every managed object are determined by their physical semantics in real network environments. An SNMP agent, Ai , at the time of t is specified as a simple abstract fact base fi , which includes only predicate “equals” (“=”). Every statement or sentence in fi is a triple like {=, v, o}, where v is the variable name and o the object value. The primitives of SNMP, GET, SET and TRAP (GET–RESPONSE) implement actions of value retrieval, value attribution and notification on the instances of managed object variables in v. Ai constitutes the local context of network management information, Ci , and is identified by a domain name or network address of the host system with a corresponding process or interface id. SMIng has defined the naming rule of instances of variable v: instances of scalar variables vs are named by their oid + “.” + “0”; instances of columnar variables are named by oid + “.” + subid, where subid is represented by values of instances of vector vc (1), . . . , vc (m) based on function Index, which uniquely identify columnar variables vc of every row. In order to be more intelligent, management agents should understand each other through a language with more formal semantics, which describes the basic definition and relations of correspondent management entities. XML has been incorporated to specify SMIng DTD; therefore, XML versions of SMIng provide a basic schema vocabulary to link up a more complex management knowledge model.

30

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

2.2. Workflow ontology XML has been widely accepted as a uniform data format not only for organisational processes but also for task-specific tools. This will enable distributed software environments, like workflow systems, to be interoperable to a much greater extent when taking advantages of its flexibility and portability. Through evaluating our work and comparing with related research, we have identified the following main drawbacks of XML’s application in workflow environment: extraordinary time consumption for queries and updates, lack of mechanisms for universal distribution and discovery and also ubiquitous misunderstanding between different contexts. For complex workflow systems, XML-based languages would fail because of the limited semantic representation capability. This so-called capability has been enriched by other recent work. For example, as reported by van der Aalst et al. [1], they incorporate the power of Petri nets and/or XML algebra. Moreover, we also tend to investigate the similar complexity that has been brought by existing B2B XML standards as described in Section 1.2, as many specifications become a burden, especially to novice business process engineers. We believe that the workflow specific domain ontology will help us identify problems from a different and also consistent point of view, where XML alone cannot solve the entire problem. Ontology-based data integration has been studied to some extent. Taking Omelayenko and Fensel’s work [16] as an example, they introduced a synonym of ‘ontology’, universal catalogue, which acts as a bridge between heterogeneous product information that was described by different standards such as xCBL and cXML. In their implementation, XSLT (eXtensible Style-sheet Language Transformation) plays transforming roles between DTDs. However, we prefer a middle-out method in the development of workflow ontology because we should stipulate commonality, stability, and verifiability of consistency and accuracy rather than process description details, which would be hard to manage and handle especially when basic vocabulary and semantics are confusing [23]. Similar work can be found on Australian CSRIO Mathematical and Information Science website, where research has been taken on agent-based e-commerce and ontology metadata thesauri in the group of AI in E-Business and Technologies for Electronic Documents.7 Because WfMC specifications indicate shared understanding to a great extent, our candidate of process ontology vocabulary can stem from their glossary, although highly informal. On the other hand, NIST’s efforts on Process Description Language (PSL) are well structured and based on the formal Knowledge Interchange Format (KIF) [18]. The rich expressive power of PSL is a sound skeleton for us. When mapping PSL concepts and semantics onto the XML representation, RDF’s benefits have also been explored and compared with KIF. Thus, for interoperability of the workflow system, we are not short of different process-oriented vocabularies at different levels.

3. Case study 1: agent-based network management 3.1. RDF descriptions of SMIng modules 3.1.1. From SMIng modules to RDFS Every SMIng module has its namespace, which is identified by its authors’ organisation and its version and is correspondent to the RDFS definition. Unique URIs [3] identifies modules, which include statements such as ‘revision’, ‘contact’, and ‘organisation’ and introduce different identifiers from other modules by ‘import’ statements. In this paper, we define that, xmlns:sming = “http://www.it.swin.edu.au/centres/cicec/sming-schema”. Besides the seven basic data types such as rdfs : Class ID = “Float”/, other data types, such as Float128, are defined as subclasses. The common statements within the parameter blocks correspond to, in other words, can be mapped to, properties like rdfs:comment, for example, rdfProperty ID = “description”/ for ‘description’ statement. The statements, such as ‘default’, ‘format’ and ‘units’ are the same except that their physical semantics are interpreted by management entities. Therefore, these properties share the same rdfs:domain as rdfs:Class and rdfs:range as rdf:Literals. 7

http://www.cmis.csiro.au/ted.

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

31

While the root node has a class ID as zeroDotZero, other nodes may have correspondent IDs such as ‘Parent.Key’. A ‘scalar’ statement is an RDFS class definition in the following form based upon the above definitions. notifyonly readonly readwrite some default value of this scalar variable display format for this variable descriptor of units of this variable status word description lines reference statement noacess notifyonly readonly readwrtie The property access defines the access level as a selection list. As for a whole table, including rows and columns, and ‘notification’ statement, please see Appendix A. The ‘extension’, ‘group’ and ‘compliance’ statements of SMIng modules can be easily translated into RDFS classes as well. By now, we may summarise that information elements of resource objects can be mapped onto RDFS classes or properties accordingly with the introduction of new keywords to the namespace of sming. 3.1.2. RDF descriptions of MIB Instances of SMIng modules are implementations of MIB, which are abstracted as a set of statements that declare managed object variables within a specific managing or managed entity at a specific time point (see Section 2.1); it also maps the identifiers of scalar or columnar object instances onto values of specific data types [22]. We assume the RDFS descriptions of certain SMIng modules, which have been constructed by methods described in the preceding section, are located in the namespace: xmlns:agent = “http://www.it.swin.edu.au/centres/cicec/smingmodule/rdf-schema”, a reification of xmlns:sming. Then, as for a scalar object, the RDF description of its instance should be as follows, where agent is a service and processID:port a process and its interface, sming:time and sming:value are properties with rdfs:domain as ScalarVariableID or ColumnID. yy:mm:dd:hh:mm:ss value of certain DataType

32

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

The RDF description of instances of columnar objects can be seen in Appendix B. All object variable resources of the SNMP architecture can be described in the RDF framework with RDFS correspondent to SMIng modules and the RDF model specification mapping SMIng instances, that is, MIB. The RDF framework inherently implies methods and forms of description, storage, transfer and access for resource information. First, RDF has a complete and scalable syntax suitable to describe all types of network resources. Secondly, the formal logic model of RDF is a set of triples, which also determines the physical format of the information base. Moreover, RDF-based entities encode messages in XML, communicate by HTTP and process information with the help of different query languages. As XML has been introduced as a bearer of SMIng modules, the above benefits of RDF(S) may play a promising role.

3.2. RDF-based MKBs 3.2.1. Specification of intelligent management elements Relational MIB, which is developed based on traditional SMI, may be extended to the MKB with the rich semantic capabilities of RDFS; similarly, traditional network management agents can be extended to IAs with certain knowledge processing capabilities. Assuming that a centralised network management expert system is constituted of predicate set, rules set and action scripts set, then in a multi-agent environment, every agent contains some basic elements. Here we apply the RDF Context and FIPA-RDF to describe agent’s knowledge base with the following format corresponding to an agent’s MKB. ... ... ... ... ... ... . ... . Note that sming:value becomes a basic predicate of MKB, a description of a MIB variable instance will become a proposition specification of the subject–predicate–object and truth-value relationship as follows, where columnar variables are distinguished by column values of different rows using rdf:Seq tags. agent:VariableID sming:value value of certain Data Type true

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

33

The rich predicates of MKB may replace sming:value in order to describe more complex relationships between managed resource objects. MKB applies rdfc:asserts or rdfc:assumes in order to list facts or propositions, which are the base of agent’s reasoning. A proposition with a specific predicate may describe terms (statements). Similarly, the operations on managed objects can be expanded with new management action scripts rather than GET, SET or RESPONSE. Note FIPA [11] also defines actions in the following format. ActorName ActionName ArgumentValue ScriptLanguage ScriptMapRDF ScriptURI ScriptLines 3.2.2. Example of fault management rule Compared with Script MIB, our knowledge base system for network management can provide more diverse functions more flexibly. For example, within FIPA-RDF, the rules, as the core of reasoning, are regarded as compositions of two parts: selection and manipulation. fipa:selection selects resources according to the specific predicate expressions with the SQL-like RDF Query specification,8 while fipa-manipulation describes corresponding actions on these resources, such as updating values of variables. The following is a complete example for classifying faults of network interfaces. agent:ifInErrors 8

http://www.w3.org/TandS/QL/QL98.

34

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

1000 bad-interface In the above example, if certain rows in a MIB-II ifTable table have ifInErrors values greater than 1000, then the interfaces of these rows are regarded as failure-prone bad interface. In most cases, fipa:selection-result corresponds to rdf:Bag or rdf:Seq resources. Actually, in the MKB context, every scalar managed object has only one instance, so rules about scalar variable scarcely include ‘selection’ semantics. A simpler description of rules is given below. “#ClassExpression” ... ... ... ClassExpression above can be easily represented with logic operators of OIL to describe premises of rules, see Appendix C, for example. The management process of every agent is the replicated applications of fipa:Rule or sming:Rule to reasoning or invoking fipa:Action to operate on sets of fipa:Proposition, which are defined by rdfc:asserts or rdfc:assumes within certain rdfc:Context. 3.2.3. Implementation Our prototype is based on MCT (Mobile Code Toolkit), which was developed at Carleton University, Canada [4]. MCT offers basic functions for managing or programming mobile agents. Our mobile agents run beyond TCP/IP/SNMP protocols, and also JMX9 (Java Management eXtensions). As mentioned in Section 1, JatLite helps us processing KQML performatives and messages, which are carriers of contents of dialogs between agents. The KQML message parameters are redefined in our context, that is, :ontology becomes SMIng modules and :language should be languages of RDF and its extensions. Besides SiRPAC (Simple RDF 9

http://java.sun.com.

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

IA

CIA

SMI Onto

Globe MKB

SMIng Onto

RDF API

Globe MIB

Jena

SAX

XML Serv

Xerces

JatLite

Router ANS

JatLite

MCT/RMI

RMIClient

MCT/RMI

35

IA

TCP/IP/SNMP/JMX

Fig. 1. Multi-agent implementation model.

Parser and Complier),10 there are also some available Java-based toolkit packages for processing the RDF syntax, for example, RDF API11 and Jena,12 which may run upon popular XML parsers such as SAX13 as well as Xerces.14 The whole implementation model at the current stage is shown in Fig. 1, where Java RMI acts as a platform for agent mobility and Router/ANS, a component of JatLite, helps route messages by agents’ names. The CIA (coordinating intelligent agent) maintains the global knowledge of a group of IAs, they communicate and understand each other in SMI or SMIng ontologies. The dialogs are encapsulated packets with the headers of RDF(S) and XML as well as the performatives of KQML. As for the concrete management upon rules and ontologies, our prototype supports heterogeneous implementation of management agents. RDF-enabled agents not only communicate with each other to exchange basic management information but also understand each other by explicit semantics. For example, special requests and inferences for coordination, which solely query and manipulate on static variables, are added-value to traditional network management framework.

4. Case study 2: ontology integration in workflows Unique requirements for ontology and proper description language are still concerned as follows, to which we deem RDF-based languages may meet: • Basic vocabulary should act as a proper inter-lingua with a well-formed scope, which should be not only necessary but also sufficient. Taking XPDL as an example, it can only make up a base for ontological use in its current standard shape. • The degree of formality should be reasonably modest. XML certainly fails because of the semantics ambiguity it brings inherently. Formal or semi-formal options such as PSL and KIF are possible options, but over complicated to be flexibly applied in the implementation and deployment of workflow systems. • Explicit semantics should be as simple as possible; however, it should provide enough extensibility whenever it is required to express more complex logic or constraints. However, we would not bind special axioms with the properties of or relations between entities and/or classes. • Unified syntax and universal distribution should be guaranteed intrinsically. This requirement makes communication, discovery and distribution of ontology data possible to the greater extent. 10 11 12 13 14

http://www.w3.org/RDF/Implementations. http://www-db.stanford.edu/∼melnik. http://www-uk.hpl.hp.com/people/bwm/rdf/jena. http://megginson.com/SAX. http://xml.apache.org/xerces-j.

36

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

CORBA Services

Web Services (SOAP/WSDL) Ontology Level

PSL

Petri Net

RDF-Based Descriptions Process Level

WfXM L

XPDL Application Level

XML

xCBL

ebXML

cXML

B2B Spec…

Fig. 2. Framework for workflow data integration.

• Performance can be optimised in comparison to semi-structured XML databases. It should be effortless to represent data dependency and reduce data redundancy as well. We have proposed our framework in Fig. 2 [21] to illustrate our contribution as addressed in this section. As for data integration, three levels exist along the vertical dimension, where the ontology level explicitly describes and represents abstract semantics beyond processes. Integrated tools, such as CORBA platform and e-services like Simple Object Access Protocol (SOAP) and Web Service Description Language (WSDL), provide possibilities to ease the realisation. According to the degree of formality required, we believe that the RDF-based ontology is easier to realise when considering the overall requirements in workflow environments, though PSL or some types of Petri net are complementary. In this section, we briefly sketch out the details of what to be considered and included in our prototype ontology at first, followed by an example mapping of workflow constructs. The prototyping is discussed at last. 4.1. Description of workflow ontology using RDF tools Bajaj and Ram [2] stated that their SEAM model captured different aspects of workflow and demonstrated itself as an amalgamation of current models. We should acknowledge that their contribution lies in system development from the database modelling point of view, despite their considerations cover most constructs that should be included in a process description. However, besides Entity, State, and Activity within their proposal, the most controversially indeterminate construct is temporal modelling and representation. Fortunately, PSL includes rich axiomatic paradigms for describing timing constraints and relationship between activities. From the complexity point of view, rules are the most difficult thing to model and manage, especially for that the Event Condition Action (ECA) mechanism is well adopted in workflow systems. Actually, on a distributed object platform, Kappel et al. [14] had also incorporated rules with roles as another important concept. Similar to their special handling of class of roles, Fan et al. [9] even argued to keep role concept overlapping with, but also separated from, entities, whose purposes of activities should be represented by proper roles. Through comprehensive research, we have reached a would-be agreement on workflow ontology. The basic class hierarchy of rdfs:Resource (rdfs:Class) is shown in Fig. 3. There are some meta-classes such as Activity, Entity, Role, Rule and TimePoint, which describe common constructs within a generic workflow system. Furthermore, subclasses such as ComposedActivity, Loop, human, agent, auditor and compensator, which represent characteristic kinds of conceptual entities, are regarded necessary. For example,

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

37

Fig. 3. Basic class hierarchy in RDFS.

one human-like autonomous agent program may play a role as a compensator that is responsible for exception handling. In Fig. 3, proj 01 and task 1001 are represented as flat activities, where the subsumption relationship has been hidden. This meta-level method is appropriate for routine processes, so that every occurrence or instance of a certain activity takes such a description for granted as template. When the processes are ad hoc, the designer may choose another option to represent every activity at the instance level instead of the class level, so are the descriptions of events and states (classes Event and State). Besides the core properties of workflow resources such as activityName, excutedBy and participatedBy, the most important relations between process activities are sequences, loops and branches. The basic ordering relations are nextActivity, and its inverse, previousActivity. nextActivity is a linear and transitive relation, for example, guaranteeing that every occurrence of task 0001 must be prior to that of task 1001. We distinguish transitions into two disjoint classes Join and Split, which are connected to other activities by joinFrom and splitTo. Each class or instance of transition may be a composition of activities by connectors such as AND, OR and XOR. The following is an excerpt of our RDF model with the OIL syntax.

38

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

Table 1 shows the mapping of concepts and relations from our workflow ontology onto XML-based prototype and two other specification languages. The mapping is not one-to-one but rather overlapped. We would show an example to illustrate the description of Rule in the following section. 4.2. Example of ECA rules described by RDF languages When workflow activities are instantiated, they maintain and update their states, and publish and subscribe events. Both states and events are fixed by certain time-stamp, so that semantics, such as ‘state unenacted holds within a time duration of interval’ or ‘event1 occurs at a time point’, become clearly represented. Moreover, every reified RDF description of an activity instance should explicitly include entity-value pairs and related events as well as timing information, if necessary. Above all, RDF and RDFS can hardly be regarded as a strict combination of a type system, entity-relationship model and first-order predicate logics. Yet, they cannot easily describe logic constraints besides the range and domain, and relationships such as set-element, attribute-value or class-instance, for which an ontology specification requires [8]. For example, description of an ECA rule requires more semantic expression capabilities than what we have discussed so far. The following excerpt gives some options, where part (a) introduces some specific namespaces such as rdfq (see Section 3.2.2) and wfonto, besides fipa. Here, terminateException is an event that may trigger corresponding actions such as updating currentState. To enable the action, certain conditions should be satisfied in advance. As for task 1001, if the activatedInstance has been inactive, say, for more than 48 h, we might regard it as an unrecoverable task and some compensation actions should perhaps be initiated. RDF queries should be supported in this situation; however, there is no standard solution for it by now. The other means of description of rules are shown in part (b) or (c), which leaves the complex semantic representation to Table 1 Mappings between languages Workflow ontology

XML prototype

XPDL

PSL

activity class Entity human agent, application Role compensator State Event TimePoint Join, Split, Loop

project, task – people tool – – status – – dependent

Process, Activity – Participant Application, Data ParticipantType – Attribute, Parameter – Valid Date Transition Restriction

activity object – – – Repairable-fluent fluent, exists-at Activity-occurrence timepoint Junctions

activityName FinishTime, StartTime nextActivity joinFrom, splitTo paticipatedBy

id, name finish date, start date dependent dependent username

Id, Name Valid To, From Date From, To TransitionRef Responsible, Performer

?variable endof, beginof next-activity Junction participate-in

Rule



Condition, Subflow

Achievement

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

39

implementation or artefacts such as ClassExpression or Context. As XPDL and PSL also disregard this expressive mechanism in their newest version, we would like to mention some efforts such as the XML Declarative Description, which may denote variables flexibly beyond RDFS [25]. wfonto:task 1001 48 wfonto:unrecoverable (a) ... (b) “#ClassExpression” ... ... (c)

40

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

Fig. 4. Properties of a to-do activity.

4.3. Prototyping As described elsewhere, we have already utilised XML files [28], instead of a relational database that was used in the previous workflow prototype [26,27], as the universally accessible portable data repository for data integration. Now with ontology tool OilEd,15 we can view and implement workflow systems from a different point of view. For example, task 1001 is defined as ComposedActivity in RDF-based workflow ontology (Fig. 4). It is followed by an AND-Join transition2 in parallel with task 1101 (see Section 4.1), participatedBy individual person Jones, who is also a subclass of both human and auditor, and executedBy individual application tool Java. The semantics of workflow entities and relationships becomes clearer and easier to handle. RDF files may be stored as sets of triples in relational databases, parsed by XML/RDF parser engines and analysers, and accessed through URI addresses and anchors.16 We should note that Java-based packages and some open resource toolkits, which are available within RDF special interest groups, offer greater interoperability to our implementation of workflow data integration.

5. Discussions Two case studies in this paper meet the knowledge representation requirements of the cutting-edge distributed cooperative systems: agent-based network management information model and process ontology of workflow data integration. In the first case study, we have based our work upon SMIng, which is a prospective SMI proposed by IRTF’s network management research group towards a next generation information model for network management. However, aiming at the deployment of multi-agent systems to network management environments, we have been trying to establish a novel lightweight knowledge model based on RDF and its extensions. As the SMIng namespace acts as a seed, mapping from SMIng modules and related MIB onto RDFS definitions of classes, properties and related descriptions have become feasible. Then, the elements of MKB, especially, the rule bases and the action scripts can be described by extensions of RDF or RDFS, such as RDF Context, FIPA-RDF 15 16

http://www.ontoknowledge.org/oil. http://www.w3.org/RDF/Interest.

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

41

and OIL. Moreover, the specific RDF query languages are necessary to support selection and manipulations on rules. Finally, our implementation model integrates Java-based tools at different levels in order to coordinate agents more easily. However, an abstract knowledge model, namely, network management ontology, is merely a small step towards intelligent management. To meet the requirements of knowledge sharing and processing among managing agents in the future, we need not only fix the management information model but also absorb results from semantic web and knowledge management areas, from where RDF stems. On the other hand, due to the limitations of the XML solution to data integration for data sharing and exchange in workflow systems, we believe that data integration should incorporate ontology engineering. In comparison with other similar research work, we have focused on the innovative RDF-based descriptions of entities and relationships for workflow processes. Our novel framework clearly shows the correspondent issues and relations among data integration levels. The concept space that we have sketched out at this stage may form a basis for the construction of most common and conventional workflow ontology, although some constructs such as some parts of ECA rules remain ambiguous. In our prototype, diverse tools have been integrated to support the ontology development and analysis. With our framework, mappings among different process description languages are relatively easy to implement. As shown in this paper, RDF-based languages and tools are promising options for data integration in a distributed cooperative environment, such as construction of workflow ontology. RDF mechanisms have been incorporated into existing workflow management systems in order to improve interoperability and knowledge sharing among peer entities. RDFSs may also converge together different vocabularies and semantics from different e-commerce or cooperation areas. With the far-range deployment of web services and semantic web, applying RDF to data integration in workflow systems becomes inevitable. In the future, we need to refine our ontology and its representation language, especially for unambiguous descriptions of complex elements such as rules and time points. We also need to investigate data-centric application integration further for workflow systems. In summary, there are different ways to build RDF-based applications. One way is to extend an existing standard information model straightforward. There are direct mappings from the entities onto concepts. Up to this stage, roles of RDF are quite similar to XML. Beyond that, in order to describe more complex predicates, relations and roles, RDFSs can offer more powerful definition mechanisms to establish a common knowledge environment. The other way is to integrate a set of existing description languages. The developers need to select the basic common vocabulary in advance to support feasible mappings between them through such inter lingua. Then it is more natural to describe ontology elements. An important issue is to distinguish the abstract levels among overall entities and concepts, as well as roles and rules. The inherent class-instance relations defined by RDFS and RDF have guaranteed a well-structured framework to outsource metadata and data. In our cases, network management information modules describe meta-level variables based on the RDFS class structure, so that dynamic updates and queries upon MIB data can be extended to complex processing and inference on MKB knowledge based on the RDF syntax. Similarly, workflow engines parse process descriptions to extract process logic, and activate task enactment through organising and exchanging RDF documents. Both network management agents and workflow engines only need to reason upon RDFS definitions and disregard individual instance activations.

6. Conclusions The rich semantics power and extensions of RDF(S) allow us to describe knowledge and ontology in much broader areas. In the next generation distributed environments, knowledge sharing and data interoperability become extremely critical issues. Traditional languages and tools, such as XML, lack the essential description capabilities for complex relations and rules between computing entities. The major contribution of this paper lies in the method investigation of novel RDF applications. In order to construct knowledge modules or ontology repositories, we should distinguish and select a basic information model

42

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

and its vocabularies at first. By incorporating new logical expression and predicate property elements, we may establish a foundational description framework for each resource domain, a single complete set of concepts and relations including explicit semantics. Then, according to the abstract levels, different RDF or RDFS descriptions for different entities should be determined in favour of diverse RDF extensions. In general, an implementation object with an identifier is an individual instance of a meta-level class. Finally, system prototyping helps us take advantages of effective manipulation upon RDF-based data. Our dedicated experiences in extending RDF applications suggest that RDF(S) languages are promising tools for data and knowledge integration in the future. In both case study application areas that have been addressed in this paper, our efforts are comparably innovative and effective. By now, we are implementing an RDF-based data repository within a peer-to-peer based workflow system, where knowledge exchanges between peers are very critical.

Acknowledgements We wish to thank many anonymous referees for their suggestions to improve this paper. This work is partly supported by Swinburne Vice Chancellor’s Strategic Research Initiative Fund 2002–2004. We are also grateful for many members of CICEC for constructive discussions.

Appendix A For the ‘notification’ statement, with objectIdentifier-i corresponding to the name of the ith object variable, it looks like the following. resource=“#objectIdentifier-1” ... ... resource=“objectIdentifier-n” status word ... ... Description of a whole table, including rows and columns, is in the following format. ... ...

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

... ... yes ... ... noaccess notifyonly readonly readwrite ... ... ... ... ... ... Where

Appendix B The RDF description of instances of columnar objects looks like the following. yy:mm:dd:hh:mm:ss value

43

44

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

... . value ... ... ... . Appendix C The following format is a representation of a logic expression for ¬((pred1, Variable1, obj1)v((pred2, Variable2, obj2) ∧ (pred3, Variable3, obj3))) Variable1 pred1 obj1 Variable2 pred2 obj2 Variable3 pred3

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

45

obj3 References [1] W.M.P. van der Aalst, H.M.W. Verbeek, A. Kumar, XRL/Woflan: verification of an XML/Petri-net based language for inter-organizational workflows, in: K. Altinkemer, K. Chari (Eds.), Proceedings of the Sixth INFORMS Conference on Information Systems and Technology (CIST-2001), Linthicum, MD, 2001, pp. 30–45. [2] A. Bajaj, S. Ram, SEAM: a state–entity–activity-model for a well-defined workflow development methodology, IEEE Trans. Knowl. Data Eng. 14 (2) (2002) 415–431. [3] T. Berners-Lee, R. Fielding, L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, RFC 2396, IETF, 1998. [4] A. Bieszczad, Mobile agents for network management, IEEE Commun. Surv. 1 (1998). http://www.comsoc.org/pubs/surveys. [5] C. Breugst, T. Magedanz, Mobile agents—enabling technology for active intelligent network implementation, IEEE Network 12 (5) (1998) 53–60. [6] J. Case, K. McCloghrie, M. Rose, S. Waldubsser, Structure of management information for version 2 of the Simple Network Management Protocol, RFC 1902, IETF, 1996. [7] D. Connolly, F. van Harmelen, I. Horrocks, D.L. McGuinness, L.A. Stein, DAML+OIL reference description, World Wide Web Consortium Note, 2001. http://www.w3.org/TR/daml+oil-reference. [8] S. Decker, S. Melnik, F. van Harmelen, D. Fensel, M. Klein, J. Broekstra, M. Erdmann, I. Horrocks, The semantic web: the roles of XML and RDF, IEEE Internet Comput. 4 (5) (2000) 63–74. [9] J. Fan, K. Barker, B. Porter, P. Clark, Representing roles and purpose, in: Proceedings of the First International Conference on Knowledge Capture (K-Cap’01), ACM Press, 2001, pp. 38–43. [10] T. Finin, R. Fritzson, D. Mckay, R. McEntire, KQML as an agent communication language, in: Proceedings of the Third International Conference on Information and Knowledge Management (CIKM’94), ACM Press, 1994, pp. 456–463. [11] FIPA Content Language Library, FIPA 99 Specification, v 0.2, Part 18, Foundation for Intelligent Physical Agents, Geneva, Switzerland, 1999. http://www.fipa.org. [12] I. Horrocks, D. Fensel, J. Broekstra, S. Decker, M. Erdmann, C. Goble, F. van Harmelen, M. Klein, S. Staab, R. Studer, E. Motta, The ontology inference layer—OIL, Technical Report of On-To-Knowledge Project in the IST Program, 2000. http://www.ontoknowledge.org/oil. [13] M. Hsu (Ed.), Special issue on workflow systems, Bulletin of the Technical Committee on Data Engineering, IEEE Comput. Soc. 18 (1) (1995). [14] G. Kappel, P. Lang, S. Rausch-Schott, W. Retschitzegger, Workflow management based on objects, rules, and roles, in: M. Hsu (Ed.), Data Engineering Bulletin: Special Issue on Workflow Systems, vol. 18, No. 1, 1995, pp. 10–17. [15] G. Klyne, Context for RDF information modeling, 2000. http://public.research.mimesweeper.com/RDF/RDFContexts.html. [16] B. Omelayenko, D. Fensel, A two-layered integration approach for product information in B2B electronic commerce, in: Proceedings of the Second International Conference on Electronic Commerce and Web Technologies (EC WEB 2001), LNCS vol. 2115, Springer, Berlin, 2001, pp. 226–239. [17] F. Riempp, Wide area workflow management: creating partnerships for 21st century, in: D. Diaper, C. Sanger (Eds.), Series of Computer Supported Cooperative Work, Springer, London, 1998, pp. 78–83. [18] C. Schlenoff, M. Gruninger, F. Tissot, J. Valois, J. Lubell, J. Lee, The Process Specification Language (PSL): Overview and Version 1.0 Specification, NISTIR 6459, National Institute of Standards and Technology, Gaithersburg, MD, 2000. [19] J. Schönwälder, J. Quittek, C. Kappler, Building distributed management applications with the IETF Script MIB, IEEE J. Selected Areas Commun. 18 (5) (2000) 702–714. [20] J. Schönwälder, F. Strauß, Next generation structure of management information for the Internet, in: Proceedings of the 10th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM’99), LNCS vol. 1700, Springer, Berlin, 1999, pp. 93–106. [21] J. Shen, Y. Yang, RDF-based data integration for workflow systems, in: Proceedings of the 13th Australasian Conference on Information Systems (ACIS 2002), Melbourne, Australia, December 2002, pp. 379–390. [22] J. Shen, Y. Yang, RDF-based knowledge model for network management, in: Proceedings of the Eighth IFIP/IEEE International Symposium on Integrated Network Management (IM 2003), Colorado Springs, CO, Kluwer Academic Publishers, Dordrecht, March 2003, pp. 123–126.

46

J. Shen, Y. Yang / Future Generation Computer Systems 20 (2004) 27–46

[23] M. Uschold, M. Gruninger, Ontologies: principles, methods and applications, Knowl. Eng. Rev. 11 (2) (1996) 96–137. [24] Workflow Management Coalition, Workflow Process Definition Interface-XML Process Definition Language, WfMC-TC-1025, v 0.03, Lighthouse Point, FL, 2001. [25] V. Wuwongse, C. Anutariya, K. Akama, E. Nantajeewarawat, XML declarative description: a language for the semantic web, IEEE Intell. Syst. 16 (3) (2001) 54–65. [26] Y. Yang, An architecture and the related mechanisms for web-based global cooperative teamwork support, Int. J. Comput. Inform. 24 (1) (2000) 13–19. [27] Y. Yang, Enabling cost-effective light-weight disconnected workflow for web-based teamwork support, J. Appl. Syst. Stud., Camb., England 3 (2) (2002) 437–453. [28] Y. Yang, Tool interfacing mechanisms for programming-for-the-large and programming-for-the-small, in: Proceedings of the Ninth Asia Pacific Software Engineering Conference (APSEC02), Gold Coast, Australia, IEEE Computer Society Press, December 2002, pp. 359–365. [29] H. Zhuge, A knowledge flow model for peer-to-peer team knowledge sharing and management, Expert Syst. Appl. 23 (1) (2002) 23–30. [30] H. Zhuge, A knowledge grid model and platform for global knowledge sharing, Expert Syst. Appl. 22 (4) (2002) 313–320. [31] H. Zhuge, Clustering soft-devices in the semantic grid, IEEE Comput. Sci. Eng. 4 (6) (2002) 60–62.

Jun Shen is a Post-Doctoral Research Fellow of the Center for Internet Computing and E-Commerce (CICEC) at Swinburne University of Technology, Melbourne, Australia. His current research interests include ontology engineering, workflow systems, Petri net, network management and semantic web. His efforts are mainly on RDF applications in different areas, especially workflow ontology and network management. He is now a Member of Australian Computer Society, IEEE Computer Society and ACM.

Yun Yang is an Associate Professor and Foundation Director of the CICEC at Swinburne University of Technology, Melbourne, Australia. He is also the Deputy Head (Research) of School of Information Technology at Swinburne University of Technology. He received a PhD degree in Computer Science from the University of Queensland, Brisbane, Australia. His current research areas include software technology, workflow systems, Internet computing applications, CSCW and e-business processes. He has edited one book and co-authored about 70 papers on international journals and conferences. He is now a Member of IEEE and IEEE Computer Society.

Suggest Documents