system many tools and techniques have been developed for semantic annotation of the web ... http://epub.mimas.ac.uk/pape
Semantic Digital Library Services Nabonita Guha
[email protected];
[email protected] Documentation Research & Training Centre Bangalore (India) Abstract To a great extent, ‘Semantics’ depends upon the context in which the user is seeking information. The proposed model aims at delivering information in the light of ontology-based document annotation, user annotation and domain ontology. The present study addresses the problem of semantic interoperability in digital library environment. In a heterogeneous environment, a mediation-based service is of immense help to resolve the ontology-level, schema-level and service-level ambiguity, for which WSMO Framework has been chosen. This study is confined to bring interoperability at schema and ontology level for the domain of agricultural. Further studies can be done for service level interoperability among various digital library services.
INTRODUCTION The faster and meaningful information retrieval has been the driving aim for the information retrieval systems since the beginning of automated information retrieval. During the database information systems, Artificial intelligence, and other computer-aided retrieval systems have made a very optimistic start. In the Internet era, Semantic Web came as a model of semantic retrieval in the web environment. Traditional libraries are in a stage of transition towards making the library without boundary with global access with Internet. Many information storage and retrieval systems were been used for a meaningful retrieval in print-based libraries. In a Web environment, the traditional means and techniques for information storage and retrieval are required to be modified to suite the changed needs. The classification systems for book classification has been changed into Ontologies to represent domain knowledge in machine processable form; the cataloguing codes have taken shape of Metadata Schema for the description web resources. The model of Semantic Web goes further to inferencing and proofing which is essential for information retrieval in web environment. With all these components (XML, RDF, Ontology, Inferencing, proofing, etc) [1], the vision of meaningful retrieval on Web seems quite practical. Still the following two core aspects has to be looked into for getting meaningful retrieval: • User’s context; and • Document context The vision of semantic retrieval can be accomplished with an ontological model of user’s interest areas, and the modeling of context of information which has been dealt in the document. This will make the match of user’s context with the document context easier at the search stage. Due to the distributed and heterogeneous nature of the Web, interoperability at semantic level became a great challenge before the system developer. Hence the present study aims at addressing both the challenges using tools and technology available to implement Semantic
1
Web based services. The present work intends to bring semantic retrieval in digital libraries using semantic web technologies for the key tasks like ontology-based annotation, semantic mediation and inferencing. CONTEXT-BASED SERVICE MODEL
3
classaraus
4
2
Inference Engine & Rule Base 1
Domain Ontology
classaraus
Digital Docs with Metadata
Onto-based Document Annotation classaraus
Search Interface
Ontobased User Annotation
5
Context Based Index
Fig. 1: Proposed Architecture of Context-based Service Model Semantic Annotation: In the given model above, the ontology has not been only for the domain knowledge modeling, but also for document annotation and the annotation of the registered users of the repository. Context of the information been dealt in the document can be made explicit by document annotation, which can be represented in OWL or WSMO. Similarly the user profiles can be represented as ontology-based annotation. Classaurus [2]: It is a faceted thesaurus used as a vocabulary controlled mechanism for automated permuted indexing in traditional bibliographic databases. METADATA INTEROPERABILITY FOR SEMANTIC RETRIEVAL There is wide variety of metadata schemas available for different kind of digital resources. In the classification table for metadata, Kashyap and Seth [3] has categorized the various types of metadata under the following broad categories: • Content independent metadata • Content dependent metadata o Direct content-based metadata o Content descriptive metadata - Domain independent metadata - Domain specific metadata With this categorization make it quite clear that to bring semantic interoperability among bibliographic repositories, various types of metadata schemas has to be considered. JeromeDL [4] project has made an effort to bring semantic interoperability among the digital repositories using different bibliographic metadata standards like Dublin Core, BibTeX, MARC21 etc. The WSMO framework is a well define model to bring semantic interoperability among
2
heterogeneous automated retrieval systems by it’s the key components like Web Service Modeling Ontology (WSMO) [5], Web Services Modeling Environment (WSMX) [6] including various mediators. VIRTUAL DOCUMENT DELIVERY SERVICE The above two approaches are planned to be incorporated as a virtual document delivery service. The proposed service aims at context-based and interoperable service at metadata level. The domain of agriculture has been chosen for implementing this service among various repositories from agriculture domain. Various agriculture metadata schemas and thesaurus has been studies which are used by the classaraus
Domain Ontology classaraus
Inference Engine & Rule Base
Onto-based Document Annotation
Context Matching
Digital Docs with Metadata
CAB Thesaurus
classaraus
Onto-based User Annotation Mediation
AGROVOC
Mediation
Digital Docs with Metadata
WSMX
ASFA Thesaurus
Service Registry
Fig. 2: Ontology Meditation and context matching for Semantic Document Retrieval METHODOLOGY In the above model of virtual document delivery (fig. 2), the key tasks are: service discovery, ontology mediation and context matching. The semantic digital library searches for the service registry [7, 8] to locate appropriate repositories. After the service discovery to map the ontology and metadata schema, ontology mediators are used. The ontology mediator converts the mapped ontology to the native syntax. The retrieved document description used to generate context-based index. This context-based index is then matched with the user query and annotation. RELATED WORKS Context-based retrieval: Context-based information retrieval system has been a major focus of research. The Context Ontology Language (CoOL) [11] is an ontology-based context modeling approach, which uses the Aspect-Scale-Context (ASC) model where each aspect (e.g. spatial distance) can have several scales (e.g. kilometer scale or mile scale) to express some context information (e.g. 20). Chen et al. [12] propose a context broker architecture (CoBrA) using an ontology to describe persons, places and intentions. Less emphasis is put on the notion of
3
services and related aspects, such as user interfaces and mobile devices on which these services are deployed. Semantic Annotation: For making the content and context of the information explicit to the system many tools and techniques have been developed for semantic annotation of the web resources. KIM Semantic annotation platform and KIM plug-in is one of the useful tools available for digital resource annotation. Maedche and Staab [10] have proposed a semiautomatic acquisition of ontologies from domain texts. My proposed study aims in not only to annotate the digital resources semantically but also how to interpret the annotations described in different annotation languages. Semantic interoperability in Digital Libraries: JeromeDL [13] project developed a software aiming at bringing personalized services to the users and search algorithm based on the user profiles. My present study also considering user’s interest areas as the key component of the retrieval system to decide the context of the retrieved information, but here domain ontology, user’s annotation and the document annotation are supported by Classaurus mechanism which makes the communication between all three components easier. In metadata interoperability aspect also my study doesn’t aim at bringing any new ontology language but trying to bring interoperability among existing ontology languages, thesaurus and metadata schemas. Ontology based mediation in digital libraries: MarcOnt Initiative [14] is one of the on-going projects aiming at creating a new bibliographic description standard (MarcOnt) and mediation services to support different legacy bibliographic formats. But the outcome of the project proposal has yet to come. FUTURE RESEARCH AND CONCLUSION Interoperability in a heterogeneous environment is a broad connotation encompassing syntactic, as well as semantic interoperability. Interoperability at semantic level is a challenging task. In context-sensitive query processing over heterogeneous information resources requires the matching of concepts. Vocabularies, semantic relationships and mappings are information objects themselves, their life cycle: creation, acquisition, collection, modeling, identification, integration, mediation, search, use, maintenance and preservation etc. is of primary importance and a necessary prerequisite to improved semantic interoperability [9]. Steps are to be taken in all future researches in this regard.
REFERENCES [1] Tim Berners-Lee, James Hendler and Ora Lassila. The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. May 17, 2001. [2] G. Bhattacharya. Fundamentals of subject indexing languages. In: Proceedigns of Third International Study Conference on Classification Research. Bombay, Jan. 6-11, 1975. Bangalore, In: DRTC 1979, pp 86-98. [3] Vipul Kashyap and Amit Sheth. Semantic heterogeneity in global information systems: The role of metadata, context and ontologies, Cooperative Information Systems. Academic Press: San Diego, 1998, pp. 139-178.
4
[4] Sebastian Ryszard Kruk, Stefan Decker, and Lech Zieborak. JeromeDL reconnecting digital libraries and the Semantic Web. WWW2005, May 10-14, 2005, Chiba, Japan. http://www.marcont.org/marcont/pdf/www2005_jeromedl.pdf [5] John Domingue, Dumitru Roman, and Michael Stollberg. Web Service Modeling Ontology (WSMO) - An ontology for Semantic Web services. Position paper at the W3C Workshop on Frameworks for Semantics in Web Services, June 9-10, 2005, Innsbruck, Austria. http://www.w3.org/2005/04/FSWS/Submissions/1/wsmo_position_paper.html [6] Emilia Cimpian and Michal Zaremba (Eds). Web Service Execution Environment (WSMX). W3C Member Submission 3 June 2005. http://www.w3.org/Submission/WSMX/ [7] Ann Apps. A middleware registry for the discovery of collections and services. In The First International Conference on e-Social Science, Manchester, UK, 22-24 June 2005. http://epub.mimas.ac.uk/papers/ncess2005/apps-ncess2005.pdf [8] Eric Lease Morgan, Jeremy Frumkin and Edward A. Fox. The OCKHAM Initiative Building component-based digital library services and collections. D-Lib Magazine, 10 (11), Nov. 2004. http://dlib.org/dlib/november04/11inbrief.html#FOX [9] Manjula Patel et al. Semantic interoperability in digital libraries. Task 3: Semantic Interoperability. WP5: Knowledge Extraction and Semantic Interoperability DELOS2 Network of Excellence in Digital Libraries July 2004 - June 2005. Project no. 507618, UKOLN, University of Bath. [10] Alexander Maedche and Steffen Staab. Semi-automatic engineering of ontologies from text. In: Proceedings of the Twelfth International Conference on Software Engineering and Knowledge Engineering (SEKE'2000). Chicago, July 6-8, 2000. [11] T.Strang, C. Linnhoff-Popien, and K. Frank. CoOL: A Context Ontology Language to enable Contextual Interoperability. In: J.B. Stefani, I. Dameure, D. Hagimont, (Eds). In: LNCS 2893: Proceedings of 4th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems DAIS2003). Volume 2893 of Lecture Notes in Computer Science (LNCS). Springer Verlag: Paris, 2003, pp. 236-247. [12] H. Chen, T. Finin, and A. Joshi. An ontology for context-aware pervasive computing environments. Special Issue on Ontologies for Distributed Systems, Knowledge Engineering Review, 2003. [13] Sebastian Ryszard Kruk, Stefan Decker, and Lech Zieborak. JeromeDL: Reconnecting Digital Libraries and the Semantic Web. http://www.marcont.org/marcont/pdf/www2005_jeromedl.pdf [14] Sebastian Kruk and Marcin Synak and Kerstin Zimmermann. MarcOnt Initiative Mediation services for digital libraries. http://library.deri.ie/servlet/showDoc?docId=http://library.deri.ie/pages/show.jsp?id=51c4 c7ff&chapter=0&view=pdf
5