Light-Weight Communal Digital Libraries - CiteSeerX

0 downloads 0 Views 87KB Size Report
Open Archives Initiative Protocol for Metadata Harvesting (OAI-. PMH) to facilitate ... allows the user to publish/edit metadata, view a list of published items, view ...
Light-Weight Communal Digital Libraries Rick Luce

K. Maly, M. Nelson, M. Zubair, A. Amrou, S. Kothamasa, and L. Wang Old Dominion University Norfolk VA 23592 USA

Los Alamos National Laboratory Research Library Los Alamos NM 87544 USA


[email protected]

providers, and on the other hand overcome the reluctance of authors to publish into a digital library (and instead put their work on their website) through user-friendly publishing tools and total control through having all files and metadata reside on the author’s personal machine. This latter characteristic has serious implications regarding reliability. As a demonstration, we provided an initial registration service and a service provider at Old Dominion University. Once an archivelet registered with our registration service, the service provider could harvest metadata from it. The Kepler Group Digital Library that harvests, indexes and provides end-user discovery services is based on the popular Arc system [2].

ABSTRACT We describe Kepler, a collection of light-weight utilities that allow for simple and quick digital library construction. Kepler bridges the gap between established, organization-backed digital libraries and groups of researchers that wish to publish their findings under their control, anytime, anywhere yet have the advantage of their personal libraries. The personal libraries, or “archivelets”, are Open Archives Initiative (OAI) compliant and thus available for harvesting from OAI service providers. A Kepler archivelet can be installed in the order of minutes by an author on a personal machine and a Kepler group server in the order of hours.

We faced a number of issues during the initial deployment of Kepler: the software did not have the flexibility of customizing and deploying it for a community, and archivelets were often installed behind firewalls, making it difficult for the service provider to harvest them. We built upon our experiences with the initial Kepler distribution and the resulting new Kepler system is easily customized for specific community needs, can be easily populated, managed, and is “open” for development of future services. For the test deployment, we are working with the US Geological Survey (USGS), Los Alamos National Laboratory (LANL), and the Open Language Archives Community (OLAC).

Categories and Subject Descriptors H.3.7 [Information Storage and Retrieval]: Digital Libraries

General Terms Documentation, Performance, Design

1. INTRODUCTION One of the largest obstacles for information dissemination to a user community is that many digital libraries use different, proprietary technologies that inhibit interoperability. The Open Archives Initiative (OAI) addresses interoperability by using the Open Archives Initiative Protocol for Metadata Harvesting (OAIPMH) to facilitate the discovery of content stored in distributed archives [1]. Realizing the benefits of OAI, a number of communities are interested in an out-of-the-box solution that will help them deploy OAI-based digital libraries consisting of publishing tools, archives, and search services.

2. ARCHIVELET ARCHITECTURE The original Kepler archivelet implementation [3] only supported Dublin Core (DC) and could not support other metadata formats. We have re-engineered the architecture to be extensible with regard to formats and functions. The new design has a welldefined API specification defining the various functions that are implemented by every module that in turn are available for other modules. Support for new metadata formats requires just the implementation of a metadata driver module. In the Kepler software documentation, we provide developer guidelines for fast and easy implementation of a metadata driver. The following sub-sections describe the features show in Figure 1.

Building a communal digital library is severely hampered by a lack of tools and software that are easy to use and address the diverse requirements of different communities. In particular, metadata, as the codification of the worldviews that define a community, needs to accommodate varying formats, uses, encodings and pedigrees. Creating a system for communal digital libraries poses a number of challenging research issues: handling community-specific requirements, maintaining ease of use, and providing high-reliability.

2.1 Webserver This module creates a server socket and listens for OAI-PMH and full-text documents requests. If the request is for a document, the document is retrieved from the data folder and served to the requester. If it is an OAI-PMH request, it is parsed and the appropriate method from the OAI-PMH API of the Metadata Manager is invoked and the results are returned. The server is controllable by the user; at any time the user can turn it on or off. This allows users to control when to make their collection available for harvesting.

Kepler gives publication control to individual publishers, supports rapid dissemination, and addresses interoperability. In Kepler, the OAI-PMH was used to support “personal data providers” or “archivelets”. Archivelets are meant to be “personal pocket libraries” that are on the one hand OAI-PMH compliant data Copyright 2004 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by a contractor or affiliate of the U.S. Government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. JCDL’04, June 7–11, 2004, Tucson, Arizona, USA. Copyright 2004 ACM 1-58113-832-6/04/0006…$5.00.


Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries (JCDL’04) 1-58113-832-6/04 $ 20.00 © 2004 ACM

the user clicks on publish). These configuration files contain information on regular expressions for the various metadata elements, values for drop down lists for elements that uses predefined options (e.g., language), and the mandatory/optional status for every metadata element. The Validation module API is invoked whenever the user publishes or edits metadata and insures the metadata meet the constraints specified in the configuration files.

2.2 Kepler User Interface This module creates the main archivelet user interface which allows the user to publish/edit metadata, view a list of published items, view full-text documents, start/stop the OAI-PMH server, and do other configuration tasks such as changing the server port and registering with Kepler groups. It communicates with the metadata manager through the UI API.

2.6 Repository This module provides access to the local collection. Whenever the user publishes new metadata or edits existing metadata, the repository module writes this metadata to the collection as XML and also uploads the full-text specified by the user. There are two implementations: a File-System Repository and a Database Repository. The File-System Repository is used in the traditional archivelet and the Database Repository is used in the server-side archivelet. For the Database Repository, only the metadata is stored in the database and the full-text is uploaded to a folder in the server machine. For the File-System Repository, both the metadata and full-text are stored in a folder in the file system.

3. CONCLUSIONS We have done extensive testing of the installation process and the publishing tasks. The testing was done by project participants and persons within the participating institutions: LANL, CERN, UPenn, USGS, and ODU. The original publication and resource discovery interfaces are similar to current interfaces; most of the changes are behind the scenes or are new features. All of the new features (described in [4]) such as server-side archivelet, export/import, persistent URLs, and validation are the result of perceived needs by the project participants. This new version achieves our original goal of minutes for installation by individuals and hours for group administrator. Performance is also well within standard Web applications (less than a second response time), and harvesting is totally automated and runs reliably. Kepler’s ease of installation distinguishes it from other OAI-based repository projects, such as those covered in [5].

Figure 1. Archivelet Architecture

2.3 Metadata Manager This module is responsible for instantiating the various metadata drivers for the system. It also implements the OAI-PMH API that provides a method for each of the six OAI-PMH verbs. OAI-PMH requests received by the Webserver module are forwarded to the Driver Manager that decides what metadata drivers are involved and invokes these drivers to get partial responses from each. The Driver Manager then constructs the whole response from these partial responses. The Driver Manager also implements a User Interface API. This API contains methods that are invoked in response to user interactions with the main interface. For example, when the user clicks “publish”, the Driver Manager brings a simple GUI that allows the user to select which metadata format she wants to use and then the Driver Manager invokes the appropriate Driver to display the appropriate publishing tool.

4. ACKNOWLEDGEMENTS We thank Herbert Van de Sompel for his contributions to Kepler’s Persistent URLs. This work is supported by NSF grant 0205486.

5. REFERENCES [1] Lagoze, C. and Van de Sompel, H. The Open Archives Initiative: building a low-barrier interoperability framework. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2001) (Roanoke VA, 2001). ACM Press, New York, NY, 2000, 54-62. [2] Liu, X., Maly, K, Zubair, M. and Nelson, M. Arc – An OAI Service Provider for Federation. D-Lib Magazine, 7,4 (April 2001). [3] Maly, K, Zubair, M. and Liu, X. Kepler – An OAI Service/Data Provider for Individuals. D-Lib Magazine, 7,4 (April 2001). [4] Maly, K. et al. Kepler – a communal digital library. Technical Report, Old Dominion University, 2004. [5] Brogan, M. A Survey of Digital Aggregation Services. Technical Report, Digital Library Federation, 2003.

2.4 Metadata Driver This module implements the OAI-PMH processing and the user interface functions such as publishing tools for the specific metadata format that the Driver handles. The publishing interface has dynamic field types (mandatory or optional), which are determined by a configuration file, based on XML schema, managed by the group server administrator. The Driver invokes the Validation module whenever new metadata is published to validate the metadata against the constraints specified in the configuration file and uses the repository API to store metadata and files.

2.5 Validation This module exists in every driver and is responsible for downloading the configuration files from the Kepler group server periodically (in this implementation we download the file whenever


Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries (JCDL’04) 1-58113-832-6/04 $ 20.00 © 2004 ACM