systematic approach to the conversion process is used in the design of the tool and the .... http://www.loc.gov/marc/marc-functional-analysis/tool.html. 9.
A Tool for Converting from MARC to FRBR Trond Aalberg1 , Frank Berg Haugen2 and Ole Husby3 1
The Norwegian University of Science and Technology (NTNU), IDI 2 The National Library of Norway 3 BIBSYS, Norway
Abstract. The FRBR model is by many considered to be an important contribution to the next generation of bibliographic catalogues, but a major challenge for the library community is how to use this model on already existing MARC-based bibliographic catalogues. This problem requires a solution for the interpretation and conversion of MARC records, and a tool for this kind of conversion is developed as a part of the Norwegian BIBSYS FRBR project. The tool is based on a systematic approach to the interpretation and conversion process and is designed to be adaptable to the rules applied in different catalogues.
1
Motivation
The Entity Relationship (ER) model defined in the Functional Requirements for Bibliographic Records [7] aims to enable libraries to meet the broad range of increasing user expectations and needs, and the model is considered by many to be a major contribution to the next generation of bibliographic catalogues. The core of the model is the use of the entities work, expression, manifestation and item to model the products of intellectual and artistic endeavour at different levels of abstraction. Due to the very large number of already existing bibliographic records in the MARC-format, a major challenge for the library community is to be able to apply the FRBR model on existing bibliographic catalogues. This is commonly referred to as ”frbrization” and requires a solution for the automatic interpretation and conversion of existing information. A tool for this purpose has been developed in a joint frbrization project between The Norwegian University of Science and Technology, The Norwegian bibliographic database BIBSYS and the National Library of Norway4. The tool supports the full conversion of a MARC-based bibliographic catalogue into a format that directly reflects the FRBR model. A systematic approach to the conversion process is used in the design of the tool and the conditions that govern the conversion process are stored externally. This approach facilitate easier definition and consistent maintenance of the conversion and additionally enables reuse of the tool across catalogues that may be different in terms of cataloguing practise and the use of MARC format. 4
The project was funded by the Norwegian Archive, Library and Museum Authority.
2
Interpreting and Processing MARC Records
At the most generic level the process of interpreting a MARC record in the context of the FRBR model consists of (1) identifying the various entities described in the record, (2) selecting the MARC fields that describe each entity and (3) finding the correct relationships between the entities. Each FRBR entity may appear in a bibliographic record with different roles; e.g. persons can be found as the creator of a work or as a subject of the work, and with many other roles as well. To account for all the combinations of entities and their roles, a conversion process needs to be able to deal with a rather large number of cases. The need for selecting and assigning MARC fields to the correct entities and establishing the proper relationships between entities further complicates this problem.
3
The Conversion Tool
The process outlined above is implemented in our conversion tool by the use of XSLT – the W3C language for transforming XML. Each entity case is coded as a template following the same control structure, and XPATH expressions are used for the various conditions and selections that need to be applied. The tool reads MARC-records encoded in the MarcXchange XML-format [4] and produces a record for each entity in a format that extends the MarcXchange with FRBR type attributes and a relationship element. The basic structure of each template consists of an outer for-each loop with an entity condition that tests for the existence of data fields or other information that indicates the presence of an entity (or multiple entities). The code that creates the record is inside this loop and includes the generation of an identifying key, the code that selects and copies the appropriate data fields from the source record, and the code that creates the proper relationships to other entities. The tool is further generalized by the use of a database for storing the data that is variable for each template. This includes the entity conditions, the mapping of entities from MARC to FRBR and the conditions and targets for relationships. The various templates needed in the conversion are created by a program written for this purpose. When the templates have been created the conversion can be run using code that iterates over a collection of records and applies all entity templates on each record. This solution enables reuse of the tool even across catalogues that have different use of the MARC format. Rather than having to implement a conversion tool ad-hoc, the same tool can be used but with different rules. The process outlined so far is only concerned with the interpretation and conversion of each record into a corresponding set of interrelated FRBR entities. In bibliographic catalogues the same entity is often duplicated across a number of records, and a merging of identical entities is required to create a normalized frbrization with a consistent set of relationships. The merging process is performed after the actual conversion and is based on the use of descriptive entity keys. Each entity is identified by keys that are created from a set of identifying
field values. Equal keys will be generated for entities that are described by the same set of identifying field values, but a tolerant key matching algorithms can be applied to allow for certain variations in the data. In addition to the core process of creating FRBR entities and relationships, a complete conversion typically needs to include both a preprocessing and postprocessing step. Preprocessing is often needed to enable different kinds of preparations that are more conveniently solved in advance rather than during the conversion. An additional postprocessing step is included as well to support the possible processing of the final results into other formats or to enrich the result with additional data. Various preprocessing and postprocessing tasks can easily be added to the conversion tool by including additional XSLT templates that processes either the input format before the frbrization or the output from the conversion.
4
Status, Related Work and Future Issues
The conversion tool has successfully been used to convert the 4 million records in the BIBSYS bibliographic database. The result has been used in a prototype catalogue available on the web. This project has verified that the conversion tool is able to capture and apply all the different conditions and rules that govern the interpretation of the BIBSYS-MARC format. The tool can be used in a variety of settings. Currently there are many libraries that want to do frbrization as an experimental project to learn about the FRBR model or to test how well the records can be converted, and the tool may serve this need. A full conversion to a FRBR-based data model is probably not realistic for most libraries yet due to the limited knowledge about the actual usability of the model and the lack of standardized formats for FRBR, but the conversion tool can be used in real-world applications in other ways such as for creating indexes that can be used to support frbrized views on existing catalogues. The main problems uncovered in the BIBSYS frbrization project are the efficiency of a large scale conversion based on XSLT and the problem of dealing with inconsistent data. The results from our conversion demonstrate that a perfect set of FRBR entities and relationships can be produced if the initial records contain sufficient and consistent information. For other sets of records the conversion tool creates duplicate entities and erroneous relationships, but these problems can usually be ascribed to inconsistencies and errors in the initial records rather than the conversion tool. The conversion tool is influenced by comparable work by others such as the identification of FRBR entities in bibliographic catalogues which have been explored in [1–3, 5]. Our mapping of MARC fields to FRBR entities and attributes follows the same pattern as the mapping that is defined for the MARC 21 format in [9], but additionally includes conditions which enables a more detailed definition. A few other tools for experimenting with frbrization already exist such as the FRBR display tool made available by The Library of Congress Net-
work Development and MARC Standards Office [8], and the workset algorithm developed by OCLC [6]. The key algorithm applied in this tool is comparable to the workset algorithm of OCLC but is applied on all entities. Compared to the work of others the main contribution of our tool is the systematic approach to the conversion process and the subsequent support for creating a complete frbrization of MARC records. The tool is being further developed at NTNU, and is planned to be made freely available. The use of the tool to do large-scale conversions of other MARCformats is needed to verify the reusability of the system. Further work on the tool includes the implementation of support utilities and services than can be used to solve the more difficult problems of identifying comparable entities e.g. by the use of external authority files and specific purpose extension functions for comparing entity keys. Additional future work includes the conversion to other formats; particularly the use of RDF combined with a formal FRBR ontology e.g. in OWL will be explored.
References 1. Marie-Louise Ayres. Case studies in implementing Functional Requirements for Bibliographic Records [FRBR]: AustLit and MusicAustralia. ALJ: the Australian Library Journal, 54(1):43–54, February 2005. http://www.nla.gov.au/nla/staffpaper/2005/ayres1.html. 2. Knut Hegna and Eeva Murtomaa. Data Mining MARC to Find : FRBR? BIBSYS/HUL, 2002. http://folk.uio.no/knuthe/dok/frbr/datamining.pdf. 3. Thomas B. Hickey, Edward T. O’Neill, and Jenny Toves. Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR). D-Lib Magazine, 8(9), September 2002. http://www.dlib.org/dlib/september02/hickey/09hickey.html. 4. International Organization for Standardization TC46/SC4. Information and Documentation : MarcXchange. Draft standard ISO/CD 25577, ISO, 2005. http://www.bs.dk/marcxchange/. 5. Christian M¨ onch and Trond Aalberg. Automatic conversion from MARC to FRBR. In Research and Advanced Technology for Digital Libraries, ECDL 2003, number 2769 in Lecture Notes in Computer Science, pages 405–411. Springer-Verlag, 2003. 6. OCLC. FRBR work-set algorithm. http://www.oclc.org/research/projects/frbr/algorithm.htm. 7. IFLA Study Group on the functional requirements for bibliographic records. Functional requirements for bibliographic records : final report, volume 19 of UBCIM Publications : New Series. K. G. Saur, Munich, 1998. http://www.ifla.org/VII/s13/frbr/frbr.pdf. 8. The Library of Congress’ Network Development and MARC Standards Office. FRBR Display Tool. http://www.loc.gov/marc/marc-functional-analysis/tool.html. 9. The Library of Congress’ Network Development and MARC Standards Office. Functional analysis of the MARC 21 bibliographic and holdings formats. http://www.loc.gov/marc/marc-functional-analysis.