Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
RENARDUS: PROJECT DELIVERABLE Project Number:
IST-1999-10562
Project Title:
Reynard - Academic Subject Gateway Service Europe
Deliverable Type:
Internal
Deliverable Number:
D6.4
Contractual Date of Delivery:
30 September 2000
Actual Date of Delivery:
17 November 2000
Title of Deliverable:
Data model (first final version 1.0)
Workpackage contributing to the Deliverable:
WP6
Nature of the Deliverable:
Report
URL:
http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/index.html (restricted access) http://renardus.sub.uni-goettingen.de/ (public access)
Authors:
Hans Jürgen Becker, Frank Klaproth, Heike Neuroth Contributions: Michael Day (UKOLN, text); Anders Ardo and Traugott Koch (DTV/NetLab, discussions).
Contact Details:
Platz der Göttinger Sieben 1 37073 Göttingen Germany email:
[email protected]
Abstract
This report provides an introduction to the development of a Renardus Application Profile. It is a reference to the partners’ answers of the D6.4 questionnaire developed by SUB. The answers lead into the development of several data models: a data model of the Renardus prototype pilot system, a first version of the data model for the operational pilot system, and a data model for the administrative database. This database contains, besides the mapping tables for cross-browsing, tables for the conversion of some codes to the defined Renardus codes, and the collection description of each subject gateway. Finally, this report contains some upgrade recommendations for partners‘ metadata information.
Keywords
data model, data flow, subject gateway, metadata, profile, application profile, namespace, Renardus, Reynard
Reynard IST-1999-10562
1
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Distribution List:
All partners
Issue:
1.0
Reference:
IST-1999-10562 / D6.4 / 1.0
Total Number of Pages:
62
Reynard IST-1999-10562
Date of issue: 17 Novemberr 2000
2
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
TABLE OF CONTENTS PART I 1 1.1
TITLE PAGE
RESULTS Agreement on eight elements
14 14
1.2 Results of the second questionnaire developed for D6.4 15 1.2.1 Eight Elements for Cross-Searching 16 1.2.1.1 General (0) 16 1.2.1.2 Title (1) 17 1.2.1.2.1 Title/Title.Alternative (1.1 – 1.6) 17 1.2.1.3 Creator (2) 18 1.2.1.3.1 Creator: general (2.1) 18 1.2.1.3.2 Creator: rules (2.2 – 2.9) 18 1.2.1.3.3 Creator: additional information (2.10 – 2.16) 18 1.2.1.4 Description (3) 19 1.2.1.4.1 Description: general (3.1) 19 1.2.1.4.2 Description: description + keywords (3.2 – 3.5) 19 1.2.1.4.3 Description: multilinguality (3.6) 19 1.2.1.5 Subject (4) 19 1.2.1.5.1 Subject: keywords – general (4.1 – 4.2) 19 1.2.1.5.2 Subject: form of keywords (4.3 – 4.7) 20 1.2.1.5.3 Subject: keywords – multilinguality (4.8) 20 1.2.1.5.4 Subject: keywords – rules (4.9) 20 1.2.1.5.5 Subject: classification – general (4.10 – 4.15) 20 1.2.1.5.6 Subject: classification system - cross-search with regard to a special subject classification (4.16 – 4.20) 21 1.2.1.5.7 Subject: classification systems – multilinguality (4.21) 21 1.2.1.6 Identifier (5) 21 1.2.1.6.1 Identifier: general - regarding resources in several languages (5.1 – 5.2) 21 1.2.1.6.2 Identifier: general - regarding mirrored/copied resources (5.3 – 5.5) 21 1.2.1.6.3 Identifier: Qualifier (5.6 – 5.9) 22 1.2.1.7 Language (6) 22 1.2.1.7.1 Language: general (6.1) 22 1.2.1.7.2 Language: code (6.2 – 6.4) 22 1.2.1.8 Country (7) 22 1.2.1.8.1 Country: general (7.1 – 7.3) 22 1.2.1.8.2 Country: code (7.4 – 7.5) 22 1.2.1.9 Type (8) 23 1.2.1.9.1 Type: general (8.1 – 8.5) 23 1.2.2 Future Elements 23 1.2.2.1 Rights (9.1 – 9.7) 23 1.2.2.2 Publisher (10) 23 1.2.3 Additional Elements 24 1.2.4 Administrative Elements 24 1.2.4.1 Subject Gateway ID (IV A) 24 1.2.4.2 Unique Record Number (IV B) 24 1.2.4.3 Record Creator (IV C) 24 1.2.4.4 SBIG ID (IV D) 24 1.2.4.5 Record Last Checked Date (IV E) 24 1.2.4.6 Other (IV F) 24 1.3 Subject Gateways in the UK 1.3.1 RDN 1.3.2 Individual RDN hubs 1.3.2.1 BIOME
Reynard IST-1999-10562
24 25 26 26
3
Deliverable: D6.4
1.3.2.2 1.3.2.3 1.3.2.4 1.3.2.5
2
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
EEVL Humbul PSIgate SOSIG
DATA MODEL AND DATA FLOW
27 27 27 27
27
2.1 Data model for the prototype Renardus pilot system 2.1.1 Dublin Core Elements 2.1.1.1 DC.Title and DC.Title.Alternative 2.1.1.2 DC.Creator 2.1.1.3 DC.Description 2.1.1.4 DC.Subject: classification system(s) and keywords 2.1.1.5 DC.Identifier 2.1.1.6 DC.Language 2.1.1.7 DC.Type 2.1.2 Non Dublin Core element 2.1.2.1 Country 2.1.3 Administrative Renardus elements 2.1.3.1 Full Record URL 2.1.3.2 SBIG ID
28 29 29 30 31 32 33 34 34 36 36 36 36 37
2.2 Preliminary version of data model for the operational Renardus pilot system 2.2.1 Dublin Core Elements 2.2.1.1 DC.Title and DC.Title.Alternative 2.2.1.2 DC.Creator and DC.Creator.AddinionalInformation 2.2.1.3 DC.Description 2.2.1.4 DC.Subject: classification system(s) and keywords 2.2.1.5 DC.Identifier 2.2.1.6 DC.Language 2.2.1.7 DC.Type 2.2.2 Non Dublin Core element 2.2.2.1 Country 2.2.3 Administrative Renardus elements 2.2.3.1 Full Record URL 2.2.3.2 SBIG ID
38 39 39 40 41 42 43 45 46 48 48 48 48 49
2.3
Data model of the administrative database: Collection Level Description (CLD)
49
2.4
Data flow
52
3 Appendix A: Questionnaire Renardus questionnaire D6.4: Data model and data flow (http://www.sub.unigoettingen.de/ssgfi/reynard/wp6/d6.4/questionnaires/all.html) 54 4 Appendix B: Responses Questionnaire: Responses from the partners (http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/index.html)
54
5
54
Appendix C: Comments of Partners
6 Appendix D: Summary Summary of responses (matrix): http://www.sub.unigoettingen.de/ssgfi/reynard/wp6/d6.4/summary_d6_4.pdf 60
Reynard IST-1999-10562
4
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
7 Appendix E: Data Model and Data Flow Data model and data flow, draft version 0.3 (4. September 2000) http://www.sub.unigoettingen.de/ssgfi/reynard/wp6/d6.4/data_model.pdf 60 8
BIBLIOGRAPHY
60
9
REFERENCES
61
Reynard IST-1999-10562
5
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
PART II - MANAGEMENT OVERVIEW DOCUMENT CONTROL Issue
Date of Issue
Comments
0.1
10 May 2000
First draft presented to partners on Bath meeting (excel sheet)
0.2
12 May 2000
Second draft, presented on first SCHEMAS workshop in Bath
0.3
8 September 2000
Third draft, for review by project partners on Paris meeting
0.4
6/7 November 2000
Fourth draft, for review by project partner on Göttingen meeting
1.0
17 November 2000
First final version
EXECUTIVE SUMMARY The object of the Renardus project is to establish an academic subject gateway service in Europe. The pilot system will be based on a generic broker-architecture and data-model that will allow the integrated searching and browsing of distributed resource collections. This report will provide background information about the development of the Renardus data model and data flow. It is a reference to the partners’ answers of the D6.4 questionnaire developed by SUB. Michael Day (UKOLN) presents basckground information about RDN and the individual hubs. The answers lead into a data model of the Renardus prototype pilot system and a first version of the data model for the operational pilot system. The questionnaire was provided to the following ten partners: DutchESS (The Netherlands), NOVAGate (Nordic countries), EELS (Sweden), DEF fagportal (Denmark), DAINet (Germany), FVL (Finland), Les Signets (France), RDN (United Kingdom), DDB (Germany) and SSG-FI (Germany). The answers of the partners are summarized in the following list, only those responses with the highest priority (required, strongly recommended and recommended) are considered: Title/Title.Alternative: The main Title should not be repeatable, Title.Alternative element should be repeatable, Title and Title.Alternative should be both cross-searchable. Title should be provided in the language of the resource and additional titles (translated title, acronym, etc.) should be provided in repeatable Title.Alternative elements. Creator: Creator should be a repeatable element. Description: Description element should be repeatable in case the description is provided in more than one language. Each Subject Gateway should provide either an English version of Description or an English version of Keywords for every resource (beside other languages). Subject: Keywords should be browsable and repeatable. All forms of the repeatable element Keyword (free, controlled, thesaurus based) should be provided and the form of Keywords should be indicated for the user. The Subject Gateways should be browsable via a common Classification System, Renardus should use an existing Common Classification system and this system should be DDC (all partners map their system to DDC). The Classification System should be provided in several European languages. Verbal description of each notation (caption) should be indexed together with keywords, so users can search both; besides the common Classification System, Renardus should provide subject classification systems like MSC, Ei: cross-searchable via notation and captions as well. Identifier:
Reynard IST-1999-10562
6
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Identifier should be repeatable and searchable if the resource is provided in more than one language with different URLs. Renardus should integrate URLs, ISBNs, ISSNs, PURLS in Identifier elements with different qualifiers. Language: Language element should be repeatable and the language code should be the ISO Code 639, three letters. Country: Country should reflect the publisher country and the country code should be ISO Code 3166, two letters. Types: Renardus should develop a common list of Types (controlled list) and the common list of Types should be based on the Dublin Core type list. Future Elements: Renardus should support the Rights element in the future (in the sense of IPRs, Rights should contain information about access conditions/restrictions of the resource and should contain copyright/IPR information of the resource as well). Rights should be a repeatable element for different kinds of information (access conditions/restrictions, subscription information, copyright, IPR, etc.). Renardus should use the element Rights with different qualifiers for different kinds of information Renardus should support in the future a Publisher element -
On the basis of partners’ answers several data models have been developed. The Renardus broker system will consist of two databases: 1) Renardus decentral content database, which contains records extracted from each individual Service Provider (can consist of several Subject Gateways). The data model for this database consists of seven well defined metadata elements, which are based on Dublin Core, one non-DC metadata element (Country), and two administrative elements (Full Record URL and SBIG ID). There are two versions of the data model: One version is for the prototype pilot system and the second is for the operational pilot system. The following figures provide the Renardus metadata elements for these two systems (M=mandatory, R=strongly recommended, O=optional, NR=not repeatable, R=repeatable, LQ=Language Qualifier): Prototype Pilot System: Metadata Element DC.Title DC.Title.Alternative DC.Creator
Obligation M O R
DC.Description
M
DC.Subject
M
DC.Subject:DDC
M
DC.Identifier
M
DC.Language
R
DC.Type
R
DC.Type.DCT1 Country Full Record URL
R R R
Reynard IST-1999-10562
Repeatable LQ Comments NR possible R possible R no Last name and first name should be clearly distinguishable. R possible For cross-search reasons the field description must contain free text. R possible In the prototype system there will be no further distinction between the several kinds of subject (keywords, classification system). R no DDC 21: adapted DDC version for crossbrowsing puporse. Only captions and not notations will be displayed R no In the prototype system no distinction will be made between resource URL, mirrored, copied resource URL(s) and URL(s) for archive reasons. R no The language code is the ISO 639-2, three letter code. R no Subject Gateways should provide their original types without encoding scheme. R no NR no 3166-1 (two letter code) NR no A URL that leads to a detailed display of each record at the originating service site.
7
Deliverable: D6.4
Data model (first final versiont)
SBIG ID
Issue: 1.0
Date of issue: 17 Novemberr 2000
M
NR
no
A stable unique acronym also well defined in the Collection Level Description.
Metadata Element DC.Title
Obligation M
Repeatable NR
LQ yes
DC.Title.Alternative DC.Creator
O R
R R
yes no
DC.Creator. Additional. Information DC.Description
O
R
no
Comments Title should be the original title. It is strongly recommended to provide only one version of title in this field. Last name and first name should be clearly distinguishable. Additional information like Email, URL, Organisational Information.
M
R
yes
DC.Subject
M
R
yes
M
R
no
DC.Identifier
M
R
no
DC.Identifier. Mirror DC.Identifier. Archive DC.Language
O O
R NR
no no
R
R
no
DC.Type
R
R
no
DC.Type.DCT1 DC.Type.DCT2
R O
R R
no no
Country Full Record URL
R R
NR NR
no no
SBIG ID
M
NR
no
Operational Pilot System:
DC.Subject:DDC
For cross-search reasons the field description must contain free text. Strongly recommended: Each SG should provide either an English version of description or an English version of keywords for every resource (beside other languages). In the operational system there will be made a distinction between the several kinds of subject (keywords, classification system). For the final system the provision of keywords is required. DDC 21: adapted DDC version for crossbrowsing puporse. Only captions and not notations will be displayed In the operational system a distinction will be made between resource URL, mirrored, copied resource URL(s) and URL(s) for archive reasons. The language code is the ISO 639-2, three letter code. Subject Gateways should provide their original types without encoding scheme. The possibility and usability of a mapping to DCT2 will be investigated in WP 7. 3166-1 (two letter code) A URL that leads to a detailed display of each record at the originating service site. A stable unique acronym also well defined in the Collection Level Description.
2) Renardus administrative database, which contains the collection description of each subject gateway, the mapping tables for cross-browsing the metadata via the common classification system DDC, some codes (probably language, country, and type) for conversion to the defined Renardus codes. The metadata elements for this kind of database are based on the RSLP collection description schema. The aims of the collection description are to support the selection of subject gateway(s) for searching, to provide background information about the participating subject gateway for human and machine users, and to promote/register the individual subject gateway(s) as high quality resources in the Internet. The following list provides the elements of the Renardus Collection Level Description schema:
Reynard IST-1999-10562
8
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Title: Identifier: Description: Language: Publisher: Format.Extent: Date.Issued: Subject: Subject Notation: Relation: Country: Acronym: Resource Language: DDC mapping URL: Z39.50 Location: Logo URL:
The name of the collection. An unambiguous reference to the collection within a given context. An account of the content of the collection. The main language(s) of the metadata in the collection with quantitative indication. An entity responsible for making the collection available. The size of the collection. Date of formal iisuance (e.g. publication) of the collection. The topic of the content of the collection. The topic of the content of the collection. A reference to a related resource. The country in which the collection is physically located. The acronym of the collection. Language(s) of the described resources. URL of local DDC mapping information in Renardus format. The online location of the Z39.50 server of the subject gateway. The URL of the logo (image) of the subject gateway.
Some recommendations for upgrade processes for partners’ metadata information are provided: In case the element Keyword is not yet an element in partners’ datamodel for the normalization process it is recommemded to provide this element first. For the future it is required that the title will be provided in the original version, other forms of title could be given in the title.alternative field. It is still undecided if in the future it will be required to provide an English version of the title, either in the Title field or in the Title.Alternative field. Considering that all partners should support an element it is further recommended that all partners support the country element. It seems to be easier to extract the country code from the domain of a URL than to support a language code. In conclusion, if partners have to upgrade their metadata information it is strongly recommended to include first keywords, than country followed by type and language. All three data models will be updated in the future; so during the next months the several data models will lead into a final version of the Renardus Application Profile, which will be described in the public report D6.5, to be delivered in June 2001.
SCOPE STATEMENT This report is the second internal deliverable (beside two public deliverables: D6.1 and D6.2) to be issued by WP6 (Data model and data flow) of the Renardus project. The objective of WP6 is to develop the data model that will underpin the Renardus system. The aim of the questionnaire gateway survey was to analyse the gateway structures and formats of the Renardus partners. These should lead to the setup of a generic service profile that is needed to record all types of information about a gateway service. The inventory of the participating services is necessary for the specifications of functional requirements of the data model (D6.3) and for building the data model (D6.4/D6.5). This report provides also important features for WP 1 (functional model) and WP 2 (design and implementation).
Reynard IST-1999-10562
9
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
PART III - DELIVERABLE CONTENT INTRODUCTION This report will provide background information about the development of the Renardus data model and data flow. It is a reference to the partners’ answers of the second questionnaire. This answers lead into a data model of the Renardus prototype pilot system and a first version of the data model for the operational pilot system. The Appendix contains the data provided by the partners, the dynamically generated metadata mapping and overviews of keywords and classification systems (dynamically generated access databases). The data model and data flow will be extended by the discussions in the Dublin Core Community (e.g. 8th Dublin Core Workshop) e.g. related to agent. Throughout the runtime of the project corrections and additions will be worked in, so that the data model and data flow will always be up-to-date. The report is divided into two main chapters: The first chaper provides an overview about the results of the second questionnaire. The second chapter introduces the data model for the Renardus prototype pilot system as well as for the operational pilot system and for the administrative database (collection description) and presents a first overview about the data flow.
GLOSSARY AHRB Arts and Humanities Research Board.
ALUH Viikki Science Library, University of Helsinki, Finland.
BIOME The RDN hub for the medicine, health and the life sciences.
BNF Bibliothèque Nationale de France (National Library of France).
CLD Collection Level Description.
DAINet Deutsches Agrarinformationsnetz, Germany.
DC Dublin Core.
DCMES Dublin Core Metadata Element Set.
DCMI Dublin Core Metadata Initiative.
Reynard IST-1999-10562
10
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
DDB Die Deutsche Bibliothek (National Library of Germany).
DDC Dewey Decimal Classification system.
DEF Danmarks Elektroniske Forskningsbibliotek. Denmark's Electronic Research Library - a virtual library for researchers, students, lecturers and other users of Danish research institutions, Denmark.
DNER Distributed National Electronic Resource - the JISC's concept of a managed environment for accessing heterogeneous, quality-assured information resources on the Internet.
DTV Technical Knowledge Centre and Library of Denmark.
Dublin Core An initiative - sometimes known as the Dublin Core Metadata Initiative (DCMI) - to develop a core metadata element set to facilitate the discovery of digital (networked) resources. Developments in the element set are defined on the basis of international consensus.
DutchESS Dutch Electronic Subject Service, The Netherlands.
EELS Engineering Electronic Library, Sweden.
EEVL Edinburgh Engineering Virtual Library - one of the eLib-funded Internet information gateways.
eLib The Electronic Libraries Programme - a series of UK higher education-based networking projects, funded by the JISC.
ESRC Economic and Social Research Council.
EULER European Libraries and Electronic Resources in Mathematical Sciences - a project funded by the European Union.
EEVL Edinburgh Engineering Virtual Library- one of the eLib-funded Internet information gateways.
FVL The Finnish Virtual Library - Virtuaalikirjasto, Finland.
HUB
Reynard IST-1999-10562
11
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Hubs provide data for RDN. Hubs may be individual organisations or (more frequently) consortia of prominent library, academic, research and professional organisations.
HUMBUL The RDN hub for the arts and humanities.
ISO International Organisation for Standardization.
JISC Joint Information Systems Committee - a strategic advisory committee working on behalf of the funding bodies for higher and further education in England, Scotland, Wales and Northern Ireland. Its mission is to promote the innovative application and use of information systems and information technology in higher and further education across the UK.
JyU Finnish Virtual Library Project, Jyväskylä University Library, Finland.
KB Koninklijke Bibliotheek, National Library of the Netherlands.
LCSH Library of Congress Subject Headings.
MSC Mathematics Subject Classification.
NetLab NetLab, Lund University, Sweden.
NOVAGate Nordic Gateway to Information in Forestry, Veterinary and Agricultural Sciences, Finland.
OMNI Organising Medical Networked Information - one of the eLib-funded Internet information gateways. Now part of the BIOME RDN Hub.
PSIgate RDN hub for physical sciences. The service is still under development.
RDN The Resource Discovery Network - the RDN is a co-operative network dedicated to providing access to highquality Internet resources for the learning, teaching and research community in the UK. The RDN is coordinated by a team based at UKOLN and King's College London.
ROADS Resource Organisation and Discovery in Subject-oriented services - originally an UK project funded by JISC under eLib, ROADS is an open-source software toolkit for Internet subject gateways.
Reynard IST-1999-10562
12
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
RSLP Research Support Libraries Programme.
SG Subject Gateway in the sense of quality controlled subject gateway, also called sometimes SBIGs (Subject Based Information Gateways).
SOSIG Social Science Information Gateway - one of the eLib-funded Internet information gateways, now a RDN Hub.
SSG-FI SonderSammelGebiets-FachInformationsführer (Special Subject Gateways), SUB Göttingen, Germany.
SUB Niedersächsische Staats- und Universitätsbibliothek Göttingen (Lower Saxony State and University Library Göttingen), Germany.
UKOLN UK Office for Library and Information Networking, University of Bath, UK.
URN Uniform Resource Name.
ZADI Zentralstelle für Agrardokumentation und -information, Germany.
Z39.50 An ANSI/NISO protocol for search and retrieval. Version 3 of the protocol has also been accepted as an ISO standard - ISO 23950.
Z39.85 Draft Standard Z39.85-200X: The Dublin Core Metadata Element Set.
Reynard IST-1999-10562
13
Deliverable: D6.4
1
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
RESULTS
This chapter is divided into three parts: The first part gives a short overview about the agreements made on the technical meeting in Bath (also fixed in the minutes), the second part summarizes the answers from the partners to the second questionnaire asking about further details of the data model and data flow like rules, codes, standards, and the third part provides a short outlook to RDN and the individual hubs. The numbers in brackets behind the subheadings refer to the corresponding questions in the questionnaire. The comments of partners to each section of questions can be found in appendix C.
1.1
Agreement on eight elements
After finishing the “Evaluation report of partner subject gateways” (see public version D6.1) partners agreed on 8 elements (at a technical meeting in Bath on 10. May) - without further discussion about rules, codes, standards, and qualifiers. They also agreed that partner subject gateways will have to support most of these elements (e.g. if one Subject Gateway supports only 7 of the elements this would be no reason to exclude it), but this needed more detailed discussion. They agreed further that the data model is based on Dublin Core. These eight elements are: -
DC.Title - probably title.alternative is repeatable
-
DC.Creator - repeatable
-
DC.Description - repeatable in case descriptions in several languages are provided
-
DC.Identifier: URI - possibly repeatable for mirror sites, but this needs further discussion
-
DC.Subject – repeatable and with the need of common classification system (either “home-grown” or mapped to a general system)
-
DC.Language - repeatable (need a common code like ISO 639)
-
DC.Type – repeatable: partners will either map their types to Dublin Core types, use DC types with Renardus specific extensions or develop a “home-grown” list of types with the most common ones
-
Country Code - a clear definition is needed, e.g. the publisher country or the country in which the server is located. Also to need a common code like ISO 3166)
Several reccommendations are formulated for two further elements, after developing the prototype pilot system: -
DC.Publisher: possibly include in the future? Will probably not be included in the pilot system
-
DC.Rights: possibly include this element in the future, e.g. to give information about copyright, access/restriction conditions (could also be necessary if print materials etc. will be included)
-
Rights in the sense of IPRs: probably included so that the SGs keep their copyrights of the metadata records after they are gathered from the broker service In order to specify common rules, codes, standards, and qualifiers, which can be supported by all Renardus partners SUB developed a more detailed questionnaire
Reynard IST-1999-10562
14
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
These results were presented by SUB at two conferences: At the first SCHEMAS workshop on 12. Mai and at the CULTURAL HERITAGE – CONCERTATION EVENT on 30. June. In order to specify common rules, codes, standards, and qualifiers for these elements SUB developed a more detailed questionnaire. In this questionnaire partners were asked for an evaluation of several proposals to qualify the metadata elements. 1.2
Results of the second questionnaire developed for D6.4
The main purpose of this questionnaire is to gather information about the qualifiers, rules, standards, and codes of the elements which are supported by the Renardus prototype and the operational pilot system. As the Bath meeting led only to a basic agreement on eight elements this questionnaire was intended to provide deeper insight on how to use them. The results lead into the development of the data model. The questionnaire was sent out on 3. July and partners were asked to send it back to SUB before 14 July. Because of holidays the last responses arrived at SUB on 24. August. Two partners (DTV and NetLab) filled in the questionnaire together. Because of the discussion and ongoing process at RDN about a centralized structure (RDNC) it was not possible to get common (and official) information from UKOLN, RDN or the single hubs. SUB and UKOLN try to get detailed information on the basis of the two questionnaires (D6.1 and D6.4) from all RDN hubs. The results will be presented in an updated version of D6.4. Some partners did not fill in the questionnaire completely so in case no evaluation was given (e.g. only ‘no’) they have not been incorporated into the analysis (see also Appendix C) and not are considered here in the report.
Following Renardus partners filled in the questionnaire:
Name
Acronym
URL
National Library of the Netherlands
KB
http://www.kb.nl/
National Library of France
BNF
http://www.bnf.fr/
National Library of Germany
DDB
http://www.ddb.de/
Finnish Virtual Library Project
JyU
http://www.jyu.fi/library/english/index.htm
NetLab, Lund University, Sweden
NetLab
http://www.lub.lu.se/netlab/
together with Technical Knowledge Centre and Library of Denmark
DTV
http://www.dtv.dk/
Niedersächsische Staats- und Universitätsbibliothek, Göttingen, Germany
SUB
http://www.sub.uni-goettingen.de/
Viikki Science Library, University of Helsinki, Finland
ALUH
http://helix.helsinki.fi/infokeskus/lib/
Zentralstelle für Agrardokumentation und information, Germany
ZADI
http://www.dainet.de/zadi/
Reynard IST-1999-10562
15
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
The answers of UKOLN and SOSIG will not be considered here. As mentioned above SUB and UKOLN will prepare a common view of these issues together and present the results in an updated version of D6.4. A short overview is given in chapter 1.3. For the questionnaire and the answers provided by each partner, see Appendices A and B. Partners had the possibility to answer the questions by giving an evaluation in the following way: required (1) strongly recommended (2) recommended (3) desirable (4) not necessary (5) definitely not (6).
It was also asked in most questions if partner subject gateways will support the mentioned rule, code etc. now or in future. This information will help to find common Renardus metadata element refinements and encoding schemes. The numbers in brackets behind the subheadings refer to the corresponding questions in the questionnaire. For each question the number of SGs which support the meaning in the question now or in future is located behind each result in brackets. 1.2.1
Eight Elements for Cross-Searching
In this chapter the results of the questionnaire lead into detailed information about rules, codes etc. about the eight elements (title, creator, description, subject, identifier, language, country, type). 1.2.1.1
General (0)
Partner subject gateways have to support most of the agreed eight elements. To gather information which elements are required for (future) subject gateways and must be supported, partners were asked to mark these elements. The results are summarised in figure 1:
Reynard IST-1999-10562
16
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Elements that have to be supported by each SBIG
Identifier Description
Renardus elements
Title Classification Keywords Creator Country Type Language 0
1
2
3
4
5
6
7
8
Number of support by partners [max. 8]
Figure 1: Evaluation about requirements of Renardus metadata elements.
The following metadata elements must be supported by (future) partners: title, description, subject: keywords, subject: classification system, and identifier. If a subject gateway provides no keywords, it could be allowed to generate keywords automatically from the description field. Generating keywords in this way the quality standards of Renardus has to be considered, e.g. stop words, controll of automated program. The following metadata elements are strongly recommended: creator, language, country and type. Partners have to consider that most of these elements must be supported. But if for example one element of the eight Renardus elements can’t be provided by a subject gateway this will be no argument to exclude the subject gateway from the broker system. Each case has to be negotiated with the Renardus team. 1.2.1.2 1.2.1.2.1
Title (1) Title/Title.Alternative (1.1 – 1.6)
Partners handle the title field in different ways (see public report D6.1), some partners provide the original title and translated title in the main title field (e.g. DutchESS), some partners use the title alternative field to provide translated titles or acronyms (e.g. RDN). Another open issue is the language of title with regard to cross-search this field. Required: -
Title and Title.Alternative should be cross-searchable (supported by all partners)
Strongly recommended:
Reynard IST-1999-10562
17
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
-
The main title should not be repeatable (supported by seven partners)
-
Title should be provided in the language of the resource and additional titles (translated title, acronym, etc.) should be provided in repeatable Title.Alternative fields (supported by six partners)
Strongly recommended/recommended: -
The Title.Alternative field should be repeatable (supported by five partners)
Not necessary: -
The main title should be provided in English (for cross-searching) and additional titles (translated title, acronym, etc.) should be provided in repeatable Title.Alternative fields (supported by one partner)
-
If there is no English title provided on the server side should Renardus provide an English version of the title (done by an automatic translation program)?
1.2.1.3
Creator (2)
Currently the Creator, Contributor and Publisher (collectively called Agent elements) are being discussed within the DC community. At the moment the proposed agent qualifiers are: Type, Name, Affiliation, Role, and Identifier (see DC Working Draft - 10 December 1999; http://www.mailbase.ac.uk/lists/dc-agents/files/wdagent-qual.html). SUB will keep an eye on the Agents discussion. Changes will be worked in in further deliverables. 1.2.1.3.1
Creator: general (2.1)
It is strongly recommended that creator should be a repeatable field (supported by all partners). 1.2.1.3.2
Creator: rules (2.2 – 2.9)
Results of the questionnaire with regard to creator rules are: Recommended/desirable: -
Syntax of creator should be last name, first name in one field, separated by a special character (supported by four partners)
-
Renardus should reuse existing authority files (PND – Germany, LoC authority file, other)
Not necessary: -
Cataloging rules like AACR2 (supported by two partners)
-
Syntax of creator should be last name, first name in separate fields (supported by three partners)
-
Renardus should provide authority files respective develop a home grown authority file
1.2.1.3.3
Creator: additional information (2.10 – 2.16)
Results of the questionnaire with regard to additional information of the creator field are: Desirable: -
Additional information should be provided in extra Renardus database fields (supported by one partner)
Reynard IST-1999-10562
18
Deliverable: D6.4
-
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Email information of creator should be provided, URL of creator (e.g. homepage) should be provided, Organizational information of creator should be provided (each part is supported by three partners)
Not necessary: -
Additional information should be provided in one Renardus database field, separated by special characters (supported by one partners)
-
Address information of creator should be provided in form of vCard (none partner support this)
1.2.1.4
Description (3)
1.2.1.4.1
Description: general (3.1)
It is strongly recommended that the description field is repeatable in case the description is provided in more than one language. Some subject gateways provide the description beside in English also in their native language (e.g. NOVAGate, ZADI, FVL) (supported by four partners). 1.2.1.4.2
Description: description + keywords (3.2 – 3.5)
This part of the questionnaire was important for cross-search issues. Partners were asked how strong they evaluate that subject gateways must provide description and/or keywords in English language. Recommended: -
Each SG should provide either an English version of description or an English version of keywords for every resource (beside other languages) (supported by seven partners)
Desirable: -
Each SG should provide an English version of keywords for every resource (beside other languages) (supported by five partners)
-
Each SG should provide an English version of description for every resource (beside other languages) (supported by five partners)
Not necessary -
Each SG should provide an English version of description and an English version of keywords for every resource (beside other languages) (supported by four partners)
1.2.1.4.3
Description: multilinguality (3.6)
In case no English description is provided by a SG it is not necessary to have an automatic translation of the main words of the description into English by the Renardus system, but for three of eight partners this will be desirable in the future. 1.2.1.5
Subject (4)
This chapter summarizes results of questions related to keywords as well as classification systems. 1.2.1.5.1
Subject: keywords – general (4.1 – 4.2)
It is recommended that keywords are browsable (condition: each SG must have its own keyword index). Only for one partner this issue is not necessary, all other seven partners evaluate this question between strongly recommended and desirable (supported by seven partners)
Reynard IST-1999-10562
19
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
It is more or less strongly recommended that this field should be repeatable in case keywords (controlled lists, thesaurus based, free keywords) are provided in several languages (supported by five partners). 1.2.1.5.2
Subject: form of keywords (4.3 – 4.7)
Strongly recommended/recommended: -
All forms of keywords (free, controlled, thesaurus based) should be provided
-
The form of keywords should be indicated for the user, e.g. if he/she only wants to search for thesaurus based keywords in his/her scientific area (supported by six partners)
-
Repeatable field for each form of keywords in one language (several thesauri, controlled lists, free keywords) (supported by four partners)
Not necessary/definitely not: -
Only controlled (home grown list and/or thesaurus based) keywords should be provided (no free keywords)
-
Only thesaurus based keywords should be provided (no free keywords, no controlled lists)
1.2.1.5.3
Subject: keywords – multilinguality (4.8)
An automatic translation of keywords into English in case no English keywords are provided by a SG is evaluated by four partners with desirable, one partner answers with not necessary and two partners with definitely not. In general this issue will not be necessary in Renardus. 1.2.1.5.4
Subject: keywords – rules (4.9)
Partners were asked if they use rules for keywords, e.g. geographica, proper names. Most of all partners use thesauri rules (DTV/NetLab, SUB: thesauri rules, BnF, FVL, DDB). ZADI uses also special thesauri for subjects, objects, and geographical regions. 1.2.1.5.5
Subject: classification – general (4.10 – 4.15)
Required/strongly recommended: -
The SGs should be browsable via a common classification system
-
Renardus should use an existing common classification system
Strongly recommended/recommended: -
The common classification system should be DDC (all partners map their system to DDC) (supported by six partners)
Recommended/desirable: -
Renardus should construct a common classification system
Not necessary: -
The common classification system should be a home grown one (a construction of all the partners' classification systems)
-
The common classification system should be a general classification system, other than DDC (all partners map their system to this general system)
Reynard IST-1999-10562
20
Deliverable: D6.4
1.2.1.5.6
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Subject: classification system - cross-search with regard to a special subject classification (4.16 – 4.20)
Recommended: -
Verbal description of each notation should be indexed together with keywords, so users can search both:
Recommended/desirable: -
Besides the common classification system, Renardus should provide subject classification systems like MSC, Ei: cross-search via notation
-
Besides the common classification system, Renardus should provide subject classification systems like MSC, Ei: cross-search via verbal description of the notation
Desirable: -
Besides the common system, Renardus should provide all other SG specific classification systems (local, national): cross-search via verbal description of the notation
-
Besides the common classification system, Renardus should provide all other SG specific classification systems (local, national): cross-search via notation
1.2.1.5.7
Subject: classification systems – multilinguality (4.21)
It is strongly recommended by partners that the common classification system should be provided in several European languages. 1.2.1.6
Identifier (5)
At the several Renardus meetings there were strong discussions about the handling of the field identifier e.g. in case several URLs are provided for one resource. Some partners provide more than one URL if the resource has e.g. several titles in different languages. On the other hand some partners stated that each record should have only one unique URL according to the one to one principle. To get now a common view on this topic several questions had to be answered by partners related to this topic. Furthermore there are open questions regarding mirror or copied sites, how to handle them. 1.2.1.6.1
Identifier: general - regarding resources in several languages (5.1 – 5.2)
Recommended: -
Repeatable if the resource is provided in more than one language with different URLs (supported by five partners)
-
If repeatable this field should also be searchable by the Renardus system (supported by six partners)
1.2.1.6.2
Identifier: general - regarding mirrored/copied resources (5.3 – 5.5)
Strongly recommended/recommended: -
If this field is repeatable it should alsoalso be searchable by the Renardus system (supported by five partners)
Desirable: -
Repeatable in the field identifier with a special Renardus scheme
Reynard IST-1999-10562
21
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Not necessary: -
Repeatable in DC.Relation (e.g. with a special Renardus scheme) (supported by two partners)
1.2.1.6.3
Identifier: Qualifier (5.6 – 5.9)
Recommended: -
Renardus should integrate URLs, ISBNs, ISSNs in Identifier fields with different qualifiers (supported by six partners)
-
Renardus should integrate URIs, PURLs, and URNs (supported by five respective six partners)
1.2.1.7 1.2.1.7.1
Language (6) Language: general (6.1)
It is strongly recommended that the language field is repeatable in separate fields in case several languages are provided (supported by six partners). 1.2.1.7.2
Language: code (6.2 – 6.4)
It is strongly recommended that Renardus should support the ISO Code 639, three letters (supported by six partners) and not the ISO Code 639 (supported by four partners), two letters (not necessary). There is no need to use other codes. 1.2.1.8
Country (7)
Although this element is no Dublin Core element partners decided to support this field. One of the open questions was the definition of this field. The country code could reflect the country of the publisher or the country in which the server is located. In the last sense, it would be possible for Renardus users to select or sort hits after the European countries. Another possiblity would be to reduce the hits returned on a search by filtering out a country; e.g. in case of duplicates of resources to select the nearest one. 1.2.1.8.1
Country: general (7.1 – 7.3)
Strongly recommended: -
The country code should reflect the publisher country (supported by seven partners)
Not necessary: -
The country code should reflect the server country (supported by two partners)
-
Renardus should support both, publisher and server country (e.g. country with a Renardus scheme publisher and another scheme server) (supported by two partners)
1.2.1.8.2
Country: code (7.4 – 7.5)
It is more or less strongly recommended by partners that the country code should be ISO Code 3166, two letters (supported by six partners). There is no need to use another code.
Reynard IST-1999-10562
22
Deliverable: D6.4
1.2.1.9
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Type (8)
Not all partners support this element and those partners, which support it, use different “controlled lists”, some of them are Dublin Core based. To get a common view and handling on this field partners were asked several questions. 1.2.1.9.1
Type: general (8.1 – 8.5)
Recommended: -
Renardus should develop a common list of types (controlled list)
-
The common list of types should be based on the Dublin Core type list (supported by five partners)
Not necessary: -
The common list of types should be a home grown one (mixture of all types of partners SGs)
-
The common list of types should be based on a type list other than Dublin Core (e.g. type list in MARC21, in Germany: Working Group "Codes", etc. )
Five partners want to specify the common type document.theses.habilitation etc., three partners don’t want this. 1.2.2
list
by
"qualifiers/subcategories"
like
Future Elements
With regard to future elements at the technical meeting in Bath (11. May) there was more or less the strong wish from some partners to support further elements after the prototype test installation of Renardus. 1.2.2.1
Rights (9.1 – 9.7)
Recommended/Desirable: -
Renardus should support the Rights element in the future (supported by four partners)
-
Renardus should support the Rights element in the sense of IPRs (SGs keep their copyrights of the metadata records after they are gathered from the broker service (supported by four partners)
-
Rights should contain information about access conditions/restrictions of the resource (e.g. technical/software requirements, subscription information) (supported by four partners)
-
Rights should contain copyright/IPR information of the resource (supported by three partners)
-
Rights should be a repeatable element for different kinds of information (access conditions/restrictions, subscription information, copyright, IPR, etc.)
-
Rights should contain information about access conditions/restrictions negotiated by the SG (by the library or institution maintaining the SG respectively) (supported by three partners)
-
Renardus should use the element Rights with different qualifiers for different kinds of information
1.2.2.2
Publisher (10)
It is strongly recommended to support in future a publisher element (five partners evaluate this with required and this field is supported by six partners).
Reynard IST-1999-10562
23
Deliverable: D6.4
1.2.3
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Additional Elements
There are some partners who want to support in the future a DC.Relation element (SUB, BnF), DC.Format element (SUB, DDB: there even might be format preferences for the display of different mime types, one partner stated definitely not (FVL: the system will become too, omplicated) and one partner referred to the Bath decision (DTV/NetLab). One partner (ZADI) mentioned some general interest to support additional elements in the future. This might be an issue that should be discussed new after the prototype installation of the Renardus broker. 1.2.4
Administrative Elements
For the administrative, separate database Renardus needs some further administrative metadata elements. 1.2.4.1
Subject Gateway ID (IV A)
It is strongly recommended that Renardus should support an element like Subject Gateway ID with the name and URL of the SG, so the user can search only in special gateways. 1.2.4.2
Unique Record Number (IV B)
It is recommended that Renardus should support an element like a Unique Record Number as an unambiguous Renardus identifier. 1.2.4.3
Record Creator (IV C)
It is not necessary that Renardus should information about the record creator (with last name, first name, Email, organisation etc.). 1.2.4.4
SBIG ID (IV D)
It is more or less recommended that Renardus should support a SBIG ID (=Record source) with the syntax: name of information provider/name of Subject Gateway:Internal ID of the record in the SG database. With this SBIG ID it is possible to update a record from the SG database to the Renardus database. 1.2.4.5
Record Last Checked Date (IV E)
It is recommended respective desirable that Renardus should support something like a "Record Last Checked Date" element, which informs about a date of the last verification or update of the metadata record. 1.2.4.6
Other (IV F)
FVL stated that aybe the participant gateways need an administrative field, which determines, whether is the record suitable for Renardus purposes or not. DDB stated that we should consider whether there should be separate sets representing the subject gateways, with elements describing their particular subject competences (for instance expressed by DDC notations), thereby enabling the system to route the user queries. Other elements might be system administrators etc. 1.3
Subject Gateways in the UK
One of the gateway initiatives associated with the Renardus project is the UK's Resource Discovery Network (RDN). The RDN is a service funded by the Joint Information Systems Committee (JISC) of the UK higher education funding councils with support from the Economic and Social Research Council (ESRC) and the Arts and Humanities Research Board (AHRB). The RDN builds upon the experiences of the subject gateway activity carried out under the JISC's Electronic Libraries (eLib) Programme.
Reynard IST-1999-10562
24
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
The RDN provides resource discovery services through a network of Internet information gateways that are clustered together in subject-based 'hubs' (see chapter 1.3.2). These are co-ordinated by a team based in the JISC's DNER Office at King's College London and at UKOLN. The hubs are essentially independent service providers who provide one or more Internet resource catalogues or gateways that can be accessed at a variety of levels. In addition, hubs have also developed, and linked to, a wide range of other information and related services (Dempsey, 2000, p. 19). Furthermore, in the context of the JISC's concept of a Distributed National Electronic Resource (DNER), the RDN hubs are being encouraged to provide additional service layers, brokering access to heterogeneous services through protocols like Z39.50. These services are referred to as DNER Portals. Dempsey (2000, p. 19) has said, in this context, that "the 'subject gateway' or resource catalogue is one component in a network of communicating services which may be assembled to meet particular business and user needs." In the RDN context, the contents of gateways can be accessed at a variety of levels: -
Individual gateways or Internet resource catalogues. Where hubs are comprised of more than one gateway, each will have its own Web interface. For example, the BIOME hub, which covers subjects in the health and life sciences, is made up of five distinct gateways. Each one has its own interface that allows searching and browsing within that particular gateway.
-
Hubs. Each RDN 'hub' will have an interface that allows for all of its component Internet resource catalogues to be searched (and possibly browsed) together. For more information on RDN hubs, see chapter 1.3.2.
-
The RDN. The RDN is responsible for providing an interface to all of the services developed by hubs, including services that will be able to cross-search through the ResourceFinder all of the Internet resource catalogues developed by RDN hubs.
The RDN hubs are independent service providers. They can (and do) use a wide variety of different software types and metadata formats. In order to support the central services that are offered by the RDN, it is strongly recommended that hubs are able to provide a minimum set of metadata that - as currently defined - is a sub-set of the Dublin Core elements. The six elements (Title, Subject, Description, Type, Identifier and Language) are defined (with brief content rules) in the RDN Cataloguing Guidelines (Day and Cliff, 2000). In this distributed scenario, it is unlikely that all RDN hubs would have a common single view of the Renardus data model. As new hubs (and Internet resource catalogues) become part of the RDN, it is possible that there could be even more diversity. 1.3.1
RDN
Michael Day from the RDN support team at UKOLN filled in the D6.4 questionnaire. He pointed out (in an email of 14 August) that the answers/comments on the questionnaire were mainly his own, but were in part based on the RDN Cataloguing Guidelines and other internal discussions. "Because the RDN is a federation of a number of gateways it is difficult to say whether RDN "supports" anything specific in the questionnaire, now or in the future. It is likely that parts of RDN will support some things, while the RDN as a whole may not. For example, ROADS gateways can record v-card-type information about creators or administrators, but the RDN ResourceFinder will not be able to search this. On the other hand, both the RDN and gateways will be certainly interested in things like developing a common classification system for cross-browsing. Many of the replies are fairly neutral ('desirable' or 'not necessary') because they are issues that have not been widely considered in an RDN context, e.g. the repeatability of some fields, descriptions in multiple languages, etc. Also, RDN allows gateways to do much their own thing and they do. Some (e.g. SOSIG) are based on ROADS, others (EEVL, the new OMNI) are not. Some use ROADS templates, others use something more DC-like. The RDN mandatory elements (Title, Subject, Description, Type, Identifier (URI), Language) are based on a subset of DC." The RDN Cataloguing Guidelines define content rules for all fifteen DCMES elements. Definitions were taken from the Reference Description of DCMES version 1.1. Schemes are used in four of the six 'minimum set' elements.
Reynard IST-1999-10562
25
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
-
Title. No particular scheme is defined in the guidelines, although AACR2 practice as regard to capitalisation and punctuation is recommended.
-
Subject. The guidelines do not mandate the use of any particular subject scheme, but if a scheme is used, a shortened version of the scheme should be added as a value qualifier.
-
Description. No particular scheme is defined in the guidelines.
-
Type. The guidelines suggest that resource type should be taken from either the draft list of Dublin Core Types (Dublin Core Type Working Group, 1999) or the list of types defined by the RDN (Cliff, 2000).
-
Identifier. If no value qualifier is present, the identifier must be an URI.
-
Language. This should be a language code either based on the three letter codes defined in ISO 639-2:1998 or the two letter codes recommended by RFC 1766. If required, RDN may need to provide some conversion tools to map between the two schemes.
All RDN Internet resource catalogues should be able to provide records broadly in accordance with these general guidelines. They would be able, therefore, to support most of the eight elements defined in the Renardus data model. http://www.rdn.ac.uk/
1.3.2
Individual RDN hubs
The RDN does not specify the software and metadata formats in use by each of the hubs. Most use their own metadata formats, although these tend to have some kind of relationship with ROADS/IAFA templates or the DCMES. The following sections attempt to explain the metadata formats in use within each of the RDN's current hubs, to note its relationship with the 'minimum set' recommended by the RDN itself, and to note content standards in use where these have been published. 1.3.2.1
BIOME
The BIOME health and life sciences hub is currently made up of five separate gateways that cover health and medicine (OMNI), animal health (VetGate), biological and biomedical science (BioResearch), the natural world (Natural Selection) and agriculture, food and forestry (AgriFor). A new gateway for nursing, midwifery and allied health professions (NMAHP) will soon be added. BIOME provides its own cataloguing rules based on the RTNG resource description template structure (Gray, 2000). These include versions of all six of the RDN's 'minimum set' of elements ('Title', 'Add subject descriptor', 'Add keywords', 'Description', 'Category', 'Main URI' and 'Main Language'), but also an element ('UK based') that will indicate whether the resource being described is based in the UK. -
The type element ('Category') uses a scheme defined by BIOME.
-
The language element ('Main Language') is left blank if English is the main language. Other languages are entered according to the MARC three letter language code (based on ISO 639-2:1988).
-
For the subject classification element ('Add subject descriptor'), the National Library of Medicine and the Library of Congress classification schemes are used in OMNI, NMAHP, VetGate and BioResearch; the Dewey Decimal Classification (DDC) scheme in AgriFor and Natural Selection. Controlled vocabulary schemes ('Add keyword') in use within BIOME include Medical Subject Headings (MeSH) for OMNI and BioResearch, MeSH and the RCN (Royal College of Nursing) thesaurus for NMAHP, the CAB thesaurus for AgriFor and VetGate, and Library of Congress Subject Headings (LCSH) for Natural Selection.
http://www.biome.ac.uk/
Reynard IST-1999-10562
26
Deliverable: D6.4
1.3.2.2
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
EEVL
EEVL (the Edinburgh Engineering Virtual Library) is currently the RDN service that covers engineering. EEVL uses its own metadata format of 22 attributes that includes five of the RDN 'minimum set' of elements ('Title', 'Classification', 'Description', 'Resource type' and 'URL'); i.e., all elements except 'Language' (MacLeod, Kerr and Guyon, 1998, pp. 209-210). The subject classification scheme adopted by EEVL is an in-house scheme that is loosely based on the Ei Classification Scheme developed by Engineering Information Inc. EEVL is part of a hub that will expand to cover the mathematical sciences (MathGate) and computing (Computing). The MathGate and Computing gateways are still under development. http://www.eevl.ac.uk/
1.3.2.3
Humbul
The Humbul service covers the arts and humanities. The gateway has developed its own software and uses an element set based on the Dublin Core. The service publishes some draft cataloguing guidelines, Describing and cataloguing resources in Humbul that are broadly based on the RDN guidelines and AACR2 (Humbul, 2000). Versions of all the RDN 'minimum set' of elements are 'required' elements, as are several other elements, including 'Author' and 'Publisher'. The main subject scheme in use is the Library of Congress Subject Headings (LCSH). Types are defined using the draft list of Dublin Core Types; the RDN-defined list of types and an additional set of types defined by Humbul itself. The 'Language' element uses the three letter code defined in ISO 639-2:1998. http://www.humbul.ac.uk/
1.3.2.4
PSIgate
The PSIgate hub will cover the physical sciences. The service is still under development. http://www.psigate.ac.uk/
1.3.2.5
SOSIG
The SOSIG service covers the social sciences, business and law. The gateway uses the ROADS software, and resources are described using ROADS/IAFA templates. These include equivalents of all RDN 'minimum set' elements ('Title', 'Subject-Descriptor'/'Subject-Descriptor-Scheme', 'Description', 'Category', 'URI' and 'Language'). The browse structure is based on the Universal Decimal Classification (UDC). A thesaurus searching option is also available which uses a thesaurus derived from HASSET (the Humanities And Social Sciences Electronic Thesaurus). http://www.sosig.ac.uk/
2
DATA MODEL AND DATA FLOW
Very early in the discussion of a Renardus data model it was clear, that the data model should be based on Dublin Core as far as possible. Only one Renardus element is neither a DC element nor a DC based element and this is Country. All other elements and qualifiers (element refinement and value encoding scheme) are based on Dublin Core where possible. In case no encoding scheme or refinement from Dublin Core can be used, the definition is a Renardus qualifier. It is also part of this workpackage to develop a Renardus namespace with a defined Renardus Metadata Element Set (RMES). The final Renardus application profile will be ready in June 2001 (the public deliverable of D6.5).
Reynard IST-1999-10562
27
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
The Renardus broker will consist of the content databases (decentral: Z39.50) with the agreed eight elements and two administrative elements and the Collection Level Description database. In this report the content database is at first based on the data model for the prototype Renardus pilot system (see 2.1) and later on, after test installation of the prototype, on the preliminary version of the data model for the operational Renardus pilot system (see 2.2). The content database will contain the metadata records extracted from the individual Service Providers databases in accordance with the Renardus data model. The Collection Level Description database will contain information on collection description of each subject gateway and the mapping tables (e.g. for DDC, probably also for Language, Type, or Country code) (see 2.3).
Cross-search, cross-browse and filter issues: The main basic index will allow a search across the elements Title, Description and Subject. Therefore it is necessary that firstly the Subject Gateways provide free text in the description field and not e.g. a URL and secondly that the Subject Gateways deliver any kind of subject information. Up to now it is an open question if DDC captions will also be included in the basic index. The cross-browsing structure will be realized through a mapping of each partners’ classification system to the Dewey Decimal Classification (DDC). The DDC element is mandatory. With the elements Country, Language, and Type some filter processes are possible. Together with the element Creator these elements could also be displayed in the result list.
Upgrade priority for partners’ metadata information: In case keyword is not yet an element in partners’ datamodel for the normalization process it is in the first place recommended to provide the element keyword. For the future it is required that the title will be provided in the original version, other forms of title could be given in the title.alternative field. It is still undecided if in the future it will be required to provide an English version of the title, either in the Title field or in the Title.Alternative field. Considering that all partners should support an element it is further recommended that all partners support the country element. It seems to be easier to extract the country code from the domain of a URL than to support a language code. In conclusion, if partners have to upgrade their metadata informatio it is strongly recommended to include first keywords, than country followed by type and language. 2.1
Data model for the prototype Renardus pilot system
The data model is mainly based on two Dublin Core documents:
[DCMES version 1.1] Dublin Core Metadata Element Set, Version 1.1: Reference Description, http://purl.oclc.org/dc/documents/rec-dces-19990702.htm
[DCMES Qualifiers (2000-07-11)] Dublin Core Qualifiers, http://purl.org/dc/documents/rec/dcmesqualifiers-20000711.htm
Format of entries: Name
Reynard IST-1999-10562
Name of Metadata field
28
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Qualified DC name
Qualified Dublin Core name
Namespace
DCMES version 1.1, DCMES Qualifiers (2000-07-11) or Renardus Metadata Element Set = RMES version 0.1
Refinement(s)
Element Refinements used in Renardus: These qualifiers make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope
DC Encoding Scheme(s)
These qualifiers identify schemes that aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules. A value expressed using an encoding scheme will thus be a token selected from a controlled vocabulary (e.g., a term from a classification system or set of subject headings) or a string formatted in accordance with a formal notation (e.g., "2000-01-01" as the standard expression of a date). If an encoding scheme is not understood by a client or agent, the value may still be useful to a human reader
R Encoding Scheme(s)
Renardus encoding scheme, see above
Form of Obligation
In the Renardus data model the obligation can be: mandatory (M), strongly recommended (R) or optional (O). Mandatory ensures that some of the elements are always supported. An element with a mandatory obligation must have a value. The strongly recommended and the optional elements should be filled with a value if the information is appropriate to the given resource or provided by a Subject Gateway, but if not, they can be left blank.
Repeatable
Metadata field is repeatable: yes or no
LQ "LANG"
Language Qualifier "LANG": to give information about the language of the content of a metadata field (ISO Code 639, two letter), yes, no, or possible
DC Definition
Dublin Core Definition of metadata field
DC Comment
Dublin Core comments to this metadata field
R Definition
Renardus definition of metadata field
R Comment
Renardus comments to this metadata field
2.1.1
2.1.1.1
Dublin Core Elements
DC.Title and DC.Title.Alternative
Name
Title
Qualified DC name
DC.Title
Namespace
DCMES version 1.1
Refinement(s)
Alternative
Reynard IST-1999-10562
29
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
-
Obligation
M
Repeatable
no
LQ "LANG"
possible
DC Definition
A name given to the resource
DC Comment
Typically, a title will be a name by which the resource is formally known
R Definition
-
R Comment
-
Name
Title ¦ Alternative
Qualified DC name
DC.Title.Alternative
Namespace
DCMES Qualifiers (2000-07-11)
Refinement(s)
-
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
-
Obligation
O
Repeatable
yes
LQ "LANG"
possible
DC Definition
Any form of the title used as a substitute or alternative to the formal title of the resource
DC Comment
This qualifier can include Title abbreviations as well as translations
R Definition
-
R Comment
-
2.1.1.2
DC.Creator
Name
Creator
Qualified DC name
DC.Creator
Reynard IST-1999-10562
30
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Namespace
DCMES version 1.1
Refinement(s)
-
DC Encoding Scheme(s)
-
R Encoding Scheme(s)
For personal names: last name and first name in separate tags
Obligation
R
Repeatable
yes
LQ "LANG"
no
DC Definition
An entity primarily responsible for making the content of the resource.
DC Comment
Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.
R Definition
Creator(s) are person(s) which are responsible for the intellectual content of the document(s), e.g. webmasters are no creators.
R Comment
If this field is applicable it is strongly recommended to provide the creator. For Renardus normalization process it is strongly recommended that last name and first name are clearly distinguishable.
2.1.1.3
DC.Description
Name
Description
Qualified DC name
DC.Description
Namespace
DCMES version 1.1
Refinement(s)
-
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
-
Obligation
M
Repeatable
yes
LQ "LANG"
possible
DC Definition
An account of the content of the resource.
DC Comment
Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.
Reynard IST-1999-10562
31
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
R Definition
-
R Comment
For the Renardus normalization process it is not enough to provide only a URL, for cross-search reasons the field description must contain free text.
2.1.1.4
DC.Subject: classification system(s) and keywords
Name
Subject
Qualified DC name
DC.Subject
Namespace
DCMES Qualifiers (2000-07-11) and RMES version 0.1
Refinement(s)
-
DC Encoding Scheme(s)
LCSH, MESH, DDC, LCC, UDC
R Encoding Scheme(s)
all other encoding schemes used by the partners
Obligation
M
Repeatable
yes
LQ "LANG"
possible
DC Definition
The topic of the content of the resource.
DC Comment
Typically, a subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.
R Definition
-
R Comment
Here is the place for all subject information used by partners like controlled keywords, free keywords, classification system(s) and/or captions. In the prototype system there will be no further distinction between the several kinds of subject. In the prototype system the provision of keywords is strongly recommended, in the final system the provision of keywords is required.
Name
Subject ¦ DDC
Qualified DC name
DC.Subject
Namespace
DCMES Qualifiers (2000-07-11) and RMES version 0.1
Refinement(s)
-
DC Encoding Scheme(s)
DDC
Reynard IST-1999-10562
32
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
R Encoding Scheme(s)
Ren-DDC for normalization, DDC 21 can be extend by RENARDUS specific captions
Obligation
M
Repeatable
yes
LQ "LANG"
no
DC Definition
Dewey Decimal Classification, see also: http://www.oclc.org/dewey/index.htm
DC Comment
-
R Definition
DDC 21: adapted DDC version for cross-browsing puporse.
R Comment
This field is created in the Renardus normalization process via mapping tables from the particular Subject Gateway classification scheme. Each partner has to map the own classification system to DDC. Mapping guideline for DDC will be prepared in the context of WP 7. Only captions and not notations will be displayed.
2.1.1.5
DC.Identifier
Name
Identifier
Qualified DC name
DC.Identifier
Namespace
DCMES version 1.1
Refinement(s)
-
DC Encoding Scheme(s)
URI
R Encoding Scheme(s)
-
Obligation
M
Repeatable
yes, for translated sites and/or mirrored, copied sites
LQ "LANG"
no
DC Definition
An unambiguous reference to the resource within a given context.
DC Comment
Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN)..
R Definition
-
R Comment
URI means URL, URN, DOI, ISBN, ISSN etc. For Renardus normalization process
Reynard IST-1999-10562
33
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
DOI, ISBN und ISSN must be displayed in a URN syntax. In the prototype system no distinction will be made between resource URL, mirrored, copied resource URL(s) and URL(s) for archive reasons.
2.1.1.6
DC.Language
Name
Language
Qualified DC name
DC.Language
Namespace
DCMES version 1.1
Refinement(s)
-
DC Encoding Scheme(s)
ISO 639-2
R Encoding Scheme(s)
-
Obligation
R
Repeatable
yes
LQ "LANG"
-
DC Definition
A language of the intellectual content of the resource.
DC Comment
Recommended best practice for the values of the Language element is defined by RFC 1766 which includes a two-letter Language Code (taken from the ISO 639 standard), followed optionally, by a two-letter Country Code (taken from the ISO 3166 standard). For example, en for English, fr for French, or en-uk for English used in the United Kingdom
R Definition
-
R Comment
The language code is the ISO 639-2, three letter code. SUB will provide a mapping between the two letter and three letter language code but this will also be found on the LoC site – ISO 639-2: http://lcweb.loc.gov/standards/iso639-2/englangn.html
2.1.1.7
DC.Type
Name
Type ¦ DCMI Type (DCT1)
Qualified DC name
DC.Type
Namespace
DCMES Qualifiers (2000-07-11)
Refinement(s)
-
Reynard IST-1999-10562
34
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
DC Encoding Scheme(s)
DCMI Type Vocabulary (DCT1)
R Encoding Scheme(s)
-
Obligation
R
Repeatable
yes
LQ "LANG"
no
DC Definition
The nature or genre of the content of the resource.
DC Comment
Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.
R Definition
-
R Comment
SUB will provide a mapping of all types used in partners’ subject gateways to DCT1 (probably except of ZADI). The possibility and usability of a mapping to DCT2 will be investigated in the context of WP 7.
Name
Type
Qualified DC name
DC.Type
Namespace
DCMES version 1.1
Refinement(s)
-
DC Encoding Scheme(s)
-
R Encoding Scheme(s)
-
Obligation
R
Repeatable
yes
LQ "LANG"
no
DC Definition
The nature or genre of the content of the resource.
DC Comment
Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.
R Definition
-
R Comment
Subject Gateways should provide their original types without encoding scheme.
Reynard IST-1999-10562
35
Deliverable: D6.4
2.1.2
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Non Dublin Core element
2.1.2.1
Country
Name
Country
Qualified DC name
-
Namespace
RMES version 0.1
Refinement(s)
-
DC Encoding Scheme(s)
-
R Encoding Scheme(s)
ISO 3166-1 (two letter code) http://www.din.de/gremien/nas/nabd/iso3166ma/
Obligation
R
Repeatable
no
LQ "LANG"
no
DC Definition
-
DC Comment
-
R Definition
Country in which the publisher of the resource is located or the country which represents the cultural context of the resource. Code for the representation of names of countries.
R Comment
-
2.1.3
Administrative Renardus elements
Two administrative elements are used in Renardus for practical reasons: “Full Record ID” and “SBIG ID”. 2.1.3.1
Full Record URL
Name
Full Record URL
Qualified DC name
-
Namespace
RMES version 0.1
Refinement(s)
-
DC Scheme(s)
Encoding -
Reynard IST-1999-10562
36
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
R Encoding Scheme(s)
URL
Obligation
R
Repeatable
no
LQ "LANG"
no
DC Definition
-
DC Comment
-
R Definition
A URL that leads to a detailed display of each record at the originating service site.
R Comment
Because some partners generate their records dynamically it might be a problem to provide a URL to the full record display.
2.1.3.2
SBIG ID
Name
SBIG ID
Qualified DC name
-
Namespace
RMES version 0.1
Refinement(s)
-
DC Encoding Scheme(s)
-
R Encoding Scheme(s)
Acronym of Subject Gateway
Obligation
M
Repeatable
no
LQ "LANG"
no
DC Definition
-
DC Comment
-
R Definition
A stable unique acronym also well defined in the Collection Level Description.
R Comment
Must be the same acronym as used in the Renardus Collection Level Description schema field “Acronym”.
Reynard IST-1999-10562
37
Deliverable: D6.4
2.2
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Preliminary version of data model for the operational Renardus pilot system
This data model refleccts the current status of discussion. It is likely that there will be some changes e.g. with regard to obligation of an element, further qualifiers, additions in future e.g. with regard to support further elements like publisher, rights, format and relation, and some mor comments. In opposite to the data model for the prototype system this preliminary data model contains further qualifiers, some more language tags for the elements and some changes in the obligation of an element. The data model is mainly based on two Dublin Core documents:
[DCMES version 1.1] Dublin Core Metadata Element Set, Version 1.1: Reference Description, http://purl.oclc.org/dc/documents/rec-dces-19990702.htm
[DCMES Qualifiers (2000-07-11)] Dublin Core Qualifiers, http://purl.org/dc/documents/rec/dcmesqualifiers-20000711.htm
Format of entries: Name
Name of Metadata field
Qualified DC name
Qualified Dublin Core name
Namespace
DCMES version 1.1, DCMES Qualifiers (2000-07-11) or Renardus Metadata Element Set = RMES version 0.1
Refinement(s)
Element Refinements used in Renardus: These qualifiers make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope
DC Encoding Scheme(s)
These qualifiers identify schemes that aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules. A value expressed using an encoding scheme will thus be a token selected from a controlled vocabulary (e.g., a term from a classification system or set of subject headings) or a string formatted in accordance with a formal notation (e.g., "2000-01-01" as the standard expression of a date). If an encoding scheme is not understood by a client or agent, the value may still be useful to a human reader
R Encoding Scheme(s)
Renardus encoding scheme, see above
Form of Obligation
In the Renardus data model the obligation can be: mandatory (M), strongly recommended (R) or optional (O). Mandatory ensures that some of the elements are always supported. An element with a mandatory obligation must have a value. The strongly recommended and the optional elements should be filled with a value if the information is appropriate to the given resource or provided by a Subject Gateway, but if not, they can be left blank.
Repeatable
Metadata field is repeatable: yes or no
LQ "LANG"
Language Qualifier "LANG": to give information about the language of the content of a metadata field (ISO Code 639, two letter), yes or no
DC Definition
Dublin Core Definition of metadata field
DC Comment
Dublin Core comments to this metadata field
Reynard IST-1999-10562
38
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
R Definition
Renardus definition of metadata field
R Comment
Renardus comments to this metadata field
2.2.1
Date of issue: 17 Novemberr 2000
Dublin Core Elements
2.2.1.1
DC.Title and DC.Title.Alternative
Name
Title
Qualified DC name
DC.Title
Namespace
DCMES version 1.1
Refinement(s)
Alternative
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
-
Obligation
M
Repeatable
no
LQ "LANG"
yes
DC Definition
A name given to the resource
DC Comment
Typically, a title will be a name by which the resource is formally known
R Definition
Title should be the original title, other forms of title should be provided in the Title. Alternative field.
R Comment
It is strongly recommended to provide only one version of title in this field (and not also e.g. translated titles).
Name
Title ¦ Alternative
Qualified DC name
DC.Title.Alternative
Namespace
DCMES Qualifiers (2000-07-11)
Refinement(s)
-
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
-
Obligation
O
Repeatable
yes
Reynard IST-1999-10562
39
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
LQ "LANG"
yes
DC Definition
Any form of the title used as a substitute or alternative to the formal title of the resource
DC Comment
This qualifier can include Title abbreviations as well as translations
R Definition
-
R Comment
-
2.2.1.2
DC.Creator and DC.Creator.AddinionalInformation
Name
Creator
Qualified DC name
DC.Creator
Namespace
DCMES version 1.1
Refinement(s)
-
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
For personal names: last name, first name in separate tags
Obligation
R
Repeatable
yes
LQ "LANG"
no
DC Definition
An entity primarily responsible for making the content of the resource.
DC Comment
Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.
R Definition
Creator(s) are person(s) which are responsible for the intellectual content of the document(s), e.g. webmasters are no creators.
R Comment
If this field is applicable it is strongly recommended to provide the creator. For Renardus normalization process it is strongly recommended that last name and first name are clearly distinguishable.
It is not yet clear if the Renardus datamodel will support the refinement “Additional Information” of creator. This dependes also on the agent discussion of Dublin Core and how DC will support this kind of information in future.
Reynard IST-1999-10562
40
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
- Formal for each kind of “Additional Information” like Email, URL and Organizational Information an extra definition table sheet Name
Creator ¦ AdditionalInformation
Qualified DC name
(see Agent discussion: http://www.mailbase.ac.uk/lists/dc-agents/files/wd-agent-qual.html)
Namespace
RMES version 0.1
Refinement(s)
RMES version 0.1 (for Additional Information)
DC Encoding Scheme(s)
(see Agent discussion: http://www.mailbase.ac.uk/lists/dc-agents/files/wd-agent-qual.html)
R Encoding Scheme(s)
Email, URL, OrgInf
Obligation
O
Repeatable
yes
LQ "LANG"
no
DC Definition
-
DC Comment
-
R Definition
Additional information like Email, URL, Organisational Information with regard to creator.
R Comment
-
2.2.1.3
DC.Description
Name
Description
Qualified DC name
DC.Description
Namespace
DCMES version 1.1
Refinement(s)
-
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
-
Obligation
M
Repeatable
yes
LQ "LANG"
yes
Reynard IST-1999-10562
41
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
DC Definition
An account of the content of the resource.
DC Comment
Description may include but is not limited to: an abstract, table of contents, reference to a geographical representation of content or a free-text account of the content.
R Definition
-
R Comment
For the Renardus normalization process it is not enough to provide only a URL, for cross-search reasons the field description must contain free text. Strongly recommended: Each SG should provide either an English version of description or an English version of keywords for every resource (beside other languages)
2.2.1.4
DC.Subject: classification system(s) and keywords
- Formal for each partners’classification system (captions and notations of thematic, subject, general, or local classification: FAO/AGRIS, Ei, NLM, BK etc.), each kind of keywords (thesauri based and/or controlled keywords, free keywords: AGROVOC Thesaurus, AGRIFOREST, Danish Agricultural Thesaurus, Ei Thesaurus, GEFO Thesaurus, HASSET Thesaurus, CAREDATA, IBSS Thesaurus, Thesaurus of Geoscience, Geo Ref Thesaurus etc.) and each DC encoding scheme an extra definition table sheet -
Name
Subject
Qualified DC name
DC.Subject
Namespace
DCMES Qualifiers (2000-07-11) and RMES version 0.1
Refinement(s)
-
DC Encoding Scheme(s)
LCSH, MeSH, DDC, LCC, UDC
R Encoding Scheme(s)
all other encoding schemes used by the partners
Obligation
M
Repeatable
yes
LQ "LANG"
yes
DC Definition
The topic of the content of the resource.
DC Comment
Typically, a subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.
R Definition
-
R Comment
Here is the place for all subject information used by partners like controlled keywords, free keywords, classification system(s) and/or captions. In the preliminary version of data model for the operational Renardus pilot there will be made a distinction between the several kinds of subject.
Reynard IST-1999-10562
42
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
For the final system the provision of keywords is required.
Name
Subject ¦ DDC
Qualified DC name
DC.Subject
Namespace
DCMES Qualifiers (2000-07-11) and RMES version 0.1
Refinement(s)
-
DC Encoding Scheme(s)
DDC
R Encoding Scheme(s)
Ren-DDC for normalization, DDC 21 can be extend by RENARDUS specific captions
Obligation
M
Repeatable
yes
LQ "LANG"
no
DC Definition
Dewey Decimal Classification, see also: http://www.oclc.org/dewey/index.htm
DC Comment
-
R Definition
DDC 21: adapted DDC version for cross-browsing puporse.
R Comment
This field is created in the Renardus normalization process via mapping tables from the particular Subject Gateway classification scheme. Each partner has to map the own classification system to DDC. Mapping guideline for DDC will be prepared in the context of WP 7. Only captions and not notations will be displayed.
2.2.1.5
DC.Identifier
Name
Identifier
Qualified DC name
DC.Identifier
Namespace
DCMES Qualifiers (2000-07-11) and RMES version 0.1
Refinement(s)
Mirror, Archive
DC Encoding Scheme(s)
URI
R Encoding Scheme(s)
-
Obligation
M
Reynard IST-1999-10562
43
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Repeatable
yes, for translated sites
LQ "LANG"
no
DC Definition
An unambiguous reference to the resource within a given context.
DC Comment
Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN)..
R Definition
-
R Comment
URI means URL, URN, DOI, ISBN, ISSN etc. For Renardus normalization process DOI, ISBN und ISSN must be displayed in a URN syntax. In the preliminary version of data model for the operational Renardus pilot system there will be made a distinction between resource URL, mirrored, copied resource URL(s) and URL(s) for archive reasons.
Name
Identifier ¦ Mirror
Qualified DC name
DC.Identifier
Namespace
RMES version 0.1
Refinement(s)
Mirror
DC Encoding Scheme(s)
URI
R Encoding Scheme(s)
-
Obligation
O
Repeatable
yes
LQ "LANG"
no
DC Definition
An unambiguous reference to the resource within a given context.
DC Comment
Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).
R Definition
-
R Comment
URI means URL, URN, DOI, ISBN, ISSN etc. For Renardus normalization process DOI, ISBN und ISSN must be displayed in a URN syntax.
Name
Identifier ¦ Archiv
Reynard IST-1999-10562
44
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Qualified DC name
DC.Identifier
Namespace
RMES version 0.1
Refinement(s)
Archiv
DC Encoding Scheme(s)
URI (? to ask DDB)
R Encoding Scheme(s)
-
Obligation
O
Repeatable
no
LQ "LANG"
no
DC Definition
An unambiguous reference to the resource within a given context.
DC Comment
Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).
R Definition
-
R Comment
-
2.2.1.6
DC.Language
Name
Language
Qualified DC name
DC.Language
Namespace
DCMES version 1.1
Refinement(s)
-
DC Encoding Scheme(s)
ISO 639-2
R Encoding Scheme(s)
-
Obligation
R
Repeatable
yes
LQ "LANG"
-
DC Definition
A language of the intellectual content of the resource.
DC Comment
Recommended best practice for the values of the Language element is defined by RFC 1766 which includes a two-letter Language Code (taken from the ISO 639 standard), followed optionally, by a two-letter Country Code (taken from the ISO
Reynard IST-1999-10562
45
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
3166 standard). For example, en for English, fr for French, or en-uk for English used in the United Kingdom R Definition
-
R Comment
The language code is the ISO 639-2, three letter code. SUB will provide a mapping between the two letter and three letter language code but this will also be found on the LoC site – ISO 639-2: http://lcweb.loc.gov/standards/iso639-2/englangn.html
2.2.1.7
DC.Type
Name
Type ¦ DCMI Type (DCT1)
Qualified DC name
DC.Type
Namespace
DCMES Qualifiers (2000-07-11)
Refinement(s)
-
DC Encoding Scheme(s)
DCMI Type Vocabulary (DCT1)
R Encoding Scheme(s) Obligation
R
Repeatable
yes
LQ "LANG"
no
DC Definition
The nature or genre of the content of the resource.
DC Comment
Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.
R Definition
-
R Comment
SUB will provide a mapping of all types used in partners’ subject gateways to DCT1 (probably except of ZADI).
Name
Type ¦ DCMI Type (DCT2)
Qualified DC name
DC.Type
Namespace
DCT2: Dublin Core Type Vocabulary: Subtypes http://lcweb.loc.gov/marc/dc/subtypes-20000928.html
Refinement(s)
-
DC Encoding Scheme(s)
DCMI Type Vocabulary (DCT2) as soon as it is fixed!
Reynard IST-1999-10562
Working
Draft,
46
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
R Encoding Scheme(s)
-
Obligation
O
Repeatable
yes
LQ "LANG"
no
DC Definition
The nature or genre of the content of the resource.
DC Comment
Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.
R Definition
A list of subtypes used to categorize the nature or genre of the content of the resource, a more specific list of resource types than available in the DCT1 Type Vocabulary.
R Comment
The possibility and usability of a mapping to DCT2 will be investigated in the context of WP 7.
Name
Type
Qualified DC name
DC.Type
Namespace
DCMES version 1.1
Refinement(s)
-
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
-
Obligation
R
Repeatable
yes
LQ "LANG"
no
DC Definition
The nature or genre of the content of the resource.
DC Comment
Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.
R Definition
-
R Comment
Subject Gateways should provide their original types without encoding scheme.
Reynard IST-1999-10562
47
Deliverable: D6.4
2.2.2
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Non Dublin Core element
2.2.2.1
Country
Name
Country
Qualified DC name
-
Namespace
RMES version 0.1
Refinement(s)
-
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
ISO 3166-1 (two letter code) http://www.din.de/gremien/nas/nabd/iso3166ma/
Obligation
R
Repeatable
no
LQ "LANG"
no
DC Definition
-
DC Comment
-
R Definition
Country in which the publisher of the resource is located or the country which represents the cultural context of the resource. Code for the representation of names of countries.
R Comment
-
2.2.3
Administrative Renardus elements
Two administrative elements are used in Renardus for practical reasons: “Full Record ID” and “SBIG ID”. 2.2.3.1
Full Record URL
Name
Full Record URL
Qualified DC name
-
Namespace
RMES version 0.1
Refinement(s)
-
DC Scheme(s)
Encoding none
Reynard IST-1999-10562
48
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
R Encoding Scheme(s)
URL
Obligation
R
Repeatable
no
LQ "LANG"
no
DC Definition
-
DC Comment
-
R Definition
A URL that leads to a detailed display of each record at the originating service site.
R Comment
Because some partners generate their records dynamically it might be a problem to provide a URL to the full record display.
2.2.3.2
SBIG ID
Name
SBIG ID
Qualified DC name
-
Namespace
RMES version 0.1
Refinement(s)
-
DC Encoding Scheme(s)
none
R Encoding Scheme(s)
Acronym of Subject Gateway
Obligation
M
Repeatable
no
LQ "LANG"
no
DC Definition
-
DC Comment
-
R Definition
A stable unique acronym also well defined in the Collection Level Description.
R Comment
Must be the same acronym as used in the Renardus Collection Level Description schema field “Acronym”.
2.3
Data model of the administrative database: Collection Level Description (CLD)
In the administrative database the participating Subject Gateways and brokers will make available collection management descriptions and mapping tables for DDC. Each Renardus participant is responsible for
Reynard IST-1999-10562
49
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
maintaining and offering information about their collection on a local server and providing the mapping tables from their local classification system(s) to the agreed classification system DDC. The part of Renardus collection description data model of the administrative database is based on the RSLP Collection Description Schema. Collection description is conform to the RSLP schema with some additional element. A syntax and some content rules for the partners’ Collection Level Description will be provided in due time. Three kinds of elements are used: -
Dublin Core (based) elements (e.g. dc:title)
-
Collection Level Description elements based on RSLP schema (e.g. cld:country)
-
Renardus specific Collection Level Description elements (e.g. ren-cld:language)
All elements except of DC.Relation are mandatory. A guideline for DC.Description will be developed in the context of D6.5 (delivered on 30. June 2001) with the goal to have a more or less standardized form of description. The aims of the collection description are: -
to support the selection of subject gateway(s) for searching
-
to provide background information about the participating subject gateway for human and machine users
-
to promote/register the individual subject gateway(s) as high quality resources in the Internet
Renardus Collection Level Description Attribute
RDF property
Definition
Dublin Core (based) elements: Title Identifier
Description
dc:title dc:identifier
dc:description
The name of the collection. An unambiguous reference to the collection within a given context (encoding scheme: URI). An account of the content of the collection. Comment: Renardus will provide a standardized structure of the content of description with information about granularity of collected resources, type of subject indexing, etc. in context of D6.5.
Language
dc:language
The main language(s) of the metadata in the collection with quantitative indication. Syntax: Free text.
Publisher
dc:publisher
An entity responsible for making the collection available. Comment: The organization etc. who is responsible for the intellectual (not technical) distribution of
Reynard IST-1999-10562
50
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
the collection. Format.Extent
dc:format dcq:extent
The size of the collection. Comment: It is recommended to provide the number of records as follows: about x records.
Date.Issued
dc:date dcq:issued
Date of formal iisuance (e.g. publication) of the collection.
Subject
dc:subject
The topic of the content of the collection. Syntax: Main DDC captions for the subjects represented in the Subject Gateway.
Subject Notation
dc:subject
The topic of the content of the collection. Syntax: Main DDC notations and captions for the subjects represented in the Subject Gateway: DDC notation1 – DDC caption1; DDC notation2 – DDC caption2 etc. Comment: Element content not displayed in human readable Collection Level Descriptions.
Relation
dc:relation dcq:hasPart dcq:isPartOf
A reference to a related resource. Syntax: Acronym followed by empty character must precede other describing text for every related subject gateway. Comment: At the moment only used by RDN and its member Subject Gateways.
Collection Level Description elements based on RSLP schema: Country
cld:country
The country in which the collection is physically located. Syntax: Free text.
Renardus specific Collection Level Description elements: Acronym
ren-cld:acronym
The acronym of the collection.
Resource Language
ren-cld:language
Language(s) of the described resources. Syntax: Free text.
DDC mapping URL
ren-cld:ddcMapping
URL of local DDC mapping information in Renardus format. Comment: Element content not displayed in human readable Collection Level Descriptions.
Z39.50 Location
ren-cld:Z3950Location
The online location of the Z39.50 server of the subject gateway Syntax: machine name; port number; database
Reynard IST-1999-10562
51
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
name Comment: Element content not displayed in human readable Collection Level Descriptions. Logo URL
ren-cld:logoURL
The URL of the logo (image) of the subject gateway. Comment: Element content not displayed in human readable Collection Level Descriptions.
2.4
Data flow
The data flow does not solely depend on the chosen data model but also on other aspects. For example, organizational and business issues as well as the gateway-to-server structures which the participants will choose are of importance in this context. All these matters are being studied and developed in the current Renardus work. WP3 develops organizational structures for the management of the Renardus service and for collaboration between the participants, WP8 investigates business issues which have impact on Renardus (e.g. Intellectual Property Rights, copyright). Also, interoperability issues (WP7) will influence the Renardus data flow. A first approach to data flow can therefore be only a general one, based on the Renardus architectural model (see http://www.konbib.nl/coop/reynard/restricted/architecture2.ppt). For Renardus a distributed system architecture has been chosen (see D2.2 and D2.3). Each participant or group of participants will be required to set up and maintain a Renardus server which will contain a Renardus content database and an administrative database. In order to make data from the participant gateways available and usable in Renardus a normalization process is needed. Data from all participants have to be harmonized. The question is at what step the normalization/harmonization process will be done. It is also of importance to the data flow whether the particular Renardus server holds the data of one single service or of a group of participating services. The structures underlying the different participating services are heterogeneous. In some cases there is one gateway involved (e.g. DutchESS, DAINet). In others there are distributed broker services involved (RDN) with differently structured records (e.g. RDN’s SOSIG or EEVL) or several gateways with uniform structures held by one institution (e.g. SSG-FI with its four subject guides). In case of a single service the service extracts the relevant data from its database, normalizes them to be conform with the agreed upon data model, and imports the data into the single Renardus server. Where a group of services chooses to maintain one joint Renardus server, each service has to extract and normalize its data in the appropriate way before exporting the data to the joint Renardus server. These conversion processes will most likely be different in that the record structure of the different services will not be the same. Also the methods of exporting and importing might be different for the individual services. Normalization can occur before a service’s exporting its relevant records or after importing them to the Renardus server. Several steps are needed to get the metadata from a Subject Gateway into the Renardus broker. A suggested model for partners to make their content available in a local single Renardus server is described in D2.2 resp. D2,3: -
to extract the appropriate records from the database
-
Record conversion/normalization process
Reynard IST-1999-10562
52
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
-
to write the necessary configuration files
-
to run the Zebra indexer on the record/files generated and to start the Zebra server
Except of writing configuration files these steps has to be repeated each time in case of refreshing the content of the metadata.
Reynard IST-1999-10562
53
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
PART IV – REMAINDER APPENDIX 3
APPENDIX A: QUESTIONNAIRE Renardus questionnaire D6.4: Data model and data flow (http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/questionnaires/all.html)
4
APPENDIX B: RESPONSES Questionnaire: Responses from the partners (http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/index.html)
ALUH: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/novagate.pdf BNF: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/bnf.pdf DDB: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/ddb.pdf DTV and NetLab: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/dtv_netlab.pdf JyU: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/fvl.pdf KB: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/dutchess.pdf SOSIG: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/sosig.pdf SUB: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/sub.pdf UKOLN: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/rdn.pdf ZADI: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/zadi.pdf
5
APPENDIX C: COMMENTS OF PARTNERS
General (0) DutchESS: I think those elements are the bare minimum required to support Renardus functionality. The other ones are important and should preferable be supported, but not supporting them is no reason to exclude gateways. Gateways that don't support these elements can not be included in searches based on advanced search functionality but as it is known from research that c. 90% of searches is simple search in all fields anyway, I don't think this matters much DTV/NetLab: only one of the subject fields is needed. A SBIG should support at least 6-7 of the 8 elements BnF: we have to define the content of the creator field FVL: All those elements are important DDB: Mime type or document type?
Title/Title.Alternative (1.1 – 1.6)
Reynard IST-1999-10562
54
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
DutchESS: DutchESS puts titles in various languages in the same title field, separated by "=". I suppose these various versions could be exported to different Renardus title fields by using this "=" separator. In that case we would be able to support some of the above options. Those titles could be exported to one title field and a number of alternative title fields or to more than one title field. In that way we could support either repeatable or non repeatable title and alt. title fields Regarding Title/Title.Alternative field: - either have a not repeatable title field and a repeatable title.alt field OR - have a repeatable title field and no title.alt field. DTV/NetLab: 1.1: As we mentionend in a previous mail we are unclear about is to what "repeatable" actually means in the context of the questions – in Renardus or locally in the SG and how this ultimately effects functionality in the service.(Doyle 28/06) are unclear about as to what » repeatable« actually means in the context of the questions - in Renardus or locally in the SG, and how this ultimatly effects functionality in the service. Since we are obliged to answer our answers will only relate to the Renardus service and not the local ones. The main title is the original title of the resource, we don,t wnant to see alternative (other) titles in Renardus., ie no repetition of main title and no alternative title. SUB: 1.2: It is desirable for all SG, that they will support a title.alternative for the future Renardus system 1.4: It is desirable for the future system that the main title is provided in English 1.5: In general: This should be an issue for WP 7. If it works, this is desirable. 1.6: This works only with a language tag for title and title alternative (also because of stop words: different meanings of „stop-words“ in different languages) FVL: The main title should be provided in the language of the resource. The (repeatable) Title.alternative element could contain the (manually translated - if needed) English title, acronym. (The Title.alternative is not repeatable at this moment in the FVL.) Email 14.08.200: 1.2: This means, that that e.g. translated title and acronym could be provided also in the same (not repeatable) field. At this moment the FVL utilises this practice. NOVAGate: Title and title.alternative are cross-searchable if you don’t limit the search only to title-field
Creator: rules (2.2 – 2.9) DTV/NetLab: expensive SUB: for the interoperability (issue of WP 7) of the Renardus system it might be useful to implement authority files, especially if the amount of data increases, e.g. by extension with OPACs. We also should keep an eye on Dublin Core, they thought about implementation of vcard BnF: Question 2.5 Syntax: This question is OK for personal names but doesn't concern the corporate names. In our point of view, the corporate bodies are more numerous than the personal names. Question 2.7 authority file: Does it mean to create a link to an existing authority file or to create a specific authority file for Renardus ? In our point of view, it should be a link to an existing authority file.
Creator: additional information (2.10 – 2.16) SUB: 2.16: that depends on the agent discussion of Dublin Core, general: We have to keep in mind that it is not realizable to repeat the creator field if we use HTML standard, with RDF this will be possible! BnF: Additional information must be addded in separate fields FVL: Any extra additional information (Email-address, organisational information) related to creator should be provided in same creator field with last name and first name. This is the simpliest solution (and maybe suitable for every participating SG) NOVAGate: all additional information have to be gathered on a voluntary basis
Reynard IST-1999-10562
55
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Description: general (3.1) SUB: It would be helpfull to have a language tag for the repeatable description field in case several descriptions are provided in different languages
Description: description + keywords (3.2 – 3.5) DTV/NetLab: some of description and subject must be in English SUB: 3.2: for the future: this should be required because of the cross-search functionality BnF: Does it concern keywords extracted out of the description for indexing purpose or do we have the description in one field and keywords in an other field ? In our point of view, we should have only one field for Description and one field for Subject Keyword. 3.2: In order to facilitate the handling of other languages for search languages for search purposes, we will be able to provide English keywords which are the LCSH equivalents besides the RAMEAU Subject Headings.
Description: multilinguality (3.6) SUB: This will be an issue of WP 7 ZADI: It would be good, but at this time it seems to be unrealistic
Subject: keywords – general (4.1 – 4.2) DTV/NetLab: for normalisation in Renardus every keyword has to be in an element entity of it's own, which naturally does not say anything about how we are to display it.
Subject: form of keywords (4.3 – 4.7) DTV/NetLab: keywords must separable by Renardus. This is done in the export function/normalization process and should take into account different languages BnF: Questions 4.3, 4.4, 4.5 and 4.6: In these 4 questions, there is a confusion between the nature of the subjects (free or controlled), their use (in one or more catalogs) and the level of the structuration (a single list (not structured) versus thesaurus). In our point of view, the only significativ differences must be: A. free keywords versus controlled keywords, B. specific thesaurus versus general thesaurus (encyclopedic). FVL: The form of keywords in different subject fields should be indicated for the user in the search page (advanced search form) NOVAGate: There are two fields for Enlish keywords: one for thesaurus based keywords (Agrovoc) and the other for free keywords. All keywords in nordic languages are in the same field
Subject: keywords – multilinguality (4.8) SUB: This will be an issue of WP 7 ZADI: desirable in future, but now impossible
Reynard IST-1999-10562
56
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
BnF: If it concerns free keywords, we could have an automatic translation. In the case of controlled subjects, we cannot have automatic translation but we can make a mapping or a "linking" between the subjects in different languages as we are doing in the MACS project (no evaluation)(http://www.bl.uk/information/finrap3.html).
Subject: classification – general (4.10 – 4.15) ZADI: Renardus should not use an existing classification system, but should be oriented on a suited classification, if there is any, DDC for description of document types, not possible for subject descriptions of sources BnF: We have to define which level of granularity within the DDC we would like FVL: We can test existing systems (UDC, DDC) in general level. If they aren´t suitable, then we can create a home-grown classification
Subject: classification system - cross-search with regard to a special subject classification (4.16 – 4.20) DTV/NetLab: Basic field for topical search should combine title, description and subject FVL: Cross-searching between main-classes is enough at this moment. If end user wants more exact search functions, Renardus could advise her/him to use the subject specific database (FVL evaluates the whole section with definitely not) SUB: with regard to the verbal description of the classification system: it is necessary to provide for each verbal description also the notation of the classification system or the general subject (as a scheme?), otherwise there will be a mixing of all verbal descriptions in the search/metadata browse index and users can’t assign the description to a subject
Subject: classification systems – multilinguality (4.21) ZADI: basis must be an English classification FVL: yes, the common classification system should be provided in several European languages. Renardus needs user interfaces for different languages. Anyway, the English interface has the priority
Identifier: general - regarding resources in several languages (5.1 – 5.2) DTV/NetLab: Use one record for each language version of the resource BnF: At the BnF, we provide the URL of the site in an other language within the description field FVL: This field is not essential element in search DDB: Resources in different languages are separate resources with separate metadata sets. There is no reason to have a repeatable field for this case
Identifier: general - regarding mirrored/copied resources (5.3 – 5.5)
Reynard IST-1999-10562
57
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
BnF: In your point of view, what could this special Renardus scheme be? We must re-use an existing one and not create a new one. We'd prefer to use the Qualifiers "Is version of" and "Has version" DDB: We should consider that there should be separate fields for urn and url. In the case of copies or mirrors the resources have only one urn but may have several urls. The url field must be repatable.
Identifier: Qualifier (5.6 – 5.9) DutchESS: PURLS have the form of a URL and it is not necessary to treat them as a separate category from URLs. URI is a collective category, including URLs, PURLs and URNs. DTV/NetLab: What do you mean by 'integrate'? BnF: URL, ISBN, URI, PURL, URN must be in separate fields but in the same index FVL: Let´s dedicate this field only for URLs. There is no use to make a too complicated system DDB: URIs are urns and urls. There are already questions for both
Language: code (6.2 – 6.4) DutchESS: May support a language code in the future. DTV/NetLab: Use DC recommendation: 639-2 FVL: The FVL uses ISO Code 639 with three letters DDB: 639 two letters is deducible from 639 three letters
Country: general (7.1 – 7.3) DTV/NetLab: How many SBIGs support this? SUB: The publisher country code as well as the server country code are useful FVL: The FVL will add country code in the near future to its records
Country: code (7.4 – 7.5) DutchESS: May support a country code in future FVL: ISO code with three letters would be better for the FVL
Type: general (8.1 – 8.5) DutchESS: Like country and language: we may support a type element in the future DTV/NetLab: DC model is DCT1 which should be combined with others
Reynard IST-1999-10562
58
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
SUB: see DCT2: Dublin Core Type Vocabulary: Subtypes Working Draft http://lcweb.loc.gov/marc/dc/subtypes-20000612.html) ZADI: DC based is supported in parts, other lists should be proofed before a definitely decision is made FVL: Qualifiers are not needed - simple type list is the best DDB: I hope that DC type will be reconciled with the other code lists
Rights (9.1 – 9.7) DTV/NetLab: local info SUB: This element is also important for business models between Subject Gateways and Renardus, between Renardus and other service providers etc. FVL: The rights field isn´t useful for the majority of internet resources. Anyway: if there is a need for special rights information, you can add it to the description field NOVAGate: We don’t have the separate field for rights, but we tell about access restrictions in the description / abstract field DDB: 9.1 to 9.7 are no alternatives
Publisher (10) BnF: We need to define the content of the publisher field FVL: Essential elements in search
Unique Record Number (IV B) DTV/NetLab: see question IV D (strongly recommended) FVL: This could be the unique records number, which is automatically generated by every SG DDB: If data is held distributed there is no cause of ambiguity
Record Creator (IV C) SUB: this might be important, e.g. if reviews are provided by people wellknown in the scientic community, users might be interested in the name of them DDB: That's a matter of the special gateway
SBIG ID (IV D) DDB: Only reasonable if there is a central database
Reynard IST-1999-10562
59
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Record Last Checked Date (IV E) DutchESS: Only a "last update date", not a "last checked date" so actual changes are reflected, but not every check which has not resulted in change DTV/NetLab: This is local information and not relevant for Renardus SUB: this is an important part of quality check/control DDB: That's a matter of the special gateway
6
APPENDIX D: SUMMARY Summary of responses (matrix): http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/summary_d6_4.pdf
7
APPENDIX E: Data Model and Data Flow Data model and data flow, draft version 0.3 (4. September 2000) http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/data_model.pdf
BIBLIOGRAPHY 8
BIBLIOGRAPHY
AACR2 Translation project (http://lcweb.loc.gov/loc/german/AACR2/AACR2translation.html) BUBL LINK - Browse by Dewey Class (http://bubl.ac.uk/link/ddc.html) Business issues for Internet information (http://www.ukoln.ac.uk/metadata/renardus/wp8/issues/)
gateways
(Michael
Day,
UKOLN(
Cross-browsing in Renardus: Usage of subject vocabularies at Renardus gateways, by Traugott Koch (http://www.lub.lu.se/renardus/class.html) Dempsey, L., 2000, The subject gateway: experiences and issues based on the emergence of the Resource Discovery Network. Online Information Review, 24 (1), 8-23. Koch, T., Day, M., 1997, The role of classification schemes in Internet resource description and discovery. DESIRE deliverable D3.2 (3), (http://www.ukoln.ac.uk/metadata/desire/classification/) MACS project (http://www.bl.uk/information/finrap3.html) RDN Cataloguing Guidelines (http://www.rdn.ac.uk/publications/cat-guide/)
Reynard IST-1999-10562
60
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
REFERENCES 9
REFERENCES
AACR2 and Seriality (Library of Congress) (http://lcweb.loc.gov/acq/conser/serialty.html) Cliff, P., 2000, RDN Resource Types, v. 1, (http://www.rdn.ac.uk/publications/cat-guide/types/) Codes for the Representation of Names of Languages – ISO 639-2 (http://lcweb.loc.gov/standards/iso6392/englangn.html) CULTURAL HERITAGE PROJECTS CONCERTATION EVENT (http://www.cscaustria.at/events/concertation.htm) Day, M., Cliff, P., 2000, RDN Cataloguing Guidelines, v. 1.0, (http://www.rdn.ac.uk/publications/cat-guide/) DC Agent Qualifiers - DC Working Draft - 10 December 1999 (http://www.mailbase.ac.uk/lists/dcagents/files/wd-agent-qual.html [DCMES version 1.1] Dublin Core Metadata Element Set, Version 1.1: Reference Description, (http://purl.oclc.org/dc/documents/rec-dces-19990702.htm) [DCMES Qualifiers (2000-07-11)] Dublin Core Qualifiers, (http://purl.org/dc/documents/rec/dcmes-qualifiers20000711.htm) DCT2: Dublin Core Type Vocabulary: Subtypes Working Draft (http://lcweb.loc.gov/marc/dc/subtypes20000612.html) Dempsey, L., 2000, The subject gateway: experiences and issues based on the emergence of the Resource Discovery Network. Online Information Review, 24 (1), 19. Dewey Decimal Classification (http://www.oclc.org/dewey/about/about_the_ddc.htm) Dublin Core Type 20000612.html)
Vocabulary:
Subtypes
Working
Draft
(http://lcweb.loc.gov/marc/dc/subtypes-
Dublin Core Type Working Group, 1999, List of Resource Types. Dublin Core Metadata Initiative Working Draft, (http://purl.org/dc/documents/wd-typelist.htm) First SCHEMAS Workshop on 11/12 Mai (http://www.schemas-forum.org/workshops/ws1/agenda.html) Gray, L., 2000, Cataloguing rules for the BIOME Service: a procedural manual (http://biome.ac.uk/guidelines/cat/) Humbul, 2000, Describing and cataloguing resources in Humbul, v. 0.4a. Draft, 26 October. (http://www.humbul.ac.uk/about/catalogue.html) ISO 3166 Maintenance Agency (http://www.din.de/gremien/nas/nabd/iso3166ma/) ISO 639-2 Registration Authority – Library of Congress (http://lcweb.loc.gov/standards/iso639-2/) ISO 639-2:1998, Codes for representation of names of languages - Part 2: Alpha-3 code. Geneva: International Organization for Standardization. MacLeod, R., Kerr, L., Guyon, A., 1998, The EEVL approach to providing a subject based information gateway for engineers. Program, 32 (3), 205-223.
Reynard IST-1999-10562
61
Deliverable: D6.4
Data model (first final versiont)
Issue: 1.0
Date of issue: 17 Novemberr 2000
Mapping ROADS/IAFA templates to Dublin Core (http://www.ukoln.ac.uk/metadata/interoperability/iafa_dc.html) Personennamendatei (PND) (http://www.ddb.de/professionell/pnd.htm) RAMEAU (http://www.bnf.fr/web-bnf/infopro/rameau/) RFC 1766 Tags for the identification of languages (http://info.internet.isi.edu/in-notes/rfc/files/rfc1766.txt) Renardus architectural model, (http://www.konbib.nl/coop/reynard/restricted/architecture2.ppt) RSLP Collection Description (http://www.ukoln.ac.uk/metadata/rslp/) RSLP Collection Description: Tool (http://www.ukoln.ac.uk/metadata/rslp/tool/) Simple Collection Description (draft version: 2. August 1999) (http://www.ukoln.ac.uk/metadata/cld/simple/)
Reynard IST-1999-10562
62