renardus: project deliverable

8 downloads 1674 Views 284KB Size Report
Sep 30, 2000 - Email information of creator should be provided, URL of creator (e.g. ..... RTNG resource description template structure (Gray, 2000).
Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

RENARDUS: PROJECT DELIVERABLE Project Number:

IST-1999-10562

Project Title:

Reynard - Academic Subject Gateway Service Europe

Deliverable Type:

Internal

Deliverable Number:

D6.4

Contractual Date of Delivery:

30 September 2000

Actual Date of Delivery:

17 November 2000

Title of Deliverable:

Data model (first final version 1.0)

Workpackage contributing to the Deliverable:

WP6

Nature of the Deliverable:

Report

URL:

http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/index.html (restricted access) http://renardus.sub.uni-goettingen.de/ (public access)

Authors:

Hans Jürgen Becker, Frank Klaproth, Heike Neuroth Contributions: Michael Day (UKOLN, text); Anders Ardo and Traugott Koch (DTV/NetLab, discussions).

Contact Details:

Platz der Göttinger Sieben 1 37073 Göttingen Germany email: [email protected]

Abstract

This report provides an introduction to the development of a Renardus Application Profile. It is a reference to the partners’ answers of the D6.4 questionnaire developed by SUB. The answers lead into the development of several data models: a data model of the Renardus prototype pilot system, a first version of the data model for the operational pilot system, and a data model for the administrative database. This database contains, besides the mapping tables for cross-browsing, tables for the conversion of some codes to the defined Renardus codes, and the collection description of each subject gateway. Finally, this report contains some upgrade recommendations for partners‘ metadata information.

Keywords

data model, data flow, subject gateway, metadata, profile, application profile, namespace, Renardus, Reynard

Reynard IST-1999-10562

1

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Distribution List:

All partners

Issue:

1.0

Reference:

IST-1999-10562 / D6.4 / 1.0

Total Number of Pages:

62

Reynard IST-1999-10562

Date of issue: 17 Novemberr 2000

2

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

TABLE OF CONTENTS PART I 1 1.1

TITLE PAGE

RESULTS Agreement on eight elements

14 14

1.2 Results of the second questionnaire developed for D6.4 15 1.2.1 Eight Elements for Cross-Searching 16 1.2.1.1 General (0) 16 1.2.1.2 Title (1) 17 1.2.1.2.1 Title/Title.Alternative (1.1 – 1.6) 17 1.2.1.3 Creator (2) 18 1.2.1.3.1 Creator: general (2.1) 18 1.2.1.3.2 Creator: rules (2.2 – 2.9) 18 1.2.1.3.3 Creator: additional information (2.10 – 2.16) 18 1.2.1.4 Description (3) 19 1.2.1.4.1 Description: general (3.1) 19 1.2.1.4.2 Description: description + keywords (3.2 – 3.5) 19 1.2.1.4.3 Description: multilinguality (3.6) 19 1.2.1.5 Subject (4) 19 1.2.1.5.1 Subject: keywords – general (4.1 – 4.2) 19 1.2.1.5.2 Subject: form of keywords (4.3 – 4.7) 20 1.2.1.5.3 Subject: keywords – multilinguality (4.8) 20 1.2.1.5.4 Subject: keywords – rules (4.9) 20 1.2.1.5.5 Subject: classification – general (4.10 – 4.15) 20 1.2.1.5.6 Subject: classification system - cross-search with regard to a special subject classification (4.16 – 4.20) 21 1.2.1.5.7 Subject: classification systems – multilinguality (4.21) 21 1.2.1.6 Identifier (5) 21 1.2.1.6.1 Identifier: general - regarding resources in several languages (5.1 – 5.2) 21 1.2.1.6.2 Identifier: general - regarding mirrored/copied resources (5.3 – 5.5) 21 1.2.1.6.3 Identifier: Qualifier (5.6 – 5.9) 22 1.2.1.7 Language (6) 22 1.2.1.7.1 Language: general (6.1) 22 1.2.1.7.2 Language: code (6.2 – 6.4) 22 1.2.1.8 Country (7) 22 1.2.1.8.1 Country: general (7.1 – 7.3) 22 1.2.1.8.2 Country: code (7.4 – 7.5) 22 1.2.1.9 Type (8) 23 1.2.1.9.1 Type: general (8.1 – 8.5) 23 1.2.2 Future Elements 23 1.2.2.1 Rights (9.1 – 9.7) 23 1.2.2.2 Publisher (10) 23 1.2.3 Additional Elements 24 1.2.4 Administrative Elements 24 1.2.4.1 Subject Gateway ID (IV A) 24 1.2.4.2 Unique Record Number (IV B) 24 1.2.4.3 Record Creator (IV C) 24 1.2.4.4 SBIG ID (IV D) 24 1.2.4.5 Record Last Checked Date (IV E) 24 1.2.4.6 Other (IV F) 24 1.3 Subject Gateways in the UK 1.3.1 RDN 1.3.2 Individual RDN hubs 1.3.2.1 BIOME

Reynard IST-1999-10562

24 25 26 26

3

Deliverable: D6.4

1.3.2.2 1.3.2.3 1.3.2.4 1.3.2.5

2

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

EEVL Humbul PSIgate SOSIG

DATA MODEL AND DATA FLOW

27 27 27 27

27

2.1 Data model for the prototype Renardus pilot system 2.1.1 Dublin Core Elements 2.1.1.1 DC.Title and DC.Title.Alternative 2.1.1.2 DC.Creator 2.1.1.3 DC.Description 2.1.1.4 DC.Subject: classification system(s) and keywords 2.1.1.5 DC.Identifier 2.1.1.6 DC.Language 2.1.1.7 DC.Type 2.1.2 Non Dublin Core element 2.1.2.1 Country 2.1.3 Administrative Renardus elements 2.1.3.1 Full Record URL 2.1.3.2 SBIG ID

28 29 29 30 31 32 33 34 34 36 36 36 36 37

2.2 Preliminary version of data model for the operational Renardus pilot system 2.2.1 Dublin Core Elements 2.2.1.1 DC.Title and DC.Title.Alternative 2.2.1.2 DC.Creator and DC.Creator.AddinionalInformation 2.2.1.3 DC.Description 2.2.1.4 DC.Subject: classification system(s) and keywords 2.2.1.5 DC.Identifier 2.2.1.6 DC.Language 2.2.1.7 DC.Type 2.2.2 Non Dublin Core element 2.2.2.1 Country 2.2.3 Administrative Renardus elements 2.2.3.1 Full Record URL 2.2.3.2 SBIG ID

38 39 39 40 41 42 43 45 46 48 48 48 48 49

2.3

Data model of the administrative database: Collection Level Description (CLD)

49

2.4

Data flow

52

3 Appendix A: Questionnaire Renardus questionnaire D6.4: Data model and data flow (http://www.sub.unigoettingen.de/ssgfi/reynard/wp6/d6.4/questionnaires/all.html) 54 4 Appendix B: Responses Questionnaire: Responses from the partners (http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/index.html)

54

5

54

Appendix C: Comments of Partners

6 Appendix D: Summary Summary of responses (matrix): http://www.sub.unigoettingen.de/ssgfi/reynard/wp6/d6.4/summary_d6_4.pdf 60

Reynard IST-1999-10562

4

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

7 Appendix E: Data Model and Data Flow Data model and data flow, draft version 0.3 (4. September 2000) http://www.sub.unigoettingen.de/ssgfi/reynard/wp6/d6.4/data_model.pdf 60 8

BIBLIOGRAPHY

60

9

REFERENCES

61

Reynard IST-1999-10562

5

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

PART II - MANAGEMENT OVERVIEW DOCUMENT CONTROL Issue

Date of Issue

Comments

0.1

10 May 2000

First draft presented to partners on Bath meeting (excel sheet)

0.2

12 May 2000

Second draft, presented on first SCHEMAS workshop in Bath

0.3

8 September 2000

Third draft, for review by project partners on Paris meeting

0.4

6/7 November 2000

Fourth draft, for review by project partner on Göttingen meeting

1.0

17 November 2000

First final version

EXECUTIVE SUMMARY The object of the Renardus project is to establish an academic subject gateway service in Europe. The pilot system will be based on a generic broker-architecture and data-model that will allow the integrated searching and browsing of distributed resource collections. This report will provide background information about the development of the Renardus data model and data flow. It is a reference to the partners’ answers of the D6.4 questionnaire developed by SUB. Michael Day (UKOLN) presents basckground information about RDN and the individual hubs. The answers lead into a data model of the Renardus prototype pilot system and a first version of the data model for the operational pilot system. The questionnaire was provided to the following ten partners: DutchESS (The Netherlands), NOVAGate (Nordic countries), EELS (Sweden), DEF fagportal (Denmark), DAINet (Germany), FVL (Finland), Les Signets (France), RDN (United Kingdom), DDB (Germany) and SSG-FI (Germany). The answers of the partners are summarized in the following list, only those responses with the highest priority (required, strongly recommended and recommended) are considered: Title/Title.Alternative: The main Title should not be repeatable, Title.Alternative element should be repeatable, Title and Title.Alternative should be both cross-searchable. Title should be provided in the language of the resource and additional titles (translated title, acronym, etc.) should be provided in repeatable Title.Alternative elements. Creator: Creator should be a repeatable element. Description: Description element should be repeatable in case the description is provided in more than one language. Each Subject Gateway should provide either an English version of Description or an English version of Keywords for every resource (beside other languages). Subject: Keywords should be browsable and repeatable. All forms of the repeatable element Keyword (free, controlled, thesaurus based) should be provided and the form of Keywords should be indicated for the user. The Subject Gateways should be browsable via a common Classification System, Renardus should use an existing Common Classification system and this system should be DDC (all partners map their system to DDC). The Classification System should be provided in several European languages. Verbal description of each notation (caption) should be indexed together with keywords, so users can search both; besides the common Classification System, Renardus should provide subject classification systems like MSC, Ei: cross-searchable via notation and captions as well. Identifier:

Reynard IST-1999-10562

6

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Identifier should be repeatable and searchable if the resource is provided in more than one language with different URLs. Renardus should integrate URLs, ISBNs, ISSNs, PURLS in Identifier elements with different qualifiers. Language: Language element should be repeatable and the language code should be the ISO Code 639, three letters. Country: Country should reflect the publisher country and the country code should be ISO Code 3166, two letters. Types: Renardus should develop a common list of Types (controlled list) and the common list of Types should be based on the Dublin Core type list. Future Elements: Renardus should support the Rights element in the future (in the sense of IPRs, Rights should contain information about access conditions/restrictions of the resource and should contain copyright/IPR information of the resource as well). Rights should be a repeatable element for different kinds of information (access conditions/restrictions, subscription information, copyright, IPR, etc.). Renardus should use the element Rights with different qualifiers for different kinds of information Renardus should support in the future a Publisher element -

On the basis of partners’ answers several data models have been developed. The Renardus broker system will consist of two databases: 1) Renardus decentral content database, which contains records extracted from each individual Service Provider (can consist of several Subject Gateways). The data model for this database consists of seven well defined metadata elements, which are based on Dublin Core, one non-DC metadata element (Country), and two administrative elements (Full Record URL and SBIG ID). There are two versions of the data model: One version is for the prototype pilot system and the second is for the operational pilot system. The following figures provide the Renardus metadata elements for these two systems (M=mandatory, R=strongly recommended, O=optional, NR=not repeatable, R=repeatable, LQ=Language Qualifier): Prototype Pilot System: Metadata Element DC.Title DC.Title.Alternative DC.Creator

Obligation M O R

DC.Description

M

DC.Subject

M

DC.Subject:DDC

M

DC.Identifier

M

DC.Language

R

DC.Type

R

DC.Type.DCT1 Country Full Record URL

R R R

Reynard IST-1999-10562

Repeatable LQ Comments NR possible R possible R no Last name and first name should be clearly distinguishable. R possible For cross-search reasons the field description must contain free text. R possible In the prototype system there will be no further distinction between the several kinds of subject (keywords, classification system). R no DDC 21: adapted DDC version for crossbrowsing puporse. Only captions and not notations will be displayed R no In the prototype system no distinction will be made between resource URL, mirrored, copied resource URL(s) and URL(s) for archive reasons. R no The language code is the ISO 639-2, three letter code. R no Subject Gateways should provide their original types without encoding scheme. R no NR no 3166-1 (two letter code) NR no A URL that leads to a detailed display of each record at the originating service site.

7

Deliverable: D6.4

Data model (first final versiont)

SBIG ID

Issue: 1.0

Date of issue: 17 Novemberr 2000

M

NR

no

A stable unique acronym also well defined in the Collection Level Description.

Metadata Element DC.Title

Obligation M

Repeatable NR

LQ yes

DC.Title.Alternative DC.Creator

O R

R R

yes no

DC.Creator. Additional. Information DC.Description

O

R

no

Comments Title should be the original title. It is strongly recommended to provide only one version of title in this field. Last name and first name should be clearly distinguishable. Additional information like Email, URL, Organisational Information.

M

R

yes

DC.Subject

M

R

yes

M

R

no

DC.Identifier

M

R

no

DC.Identifier. Mirror DC.Identifier. Archive DC.Language

O O

R NR

no no

R

R

no

DC.Type

R

R

no

DC.Type.DCT1 DC.Type.DCT2

R O

R R

no no

Country Full Record URL

R R

NR NR

no no

SBIG ID

M

NR

no

Operational Pilot System:

DC.Subject:DDC

For cross-search reasons the field description must contain free text. Strongly recommended: Each SG should provide either an English version of description or an English version of keywords for every resource (beside other languages). In the operational system there will be made a distinction between the several kinds of subject (keywords, classification system). For the final system the provision of keywords is required. DDC 21: adapted DDC version for crossbrowsing puporse. Only captions and not notations will be displayed In the operational system a distinction will be made between resource URL, mirrored, copied resource URL(s) and URL(s) for archive reasons. The language code is the ISO 639-2, three letter code. Subject Gateways should provide their original types without encoding scheme. The possibility and usability of a mapping to DCT2 will be investigated in WP 7. 3166-1 (two letter code) A URL that leads to a detailed display of each record at the originating service site. A stable unique acronym also well defined in the Collection Level Description.

2) Renardus administrative database, which contains the collection description of each subject gateway, the mapping tables for cross-browsing the metadata via the common classification system DDC, some codes (probably language, country, and type) for conversion to the defined Renardus codes. The metadata elements for this kind of database are based on the RSLP collection description schema. The aims of the collection description are to support the selection of subject gateway(s) for searching, to provide background information about the participating subject gateway for human and machine users, and to promote/register the individual subject gateway(s) as high quality resources in the Internet. The following list provides the elements of the Renardus Collection Level Description schema:

Reynard IST-1999-10562

8

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Title: Identifier: Description: Language: Publisher: Format.Extent: Date.Issued: Subject: Subject Notation: Relation: Country: Acronym: Resource Language: DDC mapping URL: Z39.50 Location: Logo URL:

The name of the collection. An unambiguous reference to the collection within a given context. An account of the content of the collection. The main language(s) of the metadata in the collection with quantitative indication. An entity responsible for making the collection available. The size of the collection. Date of formal iisuance (e.g. publication) of the collection. The topic of the content of the collection. The topic of the content of the collection. A reference to a related resource. The country in which the collection is physically located. The acronym of the collection. Language(s) of the described resources. URL of local DDC mapping information in Renardus format. The online location of the Z39.50 server of the subject gateway. The URL of the logo (image) of the subject gateway.

Some recommendations for upgrade processes for partners’ metadata information are provided: In case the element Keyword is not yet an element in partners’ datamodel for the normalization process it is recommemded to provide this element first. For the future it is required that the title will be provided in the original version, other forms of title could be given in the title.alternative field. It is still undecided if in the future it will be required to provide an English version of the title, either in the Title field or in the Title.Alternative field. Considering that all partners should support an element it is further recommended that all partners support the country element. It seems to be easier to extract the country code from the domain of a URL than to support a language code. In conclusion, if partners have to upgrade their metadata information it is strongly recommended to include first keywords, than country followed by type and language. All three data models will be updated in the future; so during the next months the several data models will lead into a final version of the Renardus Application Profile, which will be described in the public report D6.5, to be delivered in June 2001.

SCOPE STATEMENT This report is the second internal deliverable (beside two public deliverables: D6.1 and D6.2) to be issued by WP6 (Data model and data flow) of the Renardus project. The objective of WP6 is to develop the data model that will underpin the Renardus system. The aim of the questionnaire gateway survey was to analyse the gateway structures and formats of the Renardus partners. These should lead to the setup of a generic service profile that is needed to record all types of information about a gateway service. The inventory of the participating services is necessary for the specifications of functional requirements of the data model (D6.3) and for building the data model (D6.4/D6.5). This report provides also important features for WP 1 (functional model) and WP 2 (design and implementation).

Reynard IST-1999-10562

9

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

PART III - DELIVERABLE CONTENT INTRODUCTION This report will provide background information about the development of the Renardus data model and data flow. It is a reference to the partners’ answers of the second questionnaire. This answers lead into a data model of the Renardus prototype pilot system and a first version of the data model for the operational pilot system. The Appendix contains the data provided by the partners, the dynamically generated metadata mapping and overviews of keywords and classification systems (dynamically generated access databases). The data model and data flow will be extended by the discussions in the Dublin Core Community (e.g. 8th Dublin Core Workshop) e.g. related to agent. Throughout the runtime of the project corrections and additions will be worked in, so that the data model and data flow will always be up-to-date. The report is divided into two main chapters: The first chaper provides an overview about the results of the second questionnaire. The second chapter introduces the data model for the Renardus prototype pilot system as well as for the operational pilot system and for the administrative database (collection description) and presents a first overview about the data flow.

GLOSSARY AHRB Arts and Humanities Research Board.

ALUH Viikki Science Library, University of Helsinki, Finland.

BIOME The RDN hub for the medicine, health and the life sciences.

BNF Bibliothèque Nationale de France (National Library of France).

CLD Collection Level Description.

DAINet Deutsches Agrarinformationsnetz, Germany.

DC Dublin Core.

DCMES Dublin Core Metadata Element Set.

DCMI Dublin Core Metadata Initiative.

Reynard IST-1999-10562

10

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

DDB Die Deutsche Bibliothek (National Library of Germany).

DDC Dewey Decimal Classification system.

DEF Danmarks Elektroniske Forskningsbibliotek. Denmark's Electronic Research Library - a virtual library for researchers, students, lecturers and other users of Danish research institutions, Denmark.

DNER Distributed National Electronic Resource - the JISC's concept of a managed environment for accessing heterogeneous, quality-assured information resources on the Internet.

DTV Technical Knowledge Centre and Library of Denmark.

Dublin Core An initiative - sometimes known as the Dublin Core Metadata Initiative (DCMI) - to develop a core metadata element set to facilitate the discovery of digital (networked) resources. Developments in the element set are defined on the basis of international consensus.

DutchESS Dutch Electronic Subject Service, The Netherlands.

EELS Engineering Electronic Library, Sweden.

EEVL Edinburgh Engineering Virtual Library - one of the eLib-funded Internet information gateways.

eLib The Electronic Libraries Programme - a series of UK higher education-based networking projects, funded by the JISC.

ESRC Economic and Social Research Council.

EULER European Libraries and Electronic Resources in Mathematical Sciences - a project funded by the European Union.

EEVL Edinburgh Engineering Virtual Library- one of the eLib-funded Internet information gateways.

FVL The Finnish Virtual Library - Virtuaalikirjasto, Finland.

HUB

Reynard IST-1999-10562

11

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Hubs provide data for RDN. Hubs may be individual organisations or (more frequently) consortia of prominent library, academic, research and professional organisations.

HUMBUL The RDN hub for the arts and humanities.

ISO International Organisation for Standardization.

JISC Joint Information Systems Committee - a strategic advisory committee working on behalf of the funding bodies for higher and further education in England, Scotland, Wales and Northern Ireland. Its mission is to promote the innovative application and use of information systems and information technology in higher and further education across the UK.

JyU Finnish Virtual Library Project, Jyväskylä University Library, Finland.

KB Koninklijke Bibliotheek, National Library of the Netherlands.

LCSH Library of Congress Subject Headings.

MSC Mathematics Subject Classification.

NetLab NetLab, Lund University, Sweden.

NOVAGate Nordic Gateway to Information in Forestry, Veterinary and Agricultural Sciences, Finland.

OMNI Organising Medical Networked Information - one of the eLib-funded Internet information gateways. Now part of the BIOME RDN Hub.

PSIgate RDN hub for physical sciences. The service is still under development.

RDN The Resource Discovery Network - the RDN is a co-operative network dedicated to providing access to highquality Internet resources for the learning, teaching and research community in the UK. The RDN is coordinated by a team based at UKOLN and King's College London.

ROADS Resource Organisation and Discovery in Subject-oriented services - originally an UK project funded by JISC under eLib, ROADS is an open-source software toolkit for Internet subject gateways.

Reynard IST-1999-10562

12

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

RSLP Research Support Libraries Programme.

SG Subject Gateway in the sense of quality controlled subject gateway, also called sometimes SBIGs (Subject Based Information Gateways).

SOSIG Social Science Information Gateway - one of the eLib-funded Internet information gateways, now a RDN Hub.

SSG-FI SonderSammelGebiets-FachInformationsführer (Special Subject Gateways), SUB Göttingen, Germany.

SUB Niedersächsische Staats- und Universitätsbibliothek Göttingen (Lower Saxony State and University Library Göttingen), Germany.

UKOLN UK Office for Library and Information Networking, University of Bath, UK.

URN Uniform Resource Name.

ZADI Zentralstelle für Agrardokumentation und -information, Germany.

Z39.50 An ANSI/NISO protocol for search and retrieval. Version 3 of the protocol has also been accepted as an ISO standard - ISO 23950.

Z39.85 Draft Standard Z39.85-200X: The Dublin Core Metadata Element Set.

Reynard IST-1999-10562

13

Deliverable: D6.4

1

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

RESULTS

This chapter is divided into three parts: The first part gives a short overview about the agreements made on the technical meeting in Bath (also fixed in the minutes), the second part summarizes the answers from the partners to the second questionnaire asking about further details of the data model and data flow like rules, codes, standards, and the third part provides a short outlook to RDN and the individual hubs. The numbers in brackets behind the subheadings refer to the corresponding questions in the questionnaire. The comments of partners to each section of questions can be found in appendix C.

1.1

Agreement on eight elements

After finishing the “Evaluation report of partner subject gateways” (see public version D6.1) partners agreed on 8 elements (at a technical meeting in Bath on 10. May) - without further discussion about rules, codes, standards, and qualifiers. They also agreed that partner subject gateways will have to support most of these elements (e.g. if one Subject Gateway supports only 7 of the elements this would be no reason to exclude it), but this needed more detailed discussion. They agreed further that the data model is based on Dublin Core. These eight elements are: -

DC.Title - probably title.alternative is repeatable

-

DC.Creator - repeatable

-

DC.Description - repeatable in case descriptions in several languages are provided

-

DC.Identifier: URI - possibly repeatable for mirror sites, but this needs further discussion

-

DC.Subject – repeatable and with the need of common classification system (either “home-grown” or mapped to a general system)

-

DC.Language - repeatable (need a common code like ISO 639)

-

DC.Type – repeatable: partners will either map their types to Dublin Core types, use DC types with Renardus specific extensions or develop a “home-grown” list of types with the most common ones

-

Country Code - a clear definition is needed, e.g. the publisher country or the country in which the server is located. Also to need a common code like ISO 3166)

Several reccommendations are formulated for two further elements, after developing the prototype pilot system: -

DC.Publisher: possibly include in the future? Will probably not be included in the pilot system

-

DC.Rights: possibly include this element in the future, e.g. to give information about copyright, access/restriction conditions (could also be necessary if print materials etc. will be included)

-

Rights in the sense of IPRs: probably included so that the SGs keep their copyrights of the metadata records after they are gathered from the broker service In order to specify common rules, codes, standards, and qualifiers, which can be supported by all Renardus partners SUB developed a more detailed questionnaire

Reynard IST-1999-10562

14

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

These results were presented by SUB at two conferences: At the first SCHEMAS workshop on 12. Mai and at the CULTURAL HERITAGE – CONCERTATION EVENT on 30. June. In order to specify common rules, codes, standards, and qualifiers for these elements SUB developed a more detailed questionnaire. In this questionnaire partners were asked for an evaluation of several proposals to qualify the metadata elements. 1.2

Results of the second questionnaire developed for D6.4

The main purpose of this questionnaire is to gather information about the qualifiers, rules, standards, and codes of the elements which are supported by the Renardus prototype and the operational pilot system. As the Bath meeting led only to a basic agreement on eight elements this questionnaire was intended to provide deeper insight on how to use them. The results lead into the development of the data model. The questionnaire was sent out on 3. July and partners were asked to send it back to SUB before 14 July. Because of holidays the last responses arrived at SUB on 24. August. Two partners (DTV and NetLab) filled in the questionnaire together. Because of the discussion and ongoing process at RDN about a centralized structure (RDNC) it was not possible to get common (and official) information from UKOLN, RDN or the single hubs. SUB and UKOLN try to get detailed information on the basis of the two questionnaires (D6.1 and D6.4) from all RDN hubs. The results will be presented in an updated version of D6.4. Some partners did not fill in the questionnaire completely so in case no evaluation was given (e.g. only ‘no’) they have not been incorporated into the analysis (see also Appendix C) and not are considered here in the report.

Following Renardus partners filled in the questionnaire:

Name

Acronym

URL

National Library of the Netherlands

KB

http://www.kb.nl/

National Library of France

BNF

http://www.bnf.fr/

National Library of Germany

DDB

http://www.ddb.de/

Finnish Virtual Library Project

JyU

http://www.jyu.fi/library/english/index.htm

NetLab, Lund University, Sweden

NetLab

http://www.lub.lu.se/netlab/

together with Technical Knowledge Centre and Library of Denmark

DTV

http://www.dtv.dk/

Niedersächsische Staats- und Universitätsbibliothek, Göttingen, Germany

SUB

http://www.sub.uni-goettingen.de/

Viikki Science Library, University of Helsinki, Finland

ALUH

http://helix.helsinki.fi/infokeskus/lib/

Zentralstelle für Agrardokumentation und information, Germany

ZADI

http://www.dainet.de/zadi/

Reynard IST-1999-10562

15

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

The answers of UKOLN and SOSIG will not be considered here. As mentioned above SUB and UKOLN will prepare a common view of these issues together and present the results in an updated version of D6.4. A short overview is given in chapter 1.3. For the questionnaire and the answers provided by each partner, see Appendices A and B. Partners had the possibility to answer the questions by giving an evaluation in the following way: required (1) strongly recommended (2) recommended (3) desirable (4) not necessary (5) definitely not (6).

It was also asked in most questions if partner subject gateways will support the mentioned rule, code etc. now or in future. This information will help to find common Renardus metadata element refinements and encoding schemes. The numbers in brackets behind the subheadings refer to the corresponding questions in the questionnaire. For each question the number of SGs which support the meaning in the question now or in future is located behind each result in brackets. 1.2.1

Eight Elements for Cross-Searching

In this chapter the results of the questionnaire lead into detailed information about rules, codes etc. about the eight elements (title, creator, description, subject, identifier, language, country, type). 1.2.1.1

General (0)

Partner subject gateways have to support most of the agreed eight elements. To gather information which elements are required for (future) subject gateways and must be supported, partners were asked to mark these elements. The results are summarised in figure 1:

Reynard IST-1999-10562

16

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Elements that have to be supported by each SBIG

Identifier Description

Renardus elements

Title Classification Keywords Creator Country Type Language 0

1

2

3

4

5

6

7

8

Number of support by partners [max. 8]

Figure 1: Evaluation about requirements of Renardus metadata elements.

The following metadata elements must be supported by (future) partners: title, description, subject: keywords, subject: classification system, and identifier. If a subject gateway provides no keywords, it could be allowed to generate keywords automatically from the description field. Generating keywords in this way the quality standards of Renardus has to be considered, e.g. stop words, controll of automated program. The following metadata elements are strongly recommended: creator, language, country and type. Partners have to consider that most of these elements must be supported. But if for example one element of the eight Renardus elements can’t be provided by a subject gateway this will be no argument to exclude the subject gateway from the broker system. Each case has to be negotiated with the Renardus team. 1.2.1.2 1.2.1.2.1

Title (1) Title/Title.Alternative (1.1 – 1.6)

Partners handle the title field in different ways (see public report D6.1), some partners provide the original title and translated title in the main title field (e.g. DutchESS), some partners use the title alternative field to provide translated titles or acronyms (e.g. RDN). Another open issue is the language of title with regard to cross-search this field. Required: -

Title and Title.Alternative should be cross-searchable (supported by all partners)

Strongly recommended:

Reynard IST-1999-10562

17

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

-

The main title should not be repeatable (supported by seven partners)

-

Title should be provided in the language of the resource and additional titles (translated title, acronym, etc.) should be provided in repeatable Title.Alternative fields (supported by six partners)

Strongly recommended/recommended: -

The Title.Alternative field should be repeatable (supported by five partners)

Not necessary: -

The main title should be provided in English (for cross-searching) and additional titles (translated title, acronym, etc.) should be provided in repeatable Title.Alternative fields (supported by one partner)

-

If there is no English title provided on the server side should Renardus provide an English version of the title (done by an automatic translation program)?

1.2.1.3

Creator (2)

Currently the Creator, Contributor and Publisher (collectively called Agent elements) are being discussed within the DC community. At the moment the proposed agent qualifiers are: Type, Name, Affiliation, Role, and Identifier (see DC Working Draft - 10 December 1999; http://www.mailbase.ac.uk/lists/dc-agents/files/wdagent-qual.html). SUB will keep an eye on the Agents discussion. Changes will be worked in in further deliverables. 1.2.1.3.1

Creator: general (2.1)

It is strongly recommended that creator should be a repeatable field (supported by all partners). 1.2.1.3.2

Creator: rules (2.2 – 2.9)

Results of the questionnaire with regard to creator rules are: Recommended/desirable: -

Syntax of creator should be last name, first name in one field, separated by a special character (supported by four partners)

-

Renardus should reuse existing authority files (PND – Germany, LoC authority file, other)

Not necessary: -

Cataloging rules like AACR2 (supported by two partners)

-

Syntax of creator should be last name, first name in separate fields (supported by three partners)

-

Renardus should provide authority files respective develop a home grown authority file

1.2.1.3.3

Creator: additional information (2.10 – 2.16)

Results of the questionnaire with regard to additional information of the creator field are: Desirable: -

Additional information should be provided in extra Renardus database fields (supported by one partner)

Reynard IST-1999-10562

18

Deliverable: D6.4

-

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Email information of creator should be provided, URL of creator (e.g. homepage) should be provided, Organizational information of creator should be provided (each part is supported by three partners)

Not necessary: -

Additional information should be provided in one Renardus database field, separated by special characters (supported by one partners)

-

Address information of creator should be provided in form of vCard (none partner support this)

1.2.1.4

Description (3)

1.2.1.4.1

Description: general (3.1)

It is strongly recommended that the description field is repeatable in case the description is provided in more than one language. Some subject gateways provide the description beside in English also in their native language (e.g. NOVAGate, ZADI, FVL) (supported by four partners). 1.2.1.4.2

Description: description + keywords (3.2 – 3.5)

This part of the questionnaire was important for cross-search issues. Partners were asked how strong they evaluate that subject gateways must provide description and/or keywords in English language. Recommended: -

Each SG should provide either an English version of description or an English version of keywords for every resource (beside other languages) (supported by seven partners)

Desirable: -

Each SG should provide an English version of keywords for every resource (beside other languages) (supported by five partners)

-

Each SG should provide an English version of description for every resource (beside other languages) (supported by five partners)

Not necessary -

Each SG should provide an English version of description and an English version of keywords for every resource (beside other languages) (supported by four partners)

1.2.1.4.3

Description: multilinguality (3.6)

In case no English description is provided by a SG it is not necessary to have an automatic translation of the main words of the description into English by the Renardus system, but for three of eight partners this will be desirable in the future. 1.2.1.5

Subject (4)

This chapter summarizes results of questions related to keywords as well as classification systems. 1.2.1.5.1

Subject: keywords – general (4.1 – 4.2)

It is recommended that keywords are browsable (condition: each SG must have its own keyword index). Only for one partner this issue is not necessary, all other seven partners evaluate this question between strongly recommended and desirable (supported by seven partners)

Reynard IST-1999-10562

19

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

It is more or less strongly recommended that this field should be repeatable in case keywords (controlled lists, thesaurus based, free keywords) are provided in several languages (supported by five partners). 1.2.1.5.2

Subject: form of keywords (4.3 – 4.7)

Strongly recommended/recommended: -

All forms of keywords (free, controlled, thesaurus based) should be provided

-

The form of keywords should be indicated for the user, e.g. if he/she only wants to search for thesaurus based keywords in his/her scientific area (supported by six partners)

-

Repeatable field for each form of keywords in one language (several thesauri, controlled lists, free keywords) (supported by four partners)

Not necessary/definitely not: -

Only controlled (home grown list and/or thesaurus based) keywords should be provided (no free keywords)

-

Only thesaurus based keywords should be provided (no free keywords, no controlled lists)

1.2.1.5.3

Subject: keywords – multilinguality (4.8)

An automatic translation of keywords into English in case no English keywords are provided by a SG is evaluated by four partners with desirable, one partner answers with not necessary and two partners with definitely not. In general this issue will not be necessary in Renardus. 1.2.1.5.4

Subject: keywords – rules (4.9)

Partners were asked if they use rules for keywords, e.g. geographica, proper names. Most of all partners use thesauri rules (DTV/NetLab, SUB: thesauri rules, BnF, FVL, DDB). ZADI uses also special thesauri for subjects, objects, and geographical regions. 1.2.1.5.5

Subject: classification – general (4.10 – 4.15)

Required/strongly recommended: -

The SGs should be browsable via a common classification system

-

Renardus should use an existing common classification system

Strongly recommended/recommended: -

The common classification system should be DDC (all partners map their system to DDC) (supported by six partners)

Recommended/desirable: -

Renardus should construct a common classification system

Not necessary: -

The common classification system should be a home grown one (a construction of all the partners' classification systems)

-

The common classification system should be a general classification system, other than DDC (all partners map their system to this general system)

Reynard IST-1999-10562

20

Deliverable: D6.4

1.2.1.5.6

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Subject: classification system - cross-search with regard to a special subject classification (4.16 – 4.20)

Recommended: -

Verbal description of each notation should be indexed together with keywords, so users can search both:

Recommended/desirable: -

Besides the common classification system, Renardus should provide subject classification systems like MSC, Ei: cross-search via notation

-

Besides the common classification system, Renardus should provide subject classification systems like MSC, Ei: cross-search via verbal description of the notation

Desirable: -

Besides the common system, Renardus should provide all other SG specific classification systems (local, national): cross-search via verbal description of the notation

-

Besides the common classification system, Renardus should provide all other SG specific classification systems (local, national): cross-search via notation

1.2.1.5.7

Subject: classification systems – multilinguality (4.21)

It is strongly recommended by partners that the common classification system should be provided in several European languages. 1.2.1.6

Identifier (5)

At the several Renardus meetings there were strong discussions about the handling of the field identifier e.g. in case several URLs are provided for one resource. Some partners provide more than one URL if the resource has e.g. several titles in different languages. On the other hand some partners stated that each record should have only one unique URL according to the one to one principle. To get now a common view on this topic several questions had to be answered by partners related to this topic. Furthermore there are open questions regarding mirror or copied sites, how to handle them. 1.2.1.6.1

Identifier: general - regarding resources in several languages (5.1 – 5.2)

Recommended: -

Repeatable if the resource is provided in more than one language with different URLs (supported by five partners)

-

If repeatable this field should also be searchable by the Renardus system (supported by six partners)

1.2.1.6.2

Identifier: general - regarding mirrored/copied resources (5.3 – 5.5)

Strongly recommended/recommended: -

If this field is repeatable it should alsoalso be searchable by the Renardus system (supported by five partners)

Desirable: -

Repeatable in the field identifier with a special Renardus scheme

Reynard IST-1999-10562

21

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Not necessary: -

Repeatable in DC.Relation (e.g. with a special Renardus scheme) (supported by two partners)

1.2.1.6.3

Identifier: Qualifier (5.6 – 5.9)

Recommended: -

Renardus should integrate URLs, ISBNs, ISSNs in Identifier fields with different qualifiers (supported by six partners)

-

Renardus should integrate URIs, PURLs, and URNs (supported by five respective six partners)

1.2.1.7 1.2.1.7.1

Language (6) Language: general (6.1)

It is strongly recommended that the language field is repeatable in separate fields in case several languages are provided (supported by six partners). 1.2.1.7.2

Language: code (6.2 – 6.4)

It is strongly recommended that Renardus should support the ISO Code 639, three letters (supported by six partners) and not the ISO Code 639 (supported by four partners), two letters (not necessary). There is no need to use other codes. 1.2.1.8

Country (7)

Although this element is no Dublin Core element partners decided to support this field. One of the open questions was the definition of this field. The country code could reflect the country of the publisher or the country in which the server is located. In the last sense, it would be possible for Renardus users to select or sort hits after the European countries. Another possiblity would be to reduce the hits returned on a search by filtering out a country; e.g. in case of duplicates of resources to select the nearest one. 1.2.1.8.1

Country: general (7.1 – 7.3)

Strongly recommended: -

The country code should reflect the publisher country (supported by seven partners)

Not necessary: -

The country code should reflect the server country (supported by two partners)

-

Renardus should support both, publisher and server country (e.g. country with a Renardus scheme publisher and another scheme server) (supported by two partners)

1.2.1.8.2

Country: code (7.4 – 7.5)

It is more or less strongly recommended by partners that the country code should be ISO Code 3166, two letters (supported by six partners). There is no need to use another code.

Reynard IST-1999-10562

22

Deliverable: D6.4

1.2.1.9

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Type (8)

Not all partners support this element and those partners, which support it, use different “controlled lists”, some of them are Dublin Core based. To get a common view and handling on this field partners were asked several questions. 1.2.1.9.1

Type: general (8.1 – 8.5)

Recommended: -

Renardus should develop a common list of types (controlled list)

-

The common list of types should be based on the Dublin Core type list (supported by five partners)

Not necessary: -

The common list of types should be a home grown one (mixture of all types of partners SGs)

-

The common list of types should be based on a type list other than Dublin Core (e.g. type list in MARC21, in Germany: Working Group "Codes", etc. )

Five partners want to specify the common type document.theses.habilitation etc., three partners don’t want this. 1.2.2

list

by

"qualifiers/subcategories"

like

Future Elements

With regard to future elements at the technical meeting in Bath (11. May) there was more or less the strong wish from some partners to support further elements after the prototype test installation of Renardus. 1.2.2.1

Rights (9.1 – 9.7)

Recommended/Desirable: -

Renardus should support the Rights element in the future (supported by four partners)

-

Renardus should support the Rights element in the sense of IPRs (SGs keep their copyrights of the metadata records after they are gathered from the broker service (supported by four partners)

-

Rights should contain information about access conditions/restrictions of the resource (e.g. technical/software requirements, subscription information) (supported by four partners)

-

Rights should contain copyright/IPR information of the resource (supported by three partners)

-

Rights should be a repeatable element for different kinds of information (access conditions/restrictions, subscription information, copyright, IPR, etc.)

-

Rights should contain information about access conditions/restrictions negotiated by the SG (by the library or institution maintaining the SG respectively) (supported by three partners)

-

Renardus should use the element Rights with different qualifiers for different kinds of information

1.2.2.2

Publisher (10)

It is strongly recommended to support in future a publisher element (five partners evaluate this with required and this field is supported by six partners).

Reynard IST-1999-10562

23

Deliverable: D6.4

1.2.3

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Additional Elements

There are some partners who want to support in the future a DC.Relation element (SUB, BnF), DC.Format element (SUB, DDB: there even might be format preferences for the display of different mime types, one partner stated definitely not (FVL: the system will become too, omplicated) and one partner referred to the Bath decision (DTV/NetLab). One partner (ZADI) mentioned some general interest to support additional elements in the future. This might be an issue that should be discussed new after the prototype installation of the Renardus broker. 1.2.4

Administrative Elements

For the administrative, separate database Renardus needs some further administrative metadata elements. 1.2.4.1

Subject Gateway ID (IV A)

It is strongly recommended that Renardus should support an element like Subject Gateway ID with the name and URL of the SG, so the user can search only in special gateways. 1.2.4.2

Unique Record Number (IV B)

It is recommended that Renardus should support an element like a Unique Record Number as an unambiguous Renardus identifier. 1.2.4.3

Record Creator (IV C)

It is not necessary that Renardus should information about the record creator (with last name, first name, Email, organisation etc.). 1.2.4.4

SBIG ID (IV D)

It is more or less recommended that Renardus should support a SBIG ID (=Record source) with the syntax: name of information provider/name of Subject Gateway:Internal ID of the record in the SG database. With this SBIG ID it is possible to update a record from the SG database to the Renardus database. 1.2.4.5

Record Last Checked Date (IV E)

It is recommended respective desirable that Renardus should support something like a "Record Last Checked Date" element, which informs about a date of the last verification or update of the metadata record. 1.2.4.6

Other (IV F)

FVL stated that aybe the participant gateways need an administrative field, which determines, whether is the record suitable for Renardus purposes or not. DDB stated that we should consider whether there should be separate sets representing the subject gateways, with elements describing their particular subject competences (for instance expressed by DDC notations), thereby enabling the system to route the user queries. Other elements might be system administrators etc. 1.3

Subject Gateways in the UK

One of the gateway initiatives associated with the Renardus project is the UK's Resource Discovery Network (RDN). The RDN is a service funded by the Joint Information Systems Committee (JISC) of the UK higher education funding councils with support from the Economic and Social Research Council (ESRC) and the Arts and Humanities Research Board (AHRB). The RDN builds upon the experiences of the subject gateway activity carried out under the JISC's Electronic Libraries (eLib) Programme.

Reynard IST-1999-10562

24

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

The RDN provides resource discovery services through a network of Internet information gateways that are clustered together in subject-based 'hubs' (see chapter 1.3.2). These are co-ordinated by a team based in the JISC's DNER Office at King's College London and at UKOLN. The hubs are essentially independent service providers who provide one or more Internet resource catalogues or gateways that can be accessed at a variety of levels. In addition, hubs have also developed, and linked to, a wide range of other information and related services (Dempsey, 2000, p. 19). Furthermore, in the context of the JISC's concept of a Distributed National Electronic Resource (DNER), the RDN hubs are being encouraged to provide additional service layers, brokering access to heterogeneous services through protocols like Z39.50. These services are referred to as DNER Portals. Dempsey (2000, p. 19) has said, in this context, that "the 'subject gateway' or resource catalogue is one component in a network of communicating services which may be assembled to meet particular business and user needs." In the RDN context, the contents of gateways can be accessed at a variety of levels: -

Individual gateways or Internet resource catalogues. Where hubs are comprised of more than one gateway, each will have its own Web interface. For example, the BIOME hub, which covers subjects in the health and life sciences, is made up of five distinct gateways. Each one has its own interface that allows searching and browsing within that particular gateway.

-

Hubs. Each RDN 'hub' will have an interface that allows for all of its component Internet resource catalogues to be searched (and possibly browsed) together. For more information on RDN hubs, see chapter 1.3.2.

-

The RDN. The RDN is responsible for providing an interface to all of the services developed by hubs, including services that will be able to cross-search through the ResourceFinder all of the Internet resource catalogues developed by RDN hubs.

The RDN hubs are independent service providers. They can (and do) use a wide variety of different software types and metadata formats. In order to support the central services that are offered by the RDN, it is strongly recommended that hubs are able to provide a minimum set of metadata that - as currently defined - is a sub-set of the Dublin Core elements. The six elements (Title, Subject, Description, Type, Identifier and Language) are defined (with brief content rules) in the RDN Cataloguing Guidelines (Day and Cliff, 2000). In this distributed scenario, it is unlikely that all RDN hubs would have a common single view of the Renardus data model. As new hubs (and Internet resource catalogues) become part of the RDN, it is possible that there could be even more diversity. 1.3.1

RDN

Michael Day from the RDN support team at UKOLN filled in the D6.4 questionnaire. He pointed out (in an email of 14 August) that the answers/comments on the questionnaire were mainly his own, but were in part based on the RDN Cataloguing Guidelines and other internal discussions. "Because the RDN is a federation of a number of gateways it is difficult to say whether RDN "supports" anything specific in the questionnaire, now or in the future. It is likely that parts of RDN will support some things, while the RDN as a whole may not. For example, ROADS gateways can record v-card-type information about creators or administrators, but the RDN ResourceFinder will not be able to search this. On the other hand, both the RDN and gateways will be certainly interested in things like developing a common classification system for cross-browsing. Many of the replies are fairly neutral ('desirable' or 'not necessary') because they are issues that have not been widely considered in an RDN context, e.g. the repeatability of some fields, descriptions in multiple languages, etc. Also, RDN allows gateways to do much their own thing and they do. Some (e.g. SOSIG) are based on ROADS, others (EEVL, the new OMNI) are not. Some use ROADS templates, others use something more DC-like. The RDN mandatory elements (Title, Subject, Description, Type, Identifier (URI), Language) are based on a subset of DC." The RDN Cataloguing Guidelines define content rules for all fifteen DCMES elements. Definitions were taken from the Reference Description of DCMES version 1.1. Schemes are used in four of the six 'minimum set' elements.

Reynard IST-1999-10562

25

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

-

Title. No particular scheme is defined in the guidelines, although AACR2 practice as regard to capitalisation and punctuation is recommended.

-

Subject. The guidelines do not mandate the use of any particular subject scheme, but if a scheme is used, a shortened version of the scheme should be added as a value qualifier.

-

Description. No particular scheme is defined in the guidelines.

-

Type. The guidelines suggest that resource type should be taken from either the draft list of Dublin Core Types (Dublin Core Type Working Group, 1999) or the list of types defined by the RDN (Cliff, 2000).

-

Identifier. If no value qualifier is present, the identifier must be an URI.

-

Language. This should be a language code either based on the three letter codes defined in ISO 639-2:1998 or the two letter codes recommended by RFC 1766. If required, RDN may need to provide some conversion tools to map between the two schemes.

All RDN Internet resource catalogues should be able to provide records broadly in accordance with these general guidelines. They would be able, therefore, to support most of the eight elements defined in the Renardus data model. http://www.rdn.ac.uk/

1.3.2

Individual RDN hubs

The RDN does not specify the software and metadata formats in use by each of the hubs. Most use their own metadata formats, although these tend to have some kind of relationship with ROADS/IAFA templates or the DCMES. The following sections attempt to explain the metadata formats in use within each of the RDN's current hubs, to note its relationship with the 'minimum set' recommended by the RDN itself, and to note content standards in use where these have been published. 1.3.2.1

BIOME

The BIOME health and life sciences hub is currently made up of five separate gateways that cover health and medicine (OMNI), animal health (VetGate), biological and biomedical science (BioResearch), the natural world (Natural Selection) and agriculture, food and forestry (AgriFor). A new gateway for nursing, midwifery and allied health professions (NMAHP) will soon be added. BIOME provides its own cataloguing rules based on the RTNG resource description template structure (Gray, 2000). These include versions of all six of the RDN's 'minimum set' of elements ('Title', 'Add subject descriptor', 'Add keywords', 'Description', 'Category', 'Main URI' and 'Main Language'), but also an element ('UK based') that will indicate whether the resource being described is based in the UK. -

The type element ('Category') uses a scheme defined by BIOME.

-

The language element ('Main Language') is left blank if English is the main language. Other languages are entered according to the MARC three letter language code (based on ISO 639-2:1988).

-

For the subject classification element ('Add subject descriptor'), the National Library of Medicine and the Library of Congress classification schemes are used in OMNI, NMAHP, VetGate and BioResearch; the Dewey Decimal Classification (DDC) scheme in AgriFor and Natural Selection. Controlled vocabulary schemes ('Add keyword') in use within BIOME include Medical Subject Headings (MeSH) for OMNI and BioResearch, MeSH and the RCN (Royal College of Nursing) thesaurus for NMAHP, the CAB thesaurus for AgriFor and VetGate, and Library of Congress Subject Headings (LCSH) for Natural Selection.

http://www.biome.ac.uk/

Reynard IST-1999-10562

26

Deliverable: D6.4

1.3.2.2

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

EEVL

EEVL (the Edinburgh Engineering Virtual Library) is currently the RDN service that covers engineering. EEVL uses its own metadata format of 22 attributes that includes five of the RDN 'minimum set' of elements ('Title', 'Classification', 'Description', 'Resource type' and 'URL'); i.e., all elements except 'Language' (MacLeod, Kerr and Guyon, 1998, pp. 209-210). The subject classification scheme adopted by EEVL is an in-house scheme that is loosely based on the Ei Classification Scheme developed by Engineering Information Inc. EEVL is part of a hub that will expand to cover the mathematical sciences (MathGate) and computing (Computing). The MathGate and Computing gateways are still under development. http://www.eevl.ac.uk/

1.3.2.3

Humbul

The Humbul service covers the arts and humanities. The gateway has developed its own software and uses an element set based on the Dublin Core. The service publishes some draft cataloguing guidelines, Describing and cataloguing resources in Humbul that are broadly based on the RDN guidelines and AACR2 (Humbul, 2000). Versions of all the RDN 'minimum set' of elements are 'required' elements, as are several other elements, including 'Author' and 'Publisher'. The main subject scheme in use is the Library of Congress Subject Headings (LCSH). Types are defined using the draft list of Dublin Core Types; the RDN-defined list of types and an additional set of types defined by Humbul itself. The 'Language' element uses the three letter code defined in ISO 639-2:1998. http://www.humbul.ac.uk/

1.3.2.4

PSIgate

The PSIgate hub will cover the physical sciences. The service is still under development. http://www.psigate.ac.uk/

1.3.2.5

SOSIG

The SOSIG service covers the social sciences, business and law. The gateway uses the ROADS software, and resources are described using ROADS/IAFA templates. These include equivalents of all RDN 'minimum set' elements ('Title', 'Subject-Descriptor'/'Subject-Descriptor-Scheme', 'Description', 'Category', 'URI' and 'Language'). The browse structure is based on the Universal Decimal Classification (UDC). A thesaurus searching option is also available which uses a thesaurus derived from HASSET (the Humanities And Social Sciences Electronic Thesaurus). http://www.sosig.ac.uk/

2

DATA MODEL AND DATA FLOW

Very early in the discussion of a Renardus data model it was clear, that the data model should be based on Dublin Core as far as possible. Only one Renardus element is neither a DC element nor a DC based element and this is Country. All other elements and qualifiers (element refinement and value encoding scheme) are based on Dublin Core where possible. In case no encoding scheme or refinement from Dublin Core can be used, the definition is a Renardus qualifier. It is also part of this workpackage to develop a Renardus namespace with a defined Renardus Metadata Element Set (RMES). The final Renardus application profile will be ready in June 2001 (the public deliverable of D6.5).

Reynard IST-1999-10562

27

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

The Renardus broker will consist of the content databases (decentral: Z39.50) with the agreed eight elements and two administrative elements and the Collection Level Description database. In this report the content database is at first based on the data model for the prototype Renardus pilot system (see 2.1) and later on, after test installation of the prototype, on the preliminary version of the data model for the operational Renardus pilot system (see 2.2). The content database will contain the metadata records extracted from the individual Service Providers databases in accordance with the Renardus data model. The Collection Level Description database will contain information on collection description of each subject gateway and the mapping tables (e.g. for DDC, probably also for Language, Type, or Country code) (see 2.3).

Cross-search, cross-browse and filter issues: The main basic index will allow a search across the elements Title, Description and Subject. Therefore it is necessary that firstly the Subject Gateways provide free text in the description field and not e.g. a URL and secondly that the Subject Gateways deliver any kind of subject information. Up to now it is an open question if DDC captions will also be included in the basic index. The cross-browsing structure will be realized through a mapping of each partners’ classification system to the Dewey Decimal Classification (DDC). The DDC element is mandatory. With the elements Country, Language, and Type some filter processes are possible. Together with the element Creator these elements could also be displayed in the result list.

Upgrade priority for partners’ metadata information: In case keyword is not yet an element in partners’ datamodel for the normalization process it is in the first place recommended to provide the element keyword. For the future it is required that the title will be provided in the original version, other forms of title could be given in the title.alternative field. It is still undecided if in the future it will be required to provide an English version of the title, either in the Title field or in the Title.Alternative field. Considering that all partners should support an element it is further recommended that all partners support the country element. It seems to be easier to extract the country code from the domain of a URL than to support a language code. In conclusion, if partners have to upgrade their metadata informatio it is strongly recommended to include first keywords, than country followed by type and language. 2.1

Data model for the prototype Renardus pilot system

The data model is mainly based on two Dublin Core documents: 

[DCMES version 1.1] Dublin Core Metadata Element Set, Version 1.1: Reference Description, http://purl.oclc.org/dc/documents/rec-dces-19990702.htm



[DCMES Qualifiers (2000-07-11)] Dublin Core Qualifiers, http://purl.org/dc/documents/rec/dcmesqualifiers-20000711.htm

Format of entries: Name

Reynard IST-1999-10562

Name of Metadata field

28

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Qualified DC name

Qualified Dublin Core name

Namespace

DCMES version 1.1, DCMES Qualifiers (2000-07-11) or Renardus Metadata Element Set = RMES version 0.1

Refinement(s)

Element Refinements used in Renardus: These qualifiers make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope

DC Encoding Scheme(s)

These qualifiers identify schemes that aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules. A value expressed using an encoding scheme will thus be a token selected from a controlled vocabulary (e.g., a term from a classification system or set of subject headings) or a string formatted in accordance with a formal notation (e.g., "2000-01-01" as the standard expression of a date). If an encoding scheme is not understood by a client or agent, the value may still be useful to a human reader

R Encoding Scheme(s)

Renardus encoding scheme, see above

Form of Obligation

In the Renardus data model the obligation can be: mandatory (M), strongly recommended (R) or optional (O). Mandatory ensures that some of the elements are always supported. An element with a mandatory obligation must have a value. The strongly recommended and the optional elements should be filled with a value if the information is appropriate to the given resource or provided by a Subject Gateway, but if not, they can be left blank.

Repeatable

Metadata field is repeatable: yes or no

LQ "LANG"

Language Qualifier "LANG": to give information about the language of the content of a metadata field (ISO Code 639, two letter), yes, no, or possible

DC Definition

Dublin Core Definition of metadata field

DC Comment

Dublin Core comments to this metadata field

R Definition

Renardus definition of metadata field

R Comment

Renardus comments to this metadata field

2.1.1

2.1.1.1

Dublin Core Elements

DC.Title and DC.Title.Alternative

Name

Title

Qualified DC name

DC.Title

Namespace

DCMES version 1.1

Refinement(s)

Alternative

Reynard IST-1999-10562

29

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

-

Obligation

M

Repeatable

no

LQ "LANG"

possible

DC Definition

A name given to the resource

DC Comment

Typically, a title will be a name by which the resource is formally known

R Definition

-

R Comment

-

Name

Title ¦ Alternative

Qualified DC name

DC.Title.Alternative

Namespace

DCMES Qualifiers (2000-07-11)

Refinement(s)

-

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

-

Obligation

O

Repeatable

yes

LQ "LANG"

possible

DC Definition

Any form of the title used as a substitute or alternative to the formal title of the resource

DC Comment

This qualifier can include Title abbreviations as well as translations

R Definition

-

R Comment

-

2.1.1.2

DC.Creator

Name

Creator

Qualified DC name

DC.Creator

Reynard IST-1999-10562

30

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Namespace

DCMES version 1.1

Refinement(s)

-

DC Encoding Scheme(s)

-

R Encoding Scheme(s)

For personal names: last name and first name in separate tags

Obligation

R

Repeatable

yes

LQ "LANG"

no

DC Definition

An entity primarily responsible for making the content of the resource.

DC Comment

Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.

R Definition

Creator(s) are person(s) which are responsible for the intellectual content of the document(s), e.g. webmasters are no creators.

R Comment

If this field is applicable it is strongly recommended to provide the creator. For Renardus normalization process it is strongly recommended that last name and first name are clearly distinguishable.

2.1.1.3

DC.Description

Name

Description

Qualified DC name

DC.Description

Namespace

DCMES version 1.1

Refinement(s)

-

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

-

Obligation

M

Repeatable

yes

LQ "LANG"

possible

DC Definition

An account of the content of the resource.

DC Comment

Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.

Reynard IST-1999-10562

31

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

R Definition

-

R Comment

For the Renardus normalization process it is not enough to provide only a URL, for cross-search reasons the field description must contain free text.

2.1.1.4

DC.Subject: classification system(s) and keywords

Name

Subject

Qualified DC name

DC.Subject

Namespace

DCMES Qualifiers (2000-07-11) and RMES version 0.1

Refinement(s)

-

DC Encoding Scheme(s)

LCSH, MESH, DDC, LCC, UDC

R Encoding Scheme(s)

all other encoding schemes used by the partners

Obligation

M

Repeatable

yes

LQ "LANG"

possible

DC Definition

The topic of the content of the resource.

DC Comment

Typically, a subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.

R Definition

-

R Comment

Here is the place for all subject information used by partners like controlled keywords, free keywords, classification system(s) and/or captions. In the prototype system there will be no further distinction between the several kinds of subject. In the prototype system the provision of keywords is strongly recommended, in the final system the provision of keywords is required.

Name

Subject ¦ DDC

Qualified DC name

DC.Subject

Namespace

DCMES Qualifiers (2000-07-11) and RMES version 0.1

Refinement(s)

-

DC Encoding Scheme(s)

DDC

Reynard IST-1999-10562

32

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

R Encoding Scheme(s)

Ren-DDC for normalization, DDC 21 can be extend by RENARDUS specific captions

Obligation

M

Repeatable

yes

LQ "LANG"

no

DC Definition

Dewey Decimal Classification, see also: http://www.oclc.org/dewey/index.htm

DC Comment

-

R Definition

DDC 21: adapted DDC version for cross-browsing puporse.

R Comment

This field is created in the Renardus normalization process via mapping tables from the particular Subject Gateway classification scheme. Each partner has to map the own classification system to DDC. Mapping guideline for DDC will be prepared in the context of WP 7. Only captions and not notations will be displayed.

2.1.1.5

DC.Identifier

Name

Identifier

Qualified DC name

DC.Identifier

Namespace

DCMES version 1.1

Refinement(s)

-

DC Encoding Scheme(s)

URI

R Encoding Scheme(s)

-

Obligation

M

Repeatable

yes, for translated sites and/or mirrored, copied sites

LQ "LANG"

no

DC Definition

An unambiguous reference to the resource within a given context.

DC Comment

Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN)..

R Definition

-

R Comment

URI means URL, URN, DOI, ISBN, ISSN etc. For Renardus normalization process

Reynard IST-1999-10562

33

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

DOI, ISBN und ISSN must be displayed in a URN syntax. In the prototype system no distinction will be made between resource URL, mirrored, copied resource URL(s) and URL(s) for archive reasons.

2.1.1.6

DC.Language

Name

Language

Qualified DC name

DC.Language

Namespace

DCMES version 1.1

Refinement(s)

-

DC Encoding Scheme(s)

ISO 639-2

R Encoding Scheme(s)

-

Obligation

R

Repeatable

yes

LQ "LANG"

-

DC Definition

A language of the intellectual content of the resource.

DC Comment

Recommended best practice for the values of the Language element is defined by RFC 1766 which includes a two-letter Language Code (taken from the ISO 639 standard), followed optionally, by a two-letter Country Code (taken from the ISO 3166 standard). For example, en for English, fr for French, or en-uk for English used in the United Kingdom

R Definition

-

R Comment

The language code is the ISO 639-2, three letter code. SUB will provide a mapping between the two letter and three letter language code but this will also be found on the LoC site – ISO 639-2: http://lcweb.loc.gov/standards/iso639-2/englangn.html

2.1.1.7

DC.Type

Name

Type ¦ DCMI Type (DCT1)

Qualified DC name

DC.Type

Namespace

DCMES Qualifiers (2000-07-11)

Refinement(s)

-

Reynard IST-1999-10562

34

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

DC Encoding Scheme(s)

DCMI Type Vocabulary (DCT1)

R Encoding Scheme(s)

-

Obligation

R

Repeatable

yes

LQ "LANG"

no

DC Definition

The nature or genre of the content of the resource.

DC Comment

Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.

R Definition

-

R Comment

SUB will provide a mapping of all types used in partners’ subject gateways to DCT1 (probably except of ZADI). The possibility and usability of a mapping to DCT2 will be investigated in the context of WP 7.

Name

Type

Qualified DC name

DC.Type

Namespace

DCMES version 1.1

Refinement(s)

-

DC Encoding Scheme(s)

-

R Encoding Scheme(s)

-

Obligation

R

Repeatable

yes

LQ "LANG"

no

DC Definition

The nature or genre of the content of the resource.

DC Comment

Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.

R Definition

-

R Comment

Subject Gateways should provide their original types without encoding scheme.

Reynard IST-1999-10562

35

Deliverable: D6.4

2.1.2

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Non Dublin Core element

2.1.2.1

Country

Name

Country

Qualified DC name

-

Namespace

RMES version 0.1

Refinement(s)

-

DC Encoding Scheme(s)

-

R Encoding Scheme(s)

ISO 3166-1 (two letter code) http://www.din.de/gremien/nas/nabd/iso3166ma/

Obligation

R

Repeatable

no

LQ "LANG"

no

DC Definition

-

DC Comment

-

R Definition

Country in which the publisher of the resource is located or the country which represents the cultural context of the resource. Code for the representation of names of countries.

R Comment

-

2.1.3

Administrative Renardus elements

Two administrative elements are used in Renardus for practical reasons: “Full Record ID” and “SBIG ID”. 2.1.3.1

Full Record URL

Name

Full Record URL

Qualified DC name

-

Namespace

RMES version 0.1

Refinement(s)

-

DC Scheme(s)

Encoding -

Reynard IST-1999-10562

36

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

R Encoding Scheme(s)

URL

Obligation

R

Repeatable

no

LQ "LANG"

no

DC Definition

-

DC Comment

-

R Definition

A URL that leads to a detailed display of each record at the originating service site.

R Comment

Because some partners generate their records dynamically it might be a problem to provide a URL to the full record display.

2.1.3.2

SBIG ID

Name

SBIG ID

Qualified DC name

-

Namespace

RMES version 0.1

Refinement(s)

-

DC Encoding Scheme(s)

-

R Encoding Scheme(s)

Acronym of Subject Gateway

Obligation

M

Repeatable

no

LQ "LANG"

no

DC Definition

-

DC Comment

-

R Definition

A stable unique acronym also well defined in the Collection Level Description.

R Comment

Must be the same acronym as used in the Renardus Collection Level Description schema field “Acronym”.

Reynard IST-1999-10562

37

Deliverable: D6.4

2.2

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Preliminary version of data model for the operational Renardus pilot system

This data model refleccts the current status of discussion. It is likely that there will be some changes e.g. with regard to obligation of an element, further qualifiers, additions in future e.g. with regard to support further elements like publisher, rights, format and relation, and some mor comments. In opposite to the data model for the prototype system this preliminary data model contains further qualifiers, some more language tags for the elements and some changes in the obligation of an element. The data model is mainly based on two Dublin Core documents: 

[DCMES version 1.1] Dublin Core Metadata Element Set, Version 1.1: Reference Description, http://purl.oclc.org/dc/documents/rec-dces-19990702.htm



[DCMES Qualifiers (2000-07-11)] Dublin Core Qualifiers, http://purl.org/dc/documents/rec/dcmesqualifiers-20000711.htm

Format of entries: Name

Name of Metadata field

Qualified DC name

Qualified Dublin Core name

Namespace

DCMES version 1.1, DCMES Qualifiers (2000-07-11) or Renardus Metadata Element Set = RMES version 0.1

Refinement(s)

Element Refinements used in Renardus: These qualifiers make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope

DC Encoding Scheme(s)

These qualifiers identify schemes that aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules. A value expressed using an encoding scheme will thus be a token selected from a controlled vocabulary (e.g., a term from a classification system or set of subject headings) or a string formatted in accordance with a formal notation (e.g., "2000-01-01" as the standard expression of a date). If an encoding scheme is not understood by a client or agent, the value may still be useful to a human reader

R Encoding Scheme(s)

Renardus encoding scheme, see above

Form of Obligation

In the Renardus data model the obligation can be: mandatory (M), strongly recommended (R) or optional (O). Mandatory ensures that some of the elements are always supported. An element with a mandatory obligation must have a value. The strongly recommended and the optional elements should be filled with a value if the information is appropriate to the given resource or provided by a Subject Gateway, but if not, they can be left blank.

Repeatable

Metadata field is repeatable: yes or no

LQ "LANG"

Language Qualifier "LANG": to give information about the language of the content of a metadata field (ISO Code 639, two letter), yes or no

DC Definition

Dublin Core Definition of metadata field

DC Comment

Dublin Core comments to this metadata field

Reynard IST-1999-10562

38

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

R Definition

Renardus definition of metadata field

R Comment

Renardus comments to this metadata field

2.2.1

Date of issue: 17 Novemberr 2000

Dublin Core Elements

2.2.1.1

DC.Title and DC.Title.Alternative

Name

Title

Qualified DC name

DC.Title

Namespace

DCMES version 1.1

Refinement(s)

Alternative

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

-

Obligation

M

Repeatable

no

LQ "LANG"

yes

DC Definition

A name given to the resource

DC Comment

Typically, a title will be a name by which the resource is formally known

R Definition

Title should be the original title, other forms of title should be provided in the Title. Alternative field.

R Comment

It is strongly recommended to provide only one version of title in this field (and not also e.g. translated titles).

Name

Title ¦ Alternative

Qualified DC name

DC.Title.Alternative

Namespace

DCMES Qualifiers (2000-07-11)

Refinement(s)

-

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

-

Obligation

O

Repeatable

yes

Reynard IST-1999-10562

39

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

LQ "LANG"

yes

DC Definition

Any form of the title used as a substitute or alternative to the formal title of the resource

DC Comment

This qualifier can include Title abbreviations as well as translations

R Definition

-

R Comment

-

2.2.1.2

DC.Creator and DC.Creator.AddinionalInformation

Name

Creator

Qualified DC name

DC.Creator

Namespace

DCMES version 1.1

Refinement(s)

-

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

For personal names: last name, first name in separate tags

Obligation

R

Repeatable

yes

LQ "LANG"

no

DC Definition

An entity primarily responsible for making the content of the resource.

DC Comment

Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.

R Definition

Creator(s) are person(s) which are responsible for the intellectual content of the document(s), e.g. webmasters are no creators.

R Comment

If this field is applicable it is strongly recommended to provide the creator. For Renardus normalization process it is strongly recommended that last name and first name are clearly distinguishable.

It is not yet clear if the Renardus datamodel will support the refinement “Additional Information” of creator. This dependes also on the agent discussion of Dublin Core and how DC will support this kind of information in future.

Reynard IST-1999-10562

40

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

- Formal for each kind of “Additional Information” like Email, URL and Organizational Information an extra definition table sheet Name

Creator ¦ AdditionalInformation

Qualified DC name

(see Agent discussion: http://www.mailbase.ac.uk/lists/dc-agents/files/wd-agent-qual.html)

Namespace

RMES version 0.1

Refinement(s)

RMES version 0.1 (for Additional Information)

DC Encoding Scheme(s)

(see Agent discussion: http://www.mailbase.ac.uk/lists/dc-agents/files/wd-agent-qual.html)

R Encoding Scheme(s)

Email, URL, OrgInf

Obligation

O

Repeatable

yes

LQ "LANG"

no

DC Definition

-

DC Comment

-

R Definition

Additional information like Email, URL, Organisational Information with regard to creator.

R Comment

-

2.2.1.3

DC.Description

Name

Description

Qualified DC name

DC.Description

Namespace

DCMES version 1.1

Refinement(s)

-

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

-

Obligation

M

Repeatable

yes

LQ "LANG"

yes

Reynard IST-1999-10562

41

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

DC Definition

An account of the content of the resource.

DC Comment

Description may include but is not limited to: an abstract, table of contents, reference to a geographical representation of content or a free-text account of the content.

R Definition

-

R Comment

For the Renardus normalization process it is not enough to provide only a URL, for cross-search reasons the field description must contain free text. Strongly recommended: Each SG should provide either an English version of description or an English version of keywords for every resource (beside other languages)

2.2.1.4

DC.Subject: classification system(s) and keywords

- Formal for each partners’classification system (captions and notations of thematic, subject, general, or local classification: FAO/AGRIS, Ei, NLM, BK etc.), each kind of keywords (thesauri based and/or controlled keywords, free keywords: AGROVOC Thesaurus, AGRIFOREST, Danish Agricultural Thesaurus, Ei Thesaurus, GEFO Thesaurus, HASSET Thesaurus, CAREDATA, IBSS Thesaurus, Thesaurus of Geoscience, Geo Ref Thesaurus etc.) and each DC encoding scheme an extra definition table sheet -

Name

Subject

Qualified DC name

DC.Subject

Namespace

DCMES Qualifiers (2000-07-11) and RMES version 0.1

Refinement(s)

-

DC Encoding Scheme(s)

LCSH, MeSH, DDC, LCC, UDC

R Encoding Scheme(s)

all other encoding schemes used by the partners

Obligation

M

Repeatable

yes

LQ "LANG"

yes

DC Definition

The topic of the content of the resource.

DC Comment

Typically, a subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.

R Definition

-

R Comment

Here is the place for all subject information used by partners like controlled keywords, free keywords, classification system(s) and/or captions. In the preliminary version of data model for the operational Renardus pilot there will be made a distinction between the several kinds of subject.

Reynard IST-1999-10562

42

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

For the final system the provision of keywords is required.

Name

Subject ¦ DDC

Qualified DC name

DC.Subject

Namespace

DCMES Qualifiers (2000-07-11) and RMES version 0.1

Refinement(s)

-

DC Encoding Scheme(s)

DDC

R Encoding Scheme(s)

Ren-DDC for normalization, DDC 21 can be extend by RENARDUS specific captions

Obligation

M

Repeatable

yes

LQ "LANG"

no

DC Definition

Dewey Decimal Classification, see also: http://www.oclc.org/dewey/index.htm

DC Comment

-

R Definition

DDC 21: adapted DDC version for cross-browsing puporse.

R Comment

This field is created in the Renardus normalization process via mapping tables from the particular Subject Gateway classification scheme. Each partner has to map the own classification system to DDC. Mapping guideline for DDC will be prepared in the context of WP 7. Only captions and not notations will be displayed.

2.2.1.5

DC.Identifier

Name

Identifier

Qualified DC name

DC.Identifier

Namespace

DCMES Qualifiers (2000-07-11) and RMES version 0.1

Refinement(s)

Mirror, Archive

DC Encoding Scheme(s)

URI

R Encoding Scheme(s)

-

Obligation

M

Reynard IST-1999-10562

43

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Repeatable

yes, for translated sites

LQ "LANG"

no

DC Definition

An unambiguous reference to the resource within a given context.

DC Comment

Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN)..

R Definition

-

R Comment

URI means URL, URN, DOI, ISBN, ISSN etc. For Renardus normalization process DOI, ISBN und ISSN must be displayed in a URN syntax. In the preliminary version of data model for the operational Renardus pilot system there will be made a distinction between resource URL, mirrored, copied resource URL(s) and URL(s) for archive reasons.

Name

Identifier ¦ Mirror

Qualified DC name

DC.Identifier

Namespace

RMES version 0.1

Refinement(s)

Mirror

DC Encoding Scheme(s)

URI

R Encoding Scheme(s)

-

Obligation

O

Repeatable

yes

LQ "LANG"

no

DC Definition

An unambiguous reference to the resource within a given context.

DC Comment

Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).

R Definition

-

R Comment

URI means URL, URN, DOI, ISBN, ISSN etc. For Renardus normalization process DOI, ISBN und ISSN must be displayed in a URN syntax.

Name

Identifier ¦ Archiv

Reynard IST-1999-10562

44

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Qualified DC name

DC.Identifier

Namespace

RMES version 0.1

Refinement(s)

Archiv

DC Encoding Scheme(s)

URI (? to ask DDB)

R Encoding Scheme(s)

-

Obligation

O

Repeatable

no

LQ "LANG"

no

DC Definition

An unambiguous reference to the resource within a given context.

DC Comment

Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).

R Definition

-

R Comment

-

2.2.1.6

DC.Language

Name

Language

Qualified DC name

DC.Language

Namespace

DCMES version 1.1

Refinement(s)

-

DC Encoding Scheme(s)

ISO 639-2

R Encoding Scheme(s)

-

Obligation

R

Repeatable

yes

LQ "LANG"

-

DC Definition

A language of the intellectual content of the resource.

DC Comment

Recommended best practice for the values of the Language element is defined by RFC 1766 which includes a two-letter Language Code (taken from the ISO 639 standard), followed optionally, by a two-letter Country Code (taken from the ISO

Reynard IST-1999-10562

45

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

3166 standard). For example, en for English, fr for French, or en-uk for English used in the United Kingdom R Definition

-

R Comment

The language code is the ISO 639-2, three letter code. SUB will provide a mapping between the two letter and three letter language code but this will also be found on the LoC site – ISO 639-2: http://lcweb.loc.gov/standards/iso639-2/englangn.html

2.2.1.7

DC.Type

Name

Type ¦ DCMI Type (DCT1)

Qualified DC name

DC.Type

Namespace

DCMES Qualifiers (2000-07-11)

Refinement(s)

-

DC Encoding Scheme(s)

DCMI Type Vocabulary (DCT1)

R Encoding Scheme(s) Obligation

R

Repeatable

yes

LQ "LANG"

no

DC Definition

The nature or genre of the content of the resource.

DC Comment

Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.

R Definition

-

R Comment

SUB will provide a mapping of all types used in partners’ subject gateways to DCT1 (probably except of ZADI).

Name

Type ¦ DCMI Type (DCT2)

Qualified DC name

DC.Type

Namespace

DCT2: Dublin Core Type Vocabulary: Subtypes http://lcweb.loc.gov/marc/dc/subtypes-20000928.html

Refinement(s)

-

DC Encoding Scheme(s)

DCMI Type Vocabulary (DCT2) as soon as it is fixed!

Reynard IST-1999-10562

Working

Draft,

46

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

R Encoding Scheme(s)

-

Obligation

O

Repeatable

yes

LQ "LANG"

no

DC Definition

The nature or genre of the content of the resource.

DC Comment

Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.

R Definition

A list of subtypes used to categorize the nature or genre of the content of the resource, a more specific list of resource types than available in the DCT1 Type Vocabulary.

R Comment

The possibility and usability of a mapping to DCT2 will be investigated in the context of WP 7.

Name

Type

Qualified DC name

DC.Type

Namespace

DCMES version 1.1

Refinement(s)

-

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

-

Obligation

R

Repeatable

yes

LQ "LANG"

no

DC Definition

The nature or genre of the content of the resource.

DC Comment

Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of DCMI Types). To describe the physical or digital manifestation of the resource, use the Format element.

R Definition

-

R Comment

Subject Gateways should provide their original types without encoding scheme.

Reynard IST-1999-10562

47

Deliverable: D6.4

2.2.2

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Non Dublin Core element

2.2.2.1

Country

Name

Country

Qualified DC name

-

Namespace

RMES version 0.1

Refinement(s)

-

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

ISO 3166-1 (two letter code) http://www.din.de/gremien/nas/nabd/iso3166ma/

Obligation

R

Repeatable

no

LQ "LANG"

no

DC Definition

-

DC Comment

-

R Definition

Country in which the publisher of the resource is located or the country which represents the cultural context of the resource. Code for the representation of names of countries.

R Comment

-

2.2.3

Administrative Renardus elements

Two administrative elements are used in Renardus for practical reasons: “Full Record ID” and “SBIG ID”. 2.2.3.1

Full Record URL

Name

Full Record URL

Qualified DC name

-

Namespace

RMES version 0.1

Refinement(s)

-

DC Scheme(s)

Encoding none

Reynard IST-1999-10562

48

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

R Encoding Scheme(s)

URL

Obligation

R

Repeatable

no

LQ "LANG"

no

DC Definition

-

DC Comment

-

R Definition

A URL that leads to a detailed display of each record at the originating service site.

R Comment

Because some partners generate their records dynamically it might be a problem to provide a URL to the full record display.

2.2.3.2

SBIG ID

Name

SBIG ID

Qualified DC name

-

Namespace

RMES version 0.1

Refinement(s)

-

DC Encoding Scheme(s)

none

R Encoding Scheme(s)

Acronym of Subject Gateway

Obligation

M

Repeatable

no

LQ "LANG"

no

DC Definition

-

DC Comment

-

R Definition

A stable unique acronym also well defined in the Collection Level Description.

R Comment

Must be the same acronym as used in the Renardus Collection Level Description schema field “Acronym”.

2.3

Data model of the administrative database: Collection Level Description (CLD)

In the administrative database the participating Subject Gateways and brokers will make available collection management descriptions and mapping tables for DDC. Each Renardus participant is responsible for

Reynard IST-1999-10562

49

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

maintaining and offering information about their collection on a local server and providing the mapping tables from their local classification system(s) to the agreed classification system DDC. The part of Renardus collection description data model of the administrative database is based on the RSLP Collection Description Schema. Collection description is conform to the RSLP schema with some additional element. A syntax and some content rules for the partners’ Collection Level Description will be provided in due time. Three kinds of elements are used: -

Dublin Core (based) elements (e.g. dc:title)

-

Collection Level Description elements based on RSLP schema (e.g. cld:country)

-

Renardus specific Collection Level Description elements (e.g. ren-cld:language)

All elements except of DC.Relation are mandatory. A guideline for DC.Description will be developed in the context of D6.5 (delivered on 30. June 2001) with the goal to have a more or less standardized form of description. The aims of the collection description are: -

to support the selection of subject gateway(s) for searching

-

to provide background information about the participating subject gateway for human and machine users

-

to promote/register the individual subject gateway(s) as high quality resources in the Internet

Renardus Collection Level Description Attribute

RDF property

Definition

Dublin Core (based) elements: Title Identifier

Description

dc:title dc:identifier

dc:description

The name of the collection. An unambiguous reference to the collection within a given context (encoding scheme: URI). An account of the content of the collection. Comment: Renardus will provide a standardized structure of the content of description with information about granularity of collected resources, type of subject indexing, etc. in context of D6.5.

Language

dc:language

The main language(s) of the metadata in the collection with quantitative indication. Syntax: Free text.

Publisher

dc:publisher

An entity responsible for making the collection available. Comment: The organization etc. who is responsible for the intellectual (not technical) distribution of

Reynard IST-1999-10562

50

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

the collection. Format.Extent

dc:format dcq:extent

The size of the collection. Comment: It is recommended to provide the number of records as follows: about x records.

Date.Issued

dc:date dcq:issued

Date of formal iisuance (e.g. publication) of the collection.

Subject

dc:subject

The topic of the content of the collection. Syntax: Main DDC captions for the subjects represented in the Subject Gateway.

Subject Notation

dc:subject

The topic of the content of the collection. Syntax: Main DDC notations and captions for the subjects represented in the Subject Gateway: DDC notation1 – DDC caption1; DDC notation2 – DDC caption2 etc. Comment: Element content not displayed in human readable Collection Level Descriptions.

Relation

dc:relation dcq:hasPart dcq:isPartOf

A reference to a related resource. Syntax: Acronym followed by empty character must precede other describing text for every related subject gateway. Comment: At the moment only used by RDN and its member Subject Gateways.

Collection Level Description elements based on RSLP schema: Country

cld:country

The country in which the collection is physically located. Syntax: Free text.

Renardus specific Collection Level Description elements: Acronym

ren-cld:acronym

The acronym of the collection.

Resource Language

ren-cld:language

Language(s) of the described resources. Syntax: Free text.

DDC mapping URL

ren-cld:ddcMapping

URL of local DDC mapping information in Renardus format. Comment: Element content not displayed in human readable Collection Level Descriptions.

Z39.50 Location

ren-cld:Z3950Location

The online location of the Z39.50 server of the subject gateway Syntax: machine name; port number; database

Reynard IST-1999-10562

51

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

name Comment: Element content not displayed in human readable Collection Level Descriptions. Logo URL

ren-cld:logoURL

The URL of the logo (image) of the subject gateway. Comment: Element content not displayed in human readable Collection Level Descriptions.

2.4

Data flow

The data flow does not solely depend on the chosen data model but also on other aspects. For example, organizational and business issues as well as the gateway-to-server structures which the participants will choose are of importance in this context. All these matters are being studied and developed in the current Renardus work. WP3 develops organizational structures for the management of the Renardus service and for collaboration between the participants, WP8 investigates business issues which have impact on Renardus (e.g. Intellectual Property Rights, copyright). Also, interoperability issues (WP7) will influence the Renardus data flow. A first approach to data flow can therefore be only a general one, based on the Renardus architectural model (see http://www.konbib.nl/coop/reynard/restricted/architecture2.ppt). For Renardus a distributed system architecture has been chosen (see D2.2 and D2.3). Each participant or group of participants will be required to set up and maintain a Renardus server which will contain a Renardus content database and an administrative database. In order to make data from the participant gateways available and usable in Renardus a normalization process is needed. Data from all participants have to be harmonized. The question is at what step the normalization/harmonization process will be done. It is also of importance to the data flow whether the particular Renardus server holds the data of one single service or of a group of participating services. The structures underlying the different participating services are heterogeneous. In some cases there is one gateway involved (e.g. DutchESS, DAINet). In others there are distributed broker services involved (RDN) with differently structured records (e.g. RDN’s SOSIG or EEVL) or several gateways with uniform structures held by one institution (e.g. SSG-FI with its four subject guides). In case of a single service the service extracts the relevant data from its database, normalizes them to be conform with the agreed upon data model, and imports the data into the single Renardus server. Where a group of services chooses to maintain one joint Renardus server, each service has to extract and normalize its data in the appropriate way before exporting the data to the joint Renardus server. These conversion processes will most likely be different in that the record structure of the different services will not be the same. Also the methods of exporting and importing might be different for the individual services. Normalization can occur before a service’s exporting its relevant records or after importing them to the Renardus server. Several steps are needed to get the metadata from a Subject Gateway into the Renardus broker. A suggested model for partners to make their content available in a local single Renardus server is described in D2.2 resp. D2,3: -

to extract the appropriate records from the database

-

Record conversion/normalization process

Reynard IST-1999-10562

52

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

-

to write the necessary configuration files

-

to run the Zebra indexer on the record/files generated and to start the Zebra server

Except of writing configuration files these steps has to be repeated each time in case of refreshing the content of the metadata.

Reynard IST-1999-10562

53

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

PART IV – REMAINDER APPENDIX 3

APPENDIX A: QUESTIONNAIRE Renardus questionnaire D6.4: Data model and data flow (http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/questionnaires/all.html)

4

APPENDIX B: RESPONSES Questionnaire: Responses from the partners (http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/index.html)

ALUH: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/novagate.pdf BNF: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/bnf.pdf DDB: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/ddb.pdf DTV and NetLab: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/dtv_netlab.pdf JyU: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/fvl.pdf KB: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/dutchess.pdf SOSIG: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/sosig.pdf SUB: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/sub.pdf UKOLN: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/rdn.pdf ZADI: http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/zadi.pdf

5

APPENDIX C: COMMENTS OF PARTNERS

General (0) DutchESS: I think those elements are the bare minimum required to support Renardus functionality. The other ones are important and should preferable be supported, but not supporting them is no reason to exclude gateways. Gateways that don't support these elements can not be included in searches based on advanced search functionality but as it is known from research that c. 90% of searches is simple search in all fields anyway, I don't think this matters much DTV/NetLab: only one of the subject fields is needed. A SBIG should support at least 6-7 of the 8 elements BnF: we have to define the content of the creator field FVL: All those elements are important DDB: Mime type or document type?

Title/Title.Alternative (1.1 – 1.6)

Reynard IST-1999-10562

54

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

DutchESS: DutchESS puts titles in various languages in the same title field, separated by "=". I suppose these various versions could be exported to different Renardus title fields by using this "=" separator. In that case we would be able to support some of the above options. Those titles could be exported to one title field and a number of alternative title fields or to more than one title field. In that way we could support either repeatable or non repeatable title and alt. title fields Regarding Title/Title.Alternative field: - either have a not repeatable title field and a repeatable title.alt field OR - have a repeatable title field and no title.alt field. DTV/NetLab: 1.1: As we mentionend in a previous mail we are unclear about is to what "repeatable" actually means in the context of the questions – in Renardus or locally in the SG and how this ultimately effects functionality in the service.(Doyle 28/06) are unclear about as to what » repeatable« actually means in the context of the questions - in Renardus or locally in the SG, and how this ultimatly effects functionality in the service. Since we are obliged to answer our answers will only relate to the Renardus service and not the local ones. The main title is the original title of the resource, we don,t wnant to see alternative (other) titles in Renardus., ie no repetition of main title and no alternative title. SUB: 1.2: It is desirable for all SG, that they will support a title.alternative for the future Renardus system 1.4: It is desirable for the future system that the main title is provided in English 1.5: In general: This should be an issue for WP 7. If it works, this is desirable. 1.6: This works only with a language tag for title and title alternative (also because of stop words: different meanings of „stop-words“ in different languages) FVL: The main title should be provided in the language of the resource. The (repeatable) Title.alternative element could contain the (manually translated - if needed) English title, acronym. (The Title.alternative is not repeatable at this moment in the FVL.) Email 14.08.200: 1.2: This means, that that e.g. translated title and acronym could be provided also in the same (not repeatable) field. At this moment the FVL utilises this practice. NOVAGate: Title and title.alternative are cross-searchable if you don’t limit the search only to title-field

Creator: rules (2.2 – 2.9) DTV/NetLab: expensive SUB: for the interoperability (issue of WP 7) of the Renardus system it might be useful to implement authority files, especially if the amount of data increases, e.g. by extension with OPACs. We also should keep an eye on Dublin Core, they thought about implementation of vcard BnF: Question 2.5 Syntax: This question is OK for personal names but doesn't concern the corporate names. In our point of view, the corporate bodies are more numerous than the personal names. Question 2.7 authority file: Does it mean to create a link to an existing authority file or to create a specific authority file for Renardus ? In our point of view, it should be a link to an existing authority file.

Creator: additional information (2.10 – 2.16) SUB: 2.16: that depends on the agent discussion of Dublin Core, general: We have to keep in mind that it is not realizable to repeat the creator field if we use HTML standard, with RDF this will be possible! BnF: Additional information must be addded in separate fields FVL: Any extra additional information (Email-address, organisational information) related to creator should be provided in same creator field with last name and first name. This is the simpliest solution (and maybe suitable for every participating SG) NOVAGate: all additional information have to be gathered on a voluntary basis

Reynard IST-1999-10562

55

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Description: general (3.1) SUB: It would be helpfull to have a language tag for the repeatable description field in case several descriptions are provided in different languages

Description: description + keywords (3.2 – 3.5) DTV/NetLab: some of description and subject must be in English SUB: 3.2: for the future: this should be required because of the cross-search functionality BnF: Does it concern keywords extracted out of the description for indexing purpose or do we have the description in one field and keywords in an other field ? In our point of view, we should have only one field for Description and one field for Subject Keyword. 3.2: In order to facilitate the handling of other languages for search languages for search purposes, we will be able to provide English keywords which are the LCSH equivalents besides the RAMEAU Subject Headings.

Description: multilinguality (3.6) SUB: This will be an issue of WP 7 ZADI: It would be good, but at this time it seems to be unrealistic

Subject: keywords – general (4.1 – 4.2) DTV/NetLab: for normalisation in Renardus every keyword has to be in an element entity of it's own, which naturally does not say anything about how we are to display it.

Subject: form of keywords (4.3 – 4.7) DTV/NetLab: keywords must separable by Renardus. This is done in the export function/normalization process and should take into account different languages BnF: Questions 4.3, 4.4, 4.5 and 4.6: In these 4 questions, there is a confusion between the nature of the subjects (free or controlled), their use (in one or more catalogs) and the level of the structuration (a single list (not structured) versus thesaurus). In our point of view, the only significativ differences must be: A. free keywords versus controlled keywords, B. specific thesaurus versus general thesaurus (encyclopedic). FVL: The form of keywords in different subject fields should be indicated for the user in the search page (advanced search form) NOVAGate: There are two fields for Enlish keywords: one for thesaurus based keywords (Agrovoc) and the other for free keywords. All keywords in nordic languages are in the same field

Subject: keywords – multilinguality (4.8) SUB: This will be an issue of WP 7 ZADI: desirable in future, but now impossible

Reynard IST-1999-10562

56

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

BnF: If it concerns free keywords, we could have an automatic translation. In the case of controlled subjects, we cannot have automatic translation but we can make a mapping or a "linking" between the subjects in different languages as we are doing in the MACS project (no evaluation)(http://www.bl.uk/information/finrap3.html).

Subject: classification – general (4.10 – 4.15) ZADI: Renardus should not use an existing classification system, but should be oriented on a suited classification, if there is any, DDC for description of document types, not possible for subject descriptions of sources BnF: We have to define which level of granularity within the DDC we would like FVL: We can test existing systems (UDC, DDC) in general level. If they aren´t suitable, then we can create a home-grown classification

Subject: classification system - cross-search with regard to a special subject classification (4.16 – 4.20) DTV/NetLab: Basic field for topical search should combine title, description and subject FVL: Cross-searching between main-classes is enough at this moment. If end user wants more exact search functions, Renardus could advise her/him to use the subject specific database (FVL evaluates the whole section with definitely not) SUB: with regard to the verbal description of the classification system: it is necessary to provide for each verbal description also the notation of the classification system or the general subject (as a scheme?), otherwise there will be a mixing of all verbal descriptions in the search/metadata browse index and users can’t assign the description to a subject

Subject: classification systems – multilinguality (4.21) ZADI: basis must be an English classification FVL: yes, the common classification system should be provided in several European languages. Renardus needs user interfaces for different languages. Anyway, the English interface has the priority

Identifier: general - regarding resources in several languages (5.1 – 5.2) DTV/NetLab: Use one record for each language version of the resource BnF: At the BnF, we provide the URL of the site in an other language within the description field FVL: This field is not essential element in search DDB: Resources in different languages are separate resources with separate metadata sets. There is no reason to have a repeatable field for this case

Identifier: general - regarding mirrored/copied resources (5.3 – 5.5)

Reynard IST-1999-10562

57

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

BnF: In your point of view, what could this special Renardus scheme be? We must re-use an existing one and not create a new one. We'd prefer to use the Qualifiers "Is version of" and "Has version" DDB: We should consider that there should be separate fields for urn and url. In the case of copies or mirrors the resources have only one urn but may have several urls. The url field must be repatable.

Identifier: Qualifier (5.6 – 5.9) DutchESS: PURLS have the form of a URL and it is not necessary to treat them as a separate category from URLs. URI is a collective category, including URLs, PURLs and URNs. DTV/NetLab: What do you mean by 'integrate'? BnF: URL, ISBN, URI, PURL, URN must be in separate fields but in the same index FVL: Let´s dedicate this field only for URLs. There is no use to make a too complicated system DDB: URIs are urns and urls. There are already questions for both

Language: code (6.2 – 6.4) DutchESS: May support a language code in the future. DTV/NetLab: Use DC recommendation: 639-2 FVL: The FVL uses ISO Code 639 with three letters DDB: 639 two letters is deducible from 639 three letters

Country: general (7.1 – 7.3) DTV/NetLab: How many SBIGs support this? SUB: The publisher country code as well as the server country code are useful FVL: The FVL will add country code in the near future to its records

Country: code (7.4 – 7.5) DutchESS: May support a country code in future FVL: ISO code with three letters would be better for the FVL

Type: general (8.1 – 8.5) DutchESS: Like country and language: we may support a type element in the future DTV/NetLab: DC model is DCT1 which should be combined with others

Reynard IST-1999-10562

58

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

SUB: see DCT2: Dublin Core Type Vocabulary: Subtypes Working Draft http://lcweb.loc.gov/marc/dc/subtypes-20000612.html) ZADI: DC based is supported in parts, other lists should be proofed before a definitely decision is made FVL: Qualifiers are not needed - simple type list is the best DDB: I hope that DC type will be reconciled with the other code lists

Rights (9.1 – 9.7) DTV/NetLab: local info SUB: This element is also important for business models between Subject Gateways and Renardus, between Renardus and other service providers etc. FVL: The rights field isn´t useful for the majority of internet resources. Anyway: if there is a need for special rights information, you can add it to the description field NOVAGate: We don’t have the separate field for rights, but we tell about access restrictions in the description / abstract field DDB: 9.1 to 9.7 are no alternatives

Publisher (10) BnF: We need to define the content of the publisher field FVL: Essential elements in search

Unique Record Number (IV B) DTV/NetLab: see question IV D (strongly recommended) FVL: This could be the unique records number, which is automatically generated by every SG DDB: If data is held distributed there is no cause of ambiguity

Record Creator (IV C) SUB: this might be important, e.g. if reviews are provided by people wellknown in the scientic community, users might be interested in the name of them DDB: That's a matter of the special gateway

SBIG ID (IV D) DDB: Only reasonable if there is a central database

Reynard IST-1999-10562

59

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Record Last Checked Date (IV E) DutchESS: Only a "last update date", not a "last checked date" so actual changes are reflected, but not every check which has not resulted in change DTV/NetLab: This is local information and not relevant for Renardus SUB: this is an important part of quality check/control DDB: That's a matter of the special gateway

6

APPENDIX D: SUMMARY Summary of responses (matrix): http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/summary_d6_4.pdf

7

APPENDIX E: Data Model and Data Flow Data model and data flow, draft version 0.3 (4. September 2000) http://www.sub.uni-goettingen.de/ssgfi/reynard/wp6/d6.4/data_model.pdf

BIBLIOGRAPHY 8

BIBLIOGRAPHY

AACR2 Translation project (http://lcweb.loc.gov/loc/german/AACR2/AACR2translation.html) BUBL LINK - Browse by Dewey Class (http://bubl.ac.uk/link/ddc.html) Business issues for Internet information (http://www.ukoln.ac.uk/metadata/renardus/wp8/issues/)

gateways

(Michael

Day,

UKOLN(

Cross-browsing in Renardus: Usage of subject vocabularies at Renardus gateways, by Traugott Koch (http://www.lub.lu.se/renardus/class.html) Dempsey, L., 2000, The subject gateway: experiences and issues based on the emergence of the Resource Discovery Network. Online Information Review, 24 (1), 8-23. Koch, T., Day, M., 1997, The role of classification schemes in Internet resource description and discovery. DESIRE deliverable D3.2 (3), (http://www.ukoln.ac.uk/metadata/desire/classification/) MACS project (http://www.bl.uk/information/finrap3.html) RDN Cataloguing Guidelines (http://www.rdn.ac.uk/publications/cat-guide/)

Reynard IST-1999-10562

60

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

REFERENCES 9

REFERENCES

AACR2 and Seriality (Library of Congress) (http://lcweb.loc.gov/acq/conser/serialty.html) Cliff, P., 2000, RDN Resource Types, v. 1, (http://www.rdn.ac.uk/publications/cat-guide/types/) Codes for the Representation of Names of Languages – ISO 639-2 (http://lcweb.loc.gov/standards/iso6392/englangn.html) CULTURAL HERITAGE PROJECTS CONCERTATION EVENT (http://www.cscaustria.at/events/concertation.htm) Day, M., Cliff, P., 2000, RDN Cataloguing Guidelines, v. 1.0, (http://www.rdn.ac.uk/publications/cat-guide/) DC Agent Qualifiers - DC Working Draft - 10 December 1999 (http://www.mailbase.ac.uk/lists/dcagents/files/wd-agent-qual.html [DCMES version 1.1] Dublin Core Metadata Element Set, Version 1.1: Reference Description, (http://purl.oclc.org/dc/documents/rec-dces-19990702.htm) [DCMES Qualifiers (2000-07-11)] Dublin Core Qualifiers, (http://purl.org/dc/documents/rec/dcmes-qualifiers20000711.htm) DCT2: Dublin Core Type Vocabulary: Subtypes Working Draft (http://lcweb.loc.gov/marc/dc/subtypes20000612.html) Dempsey, L., 2000, The subject gateway: experiences and issues based on the emergence of the Resource Discovery Network. Online Information Review, 24 (1), 19. Dewey Decimal Classification (http://www.oclc.org/dewey/about/about_the_ddc.htm) Dublin Core Type 20000612.html)

Vocabulary:

Subtypes

Working

Draft

(http://lcweb.loc.gov/marc/dc/subtypes-

Dublin Core Type Working Group, 1999, List of Resource Types. Dublin Core Metadata Initiative Working Draft, (http://purl.org/dc/documents/wd-typelist.htm) First SCHEMAS Workshop on 11/12 Mai (http://www.schemas-forum.org/workshops/ws1/agenda.html) Gray, L., 2000, Cataloguing rules for the BIOME Service: a procedural manual (http://biome.ac.uk/guidelines/cat/) Humbul, 2000, Describing and cataloguing resources in Humbul, v. 0.4a. Draft, 26 October. (http://www.humbul.ac.uk/about/catalogue.html) ISO 3166 Maintenance Agency (http://www.din.de/gremien/nas/nabd/iso3166ma/) ISO 639-2 Registration Authority – Library of Congress (http://lcweb.loc.gov/standards/iso639-2/) ISO 639-2:1998, Codes for representation of names of languages - Part 2: Alpha-3 code. Geneva: International Organization for Standardization. MacLeod, R., Kerr, L., Guyon, A., 1998, The EEVL approach to providing a subject based information gateway for engineers. Program, 32 (3), 205-223.

Reynard IST-1999-10562

61

Deliverable: D6.4

Data model (first final versiont)

Issue: 1.0

Date of issue: 17 Novemberr 2000

Mapping ROADS/IAFA templates to Dublin Core (http://www.ukoln.ac.uk/metadata/interoperability/iafa_dc.html) Personennamendatei (PND) (http://www.ddb.de/professionell/pnd.htm) RAMEAU (http://www.bnf.fr/web-bnf/infopro/rameau/) RFC 1766 Tags for the identification of languages (http://info.internet.isi.edu/in-notes/rfc/files/rfc1766.txt) Renardus architectural model, (http://www.konbib.nl/coop/reynard/restricted/architecture2.ppt) RSLP Collection Description (http://www.ukoln.ac.uk/metadata/rslp/) RSLP Collection Description: Tool (http://www.ukoln.ac.uk/metadata/rslp/tool/) Simple Collection Description (draft version: 2. August 1999) (http://www.ukoln.ac.uk/metadata/cld/simple/)

Reynard IST-1999-10562

62