Slavic Digital Text Workshop 2006
The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment
Muriel Foulonneau (
[email protected])
Grainger Engineering Library University of Illinois at Urbana-Champaign
UIUC June 2006
Outlines
Improving resource discoverability
Interoperability
Metadata and protocols
The Open Archives Protocol for Metadata Harvesting
Hidden Web, portals and distributed digital libraries
The protocol, examples of services and repositories
Issues for digital libraries of distributed objects
[email protected] University of Illinois at UC
2
June 15th, 2006
Improving resource discoverability
[email protected] University of Illinois at UC
3
June 15th, 2006
Sharing content
New services, new representations of the content, new audiences
Bring your content to attention of new users outside your immediate community
37% of visits to images of the State Library of New South Wales came from the PictureAustralia portal in 2002/3
[email protected] University of Illinois at UC
4
June 15th, 2006
Integrated Access to CIC Metadata
http://cicharvest.grainger.uiuc.edu/
[email protected] University of Illinois at UC
5
June 15th, 2006
Thematic access to resources
[email protected] University of Illinois at UC
6
June 15th, 2006
Russian Publics collection at UIUC
[email protected] University of Illinois at UC
7
June 15th, 2006
On the CIC metadata portal
[email protected] University of Illinois at UC
8
June 15th, 2006
Search on Google
[email protected] University of Illinois at UC
9
June 15th, 2006
Multiple services use different features
Full text Metadata AND resources
Metadata Metadata
[email protected] University of Illinois at UC
Collection descript. 10
Metadata AND resources
June 15th, 2006
Interoperability
[email protected] University of Illinois at UC
11
June 15th, 2006
Content and services
Building services
service Collection
=> New services need content with similar features
[email protected] University of Illinois at UC
12
June 15th, 2006
What is interoperability
Interoperability is the capacity for different systems to talk to each other 01-04-04
-“01-04-04”
- this is a month
I need
A standard language An interpreter
[email protected] University of Illinois at UC
- 01=“Jan”
13
June 15th, 2006
Various types of interoperability
Technical
Organizational
Protocols, hardware, … Mac/PC, Netscape/IE …
Who is in charge? Competence? Politics? Update? Rules
Content – related = metadata
What do you talk about? The “item” = Granularity and nature of the object Semantic : date…. Created? Published? Syntactical : 04 January 2004 Linguistic : 04 Enero 2004
[email protected] University of Illinois at UC
14
June 15th, 2006
Metadata
Are used to Manage Provide information Retrieve Preserve Define rights and conditions of use Describe structure
⇒
Descriptive Administrative Structural
⇒ ⇒
[email protected] University of Illinois at UC
15
June 15th, 2006
A metadata format
Is a set of elements or information, mandatory or not, to apply together in order to reach one of the above mentioned objectives Standard
As a text As a DTD in SGML As a Xschema in XML
=> MARC, EAD, MODS, Dublin Core, LOM, MPEG7, MyHomeCookedSchema …
[email protected] University of Illinois at UC
16
June 15th, 2006
The Dublin Core Metadata Element Set
15 elements Content Coverage Description Relation Type Source Title Subject
[email protected] University of Illinois at UC
Intellectual property Rights Contributor Publisher Creator
17
Instantiation Language Identifier Format Date
June 15th, 2006
Where metadata lay
“Internal” Webpage
Embedded TEI, EAD
External Catalogs XML records …
Includes a link to the resource
=> Third party metadata
[email protected] University of Illinois at UC
18
Library of Congress home page The Library of Congress June 15th, 2006
Sharing metadata : Federated search
My user wants “mills”…. Whatever that comes from
Federated search My resource 04
Mill?
My resource 04
My resource 04
[email protected] University of Illinois at UC
Eg. Z39.50, SRU/SRW, WAIS
19
June 15th, 2006
Sharing metadata : Data agregation
The portal gathers metadata (and resources?)
My resource 04
Mill?
Eg. Search engines, union catalogs, OAI
[email protected] University of Illinois at UC
20
June 15th, 2006
OAI divides the world between data providers and service providers
[email protected] University of Illinois at UC
21
June 15th, 2006
The OAI framework Service provider Harvester
Repository
Repository
Repository
Data provider
Data provider
Aggregator
Data provider
[email protected] University of Illinois at UC
22
Data provider
June 15th, 2006
OAI repositories can be organized in sets What do sets represent?
Journals: issues
EPrint Archives: Subject, Publication Status
Institutional repositories: Departments, research centers, etc.
Cultural Heritage Repositories: Collections with Intent
5
[email protected] University of Illinois at UC
Set representations may be constrained by the software package used.
23
April, 2006
June 15th, 2006
Multiple representations of an object MARC Record In XML Dublin Core Record In XML
Qualified Dublin Core Record In XML MODS record In XML
[email protected] University of Illinois at UC
Honoré Daumier Lithograph (Brandeis University) 24
June 15th, 2006
OAI is based on standards
HTTP protocol XML XML Schemas Dublin Core
[email protected] University of Illinois at UC
25
June 15th, 2006
OAI supports 6 verbs
Identify http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify ListSets http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets ListRecords http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=
ListRecords&metadataPrefix=oai_dc ListMetadataFormats
http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats ListIdentifiers http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix= oai_dc
GetRecord http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier =oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc
[email protected] University of Illinois at UC
26
June 15th, 2006
An OAI response -
oai:images.library.uiuc.edu:emblems/324 2003-10-22 emblems - - Müller, Johann Heinrich Traugott, 1631-1675 http://images.library.uiuc.edu:8081/u?/emblems,324
[email protected] University of Illinois at UC
27
June 15th, 2006
Examples of repositories Library of Congress http://memory.loc.gov/cgi-bin/oai2_0
ContentDM at UIUC http://images.library.uiuc.edu:8081/cgi-bin/oai.exe
Ohio State Knowledge Bank https://kb.osu.edu/dspace-oai/request
[email protected] University of Illinois at UC
28
June 15th, 2006
Examples of services
http://oaister.umdl.umich.edu http://www.americansouth.org/
http://cicharvest.grainger.uiuc.edu/ http://nsdl.org/ http://www.pictureaustralia.org/ http://imlsdcc.grainger.uiuc.edu/
[email protected] University of Illinois at UC
http://www.language-archives.org/ 29
June 15th, 2006
Turn key systems and modules
CWIS : http://scout.wisc.edu/Projects/CWIS/ ContentDM : http://contentdm.com/ Digitool : http://www.exlibrisgroup.com/digitool.htm DSpace : http://www.dspace.org/ EPrints : http://software.eprints.org/ DLXS: http://www.dlxs.org/ OAICat: http://www.oclc.org/research/software/oai/cat.htm XMLFile: http://www.dlib.vt.edu/projects/OAi/software/xmlfile/xmlfile.html DLESE OAI software: http://dlese.org/oai/index.jsp
[email protected] University of Illinois at UC
30
June 15th, 2006
Useful tools UIUC OAI registry http://gita.grainger.uiuc.edu/registry/ OAI repository explorer http://re.cs.uct.ac.za/ Errol http://errol.oclc.org/
[email protected] University of Illinois at UC
31
June 15th, 2006
Digital libraries of distributed objects
[email protected] University of Illinois at UC
32
June 15th, 2006
Metadata shareability issues
Granularity Loss of context Completeness
DLF-NSDL Best practices on shareable metadata http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?TableOfContents
[email protected] University of Illinois at UC
33
June 15th, 2006
What is behind URLs
[email protected] University of Illinois at UC
34
June 15th, 2006
Conveying actionable URLs http://rama.grainger.uiuc.edu/assetactions/ View
Resize
Select
Share Annotate
[email protected] University of Illinois at UC
35
June 15th, 2006
Conclusions
Interoperability: technical, content-related and organizational, well OAI is the easy part
Works even better for particular communities with similar organizational structures and metadata formats
Extensions of the protocol for:
Objects Actionable URLs
[email protected] University of Illinois at UC
36
June 15th, 2006
References and useful material
The Open Archives Website
http://www.openarchives.org/OAI/2.0/guidelines.htm
DLF/NSDL best practices for OAI and shareable metadata
http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?TableOfContents
OAForum Tutorial
http://www.oaforum.org/tutorial/
Getting a Leg Up on OAI
http://nsdl.comm.nsdl.org/meeting/session_docs/2004/2620_National_ Science_Digital_Library_Conference.doc
[email protected] University of Illinois at UC
37
June 15th, 2006