Digital Libraries and Digitised Maps: An Early Overview of the DIGMAP Project José Borbinha1, Gilberto Pedrosa1, João Gil 1, Bruno Martins1, Nuno Freire2, Milena Dobreva3, and Alberto Wyttenbach4 1
IST – Instituto Superior Técnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal
[email protected],
[email protected],
[email protected],
[email protected] 2 BNP – Biblioteca Nacional de Portugal, Campo Grande, 1749-081 Lisboa, Portugal
[email protected] 3 IMI – Institute of Mathematics and Informatics, 1113 Sofia, Bulgaria
[email protected] 4 UPM – ETSI Topografía, Geodesia y Cartografía, 28031 Madrid, Spain
[email protected]
Abstract. DIGMAP is a project to find solutions for digital libraries scenarios focused on digitised historical maps. The main service will reuse metadata from European national libraries and other relevant third party metadata sources to provide discovery and access to contents. This will also include a proof of concept of a scenario of reusing and enriching these metadata by automatic processes that will try to extract relevant indexing information from the images of the digitised maps, as well as from any kind of associated text.
1 Introduction DIGMAP1 is a project co-funded by the European Community programme eContentplus2. DIGMAP stands for “Discovering our Past World with Digitized Historical Maps”, but could also stand for “digging on maps”! Historical maps have important details that make them very difficult (and therefore expensive) to describe, classify and index in “hand-made” descriptive metadata structures. With these assumptions in mind, we propose to develop a modular software solution to enable the development of flexible services to support the description and indexing, as also the searching and browsing in collections of digitized historical maps. Maps have a special characteristic: they always represent a limited physical space in the real world. Therefore, we propose to develop technology for services to make it easy to classify, index, search and browse them by their geographic boundaries, with the support of multilingual geographic thesauri or gazetteers, as also by their decorative and stylistic details. This paper will proceed by describing the main DIGMAP use cases, followed by a description of the architecture of the system. 1 2
http://www.digmap.eu http://ec.europa.eu/information_society/activities/econtentplus/index_en.htm
D.H.-L. Goh et al. (Eds.): ICADL 2007, LNCS 4822, pp. 383–386, 2007. © Springer-Verlag Berlin Heidelberg 2007
384
J. Borbinha et al.
2 DIGMAP Use Cases The final results of the project will consist in a set of services available in the Internet, according to the main use cases presented in the Figure 1. A User is an anonymous person, a Registered User is an actor that has been recognized by the system, and an External System is a machine. A Cataloguer will be able to update the metadata, while a Geographer is a specialist who manages Thesaurus data types. The Administrator will manage all the services and infrastructure. The users will be able to access resources in two ways: Searching Resources (simple and complex searches in the metadata) or Browsing Resources (browsing in indexes built from the metadata). It will also be possible for users to propose resources for inclusion in the system, through the use case Proposing Resources. Through Asking Expert it will be possible to submit questions to a DIGMAP Forum. to be answered by experts with specific knowledge in the area. The specialists will answer and manage those questions through the use case Dispatching Questions. There will be three different data management use cases in DIGMAP. Updating Catalogue will feed the Catalogue with metadata from remote data providers or through a local cataloguing interface. At any time, it also might be need to care about Updating Authorities, to assure a better browsing and search. Also with the purpose of supporting those cases we have the case Updating Gazetteer. Global Use Cases User Interface
Thesauri Updating Gazetteer
Searching Human Brow sing Resources
Updating Authorities
Geographer
User Proposing Resources
Catalogue
DIGMAP Forum
Asking an Expert
Updating Catalogue Cataloguing Online Resources
Dispatching Questions
Registered User
Indexer Indexing Resources
External Service Interface Searching Remote External System
Cataloguer
Managing Serv ices Administrator
Fig. 1. DIGMAP Use Cases
Digital Libraries and Digitised Maps: An Early Overview of the DIGMAP Project
385
3 DIGMAP Architecture All the software solutions produced in DIGMAP will be based on standard and open data models and will be released as open-source, so the results will be useful for local digital libraries of maps, as standalone systems, or as interoperable components for wider and distributed systems (as for example portal management systems). A Resource is an information object relevant for the scope of the service which is registered in the system by descriptive and indexing metadata. Examples are maps, books or web sites. When a Resource is a map, it must be possible to register its geographic information (geographic boundaries, scales, details of sub-maps inside a wider generic map, etc.). The Catalogue is the service responsible by the management of the records describing the Resources. It must support the definition of collections, which are ways to group the Resources according to any desired criteria (type or genre of the resource, the source or provenance of the record, etc.). It will be possible to register Resources through a local user interface, or importing records through OAI-PMH3. The local user interface of the Catalogue must make it possible to edit any existing record. The Catalogue must be able to maintain the descriptions of authorities and of the maps in multiple metadata formats, especially in UNIMARC4 MARC 215 and Dublin Core6. The Thesauri is the generic component of the system that registers ancillary information for the purpose of indexing, searching and browsing for Resources. It will be developed as a multilingual semantic information system, to manage and provide access to information relating to concepts, geographic coordinates, names of places, areas and persons, and related historical events (with dates or time intervals). The Thesauri will be made up of two major sub-systems (not detailed in the diagram): the Gazetteer (as defined by the OGC7) and the Authority File (to maintain and provide easy access to the author’s information and identification and disambiguating from similar and duplicate authors). The User Interface is how the users will interact with the system. Besides the traditional OPAC interface, the DIGMAP service will offer a browsing environment for human users exploring paradigms inspired by Google Maps8, Virtual Earth9, TimeMap10, etc. Also, special specific functions will be provided, such as timelines and other indexes for powerful but easy and pragmatic retrieving. The External Service Interface will provide services for external services by Z39.5011, SRW/SRU12 and OAI-PMH. 3
OAI-PMH –http://www.openarchives.org UNIMARC Forum – http://www.unimarc.net/ 5 MARC Standards – http://www.loc.gov/marc/ 6 Dublin Core Metadata Initiative – http://dublincore.org/ 7 Open Geospatial Consortium – http://www.opengeospatial.org 8 Google Maps - http://maps.google.com 9 MSN Virtual Earth - http://virtualearth.msn.com 10 TimeMap Open Source Consortium - http://www.timemap.net 11 Z39.50 Maintenance Agency - http://www.loc.gov/z3950/agency 12 SRW – Search/Retrieve Web Service - http://www.loc.gov/z3950/agency/zing/srw 4
386
J. Borbinha et al.
The Indexer presents a specific challenge. Its purpose is to provide support to the automatic indexing of maps. That indexing can be done in multiple perspectives: geographic (the bounding area represented by the map); semantic features (graphical scales, ornaments, etc.); technical features (histogram, etc.). For that, several algorithms have been considered, based on ideas from [1], [2], [3], [4], [5] and others. Concerning the applicability to old maps, the most effective automatic methods appear to be geometry-based vectorization and text classification. These approaches will be seriously considered for further image processing work in the DIGMAP project. Concluding, any DIGMAP map processing software should bear in mind the benefits and costs of introducing human interaction in its workflow.
4 Conclusions The project started on October 2006, and will run for 24 months. It is not a pure research project, but an attempt to develop innovative services reusing existing software and metadata. It will make a proof of concept reusing and enriching the contents from the National Library of Portugal (BNP), the Royal Library of Belgium (KBR/BRB), the National Library of Italy in Florence (BNCF), and the National Library of Estonia (NLE). In the future we expect it to be complemented with contents and references from other libraries, archives and information sources, namely from other European national libraries members of TEL – The European Library13 (in case of success, a ultimate goal of DIGMAP will be to become a service fully integrated in TEL - in this sense the project is fully aligned with the “European Digital Library” vision, as expressed in the “i2010 digital libraries” initiative of the European Commission).
References 1. Arbeláez, P.: Boundary Extraction in Natural Images Using Ultrametric Contour Maps. In: POCV 2006. Proceedings 5th IEEE Workshop on Perceptual Organization in Computer Vision, New York, USA (June 2006) 2. Frischknecht, S., Kanani, E.: Automatic Interpretation of Scanned Topographic Maps: A Raster-Based Approach. In: Tombre, K., Chhabra, A.K. (eds.) GREC 1997. LNCS, vol. 1389, pp. 207–220. Springer, Heidelberg (1998) 3. Ganesan, A.: Integration of Surveying and Cadastral GIS: From Field-to-Fabric & Land Records-to-Fabric. In: 22nd ESRI User Conference, Redlands, CA, USA (2002) 4. Levachkine, S.: Raster to vector conversion of color cartographic maps. In: Lladós, J., Kwon, Y.-B. (eds.) GREC 2003. LNCS, vol. 3088, pp. 49–60. Springer, Heidelberg (2004) 5. Liu, W., Dori, D.: Genericity in Graphics Recognition Algorithms, Graphics Recognition: Algorithms and Systems. In: Tombre, K., Chhabra, A.K. (eds.) GREC 1997. LNCS, vol. 1389, pp. 9–21. Springer, Heidelberg (1998)
13
The European Library – http://www.theeuropeanlibrary.org/