This article was peer reviewed by two independent and anonymous reviewers
SPATIAL METADATA AUTOMATION : A NEW APPROACH 1
2
Mohsen Kalantari Abbas Rajabifard Hamed Olfat
3
Centre for SDI and Land Administration Department of Geomatics The University of Melbourne 1
PhD, Research Fellow
[email protected] Associate Professor,
[email protected] 3 PhD Student,
[email protected]
2
ABSTRACT Metadata play a key role to facilitate accessing to up-to-date spatial information and plays an important aspect in finding and delivering high quality spatial information services to users. In particular, metadata is an important element in functioning and facilitating spatial data infrastructure (SDI) initiatives. With huge amount of spatial information being generated, a spatial application must be sufficiently flexible to extract and update spatial metadata automatically. By contrast, in current applications, the extract and update process is undertaken manually, making changes to spatial metadata relatively more difficult and expensive. This paper explores different approaches for automating spatial metadata and the impacts of these approaches on critical components of metadata such as standards, data model so that the process of updating or extracting spatial metadata – where feasible – becomes automatic. This approach distinguishes between already existing methods by emphasising on new technologies like Web 2.0. Within a metadata application, different approaches of data modelling including integrated spatial data and metadata model, user generated tags and metadata standards will be presented and discussed. INTRODUCTION Metadata is information about data. It is a vital tool for management and also discovery of spatial data. Users of spatial information need to know who created it, who maintains it, its scale and accuracy, quality, and more (Rajabifard A. et al 2005). It not only provides users of spatial data with information about the purpose, quality, actuality and accuracy of spatial data sets. It performs vital functions that make spatial data interoperable, that is, capable of being shared between systems and in many cases within an SDI environment which aim to facilitate data sharing and data integration. In this context, in order to achieve optimum access and use of spatial data there is a need for knowledge about the management of data, metadata, and web services and their related processes (Williamson I. et al 2003). With this in mind, access to up-to-date metadata is vital to delivering high quality spatial information services to its vast areas. However, current metadata models and
Kalantari, M., A. Rajabifard and H. Olfat (2009). Spatial metadata automation : a new approach. In: Ostendorf B., Baldock, P., Bruce, D., Burdett, M. and P. Corcoran (eds.), Proceedings of the Surveying & Spatial Sciences Institute Biennial International Conference, Adelaide 2009, Surveying & Spatial Sciences Institute, pp. 629-635. ISBN: 9780-9581366-8-6.
M Kalantari, A Rajabifard, H Olfat
standards are complex and very difficult to handle. Also, metadata for spatial data sets is often missing or incomplete and is acquired in heterogeneous ways. It is commonly viewed by organisations as an overhead and extra cost. Typically it is acquired after the spatial data itself, through lengthy, complex efforts. Metadata is usually created and stored separately to the actual data set it relates to, and is often managed by people with a limited knowledge of its value. Separation of storage creates two independent data sets that must be managed and updated - spatial data and metadata. These are often redundant and inconsistent. Thus the reliability of spatial information and the extent it can be used are unclear. The spatial industry has already identified the major questions about metadata such as metadata quality? how can the process of updating metadata be automated and integrated with actual datasets, in order to streamline metadata usage? what metadata is most important to users? how can metadata be made more understandable and usable for the general public? (Philips A. et al 1998, Najar et al 2007, Crompvoets J et al 2004). In order to respond to these questions, this paper which is based an ongoing research project on “Spatial Metadata Automation’ aims to develop and demonstrate an approach for extracting recording, updating and delivering metadata in an automated and integrated fashion. METADATA AUTOMATION The idea of automatic spatial metadata generation research is rooted in automatic indexing, abstracting, and classification of spatial data content, which began with the need to organize increasing amount of spatial related data and inability of human authored methods to cope with huge amount of spatial metadata (Rajabifard A. et al 2009). Today, automatic metadata generation should move beyond subject representation to encompass the production of author, title, date, format, spatial extension and many other types of metadata. In addition, thousands of spatial databases are now networked via the Internet, and information resources are frequently rendered in open and interoperable standards (e.g., eXtensible Markup Language -XML). These developments should enable automatic metadata generation systems to work on far larger spatial data directories. Automatic spatial metadata generation research efforts are new in spatial arena and there is no conceptual framework to define. The next section introduces a framework for spatial data automation concepts through automatic creation, update and enrichment of spatial metadata (Figure 1).
630
M Kalantari, A Rajabifard, H Olfat
Create
Spatial Metadata Automation
Update
Enrich
Automatic update Automatic spatial metadata update or synchronisation is a process by which properties of a spatial dataset are read from the dataset and written into its spatial metadata. Some software vendors like ESRI conceptualise metadata automation as synchronising the metadata content when values in the spatial data change. In this type of model, for instance when a change occurs with a spatial data property such as its projection, the metadata will be updated with the new information. The process of synchronization is accomplished using metadata standard specific synchronizers. For example, three synchronizers are provided with ArcGIS Desktop: an FGDC synchronizer, an ISO synchronizer, and a Geography Network synchronizer. Automatic update or synchronisation should be accompanied by metadata editing facilities. The metadata editors and synchronizers should work together so that automatic updates will not overwrite information that has been typed in by a person. Automatic creation While automatic update and synchronisation is suitable for updating an existing metadata record, there is a need for other methods when there is no existing metadata associated with spatial data. Humans create metadata by writing descriptions of resources either in a structured or unstructured form. Computer applications can extract certain information from a resource or its context. This may involve simply capturing information that is already available, such as the format of the file, or running an algorithm to determine the subject of a textual resource by counting keywords or by checking and analysing pointers to the resource. Some applications use complex algorithms to increase the accuracy of the machine generated metadata. Google search engine is an example of a system that creates and uses machine generated metadata. Several automatic metadata extraction methods have been studied, e.g. hand-coded rulebased parsers and machine learning (Han et al, 2003). For highly structured tasks rulebased methods are easy to implement. The resulting rule system is usually domainspecific and can not be easily translated for use in other domains. Machine learning, on the other hand, is more robust and efficient (Han et al, 2003). Several learning models are available. Among the most popular are the Naïve Bayes model (NB), the Hidden Markov Model (HMM), Support Vector Machines and Expectation Maximization. Supervised machine learning (SML) algorithms include training data and machine self-
631
M Kalantari, A Rajabifard, H Olfat
correction based on errors in machine performance against the training set (Greenberg et al 2006). Automatic enrichment Automatic enrichment involves improving content of metadata through monitoring tags that are used by users for finding datasets. A tag is a non-hierarchical keyword or term assigned to a piece of information (such as an internet bookmark, digital image, or computer file). Tagging was popularized by websites associated with Web 2.0 and is an important feature of many Web 2.0 services (Mika 2005). This kind of spatial metadata can help describing an item and allowing it to be found again by browsing or searching. Spatial tags will be chosen informally and personally by the spatial data creator or by its users, depending on their use. On a spatial data directory if many users are allowed to tag many spatial data, this collection of tags can become a spatial folksonomy a method that can collaboratively create and manage metadata to annotate and categorize spatial data. Spatial data users can select different tags to describe the same item: for example, items related to scale maybe tagged "500", "1:500", "1/500". This flexibility allows users to classify their collections of items in the way that they find useful, but the personalized variety of terms can make it difficult for other to find comprehensive information about a subject; in order to catch every relevant item, they may have to search several times using different keywords. Users also have to decide whether each tagged item is actually relevant to what they're looking for. A disadvantage of a spatial tagging system could be there is no information about the meaning or semantics of each tag. For example, the spatial tag "Melbourne" might refer to the Central Business District of Metropolitan Melbourne or Melbourne Metropolitan itself and this lack of semantic distinction can lead to inappropriate connections between spatial data. However, larger-scale spatial folksonomies can address some of the problems of tagging, as users of spatial tagging systems tend to notice the current use of "tag terms" within these systems, and thus use existing tags in order to easily form connections to related items. In this way, spatial folksonomies collectively develop a partial set of metdata standards through ongoing involvement of non-expert spatial users. METADATA AUTOMATION IMPACTS As descried in the previous section, there are different methods for automation of metadata both for metadata extraction and updating. However there are different aspects of handling spatial metadata as presented below. Detached spatial data and metadata data models Traditionally, spatial metadata acquisition and management often play a less important role in many organisations. Metadata are usually acquired much after the spatial data has been captured and is stored in separate repositories. Consequently in this method of recording metadata, there are two independent data sets to manage and update: spatial
632
M Kalantari, A Rajabifard, H Olfat
data and metadata. These are often redundant and inconsistent, as it is not always clear which information is metadata and which is spatial data. Automation of spatial metadata stored based on these data models required synchronisation between two separate databases: metadata and spatial data databases. With the huge amount of spatial data in different formats, structure, and Database Management Systems (DBMS), automation of spatial metadata will be heavily associated with the issue of interoperability. Integrated spatial data and metadata data models To respond to the issue of detached data models, recently researches focus on the optimization of metadata management by integrating metadata and spatial data in a common file or database. This common metadata-spatial data set can be considered to be a 'comprehensive spatial data'. The concept of metadata-spatial data integration enables the spatial data to carry their own metadata description with them. The approach distinguishes between already existing spatial data models, which have to be extended and newly planned data models and sets, which can manage both spatia data and metadata together with commonality from the beginning (Najar et al 2007). When automating spatial metadata, the issue of interoperability between metadata and spatial databases can be addressed using an integrated database. This method in particular is in line with the Synchronisation and Learning Object Methods of spatial metadata automation because of the structured and strict way of handling metadata imposed by the integrated data model. This inflexibility however limits the ability of users to enrich the spatial metadata using spatial tagging methods. Standards The peak organisation for standards development across business, government and society is the International Organisation for Standardisation (ISO). ISO through its Technical Committee for Geographic information/Geomatics (ISO/TC 211) has published an International Standard for Metadata (ISO 19115:2003). ISO 19115:2003 specifies the schema required for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data (Moellering et al 2005). Similarly The Dublin Core metadata element set is a standard for cross-domain information resource description. It provides a simple and standardised set of conventions for describing things online in ways that make them easier to find. Dublin Core is widely used to describe digital materials such as video, sound, image, text, and composite media like web pages and can also be used for spatial data indexing. As an established norm or requirement metadata standards impose uniform engineering and technical criteria, for structuring and recording metadata records. That ensures documenting, recording and searching spatial information to be cohesive and coherent across spatial data producer, custodians and users. However because of complexity of metadata standards, they are not usually well understood with non- expert spatial data
633
M Kalantari, A Rajabifard, H Olfat
users. This consequent limits the efficiency of spatial metadata to help in finding right spatial data for users (Waugh 1998). Complementary to the spatial metadata standards could be the Spatial Tagging method of spatial metadata creation by users that can help enriching standards to become more useful for non- expert spatial users. The paper discussed different approaches of metadata automation and investigated their impact on the critical components of metadata. Having explored advantages and disadvantages of each approach, the future research will involve investigation on utilising all these approach under one umbrella. CONCLUSIONS Consistent description of the content and use of spatial data requires approaches to automate attributes of the spatial metadata and regularly re-consider the metadata schema or models. This means that spatial metadata can be used both for human interpretation of data sets, and computer processing and utilisation in search engines. Metadata is usually considered to be structured when it complies with a set of rules or specifications for data entry and/or data structures. The structure of the elements and their attributes can be either simple or complex. This concept of spatial metadata allows Metadata application to synchronise the content of metadata and automatically extract metadata from spatial data. However some information that is useful in managing resources can be available through spatial data users. This brings a new approach in spatial metadata automation and that is Spatial tagging by the users. Spatial tagging can enrich the standard spatially metadata handling approach. ACKNOWLEDGEMENT
The authors wish to acknowledge the support of the Australian Research Council for funding this research and the members of the Centre for Spatial Data Infrastructures and Land Administration at the Department of Geomatics, the University of Melbourne, in the preparation of this article and the associated research. However, the views expressed in the article are those of the authors and do not necessarily reflect those of these groups. REFERENCES Crompvoets J., Bregt A., Rajabifard A., Williamson I. (2004). Assessing the worldwide developments of national spatial data clearinghouses, International Journal of Geographical Information Science, 18(7): 665-689. Greenberg J, Spurgin, K, Crystal A (2006) Functionalities for Automatic-Metadata. Generation Applications: A Survey of Metadata Experts' Opinions” International Journal of Metadata Semantics and Ontology, Vol. 1 No. 1 pp 3-20.
634
M Kalantari, A Rajabifard, H Olfat
Han, H., Giles, C. L., Manavoglu, E., Zha, H., Zhang, Z., and Fox, E. A. (2003) Automatic Document Metadata Extraction using Support Vector Machines. In Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, 37-48. Moellering, H., Aalders H.J., Crane A. (2005). World Spatial Metadata Standards, Elsevier. Mika P.(2005) Ontologies Are Us: A Unified Model of Social Networks and Semantics. International Semantic Web Conference 2005: 522-536 Najar, C., Rajabifard, A., Williamson, I., and Giger, C. (2007). “A framework for comparing spatial data infrastructures: an Australian-Swiss case study”, in H.J. Onsrud (ed.). Research and Theory in Advancing Spatial Data Infrastructure Concepts, Redlands, California, USA: ESRI Press, pp. 201-213, at: http://gsdidocs.org/gsdiconf/GSDI-9/papers/TS24.1paper.pdf Phillips, A. and Williamson, I. P. and Ezigbalike, D. (1998). “The importance of metadata engines in spatial data infrastructures”, Proceedings of AURISA ’98, Perth, Australia. Rajabifard, A. , Kalantari, M., Andrew Binns (2009), ‘SDI and Metadata Entry and Updating Tools’ in SDI Convergence, ed. B.van Leonen, J W J Besemer, J.A. Zevenbergen, Netherlands Geodetic Commission , Delft, pp121-138 Rajabifard, A., Binss A., Williamson I., (2005). “Development of a Virtual Australia Utilizing an SDI Enabled Platform”, Proceedings of FIG Working Week/GSDI8, Cairo, Egypt. Shekhar S. and Chawla S. (2003). Spatial Databases: A Tour, Prentice Hall. Waugh A. (1998). Specifying metadata standards for metadata tool configuration, Computer Networks and ISDN Systems, 30(1-7): 23-32. Williamson, I., A. Rajabifard and M.-E.F. Feeney (Eds.) (2003). Developing Spatial Data Infrastructures: from Concept to Reality, London, UK: Taylor & Francis BRIEF BIOGRAPHY OF PRESENTER Mohsen Kalantari is working as a research fellow in the ARC linkage project on automation of spatial metadata since December 2008 at the Centre for SDI and Land Administration at the University of Melbourne. The aim of this project is to investigate automation of spatial metadata creation, update and enrichment. Also, Mohsen currently is the ePlan project coordinator at Land Victoria, Department of Sustainability and Environment. He finished his PhD from the University of Melbourne in 2008. Mohsen has a bachelor degree in surveying engineering and master degree in GIS engineering.
635