An Infrastructure for Digital Standardisation in ...

An Infrastructure for Digital Standardisation in Physical Anthropology Felix Engel, Stefan Schlager, Ursula Wittwer-Backofen

∗

18 September 2015

Abstract

Standardisation and digital processing of osteological data is a growing concern in physical anthropology. Coding standards (e. g. Buikstra & Ubelaker 1994, GHHP Code Book 2006, Harbeck 2014) are being developed and implemented in software applications (e. g. OsteoWare, AnthroBook). These infrastructures progressively allow for large-scale analyses (e. g. Global History of Health Project). Obstacles in this development are limited extensibility of existing standards and incompatibilities between them. We are developing a framework for modelling data from research on skeletal collections that can be used to map dierent data standards onto each other, to extend them and to build new ones. It will help researchers to pool data for large-scale projects. Skeletal collections and research institutions will be able to dene their own lab protocols for in-house purposes as well as for external collaborations. Scholars will be able to use the framework for adding data derived from new methods. Our approach is based on the Resource Description Framework (RDF). It has a modular structure, with a core ontology providing stability and compatibility, while research protocol extensions allow for exible development and adoption of new methods. Implemented in a semantic web application, our framework acts as a data model, to the eect that changes to it will be reected in the user interface. This way, researchers will be able to congure the software to create their own working environments. The project is funded by the German Research Foundation (DFG) through a scheme promoting development of digital standards for scientic collections1 . ∗ This paper was presented in a podium session at the 11th meeting of the Society for Anthropology in Munich. 1 http://www.dfg.de/download/pdf/foerderung/programme/lis/uebersicht_projekte_objektausschreibung_2013.pdf; last accessed on 8 August 2015.

1

AnthroBook 2015

2014 Harbeck 2011 Osteoware

2010

2005

2006 GHHP Code Book Project Software

2004 BABAO/IFA Guidelines

2000

1995

1994 Buikstra & Ubelaker

1990

Figure 1: Existing data standards and corresponding software in physical anthropology (major developments only).

1

Data Standards in Physical Anthropology

In 2014, the German Research Foundation (Deutsche Forschungsgemeinschaft,

2

DFG ) issued a funding scheme to foster standardisation and digitalisation of data from scientic collections.

Three reasons led the Biology Anthropology

Department to apply for a grant related to collections of human skeletal remains: 1. There was some experience with large bodies of research data from par-

3

ticipation in the Global History of Health Project . 2. A student project had developed some sketches for a research database. 3. The historical Alexander Ecker Collection, housed at the Freiburg University Archive, had recently received some repatriation claims, leading to related research. The resulting data could serve as a use case in software development. Since May 2015, a digital standard for anthropological research data is being developed, together with a database application, enabling researchers to code data according to this standard. Before dealing with these undertakings themselves, let's take a look at the underlying motivations. gure 1 gives an overview of major developments in standardised data acquisition.

Some existing data codebooks, like Brickley and McKinley (2004),

2 http://dfg.de/en/index.jsp; last accessed on 22 September 2015. 3 http://global.sbs.ohio-state.edu/; last accessed on 22 September

2

2015.

provide sets of paper forms to put their standards into practice. In most cases, however, some specialised software was developed for this task. Buikstra and Ubelaker (1994) emerged from US legislation, governing repatriation claims for skeletal remains from native populations. A dedicated work group at the Smithsonian Institution (Washington, USA) developed the database application OsteoWare, which was eventually published as free share ware in

4 (Wilczak and Dudar 2011; Wilczak and Jones 2011). It records observa-

2011

tions on a range of skeletal markers, covering age and sex estimation, osteometrics and pathology. The resulting data sets are designed to preserve information from skeletons that might not be indenitely available for research. The Global History of Health Project is a large-scale research project, collecting thousands of skeletal data sets from all over Europe.

Their codebook

(Steckel et al. 2006) is a compromise of American and European research traditions and comprises a range of items that are believed to render an idea of an individual's health status. Specialised software was designed to collect, pool and process datasets from a large number of contributors. The Bavarian State Collection for Anthropology and Palaeoanatomy in Mu-

5 recently developed their own codebook for human osteological data (Har6 7 beck 2014). It will be implemented in a new application of the XBook series , nich

providing integration with standardised data from animal osteology and archaeology. It is compulsory for researchers working with the State Collection's material to use the software. All of these projects have their specic motivations for developing data standards, which determine the ways in which these are implemented in working routines. Other scenarios can be imagined that might necessitate development of additional standards. For instance, as more and more anthropological work is carried out by freelance osteologists, specications for contracted analyses have to be formulated. Also, the creation of a unied database of material in skeletal collections would aid in setting up research designs for large-scale projects like the Global History of Health Project. The DFG funding programme called for a new, all-encompassing codebook that could be applied with all kinds of skeletal collections. But it is questionable if this would bring an additional benet, given the number of standards that are already in existence (cf. gure 2). While standardised data acquisition is undoubtedly essential to scientic work, there are a number of reasons why researchers might rather develop their own standard than to stick with some existing scheme.

Research projects have their specic research interests and

requirements. Even minor deviations from existing standards or software routines render them little attractive, even though there might be a large overlap. Researchers working on new methods or novel research objectives might want

4 http://osteoware.si.edu/; last accessed on 22 September 2015. 5 http://www.sapm.mwn.de/index.php/de/; last accessed on 22 6 http://xbook.vetmed.uni-muenchen.de/wiki/AnthroBook; last

September 2015. accessed on 22 September

2015. 7 http://xbook.vetmed.uni-muenchen.de/wiki/Main_Page; last accessed on 22 September 2015.

3

Figure

2:

Standards.

xkcd

no. 927

by

Randall

Munroe

(https://xkcd.com/927/; reproduced without alterations).

to extend existing standards. Also, researchers do not always agree on how to analyse certain features and require alternative standards that suit their individual ideas. Scientic progress will always require a re-formulation of standards. A data standard should be formulated with some purpose in mind, or it will become outdated before it has ever been used. Our approach embraces competing standards, rather than trying to replace them. It intends to formulate data standards in a way that caters for specic research requirements but also allows to pool compatible data from dierent standards. In building their own standards, researchers should be able to draw on data items from existing standards and combine them with their own. They should be able to state whether two items are compatible, incompatible, or if their values can be translated into each other by means of a specic data transformation. Finally, while the standard was to be written in a formal, widely accepted, way, it should be independent from any kind of software.

Various

developers should be able to implement it in their applications and there should be means to import data from existing data sources (gure 3). Such a system would remove the necessity to sell a new standard to a large user base. It would oer the opportunity to modify standards and adapt them to specic research projects. This would create an environment for productive competition between standards, leading to their optimisation according to evolutionary principles.

A standard will live as long as it has its own ecological

niche. It will be supplanted by standards that are tter for the task.

4

Data Pool Standard 1

Standard 2

Data Item 1.1

Data Item 2.1

Data Item 1.2

Data Item 2.2 data transformation

Data Item 1.3

Data Item 2.3

Data Item 1.4

Data Item 2.4

DB Data Base

RDMS

SDS

Research Data Management System

Skeletal Documentation Software

Figure 3: Suggested solution for managing research data acquired through different standards.

2

An RDF Approach to Modelling Anthropological Research Data

This concept can be realised by means of the Resource Description Framework (RDF; Allemang and Hendler 2011; Dengel 2012; Hitzler et al. 2008). It's basic idea is to structure data according to the semantic pattern of language, rather than in tables and relations. All information is broken down into simple statements in the form of subject, predicate and object (gure 4a). This representation is understood by humans, but also by computers. They can infer from it a data model which allows them to make simple deductions (gure 4b). The classes of this model form a hierarchy, where subclasses share the qualities of higher ranking classes. They are related by properties, taking over the predicate function, which also build a hierarchy. Research data is entered as instances of classes.

Their relations mimic the network graph dened in the data model.

Computers can infer information along the lines of object restrictions, properties and hierarchical relations.

Because of this, the database expectscertain

data items to be entered in relation to other items. Advantages for the formulation of data standards are evident.

The RDF

allows for an organic extension of the data model, without the need to dene new tables and relations. Changes of denition in the data model imply assumptions on the level of instances, even if they were created before the changes were made. The output consists of a simple text le, containing both the research data and

5

mental eminence

is a

assay

mandibula

is specied input

mental eminence

scale 1-5

is specied output

mental eminence

bone163

is a

mandibula

(a) information as simple language statements Assay subClassOf

Mandibula isSpecifiedInput

MentalEminence

isSpecifiedOutput

Scale1-5

type

bone163

(b) information as a network graph Figure 4: How the Resource Description Framework works

the specications of the data structure. Also, existing data models can be easily integrated or referenced. Of course, RDF statements cannot be made at random, but have to respect a previously dened vocabulary.

In terms of data modelling (and contrary

to the philosophical term), such a vocabulary is referred to as an ontology. Basically, what we are developing is an ontology, a specialised language for talking about skeletal research (gure 5).

It follows existing data standards,

enhancing compatibility with other schemes from adjacent disciplines. Skeletal investigations, for example, are modelled according to specications put out

8

by the Open Biomedical Ontologies (OBO) Foundry .

Adhering to elements

9 from the Europeana Data Model (EDM ) helps skeletal collections in uploading 10 their data into the Europeana network . Other related ontologies include the Organization Ontology (ORG

11 ) and the PROV Ontology12 .

Researchers can use the ontology to express what they have been working on.

For some elements, that are tinted orange in gure 5, they are allowed

to extend the language by dening subclasses. Such extensions can be stored as RDF data, published and distributed among peer researchers.

The RDF

rule that subclasses inherit qualities from higher classes ensures some degree of

8 http://obofoundry.org/; last accessed on 22 September 2015. 9 http://pro.europeana.eu/share-your-data/data-guidelines/edm-documentation;

last accessed on 22 September 2015. 10 http://www.europeana.eu; last accessed on 22 September 2015. 11 http://www.w3.org/TR/2014/REC-vocab-org-20140116/; last accessed on 22 September 2015. 12 http://www.w3.org/TR/prov-o/; last accessed on 22 September 2015.

6

Legend

CRID

External Ontologies: PROV-O

CRID registry Membership

ProvidedCHO

ORG

Role

VIVO

Collection

EDM OBO Central Class

PhysicalThing

Organization

Curation

To be Extended by Research Protocols

Participation

investigation

specimen collection

Person Participation Project

Document

specimen planning

study design execution plan

device

study design protocol

interpreting data

documenting

assay

measurement datum data transformation

conclusion

data item

Figure 5: Main classes of the ontology. Properties are not labelled for the sake of clarity.

core ontology Europeana Data Model

interface model VIVO Ontology

research protocol extensions

research data model Ontology for Biomedical Investigations

chronological model

geographical model

anatomical model

GeoNames

FMA OpenGalen

Figure 6: General structure of the data standard.

7

Figure 7: Data form from VIVO.

compatibility between extensions (gure 6). Skeletal elements are referenced according to the Foundational Model of Anatomy (FMA

13 ) and, occasionally, the OpenGalen14 anatomical model. Anal-

yses of the data can draw on the anatomical relations dened in these models (e. g. tendon attachments, vessels) and do not have to be recreated in our ontology.

Similarly, the GeoNames Ontology

15 is used to reference geographic

entities.

3

Implementation in a Web Application for End Users

While the construction of an ontology and related extensions might seem rather abstract to many, the practical application to osteological work is surprisingly simple. gure 7 shows a screenshot from the semantic web application VIVO

16

(Börner et al. 2012). It's purpose is to display data on academic sta, institutions, research, meetings, publications etc. These data are organised according

13 http://sig.biostr.washington.edu/projects/fm/; last accessed on 22 September 2015. 14 http://opengalen.org/; last accessed on 22 September 2015. 15 http://www.geonames.org/ontology/documentation.html; last accessed on 22 September

2015. 16 http://vivoweb.org/; last accessed on 22 September 2015.

8

Figure 8: Data form showing elements from a newly added ontology.

to a data model derived from the VIVO Ontology

17 , which resides within the

application. After loading an additional ontology into the programme, the same interface can be used to enter data on cranial osteology (gure 8), a topic that is not covered by the original VIVO ontology. This is possible because the application is built on the VITRO framework

18 , which takes on the structure and contents of

ontologies that are fed into it. Extensions to the core ontology would be reected in the same way when loaded into a semantic web application concerned with coding human skeletal remains.

4

Conclusion

The presented approach opens up perspectives for research on various levels. Institutions holding collections of skeletal remains can install the software on their own web server and consequently keep control on the accumulating data. By developing dedicated extensions they can provide their members and project partners with standardised data forms for them to conduct their research according to the institution's specications.

They can also deploy extensions

that are developed by others, either inside or outside the institution, for their users to chose from. As the application also works as a web portal, they can use it to make their data, or selected parts of it, public. For instance, if catalogue

17 https://wiki.duraspace.org/display/VIVO/VIVO-ISF+Ontology; last accessed on 22 September 2015. 18 https://github.com/vivo-project/Vitro; last accessed on 22 September 2015.

9

data was made public, researchers could select material in preparation of a study for which they can subsequently apply for more detailed data sets. Researchers can use the software to apply published data standards in their analyses, but also to design their own standards and data input forms. In doing so, they can reuse parts of existing standards or modify these to suit their specic needs. Standards implementing new methodology or research topics can be put into standardised code that can be published as an appendix to research papers. Even in scenarios where researchers would or could not work via internet, the software can be operated in a local area network or on a single machine, via localhost. Development of both the ontology and the database application are now in progress.

We encourage all institutions or individual researchers that are

interested in working with the software to contact us now. You can bring your particular demands as an input and we are willing to help you to develop your own standard to be employed as soon as the application is nished. You can

19 or write directly to Felix Engel20 .

nd contact information on our webpage

References

Semantic Web for the Working Ontologist: Eective Modeling in RDFS and OWL. Elsevier, Amsterdam etc., second

Dean Allemang and James Hendler. edition, 2011.

Katy Börner, Michael Conlon, Jon Corson-Rikert, and Ying Ding, editors.

VIVO: A Semantic Web Approach to Scholarly Networking and Discovery.

Synthesis Lectures on Semantic Web: Theory and Technology. Morgan and Claypool, San Rafael, 2012. doi: 0.2200/S00428ED1V01Y201207WEB002. Megan Brickley and Jacqueline I. McKinley, editors.

dards for Recording Human Remains.

Guidelines to the Stan-

Number 7 in Technical and Pro-

fessional Practice Papers. Institute for Field Archaeologists, 2004.

http://www.babao.org.uk/HumanremainsFINAL.pdf.

URL

Standards for Data Collection from Human Skeletal Remains, volume 44 of Arkansas Archaeological Survey Research Series. Arkansas Archaeological Survey, Fayetteville, 1994.

Jane E. Buikstra and Douglas H. Ubelaker, editors.

Andreas Dengel, editor.

Anwendungen.

Semantische Technologien: Grundlagen - Konzepte -

Spektrum, Heidelberg, 2012.

Anleitung zur standardisierten Skelettdokumentation in der Staatssammlung für Anthropologie und Paläoanthropologie München.

Michaela Harbeck.

Staatssammlung für Anthropologie und Paläoanatomie München, München,

19 https://www.uniklinik-freiburg.de/anthropology/research/osteologic-databaseproject.html; last accessed on 22 September 2015. 20 [email protected]

10

2014. URL http://www.sapm.mwn.de/attachments/article/249/ AnleitungSkelettdokumentation2014.pdf. dec

Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph, and York Sure. Semantic web.

Berlin, Heidelberg, 2008.

Richard H. Steckel, Clark Spencer Larsen, Paul W. Sciulli, and Philip L.

Walker. Data collection codebook. may 2006. URL http://global.sbs. ohio-state.edu/new_docs/Codebook-01-24-11-em.pdf.

Cynthia A. Wilczak and J. Christopher Dudar.

ual Volume I.

Smithsonian

Institution,

Osteoware Software Man-

Washington,

February

2012

https://osteoware.si.edu/sites/default/files/ content-pdfs/Osteoware_Vol-1_Feb2012.pdf. edition, 2011.

URL

Cynthia A. Wilczak and Erica B. Jones.

II: Pathology Module.

Osteoware Software Manual Volume

Smithsonian Institution, Washington, February 2012

URL https://osteoware.si.edu/sites/default/files/ content-pdfs/Osteoware_Vol-2_Feb2012.pdf. edition, 2011.

11

An Infrastructure for Digital Standardisation in ...

An Infrastructure for Digital Standardisation in ...

Suggest Documents

Managing IT infrastructure standardisation in the networked ... - Core

Towards an Infrastructure for the Evaluation of ... - Archivo Digital UPM

Towards an Infrastructure for the Evaluation of ... - Archivo Digital UPM

An Infrastructure for Development of Object ... - ACM Digital Library

international organisation for standardisation

STANDARDISATION in life-sciences

An Infrastructure for Turkish Prosody Generation in

Access Control Infrastructure for Digital Objects - CiteSeerX

Liquid Pricing for Digital Infrastructure Services - CiteSeerX

AN INFRASTRUCTURE FOR MECHANISED GRADING

An Infrastructure for Recombinant Computing

An Infrastructure for Adaptable Middleware

Standardisation Study Standardisation Study of ...

Mediators in Infrastructure Survivability ... - ACM Digital Library

competition's effect on investment in digital infrastructure

international organisation for standardisation ...

Standardisation of information submitted to an endpoint committee for ...

Pharmacognostic standardisation of Maranta arundinacea L. - An

Towards an understanding advertising standardisation ... - Springer Link

Component Authentication and Standardisation of an

Building a research infrastructure for Digital Archaeology in ... - Ariadne

Standardisation in Knowledge Management - CiteSeerX

Standardisation - EQTIS

A New Model for Supporting Standardisation of an Open Governance ...