An Infrastructure for Digital Standardisation in Physical Anthropology Felix Engel, Stefan Schlager, Ursula Wittwer-Backofen
∗
18 September 2015
Abstract
Standardisation and digital processing of osteological data is a growing concern in physical anthropology. Coding standards (e. g. Buikstra & Ubelaker 1994, GHHP Code Book 2006, Harbeck 2014) are being developed and implemented in software applications (e. g. OsteoWare, AnthroBook). These infrastructures progressively allow for large-scale analyses (e. g. Global History of Health Project). Obstacles in this development are limited extensibility of existing standards and incompatibilities between them. We are developing a framework for modelling data from research on skeletal collections that can be used to map dierent data standards onto each other, to extend them and to build new ones. It will help researchers to pool data for large-scale projects. Skeletal collections and research institutions will be able to dene their own lab protocols for in-house purposes as well as for external collaborations. Scholars will be able to use the framework for adding data derived from new methods. Our approach is based on the Resource Description Framework (RDF). It has a modular structure, with a core ontology providing stability and compatibility, while research protocol extensions allow for exible development and adoption of new methods. Implemented in a semantic web application, our framework acts as a data model, to the eect that changes to it will be reected in the user interface. This way, researchers will be able to congure the software to create their own working environments. The project is funded by the German Research Foundation (DFG) through a scheme promoting development of digital standards for scientic collections1 . ∗ This paper was presented in a podium session at the 11th meeting of the Society for Anthropology in Munich. 1 http://www.dfg.de/download/pdf/foerderung/programme/lis/uebersicht_projekte_objektausschreibung_2013.pdf; last accessed on 8 August 2015.
1
AnthroBook 2015
2014 Harbeck 2011 Osteoware
2010
2005
2006 GHHP Code Book Project Software
2004 BABAO/IFA Guidelines
2000
1995
1994 Buikstra & Ubelaker
1990
Figure 1: Existing data standards and corresponding software in physical anthropology (major developments only).
1
Data Standards in Physical Anthropology
In 2014, the German Research Foundation (Deutsche Forschungsgemeinschaft,
2
DFG ) issued a funding scheme to foster standardisation and digitalisation of data from scientic collections.
Three reasons led the Biology Anthropology
Department to apply for a grant related to collections of human skeletal remains: 1. There was some experience with large bodies of research data from par-
3
ticipation in the Global History of Health Project . 2. A student project had developed some sketches for a research database. 3. The historical Alexander Ecker Collection, housed at the Freiburg University Archive, had recently received some repatriation claims, leading to related research. The resulting data could serve as a use case in software development. Since May 2015, a digital standard for anthropological research data is being developed, together with a database application, enabling researchers to code data according to this standard. Before dealing with these undertakings themselves, let's take a look at the underlying motivations. gure 1 gives an overview of major developments in standardised data acquisition.
Some existing data codebooks, like Brickley and McKinley (2004),
2 http://dfg.de/en/index.jsp; last accessed on 22 September 2015. 3 http://global.sbs.ohio-state.edu/; last accessed on 22 September
2
2015.
provide sets of paper forms to put their standards into practice. In most cases, however, some specialised software was developed for this task. Buikstra and Ubelaker (1994) emerged from US legislation, governing repatriation claims for skeletal remains from native populations. A dedicated work group at the Smithsonian Institution (Washington, USA) developed the database application OsteoWare, which was eventually published as free share ware in
4 (Wilczak and Dudar 2011; Wilczak and Jones 2011). It records observa-
2011
tions on a range of skeletal markers, covering age and sex estimation, osteometrics and pathology. The resulting data sets are designed to preserve information from skeletons that might not be indenitely available for research. The Global History of Health Project is a large-scale research project, collecting thousands of skeletal data sets from all over Europe.
Their codebook
(Steckel et al. 2006) is a compromise of American and European research traditions and comprises a range of items that are believed to render an idea of an individual's health status. Specialised software was designed to collect, pool and process datasets from a large number of contributors. The Bavarian State Collection for Anthropology and Palaeoanatomy in Mu-
5 recently developed their own codebook for human osteological data (Har6 7 beck 2014). It will be implemented in a new application of the XBook series , nich
providing integration with standardised data from animal osteology and archaeology. It is compulsory for researchers working with the State Collection's material to use the software. All of these projects have their specic motivations for developing data standards, which determine the ways in which these are implemented in working routines. Other scenarios can be imagined that might necessitate development of additional standards. For instance, as more and more anthropological work is carried out by freelance osteologists, specications for contracted analyses have to be formulated. Also, the creation of a unied database of material in skeletal collections would aid in setting up research designs for large-scale projects like the Global History of Health Project. The DFG funding programme called for a new, all-encompassing codebook that could be applied with all kinds of skeletal collections. But it is questionable if this would bring an additional benet, given the number of standards that are already in existence (cf. gure 2). While standardised data acquisition is undoubtedly essential to scientic work, there are a number of reasons why researchers might rather develop their own standard than to stick with some existing scheme.
Research projects have their specic research interests and
requirements. Even minor deviations from existing standards or software routines render them little attractive, even though there might be a large overlap. Researchers working on new methods or novel research objectives might want
4 http://osteoware.si.edu/; last accessed on 22 September 2015. 5 http://www.sapm.mwn.de/index.php/de/; last accessed on 22 6 http://xbook.vetmed.uni-muenchen.de/wiki/AnthroBook; last
September 2015. accessed on 22 September
2015. 7 http://xbook.vetmed.uni-muenchen.de/wiki/Main_Page; last accessed on 22 September 2015.
3
Figure
2:
Standards.
xkcd
no. 927
by
Randall
Munroe
(https://xkcd.com/927/; reproduced without alterations).
to extend existing standards. Also, researchers do not always agree on how to analyse certain features and require alternative standards that suit their individual ideas. Scientic progress will always require a re-formulation of standards. A data standard should be formulated with some purpose in mind, or it will become outdated before it has ever been used. Our approach embraces competing standards, rather than trying to replace them. It intends to formulate data standards in a way that caters for specic research requirements but also allows to pool compatible data from dierent standards. In building their own standards, researchers should be able to draw on data items from existing standards and combine them with their own. They should be able to state whether two items are compatible, incompatible, or if their values can be translated into each other by means of a specic data transformation. Finally, while the standard was to be written in a formal, widely accepted, way, it should be independent from any kind of software.
Various
developers should be able to implement it in their applications and there should be means to import data from existing data sources (gure 3). Such a system would remove the necessity to sell a new standard to a large user base. It would oer the opportunity to modify standards and adapt them to specic research projects. This would create an environment for productive competition between standards, leading to their optimisation according to evolutionary principles.
A standard will live as long as it has its own ecological
niche. It will be supplanted by standards that are tter for the task.
4
Data Pool Standard 1
Standard 2
Data Item 1.1
Data Item 2.1
Data Item 1.2
Data Item 2.2 data transformation
Data Item 1.3
Data Item 2.3
Data Item 1.4
Data Item 2.4
DB Data Base
RDMS
SDS
Research Data Management System
Skeletal Documentation Software
Figure 3: Suggested solution for managing research data acquired through different standards.
2
An RDF Approach to Modelling Anthropological Research Data
This concept can be realised by means of the Resource Description Framework (RDF; Allemang and Hendler 2011; Dengel 2012; Hitzler et al. 2008). It's basic idea is to structure data according to the semantic pattern of language, rather than in tables and relations. All information is broken down into simple statements in the form of subject, predicate and object (gure 4a). This representation is understood by humans, but also by computers. They can infer from it a data model which allows them to make simple deductions (gure 4b). The classes of this model form a hierarchy, where subclasses share the qualities of higher ranking classes. They are related by properties, taking over the predicate function, which also build a hierarchy. Research data is entered as instances of classes.
Their relations mimic the network graph dened in the data model.
Computers can infer information along the lines of object restrictions, properties and hierarchical relations.
Because of this, the database expectscertain
data items to be entered in relation to other items. Advantages for the formulation of data standards are evident.
The RDF
allows for an organic extension of the data model, without the need to dene new tables and relations. Changes of denition in the data model imply assumptions on the level of instances, even if they were created before the changes were made. The output consists of a simple text le, containing both the research data and
5
mental eminence
is a
assay
mandibula
is specied input
mental eminence
scale 1-5
is specied output
mental eminence
bone163
is a
mandibula
(a) information as simple language statements Assay subClassOf
Mandibula isSpecifiedInput
MentalEminence
isSpecifiedOutput
Scale1-5
type
bone163
(b) information as a network graph Figure 4: How the Resource Description Framework works
the specications of the data structure. Also, existing data models can be easily integrated or referenced. Of course, RDF statements cannot be made at random, but have to respect a previously dened vocabulary.
In terms of data modelling (and contrary
to the philosophical term), such a vocabulary is referred to as an ontology. Basically, what we are developing is an ontology, a specialised language for talking about skeletal research (gure 5).
It follows existing data standards,
enhancing compatibility with other schemes from adjacent disciplines. Skeletal investigations, for example, are modelled according to specications put out
8
by the Open Biomedical Ontologies (OBO) Foundry .
Adhering to elements
9 from the Europeana Data Model (EDM ) helps skeletal collections in uploading 10 their data into the Europeana network . Other related ontologies include the Organization Ontology (ORG
11 ) and the PROV Ontology12 .
Researchers can use the ontology to express what they have been working on.
For some elements, that are tinted orange in gure 5, they are allowed
to extend the language by dening subclasses. Such extensions can be stored as RDF data, published and distributed among peer researchers.
The RDF
rule that subclasses inherit qualities from higher classes ensures some degree of
8 http://obofoundry.org/; last accessed on 22 September 2015. 9 http://pro.europeana.eu/share-your-data/data-guidelines/edm-documentation;
last accessed on 22 September 2015. 10 http://www.europeana.eu; last accessed on 22 September 2015. 11 http://www.w3.org/TR/2014/REC-vocab-org-20140116/; last accessed on 22 September 2015. 12 http://www.w3.org/TR/prov-o/; last accessed on 22 September 2015.
6
Legend
CRID
External Ontologies: PROV-O
CRID registry Membership
ProvidedCHO
ORG
Role
VIVO
Collection
EDM OBO Central Class
PhysicalThing
Organization
Curation
To be Extended by Research Protocols
Participation
investigation
specimen collection
Person Participation Project
Document
specimen planning
study design execution plan
device
study design protocol
interpreting data
documenting
assay
measurement datum data transformation
conclusion
data item
Figure 5: Main classes of the ontology. Properties are not labelled for the sake of clarity.
core ontology Europeana Data Model
interface model VIVO Ontology
research protocol extensions
research data model Ontology for Biomedical Investigations
chronological model
geographical model
anatomical model
GeoNames
FMA OpenGalen
Figure 6: General structure of the data standard.
7
Figure 7: Data form from VIVO.
compatibility between extensions (gure 6). Skeletal elements are referenced according to the Foundational Model of Anatomy (FMA
13 ) and, occasionally, the OpenGalen14 anatomical model. Anal-
yses of the data can draw on the anatomical relations dened in these models (e. g. tendon attachments, vessels) and do not have to be recreated in our ontology.
Similarly, the GeoNames Ontology
15 is used to reference geographic
entities.
3
Implementation in a Web Application for End Users
While the construction of an ontology and related extensions might seem rather abstract to many, the practical application to osteological work is surprisingly simple. gure 7 shows a screenshot from the semantic web application VIVO
16
(Börner et al. 2012). It's purpose is to display data on academic sta, institutions, research, meetings, publications etc. These data are organised according
13 http://sig.biostr.washington.edu/projects/fm/; last accessed on 22 September 2015. 14 http://opengalen.org/; last accessed on 22 September 2015. 15 http://www.geonames.org/ontology/documentation.html; last accessed on 22 September
2015. 16 http://vivoweb.org/; last accessed on 22 September 2015.
8
Figure 8: Data form showing elements from a newly added ontology.
to a data model derived from the VIVO Ontology
17 , which resides within the
application. After loading an additional ontology into the programme, the same interface can be used to enter data on cranial osteology (gure 8), a topic that is not covered by the original VIVO ontology. This is possible because the application is built on the VITRO framework
18 , which takes on the structure and contents of
ontologies that are fed into it. Extensions to the core ontology would be reected in the same way when loaded into a semantic web application concerned with coding human skeletal remains.
4
Conclusion
The presented approach opens up perspectives for research on various levels. Institutions holding collections of skeletal remains can install the software on their own web server and consequently keep control on the accumulating data. By developing dedicated extensions they can provide their members and project partners with standardised data forms for them to conduct their research according to the institution's specications.
They can also deploy extensions
that are developed by others, either inside or outside the institution, for their users to chose from. As the application also works as a web portal, they can use it to make their data, or selected parts of it, public. For instance, if catalogue
17 https://wiki.duraspace.org/display/VIVO/VIVO-ISF+Ontology; last accessed on 22 September 2015. 18 https://github.com/vivo-project/Vitro; last accessed on 22 September 2015.
9
data was made public, researchers could select material in preparation of a study for which they can subsequently apply for more detailed data sets. Researchers can use the software to apply published data standards in their analyses, but also to design their own standards and data input forms. In doing so, they can reuse parts of existing standards or modify these to suit their specic needs. Standards implementing new methodology or research topics can be put into standardised code that can be published as an appendix to research papers. Even in scenarios where researchers would or could not work via internet, the software can be operated in a local area network or on a single machine, via localhost. Development of both the ontology and the database application are now in progress.
We encourage all institutions or individual researchers that are
interested in working with the software to contact us now. You can bring your particular demands as an input and we are willing to help you to develop your own standard to be employed as soon as the application is nished. You can
19 or write directly to Felix Engel20 .
nd contact information on our webpage
References
Semantic Web for the Working Ontologist: Eective Modeling in RDFS and OWL. Elsevier, Amsterdam etc., second
Dean Allemang and James Hendler. edition, 2011.
Katy Börner, Michael Conlon, Jon Corson-Rikert, and Ying Ding, editors.
VIVO: A Semantic Web Approach to Scholarly Networking and Discovery.
Synthesis Lectures on Semantic Web: Theory and Technology. Morgan and Claypool, San Rafael, 2012. doi: 0.2200/S00428ED1V01Y201207WEB002. Megan Brickley and Jacqueline I. McKinley, editors.
dards for Recording Human Remains.
Guidelines to the Stan-
Number 7 in Technical and Pro-
fessional Practice Papers. Institute for Field Archaeologists, 2004.
http://www.babao.org.uk/HumanremainsFINAL.pdf.
URL
Standards for Data Collection from Human Skeletal Remains, volume 44 of Arkansas Archaeological Survey Research Series. Arkansas Archaeological Survey, Fayetteville, 1994.
Jane E. Buikstra and Douglas H. Ubelaker, editors.
Andreas Dengel, editor.
Anwendungen.
Semantische Technologien: Grundlagen - Konzepte -
Spektrum, Heidelberg, 2012.
Anleitung zur standardisierten Skelettdokumentation in der Staatssammlung für Anthropologie und Paläoanthropologie München.
Michaela Harbeck.
Staatssammlung für Anthropologie und Paläoanatomie München, München,
19 https://www.uniklinik-freiburg.de/anthropology/research/osteologic-databaseproject.html; last accessed on 22 September 2015. 20
[email protected]
10
2014. URL http://www.sapm.mwn.de/attachments/article/249/ AnleitungSkelettdokumentation2014.pdf. dec
Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph, and York Sure. Semantic web.
Berlin, Heidelberg, 2008.
Richard H. Steckel, Clark Spencer Larsen, Paul W. Sciulli, and Philip L.
Walker. Data collection codebook. may 2006. URL http://global.sbs. ohio-state.edu/new_docs/Codebook-01-24-11-em.pdf.
Cynthia A. Wilczak and J. Christopher Dudar.
ual Volume I.
Smithsonian
Institution,
Osteoware Software Man-
Washington,
February
2012
https://osteoware.si.edu/sites/default/files/ content-pdfs/Osteoware_Vol-1_Feb2012.pdf. edition, 2011.
URL
Cynthia A. Wilczak and Erica B. Jones.
II: Pathology Module.
Osteoware Software Manual Volume
Smithsonian Institution, Washington, February 2012
URL https://osteoware.si.edu/sites/default/files/ content-pdfs/Osteoware_Vol-2_Feb2012.pdf. edition, 2011.
11