automation of metadata processing Dennis

4 downloads 768 Views 271KB Size Report
Oct 17, 2015 - CakePHP 2.4 to use MVC PHP5 Web-Framework. • Authorization and Authentication in the user management via Access Control List. • Fedora ...
Automation of metadata processing

Automation of metadata processing CLARIN-Conference in Wroclaw, Poland, 15 - 17, Octobre

Except where otherwise noted, content on this poster is licensed under a Creative Commons Attribution 4.0 International license.

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

1

Automation of metadata processing Introduction Repositories

• Introduction • HZSK- and (Daniel Jettka) • LAUDATIO-Repository (Dennis Zielke) • Open-Source technologies • Generalized model of the data ingest process • Role of standardized metadata in the import process • Validation of data • Modelling import formats and data structures • Indexing of metadata

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

2

Automation of metadata processing Introduction HZSK-Repository

• is based on the software triad Fedora, Islandora, and Drupal • currently contains 19 corpora of transcribed spoken language • stored research data includes texts, transcripts, audio and video data, images, metadata, and other data types • is connected to the CLARIN-D infrastructure on several levels, e.g. the central services Virtual Language Observatory (for metadata search) and the CLARIN Federated Content Search (for search directly in the content)

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

3

Automation of metadata processing Introduction LAUDATIO-Repository

• is an open access environment for persistent storage of historical texts and their annotations • it currently contains historical corpora from various disciplines with a total of 2000 texts that contain about two million word forms • the main focus lies on German historical texts and linguistic annotations including all dialects of time periods ranging from the 9th to the 19th century

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

4

Automation of metadata processing Introduction LAUDATIO-Repository technical

• the technical repository infrastructure is based on generalizable software modules such as the graphical user interface, the data exchange module between research data and the Fedora REST API • the metadata search for indexing and faceting is based on the Lucene-based technology ElasticSearch • the imported corpora are stored in their original structure in a permanent and unchangeable version

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

5

Automation of metadata processing LAUDATIO: Used Open-Source-Technologies (1)

• CakePHP 2.4 to use MVC PHP5 Web-Framework • Authorization and Authentication in the user management via Access Control List • Fedora 3.6 for Data storage • REST-API for Data exchange • ElasticSearch as Search engine • REST-API for Data exchange • Implemented customized and versioned IndexMapping

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

6

Automation of metadata processing LAUDATIO: Used Open-Source-Technologies (2)

• External PID-Webservice (EPIC API Version 2) to assign the Persistent Identifier • Third party Open Source libraries auf Github • http://tinyurl.com/lf26u97 • Flat-Design (HTML5, CSS3) (Coming soon)

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

7

Automation of metadata processing LAUDATIO: appropriated Data structure

TEI XML P5

17.10.2015

Description of the corpus data structure using the TEI metadata standards

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

8

Automation of metadata processing LAUDATIO: View/Index Mapping ElasticSearch

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

9

Automation of metadata processing LAUDATIO: Examples ElasticSearch for Indexing IndexMapping

ViewMapping

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

10

Automation of metadata processing LAUDATIO: Object model Fedora via RIDGES-Korpus

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

11

Automation of metadata processing LAUDATIO: Schema config stored in Fedora

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

12

Automation of metadata processing

If you have questions please contact us: Dennis Zielke, Humboldt-Universität zu Berlin, E-Mail: [email protected] Daniel Jettka, Hamburg Centre of spoken language corpora, E-Mail: [email protected]

17.10.2015

Dennis Zielke, Daniel Jettka Humboldt-Universität zu Berlin, Universität Hamburg

13