NeuroLOG WP1 Sharing Data & Metadata

3 downloads 123 Views 6MB Size Report
May 18, 2010 ... NeuroLOG ANR-06-TLOG-024 ... Comes with the need to bring up global coherency ... Definition of NeuroLOG services exposed to the client.
Software technologies for integration of process, data and knowledge in medical imaging

Software technologies for integration of process and data in medical imaging

NeuroLOG WP1 Sharing Data & Metadata Franck MICHEL Paris, May 18th 2010

NeuroLOG ANR-06-TLOG-024 NeuroLOG ANR-06-TLOG-024

http://neurolog.polytech.unice.fr

Software technologies for integration of process, data and knowledge in medical imaging

•  Definitions  

Data: image files

 

Metadata: “data about data”, that is any data related to image files •  Type of image, modality, processing •  Examination, acquisition equipment •  Subject: age, gender, pathology…

NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

2

Software technologies for integration of process, data and knowledge in medical imaging

•  Partner sites own specific databases    

Specific database providers, OS… Specific databases should be comparable •  Same concerns  manage the same major entities •  Heterogeneous: differences in the database design (schema)

•  WP1 goals  

Define a way to share a common view: a cornerstone •  Definition of a federated relational schema  Data Federator to map specific schemas to the federated schema

•  Comes with the need to bring up global coherency  

Define a way to share image files described by metadata •  Files distributed over distant sites •  Heterogeneous file systems, resource storage units…

 

Need for each sites to keep control on their data = weak coupling

NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

3

Software technologies for integration of process, data and knowledge in medical imaging

• 

WP1: Data Module (III) for performing transversal search of information through a set of local repositories by using specific adapters to local information I

The image part with relationship ID rId4 was not found in the file.

User

Client Application

Authentication

Query Interface

Visualizer

Computing Interface Data sets and workflows

II Semantic Repository

Semantic Queries Engine (CORESE)

MOTEUR Workflow Engine

METAmorphoses (SQL↔ RDF)

Grid Application Service Wrapper

Other optimization and context-aware service

V

Grid Interface

III Data Federator (DF) SQL - Authorization

Site-Specific DB NeuroLOG (metadata + Data Base image files)

NeuroLOG ANR-06-TLOG-024

DF

SsDB

DF

NL DB

SsDB

NL DB

NeuroLOG WP1 - Paris, May 18th, 2010

Grid Storage (images)

IV

4

Software technologies for integration of process, data and knowledge in medical imaging

•  Design process    

Definition of the ontology Definition of the federated relational schema •  derived from the ontology •  all sites will align their own database on this schema

   

Definition of site-specific Data Federator mappings Definition of NeuroLOG services exposed to the client application by the NeuroLOG server

NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

5

Software technologies for integration of process, data and knowledge in medical imaging

•  Design cycle Ontology

Adapt ontology

Map sitespecific database

NeuroLOG schema

Adapt site-specific database NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

6

Software technologies for integration of process, data and knowledge in medical imaging

•  How revelant is the federated view of a specific table?    

   

Come up with relations that do not exist in the specific database •  Consistent with the semantics of the site-specific database? A mapping may loose information, narrow a concept •  e.g.: Left/Right vs. Left/Right/Converted Left/Ambidextrous •  Acceptable loss? A mapping may broaden a concept •  Make sure not to come up with unconsistent data Ensure consistency with data from other sites

NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

7

Metadata Schema Distribution Software technologies for integration of process, data and knowledge in medical imaging

•  Distribution comes with issues…  

Multiple databases coherency may be challenging

 

Internally to a site: the site-specific DB being managed independently from the middleware

 

Externally: each site being autonomous

 

Need to achieve compromise between distributed coherency and sites autonomy

•  Need to handle  

Cross-references between entities

 

Entities replications: some entities are unique (e.g. Image) while others may be found on several sites (e.g. Subject, Study)

NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, Jan. 5th to 7th, 2010

8

Software technologies for integration of process, data and knowledge in medical imaging

•  Get access to files distributed over distant sites    

Browse, retrieve and store local or remote data files, either on NeuroLOG sites or on the EGEE Grid infrastructure Provide file transfer services between NeuroLOG servers, or between NeuroLOG servers and clients

•  Deal with heterogeneous file systems, access protocols      

Standard protocols: local file, ftp/sftp, http/https Grid protocols: GridFTP, LFC Provide “some sort” of virtual file system

•  Enforce a security policy  

Leverage the Security Layer •  Check user authorizations to access files •  Secure transfer

 

Rely on the Grid security infrastructure •  Grid Certificates, Grid proxy

NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

9

Software technologies for integration of process, data and knowledge in medical imaging

•  Need to provide an interface for managing files    

In the manner of a virtual file system But do not build yet another full virtual file system NeuroLOG Client Data Manager

Data Manager

Local storage resources

NeuroLOG ANR-06-TLOG-024

Data Manager

GRID Controller

NeuroLOG WP1 - Paris, May 18th, 2010

10

Software technologies for integration of process, data and knowledge in medical imaging

Files exposed to client for direct transfer Grid certificate

Get file through GridFTP

Grid Storage Element

NeuroLOG Certificate

Require accessible file copy

Data Manager Delegate request to owning site server (incl. credentials)

Get file by url

Data Manager

-  Check authorizations. -  Make temporary copy of file on accessible file server

Site 1 NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

Local storage resources (file, http, ftp…) Site 2 11

Software technologies for integration of process, data and knowledge in medical imaging

Example: processing remote files Grid Storage Element Grid certificate

Return result files to client

NeuroLOG Certificate

Require processing of a file on the grid

Require processing

Processing Tools

Data Manager Delegate request to owning site server (incl. credentials)

Data Manager -  Check authorizations -  Send copy of file grid storage resource

Site 2

Site 1 NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

12

Software technologies for integration of process, data and knowledge in medical imaging

5 sites deployed ASCLEPIOS, GIN, I3S, IFR49, IRISA

NeuroLOG services Metadata federated view GIN

I3S

IRISA NeuroLOG server Data Federator

Results

InriaNeuroTK NeuroLOG ANR-06-TLOG-024

NeuroLOG server Data Federator

Results

ASCLEPIOS

IFR49

NeuroLOG server Data Federator

Results

NeuroLOG server

NeuroLOG server Data Federator

Results

Results

GIN-DMS

NeuroLOG WP1 - Paris, May 18th, 2010

Data Federator

CAC

Shanoir 13

Software technologies for integration of process, data and knowledge in medical imaging

Thank you

Any engineer position? Available January 1rst, 2011.

NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

14

Software technologies for integration of process, data and knowledge in medical imaging

Backup slides Data & Metadata GUI

NeuroLOG ANR-06-TLOG-024

15

Software technologies for integration of process, data and knowledge in medical imaging

•  NeuroLOG Server exposes a set of services  

Web Services interface

 

Query federated metadata •  Search by criteria (subjects, studies, datasets…) •  Browse through federated metadata •  E.g. get datasets for subjects older than 40, produced in studies started after 2007…

 

Download dataset files based on global sharing policy

 

Save downloaded datasets to customer directory

 

Query processing tools using datasets selected from metadata

NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, May 18th, 2010

16

Software technologies for integration of process, data and knowledge in medical imaging

•  NeuroLOG client GUI - Querying metadata –  What for: gather Datasets in a cart –  Use datasets of cart as inputs to: •  Visualization tools (Viscioscopie) •  Processing workflows •  Download for further local processing…

•  Several ways of querying metadata in the client GUI –  Fill parameters of multi-criteria predefined queries •  To be defined on a user needs-basis

–  Browse through metadata •  Browsing follows branches of a browsing tree •  Browsing tree likely to evolve along with users feed-back •  Designed in an easy-to-maintain way

–  Explore metadata from a given root NeuroLOG ANR-06-TLOG-024

NeuroLOG demonstration, Paris, September 7, 2009

17

Software technologies for integration of process, data and knowledge in medical imaging

•  Current browsing tree root

Investigator

Study

Dataset

Centre

Dataset Dataset

Dataset

(input of study)

(result of study)

Subject

Experimental group of subjects

Subject

Study

Dataset

Study

Dataset

(input of study)

Dataset

Dataset

(input of study)

(result of study)

Dataset

Entity

Dataset

Dataset

(input of study)

(result of study)

: tree leaf

NeuroLOG ANR-06-TLOG-024

NeuroLOG demonstration, Paris, September 7, 2009

18

Software technologies for integration of process, data and knowledge in medical imaging

• 

Start

NeuroLOG ANR-06-TLOG-024

19

Software technologies for integration of process, data and knowledge in medical imaging

• 

Search studies

NeuroLOG ANR-06-TLOG-024

NeuroLOG WP1 - Paris, Jan. 5th to 7th, 2010

20

Software technologies for integration of process, data and knowledge in medical imaging

• 

Search studies

NeuroLOG ANR-06-TLOG-024

21

Software technologies for integration of process, data and knowledge in medical imaging

• 

Search studies

NeuroLOG ANR-06-TLOG-024

22

Software technologies for integration of process, data and knowledge in medical imaging

• 

View details

NeuroLOG ANR-06-TLOG-024

23

Software technologies for integration of process, data and knowledge in medical imaging

• 

Explore metadata

NeuroLOG ANR-06-TLOG-024

24

Software technologies for integration of process, data and knowledge in medical imaging

• 

Search subjects involved in selected studies

NeuroLOG ANR-06-TLOG-024

25

Software technologies for integration of process, data and knowledge in medical imaging

• 

Search subjects involved in selected study

NeuroLOG ANR-06-TLOG-024

26

Software technologies for integration of process, data and knowledge in medical imaging

• 

Search datasets related to selected subjects, produced in the selected study

NeuroLOG ANR-06-TLOG-024

27

Software technologies for integration of process, data and knowledge in medical imaging

• 

Search datasets related to selected subjects, produced in the selected study

NeuroLOG ANR-06-TLOG-024

28

Software technologies for integration of process, data and knowledge in medical imaging

• 

Download datasets from the cart

NeuroLOG ANR-06-TLOG-024

29

Software technologies for integration of process, data and knowledge in medical imaging

• 

View download datasets

NeuroLOG ANR-06-TLOG-024

30

Software technologies for integration of process, data and knowledge in medical imaging

•  Authorization •  In case the user has no right to read the requested dataset:

•  Then the user should subscribe to the appropriate role

NeuroLOG ANR-06-TLOG-024

31

Software technologies for integration of process, data and knowledge in medical imaging

•  The administrator of the site that manages the role gets the request:

•  Administrator grants user with the requested role

•  User can restart download

NeuroLOG ANR-06-TLOG-024

32

Software technologies for integration of process, data and knowledge in medical imaging

• Apache Tomcat

• Metro JAX-WS

NeuroLOG ANR-06-TLOG-024

33