May 18, 2010 ... NeuroLOG ANR-06-TLOG-024 ... Comes with the need to bring up global
coherency ... Definition of NeuroLOG services exposed to the client.
Software technologies for integration of process, data and knowledge in medical imaging
Software technologies for integration of process and data in medical imaging
NeuroLOG WP1 Sharing Data & Metadata Franck MICHEL Paris, May 18th 2010
NeuroLOG ANR-06-TLOG-024 NeuroLOG ANR-06-TLOG-024
http://neurolog.polytech.unice.fr
Software technologies for integration of process, data and knowledge in medical imaging
• Definitions
Data: image files
Metadata: “data about data”, that is any data related to image files • Type of image, modality, processing • Examination, acquisition equipment • Subject: age, gender, pathology…
NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
2
Software technologies for integration of process, data and knowledge in medical imaging
• Partner sites own specific databases
Specific database providers, OS… Specific databases should be comparable • Same concerns manage the same major entities • Heterogeneous: differences in the database design (schema)
• WP1 goals
Define a way to share a common view: a cornerstone • Definition of a federated relational schema Data Federator to map specific schemas to the federated schema
• Comes with the need to bring up global coherency
Define a way to share image files described by metadata • Files distributed over distant sites • Heterogeneous file systems, resource storage units…
Need for each sites to keep control on their data = weak coupling
NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
3
Software technologies for integration of process, data and knowledge in medical imaging
•
WP1: Data Module (III) for performing transversal search of information through a set of local repositories by using specific adapters to local information I
The image part with relationship ID rId4 was not found in the file.
User
Client Application
Authentication
Query Interface
Visualizer
Computing Interface Data sets and workflows
II Semantic Repository
Semantic Queries Engine (CORESE)
MOTEUR Workflow Engine
METAmorphoses (SQL↔ RDF)
Grid Application Service Wrapper
Other optimization and context-aware service
V
Grid Interface
III Data Federator (DF) SQL - Authorization
Site-Specific DB NeuroLOG (metadata + Data Base image files)
NeuroLOG ANR-06-TLOG-024
DF
SsDB
DF
NL DB
SsDB
NL DB
NeuroLOG WP1 - Paris, May 18th, 2010
Grid Storage (images)
IV
4
Software technologies for integration of process, data and knowledge in medical imaging
• Design process
Definition of the ontology Definition of the federated relational schema • derived from the ontology • all sites will align their own database on this schema
Definition of site-specific Data Federator mappings Definition of NeuroLOG services exposed to the client application by the NeuroLOG server
NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
5
Software technologies for integration of process, data and knowledge in medical imaging
• Design cycle Ontology
Adapt ontology
Map sitespecific database
NeuroLOG schema
Adapt site-specific database NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
6
Software technologies for integration of process, data and knowledge in medical imaging
• How revelant is the federated view of a specific table?
Come up with relations that do not exist in the specific database • Consistent with the semantics of the site-specific database? A mapping may loose information, narrow a concept • e.g.: Left/Right vs. Left/Right/Converted Left/Ambidextrous • Acceptable loss? A mapping may broaden a concept • Make sure not to come up with unconsistent data Ensure consistency with data from other sites
NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
7
Metadata Schema Distribution Software technologies for integration of process, data and knowledge in medical imaging
• Distribution comes with issues…
Multiple databases coherency may be challenging
Internally to a site: the site-specific DB being managed independently from the middleware
Externally: each site being autonomous
Need to achieve compromise between distributed coherency and sites autonomy
• Need to handle
Cross-references between entities
Entities replications: some entities are unique (e.g. Image) while others may be found on several sites (e.g. Subject, Study)
NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, Jan. 5th to 7th, 2010
8
Software technologies for integration of process, data and knowledge in medical imaging
• Get access to files distributed over distant sites
Browse, retrieve and store local or remote data files, either on NeuroLOG sites or on the EGEE Grid infrastructure Provide file transfer services between NeuroLOG servers, or between NeuroLOG servers and clients
• Deal with heterogeneous file systems, access protocols
Standard protocols: local file, ftp/sftp, http/https Grid protocols: GridFTP, LFC Provide “some sort” of virtual file system
• Enforce a security policy
Leverage the Security Layer • Check user authorizations to access files • Secure transfer
Rely on the Grid security infrastructure • Grid Certificates, Grid proxy
NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
9
Software technologies for integration of process, data and knowledge in medical imaging
• Need to provide an interface for managing files
In the manner of a virtual file system But do not build yet another full virtual file system NeuroLOG Client Data Manager
Data Manager
Local storage resources
NeuroLOG ANR-06-TLOG-024
Data Manager
GRID Controller
NeuroLOG WP1 - Paris, May 18th, 2010
10
Software technologies for integration of process, data and knowledge in medical imaging
Files exposed to client for direct transfer Grid certificate
Get file through GridFTP
Grid Storage Element
NeuroLOG Certificate
Require accessible file copy
Data Manager Delegate request to owning site server (incl. credentials)
Get file by url
Data Manager
- Check authorizations. - Make temporary copy of file on accessible file server
Site 1 NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
Local storage resources (file, http, ftp…) Site 2 11
Software technologies for integration of process, data and knowledge in medical imaging
Example: processing remote files Grid Storage Element Grid certificate
Return result files to client
NeuroLOG Certificate
Require processing of a file on the grid
Require processing
Processing Tools
Data Manager Delegate request to owning site server (incl. credentials)
Data Manager - Check authorizations - Send copy of file grid storage resource
Site 2
Site 1 NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
12
Software technologies for integration of process, data and knowledge in medical imaging
5 sites deployed ASCLEPIOS, GIN, I3S, IFR49, IRISA
NeuroLOG services Metadata federated view GIN
I3S
IRISA NeuroLOG server Data Federator
Results
InriaNeuroTK NeuroLOG ANR-06-TLOG-024
NeuroLOG server Data Federator
Results
ASCLEPIOS
IFR49
NeuroLOG server Data Federator
Results
NeuroLOG server
NeuroLOG server Data Federator
Results
Results
GIN-DMS
NeuroLOG WP1 - Paris, May 18th, 2010
Data Federator
CAC
Shanoir 13
Software technologies for integration of process, data and knowledge in medical imaging
Thank you
Any engineer position? Available January 1rst, 2011.
NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
14
Software technologies for integration of process, data and knowledge in medical imaging
Backup slides Data & Metadata GUI
NeuroLOG ANR-06-TLOG-024
15
Software technologies for integration of process, data and knowledge in medical imaging
• NeuroLOG Server exposes a set of services
Web Services interface
Query federated metadata • Search by criteria (subjects, studies, datasets…) • Browse through federated metadata • E.g. get datasets for subjects older than 40, produced in studies started after 2007…
Download dataset files based on global sharing policy
Save downloaded datasets to customer directory
Query processing tools using datasets selected from metadata
NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, May 18th, 2010
16
Software technologies for integration of process, data and knowledge in medical imaging
• NeuroLOG client GUI - Querying metadata – What for: gather Datasets in a cart – Use datasets of cart as inputs to: • Visualization tools (Viscioscopie) • Processing workflows • Download for further local processing…
• Several ways of querying metadata in the client GUI – Fill parameters of multi-criteria predefined queries • To be defined on a user needs-basis
– Browse through metadata • Browsing follows branches of a browsing tree • Browsing tree likely to evolve along with users feed-back • Designed in an easy-to-maintain way
– Explore metadata from a given root NeuroLOG ANR-06-TLOG-024
NeuroLOG demonstration, Paris, September 7, 2009
17
Software technologies for integration of process, data and knowledge in medical imaging
• Current browsing tree root
Investigator
Study
Dataset
Centre
Dataset Dataset
Dataset
(input of study)
(result of study)
Subject
Experimental group of subjects
Subject
Study
Dataset
Study
Dataset
(input of study)
Dataset
Dataset
(input of study)
(result of study)
Dataset
Entity
Dataset
Dataset
(input of study)
(result of study)
: tree leaf
NeuroLOG ANR-06-TLOG-024
NeuroLOG demonstration, Paris, September 7, 2009
18
Software technologies for integration of process, data and knowledge in medical imaging
•
Start
NeuroLOG ANR-06-TLOG-024
19
Software technologies for integration of process, data and knowledge in medical imaging
•
Search studies
NeuroLOG ANR-06-TLOG-024
NeuroLOG WP1 - Paris, Jan. 5th to 7th, 2010
20
Software technologies for integration of process, data and knowledge in medical imaging
•
Search studies
NeuroLOG ANR-06-TLOG-024
21
Software technologies for integration of process, data and knowledge in medical imaging
•
Search studies
NeuroLOG ANR-06-TLOG-024
22
Software technologies for integration of process, data and knowledge in medical imaging
•
View details
NeuroLOG ANR-06-TLOG-024
23
Software technologies for integration of process, data and knowledge in medical imaging
•
Explore metadata
NeuroLOG ANR-06-TLOG-024
24
Software technologies for integration of process, data and knowledge in medical imaging
•
Search subjects involved in selected studies
NeuroLOG ANR-06-TLOG-024
25
Software technologies for integration of process, data and knowledge in medical imaging
•
Search subjects involved in selected study
NeuroLOG ANR-06-TLOG-024
26
Software technologies for integration of process, data and knowledge in medical imaging
•
Search datasets related to selected subjects, produced in the selected study
NeuroLOG ANR-06-TLOG-024
27
Software technologies for integration of process, data and knowledge in medical imaging
•
Search datasets related to selected subjects, produced in the selected study
NeuroLOG ANR-06-TLOG-024
28
Software technologies for integration of process, data and knowledge in medical imaging
•
Download datasets from the cart
NeuroLOG ANR-06-TLOG-024
29
Software technologies for integration of process, data and knowledge in medical imaging
•
View download datasets
NeuroLOG ANR-06-TLOG-024
30
Software technologies for integration of process, data and knowledge in medical imaging
• Authorization • In case the user has no right to read the requested dataset:
• Then the user should subscribe to the appropriate role
NeuroLOG ANR-06-TLOG-024
31
Software technologies for integration of process, data and knowledge in medical imaging
• The administrator of the site that manages the role gets the request:
• Administrator grants user with the requested role
• User can restart download
NeuroLOG ANR-06-TLOG-024
32
Software technologies for integration of process, data and knowledge in medical imaging
• Apache Tomcat
• Metro JAX-WS
NeuroLOG ANR-06-TLOG-024
33