A prototype infrastructure for the secure aggregation of ... - IEEE Xplore

2 downloads 0 Views 212KB Size Report
Oxford OX1 3PJ. United Kingdom. David Gavaghan. Oxford University Computing Laboratory. Wolfson Building, Parks Road. Oxford OX1 3QD. United Kingdom.
A prototype infrastructure for the secure aggregation of imaging and pathology data for colorectal cancer care Mark Slaymaker Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD United Kingdom Mike Brady Department of Engineering Science Parks Road Oxford OX1 3PJ United Kingdom

Andrew Simpson Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD United Kingdom David Gavaghan Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD United Kingdom

Fiona Reddington NCRI Informatics Coordination Unit 561 Lincoln’s Inn Fields London WC2A 3PX United Kingdom

Abstract In recent years, a significant number of developments across a broad range of disciplines have allowed researchers and clinicians to start to build up a picture of cancer development. In this paper we report upon the development of a prototype of a secure distributed infrastructure that links imaging data from pathology and radiology. The intention is that a fully-developed system will be capable of supporting studies that will examine whether prognostic and diagnostic features which are apparent in histopathological sections and clinical scans are related. Further, these studies will consider whether these features can be meaningfully linked into a diagnostic or predictive profile. The project in which the prototype is being developed naturally involves a large degree of cooperation across various disciplines. The focus of this paper is primarily on the development of the underlying prototype infrastructure.

1

Introduction

The figures pertaining to colorectal cancer within the United Kingdom make interesting reading. For example, in [1], the Office for National Statistics reported that about

Phil Quirke Department of Pathology Leeds General Infirmary Leeds LS1 3EX United Kingdom

27 800 colorectal cancer cases were registered in England in 2003, with Cancer Research UK detailing a mortality rate of approximately 16 000 in the same year [2]. Similar patterns are being seen elsewhere: for example, in [3] the American Cancer Society estimated that in 2005 about 145 290 people would be diagnosed with colorectal cancer and that about 56 290 people would die of the disease in the same year. Research conducted within the MERCURY (Magnetic Resonance Imaging and Rectal Cancer Equivalence Study) project [4]—which involved eleven centres—suggests that the use of MRI can improve the outcome of patients with colorectal cancer. For example, in [5] the authors establish that MRI predictions can avoid the need for potentially toxic preoperative therapy. By helping improve the interpretation of MRIs, as well as giving a more quantitative measure of surgical quality, patient outcome might be further improved. Furthermore, by linking different image types, communication between radiologists and pathologists may be enhanced. To this end, the National Cancer Research Institute (NCRI) Informatics Initiative has facilitated funded for a demonstrator project to bring together researchers from various fields—in particular, radiology, pathology, image analysis, software engineering, and numerical analysis—to

Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06) 0-7695-2517-1/06 $20.00 © 2006 IEEE

determine the feasibility of developing a platform to facilitate such linking of data to enhance patient care. Potential benefits in this application could include improved staging and prediction of prognosis of cancer, characterisation of novel high resolution imaging techniques in relation to microscopy, and the application of functional imaging to investigation of the mechanisms of novel therapeutics. Broader benefits will accrue by taking these first steps towards developing an informatics platform that can be adapted for use with different types of data across the full range of cancer research. The project, which is due to last 9 months, started in Autumn 2005. The project involves collaborators from four university departments (Computer Science at University College, London, Engineering Science and the Computing Laboratory at the University of Oxford, and the Pathology department at Leeds University) and the Royal Marsden Hospital, with project management coming from the NCRI. In this paper we report upon the development of the underlying computing infrastructure to support this linking of data from disparate sources. The ultimate success or failure of the overall demonstrator project will be dependent upon the ability to provide the correct data to the various algorithms that will make use of it in a secure and ethical fashion. To this end, we give particular attention to issues of security The structure of the remainder of this paper is as follows. In Section 2 we provide an overview of the patient journey through diagnoses with a view to providing the context in which the work described is being undertaken. Then, Section 3 provides an outline of the aforementioned NCRI demonstrator project. Next, in Section 4, we detail the architecture of the proposed solution, which has been informed by work previously undertaken within the eDiaMoND project [6, 7]. Finally, we summarise the contribution of this paper.

2

The process of diagnosing and treating colorectal cancer

In this section we consider the process of diagnosing and treating colorectal cancer. We start by discussing the roles played by MRI, CT, and PET, before providing an overview of the overall process. We then consider the role played by macro slides of excised samples.

2.1

primary tumour site is assessed to determine if the tumour has extended into the adjacent fat and involved the adjacent tissue planes, including the mesorectum. This gives the T stage of the tumour. The detection of lymph nodes and their possible infiltration by the tumour is also crucial. Distant metastases, for example to the liver or lungs, are assessed separately using Computed Tomography (CT) and in some patients using Positron Emission Tomography (PET).

2.2

An overview of the process

The following describes a possible workflow for a patient suspected to be suffering from colorectal cancer (CRC). The patient, feeling discomfort, visits his or her GP. Consequently, the GP either reassures the patient or refers the patient to the local hospital where they are seen by an oncologist and/or a radiologist. Typically, the oncologist sends the patient to a radiologist to have a CT scan, which is a 3D x-ray image of the region of interest. These images can reveal some suspicious dense areas that could be cancer; they may also give early evidence that, in the case of CRC, there is already strong evidence of metastasis, most often to the liver. On the basis of the CT examination, the following patient management options are available. 1. The patient proceeds straight to surgery. During the surgery the surgeon ‘resects’ (cuts out) the tumour, on the basis of the information provided by the CT scan. 2. The decision is taken to send the patient for ‘palliative care’. This is the case patients whose tumours are considered too advanced to benefit from either surgery or therapy, or patients who are judged too weak to undergo therapy or surgery. 3. The patient may be sent for MRI scans, which provides information that is complementary to that provided by CT—primarily contrast between different soft tissues. The MRI scans are used to confirm the diagnosis and identify other areas affected. 4. The patient is given a course of chemotherapy or radiotherapy, which typically lasts one month, with the primary being to prevent metastasis prior to surgery.

The role of MRI, CT, and PET

Colorectal cancer is first diagnosed using endoscopy and confirmed by histopathology. It is then staged radiologically in order to determine the extent of local and distant disease. Currently, this is best done using Magnetic Resonance Imaging (MRI) to assess local disease extent. The

Note that MRI images may be taken prior to therapy, a second set may be taken prior to surgery (to assess the effect of therapy), and a third set may be taken post-operatively to assess the effect of surgery. Within the context of the project described in this paper, we are concerned only with the first of these possibilities: surgery based on MRI and CT scans.

Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06) 0-7695-2517-1/06 $20.00 © 2006 IEEE

Figure 1. Macroscopic slices

2.3

Macro slides of excised samples

The extracted tumour, together with some flesh is photographed from the front and the back, with the tumour being divided into slices of 3mm thickness (see Figure 1). Each slice is captured in slides with two different resolutions: a low-resolution image is taken at about x20 zoom (macroscopic resolution), and a high resolution is taken in the order of x140 zoom (microscopic resolution). The microscopic slides are analysed by histopathologists to assess type and stage of a cancer. This analysis consists of considering the distribution, shape variation, and staining of cells visible at the higher resolution. Note that histopathology does not play a role this project, although it would certainly feature in any follow-on project. From now on, we shall use the term pathology images to refer to both microscopic and macroscopic images.

3

A demonstrator project

In this section we provide an overview of the project in which the work of Section 4 is being undertaken. We first describe the motivation for, and structure of, the project, before introducing the data used within the project.

3.1

Project structure

The aims of the NCRI Informatics Initiative are to enable the development of an “informatics platform in the UK that facilitates access to and movement of, data generated from research funded by NCRI Partner organisations, across the spectrum from genomics to clinical trials” [8]. The objective of this project is to demonstrate the utility of the multi-scale, multi-disciplinary approach to enhancing the information that can be derived from data collected within a clinical trial, not least by facilitating its linkage to data of the same type from previous studies and to data of different types.

To limit the scope of the 9-month project, but equally to form the context for any possible follow on project, the intention is to develop a prototype grid-based system to relate MRI images to the consequence macro images, and to demonstrate the application of medical image analysis (developed for radiological scale images) to the macro slides, for example to quantify quality of surgical resection. The development of a secure computing infrastructure to link disparate sites is being undertaken at Oxford University Computing Laboratory; the development of a prototype viewing application is the responsibility of the Department of Engineering Science at Oxford; the development of appropriate ontologies and meta-models for longer-term work is being undertaking by the Department of Computer Science at University College, London; and data collection is being undertaken by the Pathology Department at Leeds University and the Royal Marsden Hospital. Finally, the task of coordinating all of this activity is the responsibility of the NCRI Informatics Coordination Unit.

3.2

The data

As part of the MERCURY project (see, for example, [4]), the Royal Marsden Hospital has collected MRI volumes of colorectal tumours from numerous sites. The MRI volumes are all taken prior to chemotherapy/radiotherapy and prior to surgery. They are anonymised and each case has its own MERCURY identifier. These volumes are also accompanied with descriptive metadata such as the MRI position, the extent of the tumour, the surgical plan, etc. The other data resource that the project draws upon is a collection of macroscopic and microscopic slides in digital form scanned by using ScanScopeTM scanners at Leeds University. The ScanScopeTM scanner generates TIFF data that is immediately viewable after scanning using Aperio’s ImageScopeTM viewer. The ImageScope viewer can also be used to annotate these micro images in order to produce annotated micro data stored in an XML format. Currently, each micro image is stored in a custom file format (SVS) created by the scanner and displayed only in its associated viewer. The project will deliver a visualisation/teaching tool, which will display both the MRI volume, with its region of interest (ROI), and the corresponding pathology 2D and 3D images. This would obviously be very useful for comparison reasons between the two views of the same area, and could potentially be used as a training tool for radiologists and pathologists. Readily available tools will be used for viewing the various image types. Having considered the structure of the project and the data to be utilised, we now consider the underlying architecture that is being developed to support the proposed activity.

Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06) 0-7695-2517-1/06 $20.00 © 2006 IEEE

4 4.1

An architecture for the secure aggregation of imaging and pathology data The demonstrator

The main goal of the demonstrator project is to develop a prototype tool to integrate and display pathology and radiology images (MRI volume prior to chemotherapy and surgery) of a case having colorectal cancer by using available grid technology at the database federation level. It is essential that it is verified that both MRI and pathology images come from the same patient cases, and that these two sets of data can be related (in both technical and ethical senses). If this is the case, we can then relate and integrate pathology and radiology images of a case in two ways. The first means of relating these different types of data is via the integration of the full images as entities; this means having available the pathology image one can retrieve together with its associated MRI image/report. The second method is via the association of their corresponding region of interests (ROI); which means if one has a ROI in an MRI volume, then one should be able to retrieve its associated area in the 3D pathology images. Typically, each MRI volume can have more than one ROI, because individual lymph nodes could also be annotated as well as the main tumour, although only one of these annotations would apply to a lesion. Thus, a flexible database schema that can effectively describe all of the above data is necessary. All the data is stored in a federated database that is distributed over several sites, with all of the data servers being exposed via web service interfaces.

4.2

Virtual Organisation

E

P1

I

Data

E

P2

Data

ws1

I ws2

Hospital 1

Hospital 2

Figure 2. Architecture relevant external services then pass appropriate requests to their corresponding internal services, which then use the local access control policy to decide if the user has the necessary access. If granted the data is then returned to the querying user. This is a refinement on the architecture developed within the eDiaMoND project. Here, though, instead of using Globus GT3 [10], it utilises open source web services technologies and standards, with access control policies being captured in terms of XACML (eXtensible Access Control Markup Language) [11]. A further key requirement—after security—of the infrastructure is that it should be lightweight. That is, it should require the minimum number of components to operate effectively. Further, where proprietary solutions have been used, a simple design has been adopted so as to facilitate easy migration to another platform.

Architecture 4.3

The infrastructure that is underpinning the demonstrator is based on the output of the eDiaMoND project (see, for example, [6]). The patient and image data are stored at the site at which they originated, with the data from each location reflecting the specialisation of that site. The selection of MRI and pathology data pertaining to a single case will be achieved by federating data between multiple sites. This relies on a well-designed flexible architecture, an overview is shown in Figure 2. The architecture is based on that of [9]. The architecture shows a virtual organisation (VO). For clarity only two hospitals are shown, but in general a VO can be made up of any number of hospitals. The components within each hospital are: externally facing services (E); internal services (I); access control policies (P), workstations (WS); and the data. A user at a workstation makes a request to access data. This request is sent to the externally facing service, which sends the request to other sites if necessary via communication with those other sites’ externally facing services. The

The database

A ‘real-life’ deployment of the prototype system would potentially involve huge quantities of data, consisting of a large number of relatively small MRI image files along with a smaller number of small macro images and a few very large (of the order of several GB) micro images. The database is underpinning the project is based on the work described in [7], which describes how a relational structure for images stored in the standard DICOM (Digital Imaging and Communications in Medicine) format [12] was developed. The types of data which are being federated across multiple centres are: pre-operative MRI data, post-operative macro slices and post-operative stained micro slices. The MRI images are the only ones in DICOM form, which means that there is a good match between the data pertaining to them and the database schema from the eDiaMoND project—with only minor adaptions being necessary.

Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06) 0-7695-2517-1/06 $20.00 © 2006 IEEE

As the pathology images are not in DICOM format, a different method of dealing with them is required. The two formats that need to be handled are JPG for the macroscopic images and SVS for the microscopic images. We have chosen to employ a method that effectively automatically generates suitable DICOM wrappers for each image file. This has the benefit of allowing maximum flexibility while requiring minimal changes to the existing underlying schema. Another advantage of using DICOM-style wrapping is that it handles collections of images well. A single set of macro images would form a DICOM series and hence be addressed together. This is also the case with the micro images. Furthermore, the two series—macro and micro—can be collected together as a single study. This provides a great deal of flexibility when processing queries. In addition to the data requirement, it is also necessary for the system to facilitate the remote execution of various software components, such as 3D reconstruction algorithms. Initially, this is being achieved by the redeployment of technology developed within eDiaMoND.

4.4

Challenges

The challenges facing the project team can be characterised in a number of ways: • into those that pertain to the short-term, i.e., those associated only with the prototype and those that pertain to the longer-term, i.e., those that would have to be solved if the prototypical solution developed within the project is ever going to be deployed in a ‘real-world’ environment; • into those which are generic to health grid projects and those which are specific to this particular domain; and • into those which are technical, those which are nontechnical. It is not our intention to provide an exhaustive list of such challenges here for two reasons. First, previous papers (see, for example, [13]) have discussed such issues with respect to related projects. Second, this short paper is concerned primarily with the underlying infrastructure to support the project. Rather, we consider those which are of most relevance to this paper. First, the primary challenge facing the infrastructure team is the provision of a robust federated image repository—it is important that the images are available to both the processing algorithms and the viewing software. Commercial solutions are available that provide the requisite robustness but would have the potential to restrict future developments; open source solutions are less robust, but have the benefit of affording future extensibility. To this

end, we are utilising an as pragmatic flexible approach as possible. Second, due to the nature of data under consideration, it is important to ensure that the data only goes to those who are authorised to use it—and this goes for both algorithms and people. The processing location also needs to be considered when making a decision to allow access. As such, despite the prototype status of the system being developed, security is a key concern. We briefly discuss this in the next subsection.

4.5

Security

It will be necessary to demonstrate that only approved users have access to the project’s data. This is necessary even though all the data has been anonymised. Granting access to appropriate people only is an important part of any system operating on sensitive personal data. The approach being adopted within the prototype is aligned with the architecture of [9]. For the purpose of this project, access to the data is being restricted to project partners, with each project partner being able to access data as required. A number of more restricted users will also be set up to demonstrate the access control aspects of the project. Authentication of users is achieved via certificate technology. A certificate authority has been set up to issue certificates to members of the project only. In addition to the issue of certificates, a mechanism for propagating the revocation of a certificate efficiently is currently being developed. For the prototype, we are intending to keep the authorisation process simple. A user will have one of 3 levels of access: none, read, read/write. The majority will have read only access. A small number will have read/write access so that they can add new cases to the database. In the longer term, it is intended that the technology emerging from the GIMI (Generic Infrastructure for Medical Informatics) project [14] will be utilised to offer fine-grained flexible access control across resources.

5

Conclusions

We have provided an overview to an NCRI Informatics demonstrator project that is developing a prototype system that links data from pathology and radiology. The long-term objective is that a deployable system will be developed that builds upon the work undertaken within this pilot project. We have concentrated on the underlying infrastructure, which builds upon our experiences from the eDiaMoND project. In addition, ongoing work within the GIMI project will influence developments with respect to security.

Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06) 0-7695-2517-1/06 $20.00 © 2006 IEEE

Although the project aims only to deliver a prototype solution, we fully intend to demonstrate the feasibility of secure and ethical aggregation of data within a healthcare delivery context with a view to improving patient care.

6

Acknowledgements

The authors wish to acknowledge the financial support provided by the Medical Research Council, the Wellcome Trust, and the Department of Health.

References [1] Cancer: number of new cases 2003, by sex and age. www.statistics.gov.uk/statbase// Expodata/Spreadsheets/D9096.xls.

[10] I. Foster and C. Kesselman. Globus: a metacomputing infrastructure toolkit. International Journal of Supercomputer Applications, 11(2):115–128, 1997. [11] T. Moses. Oasis extensible access control markup language (xacml) version 2.0, February 2005. committee specification. [12] DICOM. http://medical.nema.org. [13] S. Lloyd and A. C. Simpson. The utilisation of clinical data in research health grids: eDiaMoND as a case study. In Proceedings of HealthCare ’06, 2006. [14] A. C. Simpson, D. J. Power, M. A. Slaymaker, S. L. Lloyd, and E. A. Politou. GIMI: Generic infrastructure for medical informatics. In IEEE Computer-Based Medical Systems, 2005.

[2] Large bowel (colorectal) cancer factsheet April 2005. www.cancerresearchuk.org/aboutcancer/ statistics/statsmisc/pdfs/ factsheet bowel apr05.pdf, April 2005. [3] Colorectal cancer facts and www.cancer.org/downloads/STT/ CAFF2005CR4PWSecured.pdf.

figures.

[4] The MERCURY project. www.pelicancentre.com// researchprojects/mercury.html. [5] I.R.Daniels. Colorectal 13-22. Surgery, 91(S1):24–28, 2004.

British Journal of

[6] J. M. Brady, D. J. Gavaghan, A. C. Simpson, M. Mulet-Parada, and R. P. Highnam. eDiaMoND: A grid-enabled federated database of annotated mammograms. In F. Berman, G. C. Fox, and A. J. G. Hey, editors, Grid Computing: Making the Global Infrastructure a Reality, pages 923–943. Wiley Series, 2003. [7] D. J. Power, E. A. Politou, M. A. Slaymaker, S. Harris, and A. C. Simpson. An approach to the storage of DICOM files for grid-enabled medical imaging databases. In Proceedings of the ACM Symposium on Applied Computing, pages 272–279, 2004. [8] NCRI. Strategic framework for the development of cancer research informatics in the UK. www.cancerinformatics.org.uk/ Documents/NCRI Informatics Strategic Framework 31+July.pdf, July 2003. [9] D. J. Power, E. A. Politou, M. A. Slaymaker, and A. C. Simpson. Towards secure grid-enabled healthcare. Software Practice and Experience, 35:857–871, 2005.

Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06) 0-7695-2517-1/06 $20.00 © 2006 IEEE