High Performance Computing (HPC) - CiteSeerX

5 downloads 3615 Views 409KB Size Report
High Performance Computing (HPC) in Medical Image Analysis (MIA) at the. Surgical Planning .... the SPL has a dedicated web server and firewall system. 3.1.2.
High Performance Computing (HPC) in Medical Image Analysis (MIA) at the Surgical Planning Laboratory (SPL) Ron Kikinis, M.D., Simon Warfield, Ph.D., Carl-Fredrik Westin, Ph.D. Surgical Planning Laboratory, Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Abstract This paper outlines some of the usage and applications for HPC in the medical image analysis field. As opposed to traditional HPC work that focuses on developing new optimization strategies and improving the implementation of existing environments, the work reported here is focussing strongly on the utilization of HPC technology (both commercial and public domain) in an application driven clinical and research environment. Work performed at the Surgical Planning Laboratory (SPL) of Brigham and Women's Hospital and Harvard Medical School in Boston is used as an example of this type of activity.

1 1.1

Introduction General

Postprocessing of digital diagnostic imaging data allows the extraction of quantitative measures and the generation of complex visualization. This can be used for monitoring of disease progression, diagnosis, preoperative planning and intraoperative guidance and monitoring. Postprocessing adds value to medical images. However, successful postprocessing requires complex and optimized processing systems. Development of such systems is challenging and requires multiyear interdisciplinary collaborations for a successful outcome. In order to achieve working and robust solutions, both the medical problems and the data acquisitions have to be carefully selected and optimized. Research in Medical Image Analysis (MIA) is currently pursued by a small, international community (this community has an annual meeting which is called MICCAI Conference). Some of the results of this research have already found their way into commercial products. Several additional concepts are currently being developed into products. Many new applications are

being researched right now and hold promise for a continuing stream of successful clinical applications.

1.2

High Performance Computing (HPC) in MIA

MIA is not a traditional field of application for HPC. The cultures of HPC and MIA are very different. In HPC, the application people are typically accessing the supercomputer centers and have to have their own funding for the work on the HPC machines. Some centers donate the CPU time to the applications scientists, but the effort to develop the HPC application is typically funded by the application. In contrast to this, MIA is typically happening in a medical environment, because access to physicians and patients is the most limiting step in this type of research. This requires computer scientists to spend multiple years in cross disciplinary training and building of collaborations with medical personnel. In this environment, unscheduled access to resources is of critical importance. Because the field is relatively young, a lot of the initial work did not require very sophisticated and computationally expensive algorithms. The maturation of the field is now changing this. Furthermore, the amount of data produced by the different scanners was relatively moderate until recently. Several developments are now moving the field of MIA towards the use of HPC techniques. Specifically, this is because of the increase of computational requirements, of the data volume and of the intensity of electronic utilization of the data. Table 1 provides an overview of trends in these three areas. During the development phase of an algorithm, the only aspect that is of relevance are the computational demands of a given algorithm. Since it takes several years to develop working analysis systems in this field, it becomes necessary to make assumptions on the hardware that will be at the right price performance point at the time where the software system will be ready for widespread clinical use. This is one of the motivations of

Table 1 : Evolution of demands on the computational resources. Volume of imaging data

diagnostic

10-20 MB per patient scan

20-120 MB in routine, 1-4 GB in research mode

Gigabytes per routine scans

Computational demands by algorithms

Supervised classification

statistical

self adaptive iterative statistical classification

self adaptive classification modulated by anatomical and pathological knowledge

Proof of concept demonstration projects, not economic

economic in certain niche applications: OR's remote reading

standard everywhere

Digital accessibility

using HPC hardware and software for research in this field.

2

scan

in

adaptive behavior at the expense of higher computational requirements. In some cases, the progression will continue to implementation on dedicated hardware and finally, revert back to generic desktop machines at significantly higher performance levels (due to the time that has passed between the initial development and the widespread deployment). Some processing pipelines are not suited for hardware acceleration, so not every algorithm goes through all of these steps, but many do.

Evolution of algorithms

In most cases algorithm evolution in medical image analysis begins with a concrete imaging problem. A computer scientist will assemble a library of test cases and will begin development of a processing pipeline or network that will solve the problem. The initial testing is typically done on small sized subsets of the data so that testing cycles are relatively short. The code generated is typically "messy" and not optimized. Once that the basic algorithm concept is developed, it is necessary to further validate and test the robustness of the algorithm by applying it to a larger number of cases. This typically requires adding I/O libraries and improvement of the speed by adding algorithmic accelerations through approximations and speed-ups. Finally, the inner loops get optimized and threaded. Where necessary, cluster oriented capabilities are added. When the algorithm is proven in such a fashion, the usage shifts to routine use. At this point, it becomes necessary to run batches of jobs with the data and algorithm.

The following text outlines some of the usage and applications for HPC in the medical image analysis field. As opposed to traditional HPC work that focuses on developing new optimization strategies and improving the implementation of existing environments, the work reported here is focussing strongly on the utilization of HPC technology (both commercial and public domain) in an application driven clinical and research environment. Work performed at the Surgical Planning Laboratory (SPL) of Brigham and Women's Hospital and Harvard Medical School in Boston is used as an example of this type of activity. The text is organized as follows: The material and methods section summarizes the environment in which the reported work was done. The results section uses a number of concrete examples to demonstrate the principles outlined in the material and methods section and finally in the discussion the results and the concepts introduced are put into a wider framework.

In addition to the basic mechanism discussed above, there is another evolution where the scientists that are developing new algorithms become used to having higher performance available. This will result in algorithms that will have more robustness and self-

Table 2: Migration of algorithms from research to clinical routine. Stage

Hardware

Code

Data

User

Concept development

Workstation

Simple code

Small subsets (2D or subvolumes)

Computer scientist

Small batch testing

Workstation/Server

Optimization/Threadi ng

Full data, small series

Computer scientist / Application scientist

Clinical studies

Server / Cluster

MPI, LSF

Larger series/ clinical research / routine

Application scientist/ Technicians

First products

Dedicated Hardware

Specific API

Clinical routine

Technicians

"Massmarket" products

Workstation

Highly code

Routine use

Technicians

optimized

3 3.1

Material and Methods The environment at the SPL

Because of the necessity of unscheduled access to large resources which can occur in some situations, the traditional mode of operation of HPC groups doesn't work for MIA applications. The extreme example is surgical planning, where surgical procedures sometimes occur on very short notice without the possibility of scheduling CPU time on a compute server. In such a situation it has to be possible to interrupt other, properly scheduled, activities. This is in stark contrast to the way most computer centers work. This is one of the justifications to build up an HPC environment inside a hospital. Such an environment would then allow the majority of the development cycle (see Table 2) to take place in that research lab. The personnel of such a facility has to be experienced in medical image analysis and in the different HPC techniques. Typically, this requires academic computer scientists with multi-year exposure to this type of research. 3.1.1

conventional Ethernet available for high throughput performance. The general purpose network is currently being updated to an architecture based on a Gigabit Ethernet backbone and switched 100 base T Fast Ethernet

3.2 3.2.1

Legacy hardware

In the SPL, a massively parallel CM-200 (Thinking Machines Corporation) with 16k processors has been used for interactive segmentation and fast volume rendering of image volumes since 1990. A Model 3 Power Visualization System (PVS) from IBM with 32 i860 and 512MB Ram has been used for some of the segmentation work since 1992. The programs used on both systems were developed in the SPL and allowed gaining some experience with parallelization and HPC issues in our domain. However, in both cases, the code had to be optimized to the hardware for the necessary performance and is therefore not portable. This experience formed the basis for the current work.

Workstations 3.2.2

The main part of the computer resources in the SPL is based on workstations from Sun Microsystems. Currently, over 65 workstations and compute servers are available in the lab: among them are 8 SPARC-10, 12 SPARC-20, 5 UltraSPARC-1, 1 UltraSPARC-2 and 3 UltraSPARC-30. The majority of our 450 GB of harddisk are connected to a fileserver which is a 4 CPU SPARCcenter 1000 with 256 MB of RAM. In addition, the SPL has a dedicated web server and firewall system. 3.1.2

HPC infrastructure

General Purpose Network

The SPL has its own dedicated network that is separated from the department and is designed to accommodate the high bandwidth requirements of the image processing activities. The three sites of the SPL are connected by multi-mode fiber which was installed by the Brigham and Women's Hospital. Each site has a Xyplex Enterprise Hub with an 655Mbps backplane and seven slots that can hold either conventional Ethernet modules or switched Ethernet modules with 8 ports or ATM modules with two-port daughter cards for either fiber based or UTP 5 based 155Mbps transfer rates. The hubs can provide managed, switched, bridging and routing services to the ports. The hubs, consulting and engineering services for installation and integration into the existing networking environment were donated by Whitaker-Xyplex. Using this dedicated hardware, the three sites are connected by OC-3 ATM fiber optic backbones running at 155Mbps. Each of the three sites has switched Ethernet and

Current HPC hardware

In September of 1996 the SPL acquired two Sun Microsystems Ultra HPC 5000s. Each of these machines is currently configured with 8 167 MHz UltraSPARC CPU's and two gigabytes of shared memory and is equipped with a graphics accelerator, approximately 150 gigabytes of local harddisks in diskarrays connected via fiber channel. At the end of 1997, the lab acquired an additional Sun HPC 6000 compute server equipped with 20 250MHz UltraSPARC-II CPU's, five gigabytes of ram, and a 120 gigabyte local hard disk connected via fiber channels. For the purpose of cluster computing, the three machines are interconnected via a hybrid network. The main network is Fast Ethernet. The two HPC 5000s are also connected via a private SCI network. The HPC 6000 is in the near future connected to the 5000's via a 1Gbps Ethernet. Cluster traffic will be consolidated onto a pure SCI network, using a high performance four-port SCI switch. Most of the results reported here were obtained with the network described above. We are now in the process of establishing a higher performance dedicated networking infrastructure for the cluster computing environment. This network will be based on a SCI switch which will connect all three Sun HPC servers in a low latency network with gigabit performance.

Figure 1: Segmentation paradigm

3.3

Software / Developers environment

We are primarily using the Sun Workshop environment and GNU compilers and editors for most of the initial code development. For distributed memory applications, running on our cluster of SMPs, we use MPI. The development environment consists of the Sun Parallel Development Environment (PDE). We use Sun's RTE (Run Time Environment) to execute MPI jobs. This software is part of Sun's HPC 2.0 package.

3.4

The future of MIA HPC in the context of service offices for PACS

In the medical field, there is a trend toward storage and viewing of digital imaging data (such as CT and MRI) on workstations. Workstation display capability and priceperformance has reached a point where this begins to make economic sense even in clinical environments, and the importance of picture archiving and communication systems (PACS) is growing. Postprocessing will become increasingly easy as the data is available digitally and the required computational performance is available. However, many of the analysis systems are still too complex for use by casual operators. It is therefore likely that the postprocessing will be outsourced to large processing centers, with data transmission over the net. Such centers will operate in a concept comparable to clinical laboratories, where blood samples are centrally processed in large facilities. For one of our applications, MS, we are in the process of establishing such a center today.

4 4.1

Results Overview

The core of algorithmic activity for medical image processing is centered around the issue of segmentation and registration. We approach the segmentation problem as a control theory design problem. We seek to understand images with signal processing techniques that enhance important features of the image, and have designed a feedback control system to generate the desired segmentation (Figure 1). Results of the segmentation get aligned to other data acquisitions and to the actual patient during procedures [Jolesz, 1997]. The components of this segmentation approach are image acquisition, adaptive filtering, statistical classification and explicit anatomical modeling. Finally, the results of the segmentation will be visualized using different rendering methods. The majority of the processing modules in our processing approach are computationally demanding and require some form of parallelization for optimal usability. Table 3 below lists the different modules and give some overview over the parallelization strategies used.

4.2

Feature Enhancement

In order to reduce the noise level and to emphasise image structures of interest, the image data is filtered prior to segmentation. We have clinical applications involving segmentation of MR images which routinely uses anisotropic diffusion for enhancing the gray-level image

Table 3: Overview over algorithms and their parallelization and application. Idea

Method

Parallelization

Enhance selected characteristics

Spatial and frequency domain filtering: convolutions

SMP and MPI style for Fourier transforms [Frigo, 1997]and convolutions

Noise reduction [Gerig, 1992], removal of partial volume artefacts [Westin, 1997]

k-NN, Parzen window

Classify an unknown voxel based on prototypes

Nonparametric supervised statistical classification ([Duda, 1973], [Cover, 1967], [Cover, 1968],[Clarke, 1993],[Warfield, 1996],[Friedman, 1975]

Each voxel treated separately ([Friedman, 1975]. SMP for core, MPI

Classification in different areas of the body [Kikinis 1992], [Huppi 1998], [Warfield, 1995, Warfield, 1996]

EM

Increase robustness of statistical approach through adaptive behavior

Iterates between statistical classification and intensity prediction/correction [Wells, 1996]

Classification step as in k-NN, intensity correction [Wells, 1986]: convolutions SMP

Classification primarily of brain MRI [Morocz, 1995], [Kikinis, 1997], [Iosifescu, 1997]

Intra-subject

Use inherent contrast similarity to align image

Requires entropy and joint entropy computation [Wells, 1996a]

Joint histogram computation, parallelized by computing the histogram of data chunks. Joint entropy from the histogram SMP, MPI

Registration of slices for multichannel analysis [Huppi 1998, Nakajima, 1997]

Inter-subject

Measure mismatch of alignment of two subjects by counting the number of voxel labels that don't match.

Multiresolution alignment using XOR function [Warfield, 1998]

First data is resampled, then a tissue label count to calculate registration, MPI, SMP

Initial alignment for template driven segmentation [Warfield, 1996]

Use rubber-sheet transform to align two data sets from different subjects.

Multiresolution approach with fast local similarity measurement, and a simplified regularization model

Low pass filter, upsampling, downsampling, arithmetic operations, solve systems of equations. SMP

Template driven segmentation [Warfield, 1996]

Generate highly optimized triangle surface models

Pipeline of marching cubes [Lorensen, 1987], triangle reduction [Schroeder, 1992], and triangle smoothing [Taubin, 1995]

Distributed computation of triangle models for each structure of a data set (up to 300). LSF

Visualization for surgical applications and for presentation purposes [Ozlen, 1998], [Chabrerie, 1998],[Chabrerie, 1998a], [Kikinis, 1996]

Render subvolumes separately. SMP MPI

Visualize data before segmentation, interactive editing

Feature Enhancement

Classification

Linear Registration

Nonlinear Registration

Visualization Surface Model Generation

structure prior to segmentation [Gerig, 1992]. By Volume Direct visualization of Shear warp algorithm Rendering volume data without [Ylä-Jääski, 1997], prior processing [Lacroute, 1994], [Saiviroonporn, 1998])

smoothing along structures and not across, the noise level can be reduced without severely blurring the image. For this purpose we use a parallel implementation of the anisotropic diffusion algorithm running on our CM200. Recently, a multi threaded adaptive filtering scheme was implemented in C which takes advantage of parallelism available on our Sun SMP machines. This algorithm is based on steerable filters which conforms to the local structure adaptively [Granlund, 1995]. One of the applications that uses this filtering technology is segmentation of bone from Computed Tomography (CT) images, see section 4.9 below. Convolution involves multiplication and summation of filter kernel coefficients with signal voxels, over the local area that the filter supports. Since the result in each voxel can be calculated independently, this calculations can be done in parallel and thus the speedup for convolution is linear with the number of CPUs. It should be noted, however, for large filter kernels, i.e. kernels (e.g. 9x9x9 voxels) it is in general more efficient to calculate the result of a convolution using the Discrete Fourier Transform (DFT) (Unless the calculations are performed on a massively parallel machine). When performing filtering using the Fourier transform we take advantage of a software package developed LCS at MIT, "FFTW" [Frigo, 1997]. FFTW is a C subroutine library for performing the Discrete Fourier Transform (DFT) in one or more dimensions. We run FFTW under Solaris using POSIX threads. Performing a Fourier Transform using SMP an typical sized CT data set, 512x512x100 voxels, on our 20 CPU SUN HPC server takes about 1 minute. An MPI version of the FFTW routines is available which makes it possible to perform the FFT calculation on distributed memory machines in addition to shared-memory architectures.

4.3 4.3.1

Classification k-NN Classification

Classification is a technique for the segmentation of medical images. The k-Nearest Neighbor (k-NN) classification rule is a technique for nonparametric supervised pattern classification. An excellent description of k-NN classification and its properties is provided in [Duda, 1973]. Each voxel is labeled with a tissue class selected from a set of possible tissue classes. The possible tissue classes is described, in k-NN classification, by selecting a set of typical voxels (prototypes) for each tissue type. Voxels of an unknown class are then classified by comparing the voxel intensity characteristics with those of the prototypes, and selecting the class that occurs most frequently amongst the k nearest prototypes.

The classification of each voxel is independent of neighboring voxels. As such the most straightforward parallelization strategy is to apply the k-NN classification rule to several voxels at the same time, up to the number of CPUs available for computation. Speedup is linear with the number of CPUs. Our SMP implementation uses a POSIX threads based `work pile' to distribute the classification of chunks of the voxel data to each of the CPUs. We have applied this technique to the segmentation of MR scans of patients with brain tumor, MR scans of baby brains, MR scans of the knee and MR scans of patients with multiple sclerosis. 4.3.2

EM Segmentation

EM segmentation is a method that iterates between conventional tissue classification and the estimation of intensity inhomogeneity to correct for imaging artifacts. We model intra- and inter-scan MRI intensity inhomogeneities with a spatially-varying factor called the gain field that multiplies the intensity data. The application of a logarithmic transformation to the intensities allows the artifact to be modeled as an additive bias field [Wells, 1996]. If the gain field is known, then it is relatively easy to estimate tissue class by applying a conventional intensity-based segmenter to the corrected data. Similarly, if the tissue classes are known, then it is straightforward to estimate the gain field by comparing predicted intensities and observed intensities. It may be problematic, however, to determine either the gain or the tissue type without knowledge of the other. We have shown that it is possible to estimate both using an iterative algorithm (that converges in five to ten iterations, typically). The EM algorithm consists of a conventional classification step, an intensity prediction step, and an intensity correction step. Classification is parallelized by classifying different voxels simultaneously, as above. The same is done with the intensity prediction step. Intensity correction primarily involves low pass filtering. This is implemented with a parallel unity gain filtering step [Wells, 1986], that costs only two multiplies per voxel per axis, independent of filter length. We have applied this technique to the segmentation of MR scans of patients with schizophrenia [Iosifescu, 1997], multiple sclerosis [Kikinis, 1997], and normal volunteers [Morocz,1997].

4.4

Linear Registration

Linear registration algorithms typically are used for the purpose of aligning several data sets of the same subject that contain complementary information (e.g. a CT and an MRI scan), see Figure 2. Another application is the initial alignment, as a preliminary step before non-linear

registration, of a canonical data set and the data from a specific subject (see Figure 3). The different algorithms that have been published in the literature [Warfield, 1998], [West, 1997], typically trade off speed (e.g. through feature extraction or subsampling) and robustness and capture range (e.g. simulated annealing). We have developed two different forms of linear registration, one suited to inter-patient registration and one suited to intra-patient registration.

4.5

Intra-patient Registration

The algorithm described here works with the concept of subsampling of the gray scale data for speed-up. Entropy calculations are performed in a histogram feature space. The algorithm is relatively fast, doesn't require any preprocessing of the data. However, it requires a start pose that is relatively close to the final results. In practical terms, the operator will pick three paired landmarks and the algorithm will then calculate an alignment to subvoxel accuracy. Alignment is assessed by using inherent contrast similarity to directly measure image alignment. The algorithm requires entropy and joint entropy computation. Mutual information is defined in terms of entropy [Wells, 1996]. The first term is the entropy in the reference volume. The second term is the entropy of the part of the test volume into which the reference volume projects. It encourages transformations that project the reference volume into complex parts of the test volume. The third term, the (negative) joint entropy of the reference and test volume, contributes when they are functionally related. We use a histogram

based density estimate for the joint entropy estimation. The joint histogram computation, parallelized by dividing data into chunks, computing the histogram of each chunk, and then adding the histograms together. The joint entropy can then be calculated by a loop over the histogram. Registration of acquisitions with different contrasts into multichannel data sets for better segmentation and visualization. Examples include image analysis in neonates (T2/PD - SPGR,) and surgical planning (MRA, SPECT, fMRI, MRI, CT, see Table 3 for references), see Figure2.

4.6

Interpatient Registration

The basic assumption of the MI algorithm described in the previous section is that we have two data sets containing the same structure. In a situation where we are trying to align two data sets of different subjects, this assumption is not. The metrics used conventionally for assessing the quality of alignment of two data sets, such as error minimization in a sparse subsampling of a data set, do not work satisfactorily. Dense feature comparison turns out to be more robust than sparse feature comparison in such situations. Parallelization is used to allow the speed-up of dense feature comparisons, making the application of this technique practical in a clinical context, see Figure 3. The idea is to generate segmentations of the patient scans to be aligned, measure mismatch of alignment by counting the number of voxels that don't match, and find the transform that minimizes the mismatch. Figure 4

Figure 2: Merging of two data acquisitions from the same subject. Left: Brain surface, viewed from the back, as extracted from a T1 weighted MR scan. Right: enlarged detail from the center of the image. The vessels, which are represented in a dark color have been merged, using the MI algorithm. They fit very well into the existing grooves in the brain surface.

Figure 4: Atlas to patient initial alignment. Even after successful execution of the registration algorithm there is a remnant of misalignment which is due to difference in shape. shows a flowchart describing the registration process. Each scan to be registered is classified and a multiresolution pyramid of the classified scan is constructed. An initial alignment is selected as either the identity transform or the transform identified with the process described below. For each level of the pyramid, the optimum alignment is determined by minimizing the mismatch of corresponding tissue labels. Each evaluation of this mismatch is computed in parallel on a cluster of SMPs. The evaluation of a particular transform involves the comparison of aligned data with a two step process. First

the moving data set is resampled into the frame of the stationary data set. Second is the voxelwise comparison of label values. Each of these steps can be parallelized by carrying out the operations simultaneously on some voxels in the frame of the stationary data set. This algorithm was initially developed for interpatient registration such as the initial alignment for template driven segmentation (TDS). TDS is used in many applications such as the quantitative analysis of MS, brain development, schizophrenia, rheumatoid arthritis. More recently, we have begun to utilize the algorithm for intrapatient alignment, if a large capture range was needed.

Figure 3: Flowchart describing the registration process. The imaging data is converted to a multiresolution pyramid of tissue labels, and at each level of the pyramid a registration transform is estimated. The computation of the mismatch between tissue labels is implemented on a cluster of SMPs.

Figure 5: Example for the use of nonlinear registration of surgical procedures. The left image show a large tumor adjacent to the brain area that controls the motoric functions of the body (so called motor cortex). The right image shows the corticospinal tract extracted from a digital brain atlas [Kikinis, 1996] and warped into the expected position. This can be used by a surgeon to assess the location of critical areas during the planning of a surgical procedure.

4.7

Non-linear registration

Local shape differences between data sets can be identified by finding a 3D deformation field that alters the coordinate system of one data set to maximize the similarity of local intensities with the other. Elastic matching aims to match a template, describing the anatomy expected to be present, to a particular patient scan so that the information associated with the template can be projected directly onto the patient scan on a voxel to voxel basis. The template can be an atlas of normal anatomy (deterministic or probabilistic), or it can be a scan from a different modality, or it can be a scan from the same modality. The template can contain information typically found in anatomical textbooks, but unlike normal textbooks, can be linked to any form of relevant digital information. For elastic matching, we are using an approach that is similar in concept to the work reported by Bajcsy and Kovacic [Bajcsy and Kovacic, 1989], and [Collins and Evans, 1992]. However, our implementation uses several different algorithmic improvements to speed up the processing including a multiresolution approach with fast local similarity measurement, and a simplified regularization model for the elastic membrane [Dengler and Schmidt, 1988]. Our matcher, which is implemented in C, is based on essentially the same algorithm as that implemented by Dengler in APL, with a few improvements and modifications. Our implementation uses algorithms parallelized for SMP such as low pass filter upsampling and downsampling, arithmetic operations, and solving systems of equations. Nonlinear registration gets primarily used for incremental

alignment in TDS, following the linear alignment step. It is an integral part of the TDS processing network. In addition, non-linear registration has a role in intrapatient registration, where the patients anatomy has moved (change of position), see Figure 5.

4.8 4.8.1

Visualization Surface model generation

To visualize the surface of structures by simulating light reflection requires generation of models by segmentation. The process of segmentation of the data into binary label maps and application of a surface model generation pipeline consisting of the marching cubes algorithm for triangle model generation, followed by triangle decimation and triangle smoothing to reduce triangle count. The algorithm is parallelized by distributed computation of triangle models for each structure of a data set. Efficient triangle model generation has been used for the visual verification of segmentation procedures, visualization for surgical planning and navigation [Nakajima, 1997], [Nakajima,1997a],[Kikinis,1996]. 4.8.2

Volume rendering

Visualization of structures without the need for the extensive preprocessing required by the surface model approach can be done using volume rendering. This is of benefit if the structures to be visualized are constantly changing. Ray casting and shear warp algorithms are

Figure 6: Flowchart showing a network of processing modules. among the most popular approaches for volume rendering. Among others, we have used a shear-warp algorithm, implemented on a CM 200 [Saiviroonporn, 1998]. The algorithm is parallelized by applying the light transmission model simultaneously to different sections of the data associated with different screen pixels. Visualization of data before segmentation, visualize the magnitude of vector fields, interactive editing of volume data.

4.9

Clinical Applications of HPC in MIA

In the majority of clinical cases, the algorithms that have been discussed are not deployed in isolation, but rather as an iterative network as displayed Figure1. 4.9.1

Segmentation of scans of patients with multiple sclerosis

A specific example for the application of the technology discussed above, is the quantitative analysis of MRI in patients with multiple sclerosis (MS). MS is a disease of the white matter of the brain and spine, which affects over a 300,000 patients in the US alone (ca. 1 per 1000). The patients suffer from this disease for decades. Typically, the patient will have periods of relatively little change which are followed by periods were they perceive more symptoms. While we don't have a good to treat the cause of the illness, there are potent medications available to treat individual breakouts of the disease. Unfortunately, these treatments have all severe side effects and can not be applied on a permanent basis. The physicians are therefore faced with the sometimes

difficult decision, as to when to apply those treatments. MRI offers a direct visualization of the lesions caused by MS in the white matter of the brain. Quantitative measures based on analysis of MRI's of MS lesions are therefore an objective measure for the state of the disease. Such quantitative measures can be used to assess the progression or regression of the MS lesions under treatment. While radiologists have no trouble to recognize lesions, they use anatomical knowledge to identify the white matter and then to look at the changes in signal intensity in the white matter of the brain. The problem that image processing algorithms are facing is the fact that gray matter of the brain and white matter lesions have overlapping signal intensity properties. In order to achieve the same approach using image processing methods we need to generate a mask of the white matter. We have developed an algorithm that can achieve this by mapping a digital atlas from a normal subject into the patients [Warfield, 1996]. Figure 6 provides and overview of the network of processing modules that were used to obtain the automated segmentation which is displayed in Figure 7. For a detailed description see [Warfield, 1996]. First, the data gets filtered with a feature enhancing filter for noise reduction (see above [Gerig, 1992]). Then, an initial classification is performed, based on signal intensity properties using the EM algorithm (see above). A generic digital brain atlas [Kikinis, 1996] is warped into the patient data set using our non-linear warping algorithm (see above). A fast region growing algorithms uses different criteria to reliably identify the neocortical gray matter and the deep gray matter. Criteria for the neocortex are the probability

Figure 7: Single slice out of a brain covering MR acquisition in a patient with MS. The left image is a proton density weighted image, the center image is T2 weighted. The right image shows the results of the segmentation system: The skin and skull have been removed. The white matter of the brain is a bright yellow, the gray matter is a dark gray. The lesions are colorized in a reddish hue and the cerebro-spinal fluid is represented in two different shades of blue. This result was obtained automatically. of location of neocortical gray matter from the warped atlas, signal intensity properties from the classification step and the fact that the neocortical gray matter has the topology of a crumpled sheet. This allows to generate a mask of the white matter of the brain and to search for white matter lesions in that area. The final result of this processing system is a quantitative measure of disease progression derived from imaging data. To date, we have applied this system to over 1500 MRI scans. 4.9.2

Bone segmentation from CT

Surgery of the musculoskeletal system is the fourth largest surgical procedure category. Computer aided image guidance for the planning prior to such surgical procedures and intraoperative navigation during intervention is of increasing importance. To successfully leverage the higher quality and quantity of imaging in minimally invasive scenarios, image information must be provided to the surgeon in a non-overwhelming manner, and without increasing the demand on the surgeon's navigation skills [Jolesz, 1992]. A prerequisite for fullfledged image guidance is the availability of accurate and robust methods for the segmentation of bone. The current state of the art for the identification of bone in clinical practice is by thresholding, a method which is simple and fast. Unfortunately, thresholding also produces many artifacts. This problem is particularly severe for thin bones such as in the sinus area of the skull. Another area where current techniques often fail is

automatic, reliable and robust identification of individual bones, which requires precise separation of the joint spaces. 3D renderings of bone that are based on thresholding are currently available on most state-of-the-art CT consoles. As mentioned, using only thresholding leads to suboptimal and unsatisfying results in the vast majority of the cases. It is seldom possible to automatically separate the different bones adjacent to a given joint (e.g. femur and pelvis) or different fracture fragments. Thus, it is not possible to display the different anatomical or pathoanatomical components of a joint or fracture separately. A 3D visualization of a fractured wrist for example is useless unless each bone and each fragment can be displayed and evaluated separately. Similar problems exist in areas with very thin bone, such as the paranasal sinuses and around the orbits. However, the signal intensities are reversed in the two example scenarios (joint spaces are dark and thin bone is bright in CT data). Accordingly, 3D reconstruction for craniofacial surgery will benefit from improved segmentation results. Here we use local 3D structure for segmentation [Westin, 1997]. A tensor descriptor is estimated for each neighborhood, for each voxel in the data set [Knutsson, 1989]. The tensors are created from a combination of the outputs form a set of 3D quadrature filters. The shape of the tensors describe locally the structure of the neighborhood in terms of how much it is like a plane, a line, and a sphere. Traditional methods are based purely on gray-level value discrimination and have difficulties

Figure 8: Result of segmentation of CT. Top left shows a cut through the skull indicating the location of the slice of interest. Top right shows the gray-level image. Lower left shows the segmentation result from simple thresholding. Lower right shows the result from adaptive thresholding using local shape information. Note that many of the thin bone structures in the sinus areas which disappear when thresholding can be recovered [Westin, 1997].

in recovering thin bone structures due to so called partial voluming, a problem which is present in all such sampled data. The segmentation is based on to what degree a given 3D image neighborhood is planar. Sampling theory shows us that partial voluming artifacts can be overcome by resampling the image using a new signal basis - one which more closely resembles the signal. A tensor description formed by combining the outputs of a set of orientation selective quadrature filters provides us with just such a signal decomposition. In 3D, we can interpret three simple neighborhoods from the symmetric tensor: a planar, a linear and an isotropic case. This analysis, when used to perform adaptive thresholding of CT data has been very effective in recovering thin bone structure which would be otherwise lost , see Figure 8.

4.10 Speedups achieved Speedups obtained depend on many different issues: Whether an algorithm runs on a single machine (SMP) or on a cluster, whether the data can fit into memory and how intensive the communication aspects are. We have found that the combination of threading and MPI is currently not well supported by commercial environments (although this will change soon). Nevertheless, we are able to achieve close to linear speedup's in most cases. In some communication intense applications (e.g. interpatient linear registration algorithm, [Warfield, 1998]) we will need to wait for an improved network setup, before we can achieve optimal utilization of our cluster.

5

Discussion

As was demonstrated in the examples taken from work at the SPL, HPC has a lot of uses in MIA. These uses go far beyond simple speed-up of algorithms that could be explored on workstations. We have discussed use of different parallelization approaches such as threading, cluster computing using MPI and more specialized implementations on SIMD and NUMA architectures. The individual algorithms can be put together to processing networks which can be run automatically or in a supervised fashion. We have made extensive use of HPC concepts and technology for many years, beginning with work on SIMD machines and progressing over time to portable code that incorporates both threading and clustering concepts. In our experience, HPC is an enabling technology that has allowed us to ask questions that would not have been possible otherwise. Beyond the use as a tool for research, HPC will become an important way to make the results of image processing techniques available to a larger percentage of the medical community. The development of concepts and algorithms in medical image analysis and their reduction into practice requires multiple years of work. Conversion of research results into commercial applications requires several years in addition. This is due to safety issues and regulatory requirements specific to the medical field. For commercial application it is typically necessary to rewrite software completely for compliance with FDA requirements for commercial applications. It takes over 5-10 years until the performance of a given HPC machine becomes available in a desktop computer. Doing the initial work in an HPC environment allows to research the potential of the desktop hardware of tomorrow with today tools. In this context, HPC has the potential to be an enabling technology for the development of crucial software tools for the analysis of imaging data.

6

Conclusion

As we have discussed and demonstrated using examples, there are several ways how HPC can be used in medical image analysis. There is the global trend to computationally more demanding algorithms, more data and more people interested in this type of work (see Table 1) and there is the evolution of individual algorithms from conception to routine clinical use (see Table 2). Both of these trends open opportunities for the application of HPC. We strongly believe, that in the near future, the use of HPC techniques will increase significantly in the field of medical image analysis. Availability and accessibility of HPC infrastructure and

applications will development.

7

of

critical

importance

in

this

Acknowledgements:

We would like to thank Marianna Jakab for editorial help. Charles Guttmann and Marianna Jakab provided the MS images. S. Wells and D. Gering provided the linear registration examples. M. Kaus provided the elastic registration examples. R. K. was supported in part by the following grants: NIH: RO1 CA 46627-08, PO1 CA67165-01A1, PO1 AG04953-14, NSF: BES 9631710 Darpa: F41624-96-2-0001, S. W. was in part funded by The National Multiple Sclerosis Society and C-F W. by the Wenner-Gren Foundation.

8

References

Bajcsy, R., and S. Kovacic. 1989. Multiresolution elastic matching. Computer Vision, Graphics, and Image Processing 46:1--21. Chabrerie, A., F. Ozlen, S. Nakajima, M. E. Leventon, H. Atsumi, E. Grimson, E. Keeve, S. Helmers, J. Riviello Jr, G. Holmes, F. Duffy, F. Jolesz, R. Kikinis, P. McL. Black. 1998. Three-Dimensional Reconstruction and Surgical Navigation In Pediatric Epilepsy Surgery. In press in Pediatric Neurosurgery. Chabrerie, A., F. Ozlen, S. Nakajima, M. E. Leventon, H. Atsumi, E. Grimson, F. Jolesz, R. Kikinis, P. McL. Black. 1998a. Three-dimensional Reconstruction for Low-grade Glioma Surgery. Neurosurg. Focus. 4(4). Clarke, L. P., R. P. Velthuizen, S. Phuphanich, J. D. Schellenberg, J. A. Arrington, and M. Silbiger. 1993. MRI: Stability Of Three Supervised Segmentation Techniques, Magnetic Resonance Imaging, Vol. 11, pp. 95-106. Collins, D., T. Peters, and others. 1992. Model Based Segmentation of Individual Brain Structures From MRI Data. In SPIE: Visualization in Biomedical Computing, 1808. Cover, T. M. and P. E. Hart 1967. Nearest Neighbor Pattern Classification, IEEE Transactions On Information Theory, Vol. IT-13, No. 1, pp.21-27. Cover, T. M. 1968. Estimation by the Nearest Neighbor Rule. IEEE Transactions On Information Theory Vol. IT-14, No. 1, pp. 50-55. Dengler, Joachim, and Markus Schmidt. 1988. The Dynamic Pyramid - A Model for Motion Analysis with Controlled Continuity. International Journal of Pattern Recognition and Artificial Intelligence 2 (2):275--286.

Duda R.O. and P. E. Hart. 1973. Pattern Classification and Scene Analysis, John Wiley & Sons, Inc. Friedman, J. H., F. Baskett and L. J. Shustek. 1975. An Algorithm for Finding Nearest Neighbors. IEEE Transactions On Computers. Vol. C-24, No. 10. pp.10001006. Frigo, Matteo and Steven G. Johnson. 1997. The Fastest Fourier Transform in the West. MIT-LCS-TR-728. To appear in the Proceedings of the 1998 International Conference on Acoustics, Speech, and Signal Processing, ICASSP '98, Seattle, May 12--15, 1998. Gerig, G., O. Kuebler, R. Kikinis, and F. A. Jolesz. 1992. Nonlinear Anisotropic Filtering of MRI Data. IEEE Trans. Med. Imaging 11 (2):221--232. Granlund G. H. and H. Knutsson. 1995. Signal Processing for Computer Vision. Kluwer Academic Publishers. ISBN 0-7923-9530-1. Huppi, P.S., S. Warfield, R. Kikinis, P. Barnes, G.P. Zientara, F.A. Jolesz, M.K. Tsuji, and J.J. Volpe. 1998. 3D Visualization and Quantitation of the Developing Human Brain In Vivo. Annals of Neurology, to appear. Iosifescu, D., Martha E. Shenton, Simon K. Warfield, Ron Kikinis, Joachim Dengler, Ferenc A. Jolesz, and Robert W. McCarley. 1997. An Automated Measurement of Subcortical Brain MR Structures in Schizophrenia Neuroimage 1997. Vol. 6, p13-25. Jolesz, F. A. and F. Shtern. 1992. The operating room of the future. Investigative Radiology, vol. 27(4):326-328. Report of the National Cancer Institute Workshop, Imaging-Guided Stereotactic Tumor Diagnosis and Treatment.

Surgical Planning, Model Driven Segmentation and Teaching. IEEE Transactions on Visualization and Computer Graphics, Vol.2, No.3. Kikinis, R., C. R. G. Guttmann, D. Metcalf, W. M. Wells, G. J. Ettinger, H. L. Weiner, and F. A. Jolesz. 1997. Quantitative follow-up of patients with multiple sclerosis using MRI. Part I: Technical aspects. Radiology. Knutsson, H. 1989. Representing local structure using tensors. In The 6th Scandinavian Conference on Image Analysis, pages 244-251, Oulu, Finland. Lacroute, P., and M. Levoy. 1994. Fast Volume Rendering Using a Shear-Warp Factorization of the Viewing Transformation. Paper read at Annual Conference Series, ACM SIGGRAPH, at Orlando, Florida. Lorensen, W. E., and H.E. Cline. 1987. Marching Cubes: A High Resolution 3D Surface Construction Algorith. Computer Graphics 21 (3):163-169. Morocz, I. A., H. Gudbjartsson, T. Kapur, G. P. Zientara, S. Smith, S. Muza, T. Lyons, and F. A. Jolesz. 1995. Quantification of diffuse brain edema in acute mountain sickness using 3D MRI. Paper read at Society of Magnetic Resonance, at Nice, France. Nakajima, Shin, Hideki Atsumi, Abhir H. Bhalerao, Ferenc A. Jolesz, Ron Kikinis, Toshiki Yoshimine, Thomas M. Moriarty, and Philip E. Stieg. 1997. Computer-assisted Surgical Planning for Cerebrovascular Neurosurgery. Neurosurgery 41 (2):403-409.

Jolesz, F. A. 1997. Image-guided Procedures and the Operating Room of the Future. Radiology 204:601-612.

Nakajima, S., H. Atsumi, R. Kikinis, T. M. Moriarty, D. C. Metcalf, F. A. Jolesz, P. McL. Black. 1997a. Use of Cortical Surface Registration for Image-Guided Neurosurgery. Neurosurgery, Vol. 40, No. 6, p-1209.

Kikinis, R., M. Shenton, F. Jolesz, G. Gerig, J. Martin, M. Anderson, D. Metcalf, C. Guttmann, R. W. McCarley, W. Lorensen, and H. Cline. 1992. Quantitative Analysis of Brain and Cerebrospinal Fluid Spaces with MR Imaging. JMRI 2:619--629.

Ozlen, Fatma, Shin Nakajima, Alexandra Chabrerie, Michael E. Leventon, Eric Grimson, Ron Kikinis, Ferenc Jolesz, Peter McL. Black. 1998. The excision of cortical dysplasia in the language area with a surgical navigator, A Case Report. Accepted for publication in Epilepsia.

Kikinis, R., P. L. Gleason, T. M. Moriarty, M. R. Moore, E. Alexander, P. E. Stieg, M. Matsumae, W. E. Lorensen, H. E. Cline, P. M. Black, and F. A. Jolesz. 1996. Computer assisted Interactive Three-dimensional Planning for Neurosurgical Procedures. Neurosurgery 38 (4):640-651.

Saiviroonporn, Pairash, Andre Robatino, Janos Zahajszky, Ron Kikinis, Ferenc A. Jolesz. 1998. Real Time Interactive 3D-Segmentation. Acad Radiol. Vol. 5, p49-56.

Kikinis, R., M. E. Shenton, D. V. Iosifescu, R. W. McCarley, P. Saiviroonporn, H. H. Hokama, A. Robatino, D. Metcalf, C. G. Wible, C. M. Portas, R. Donnino, F. A. Jolesz. 1996. A Digital Brain Atlas for

Schroeder, W., J. Zarge, and W. Lorensen. 1992. Decimation of Triangle Meshes. Computer Graphics 26 (2):65-70. Taubin, G. 1995. A Signal Processing Approach to Fair Surface Design. Paper read at Computer Graphics.

Warfield, S., J. Dengler, J. Zaers, C. R. G. Guttmann, W. M. Wells, G. J. Ettinger, J. Hiller, and R. Kikinis. 1995. Automatic identification of grey matter structures from MRI to improve the segmentation of white matter lesions. Paper read at Proc. MRCAS '95, at Baltimore, MD. Warfield, Simon, Joachim Dengler, Joachim Zaers, Charles R. G. Guttmann, William M. Wells III, Gil J. Ettinger, John Hiller, and Ron Kikinis. 1996. Automatic Identification of Grey Matter Structures from MRI to Improve the Segmentation of White Matter Lesions. Journal of Image Guided Surgery 1 (http://www.igs.wiley.com/synopses/syn16.htm) (6):326-338. Warfield, S. 1996. Fast k-NN Classification for Multichannel Image Data, Pattern Recognition Letters Vol. 17, No. 7. pp.713-721. Warfield, S., Ferenc Jolesz and Ron Kikinis. 1998. Parallel Computing. A High Performance Computing Approach to the Registration of Medical Imaging Data , URL = http://www.elsevier.nl/locate/parco. To appear. Wells WM. 1986. Efficient synthesis of Gaussian filters by cascaded uniform filters. IEEE Trans. Pattern Anal. Mach. Intell. 8:234--239. Wells, W. M., W. E. L. Grimson, R. Kikinis, and F. A. Jolesz. 1996. Adaptive Segmentation of MRI Data. IEEE Transactions on Medical Imaging 15 (4):429-442. Wells W. M., P. Viola, H. Atsumi, S. Nakajima, R. Kikinis. 1996a. Multi-Modal Volume Registration by Maximization of Mutual Information. Medical Image Analysis, Vol.1, No.1, p35-51. West, J, JM Fitzpatrick, MY Wang, BM Dawant, CR Maurer, RM Kessler, RJ Maciunas, C Barillot, D Lemoine, A Collignon, F Maes, P Suetens, D Vandermeulen, PA van den Elsen, S Napel, TS Sumanaweera, B Harkness, PF Hemler, DL Hill, DJ Hawkes, C Studholme, JB Maintz, MA Viergever, G Malandain and RP Woods. 1997. Comparison and evaluation of retrospective intermodality brain image registration techniques. J. Comput. Assist. Tomogr. Vol. 21, No. 4, p.554-566. Westin C-F, A. Bhalerao, H. Knutsson and R. Kikinis. 1997. Using Local 3D Structure for Segmentation of Bone from Computer Tomography Images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR'97), San Juan, Puerto Rico. Ylä-Jääski, J., F. Klein, O. Kübler. 1991. Fast Direct Display of Volume Data for Medical Diagnosis. CVGIP: Graphical Models and Image Processing. Vol. 53.No. 1.p7-18.