A Matlab User Interface for the Statistically-Assisted Fluid Registration Algorithm and Tensor-Based Morphometry Fernando Yepes-Calderon.
b,c ,
Caroline Brun, Paul Thompson
a,*
and Natasha Lepore
a,b,*
a University
of Southern California, 900 W 34th St, Los Angeles-CA, USA; Hospital Los Angeles, 4650 Sunset Blvd, Los Angeles-CA, USA; c Universidad de Barcelona, Carrer de casanova 143, Barcelona-Spain; * equal senior author contribution
b Children’s
ABSTRACT Tensor-Based Morphometry (TBM) is an increasingly popular method for group analysis of brain MRI data. The main steps in the analysis consist of a nonlinear registration to align each individual scan to a common space, and a subsequent statistical analysis to determine morphometric differences, or difference in fiber structure between groups. Recently, we implemented the Statistically-Assisted Fluid Registration Algorithm or SAFIRA,1 which is designed for tracking morphometric differences among populations. To this end, SAFIRA allows the inclusion of statistical priors extracted from the studied populations as regularizers in the registration. This flexibility and degree of sophistication limit the tool to expert use, even more, considering that SAFIRA was initially conceived for command line operation. Here, we introduce a new, intuitive, easy to use, Matlab-based graphical user interface for SAFIRA’s multivariate TBM. The interface also generates different choices for the TBM statistics, including both the traditional univariate statistics on the Jacobian matrix, and comparison of the full deformation tensors.2 The interface will be freely dissiminated to the neuroimaging research community. Keywords: Image processing, non-linear registration, Lagrangian framework, Statistical assisted registration
1. INTRODUCTION In TBM, every brain in a data set is nonlinearly registered to a common space that consists of an individual image from the data set, or an average of them. After the registration is performed, statistics are most commonly performed on the Jacobian determinant of the deformation fields. A map is obtained showing where in the brain changes occur, and the statistical significance of these changes. The purpose of these maps is to identify regions in the brain that differ between clinical groups, genders, age groups, etc, or which areas are genetically determined. The two main parts of TBM are the non-linear image registration and the statistical analysis. Non-linear registration accurately matches subjects’ images to a common template and ensures that the same anatomical structures can be compared across subjects, down to the voxel level. Moreover, the amount of deformation required to match a subject’s image to the template is a measure of the deviation of that subject’s anatomy from the template. Such information is encoded in the Jacobian of the transformation at each voxel. Recently, we have presented two main improvements to TBM. Firstly, we introduced a new non-linear registration procedure named Statistically-Assisted Fluid Registration Algorithm or SAFIRA.1 With SAFIRA, the following capabilities are enabled in TBM: 1) fluidly image registration; hence, allowing diffeomorphic, large deformation, 2) Minimization of the deformation tensors from the registration; consequently, setting the groundwork where differences found are attributable to between population dissimilarities 3) Prior information usability; therefore, opening the possibility of including information from the subject population with the aim of improving the quality of the registration. This is done through an energy dissipation term that varies across the image, depending on covariance matrices on the deformation obtained from an initial registration step. Furthermore, standard TBM analyses traditionally are performed on the determinant of the Jacobian matrix, which reflects local volume changes. A few years ago, we developed statistical analyses that take into account the full deformation tensors,2 which encode both the direction and magnitude of deformation, thus gaining Further author information E-mail:
[email protected]
anatomical insights previously unavailable. This increase in statistical power in TBM is of utmost importance in order to diagnose diseases early, when changes in brain anatomy are still small, or in drug trials to detect positive changes or toxicity of a drug on the brain as early as possible. Here we implement a new, easy to use graphical user interface (GUI) for SAFIRA-based, multivariate TBM for dissemination to the neuroimaging research community.
2. SUMMARY OF SAFIRA SAFIRA is based on the viscous fluid registration paradigm,3 in which the template is treated as a viscous fluid, and flowed into agreement with the study. The advantage of fluid registration over the most commonly used elastic registration - where the template is treated as an elastic medium and pulled into agreement with the study - is that it allows for large transformations of the images to be done without shearing or tearing the image medium. This is useful in regions where the template is quite different from the study, as is the case when comparing brain images from different subjects in cross-sectional studies. Our statistical analysis is ultimately performed on the deformation tensors that express the amount of distortion in a voxel from the registration.2 To have a registration algorithm that is consistent with our statistical analysis, for SAFIRA,1 we replaced the standard fluid regularizer based on a Navier-Stokes equation by a new fluid regularizer that regularizes over the deformation tensors, and hence minimizes distortions in volume and shape at each voxel. Another problem with these methods is that in the absence of diffusion data or other information about the structure of the white matter, we do not have a physical model of how the brain deforms from one individual to another. Therefore, one common criticism of TBM and other similar analyses based on non-linear registration algorithms in T1-weighted data is that the results may in part be the product of the particular registration code that was used. To remedy this problem, SAFIRA was implemented as a statistical registration algorthm including priors on the data. More precisely, we developed SAFIRA using a Lagrangian formalism for registration, where we derived the standard registration equations in1 from an action which is the integral of the sum of a conservative Lagrangian plus non-conservative work. d ∂L − ∂q dt
∂L ∂q 0
+ F.
∂r ∂q
1 1 2 2 ≡ ∇q Cost − β kq 0 k2 + αReg(q 0 ) + β kq 0 k2 = 0 2 | {z } |2 {z } | {z } ∂L V ' ∂q
d T ' dt
∂L ∂qj
(1)
N C=α∇q0 Reg(q 0 )−βq 0
Where L is the Lagrangian, q the position, its derivate q’ the velocity, T the kinetic energy, V the potential energy and α and β two user-defined registration constants. In addition, the term F is included to model a nonconservative force (NC) acting over the virtual displacement ∂r. All subjects are registered to the common space using the non-statstical version of SAFIRA. The covariances of the displacements and the deformation fields are then computed at each voxel, and the information is reintroduced into the regularizer. A second registration is then performed with the new regularizer starting from the original, non-registered images. The new statistical regularizer registers faster or slower depending on the local variability of the data. This method provides the system with the capacity of controlling the energy dissipation depending on how likely a particular registration direction is. See1 for more details of the method and implementation. The original version of our GUI uses the squared difference between intensities as a cost function, though other ones exist such as mutual information or cross-correlation, which we will implement in the near future. The square intensity difference works well for subjects of similar ages and for data acquired from a single scanner and acquisition protocol.
3. METHODS: MASTIP ARCHITECTURE AND FEATURES While the SAFIRA algorithm has been used successfully in the past,1 previous versions were difficult to run for the non-expert user. The algorithm runs with four different choices of registration, depending on the statistical prior to be applied: 1. No statistics 2. Using statistics that include the inverse of the covariance matrix on the displacements. 3 Using statistics that include the inverse of the covariance matrix on the strain tensors and 4. Using both 2 and 3. In addition, there is a flexibility as to the choice of similarity criterion for the cost function (intensity similarity, mutual information, etc.). Finally, in some applications, one might prefer speed rather than accuracy, or to create a modified version of the regularization. To allow all of these possibilities within a simple framework, we have created MASTIP, a Matlab-based testing graphical user interface for SAFIRA-based TBM (see e.g. Fig. 1). MASTIP exploits the easiness of interpretation, indexing capabilities and extensibility of JavaScript Object Notation (JSON) structures. With the JSON parser, five structures are created. 1. The imListJObj object that records paths and labels for all the image resources to be used in the system. 2. The spectsFunJObj object that records the path, input and output specifications of each registered function. 3. The pipelJObj that saves the order in which functions must be called into the pipeline and the list of variables the user must provided for execution. 4. the callseqJObj that registers the calling sequences and the used resources in fail and success of the pipeline and 5. The globalsJObj used to create entry forms which values are used in run time. None of these functions save graphics or complex data, their contents are only strings defining the types of data and absolute references to resources, thus this structures are transportable and light. Moreover, they can be parsed by any third party script using the JSON parser standard. The user can create a portable version of the pipeline that will compress the used functions, the JSON files, and a quantity of resources defined by the user which will serve as samples.
Figure 1. Mastip running a SAFIRA registration process. The image manager allows for multiple image loading and labeling. The functions manager enables function ordering in the pipeline plus actual and previous function parameters visualization. The output administration panel allows for results visualization and saving. The user entries panel is dynamically loaded from the globalsJObj structure.
Figure 2. MASTIP flexibility is built around JSON structures which are available through the whole pipelining operation. The functional blocks included in this simplified model warrant fault tolerance, pipeline provenance and argument coherence among the functions included in the pipeline. Some JSON updating functions have been obviated.
MASTIP has been created to test SAFIRA under different conditioning and provide the flexibility to modify all its variables and rapidly assets results. To this end, we use JSON-like containers to keep track of all the resources, user defined variables and feeding information such as the covariance matrices. The flexibility, high structural degree and extensibility of JSON is inherited in MASTIP to suite the necessities of SAFIRA implementation.
3.1 Pipeline obeyed specifications and MASTIP strategies There exists several free and commercial packages that assists the creation of algorithmic pipelines. They can be classified in graphical-compositor and command-line applications. The LONI pipeline,4 VisTrails,5 Triana,6 Galaxy,7 Taverna,8 Pegasus9 and Kepler10 are good examples of the first group; while packages such as nipype11 and matlabbatch ∗ are initially conceived for developers that use a terminal; thus, belong to the second group. In all the applications mentioned above there is a decided intention of enabling parallel computing and clusteringexecution efficiency, this is why most of the efforts done there, have to do with jobs synchronization an multicore processing. MASTIP instead, targets usability as a primary goal, consequently MASTIP is not inherently comparable with any of the referred tools. Nevertheless, it is still a pipeline assisting tool and thus comparable in the common specifications for this kind frameworks. Here, those requirements are listed together with the strategies used in MASTIP to accomplish such as demands. 3.1.1 Jobs encoding The job-encoding define the mechanism used to call the functions into the pipeline and assure high degree of coherence between the inputs and outputs of the actual and precedent function, this concept is better known as execution dependency. Solutions like DAGman † and Soma-Workflow12 require explicit definition of dependencies by the user. PSOM, Swift13 and Nipype are based on futures,14 which creates a list of dependencies in running time. In applications like Kepler, Triana, Taverna, VisTrails, Galaxy and LONI pipeline all the dependencies are defined by graphical abstractions. They also include nesting capabilities to encapsulate complex behaviors and keep the execution line as simple as possible. In MASTIP, all the arguments are mapped to variables with mnemonic vari[Fidx][arg#] and varo[Fidx][arg#] ; therefore for instance, the first and fifth argument of function entry 1 in the pipelJObj will be mapped to variables vari11 and vari15 while the first output value of the same function will be mapped as varo11, all in the same JSON object. This is done automatically by reading the function definition and the number of elements after the return clause in each job. In the same manner, userdefined entries are mapped to variables user[#] and the initial image will be always mapped to the variable data. ∗ †
http://sourceforge.net/apps/trac/matlabbatch/wiki http://research.cs.wisc.edu/condor/dagman/
When all this mnemonic scheme is done, the system checks coherence between the number the inputs required in the actual function and the ones existing in the framework. The feeding will follow position correspondence and a record of these relations will be done in the pipelJObj object. Verifying the correctness of this argument dependency between contiguous functions is accomplished by the arg-coherence block in Fig. 2. Additionally, the mnemonic scheme allows experienced users to create indirect dependencies using the JSON editor and the fact that the variable names provide easy function referencing in origin, position and context. 3.1.2 Fault tolerance In this regard, most of the mentioned solutions provided at least two levels of fault tolerance. A first level takes care of error notifications, while a second one, allows pipeline re-starting capabilities in the job where the execution failed. A third level of this specification consists in testing if the outputs are correctly generated and launching a job several times before labeling it as problematic. In MASTIP, the error handling is kept simple by inheriting the notification capabilities from Matlab. As we do not deal with clustering or multi-core processing, the errors are due to local problems and not to difficulties associated to communications or lack of synchronization. This local-scope of errors is profited by adding debugging and step by step outputting capabilities. Initially, the spectsFunJObj Object is automatically assigned an STOP label in the function entry where the argument coherence routine either finds an error or the end of the pipeline. The user can move the STOP label up, but wont be allowed to move it downwards. Moreover, the system will run the pipeline until it founds the STOP label -automatically or manually set- and will activate the visualization and saving functions for the outputs of the last function successfully evaluated in the pipeline. 3.1.3 Pipeline Provenance Tracking provenance is the cornerstone of the scientific method. Through this specification the developers of any pipeline assisting tool are encouraged to assure reproducibility. MASTIP envisioned this requirement from scratch and created the callseqJObj for this purpose. In this object all the tries are saved as text calling-sequences with error recording in case they exist. Note that this log is operative even in failure case, as opposite to similar software taking care of provenance, where this tracking is done only in success case. This is possible due to the ”fat-free” property of the JSON structures. When the ”create portable” function is launched, all the JSON structures are compressed together with the outputs and resources folders. These two folders will be filled with as many data as the user defines for prospective support purposes.
3.2 Execution sequence without exceptions MASTIP is a final-user-oriented tool that presents all its controls in only one screen. This has been done with the aim of accelerating the learning curve while keeping the simplicity in its design. To start, the user must define a path which is going to be used by MASTIP for repository purposes. Then, the user is prompted to load the data that can be either in a folder of the local computer, or on a remote site with a granted access. Once a path to the data is determined, the controls associated to the pipeline are activated. Here, the user can either use the predefined defaults, or load functions, change the execution order or define when the system must stop the execution. If at least one function is listed in the functions manager, the system enables the controls for execution. MASTIP verifies the input-output scheme of each listed function and whether it is coherent with following or preceding functions. It also reads the user-defined values for which entry fields are dynamically created with the globalsJObj. When a pipeline is successfully completed, the system enables both the results visualization and the saving operation. All these actions write in the JSON structures in the background, unless an expert user does it explicitly by activating the JSON editing functions. The function -create portable- will create a backup [timestamp] folder with as many samples as the user defines, together with the resulting outputs, the functions in the pipeline and modified versions of the JSON structures, where the absolute paths will be changed by referenced paths. In the case of SAFIRA, MASTIP has to run two pipelines. One in which the target is the created based on the population-based data, and the second that runs the methods described in Section 3. Saved pipelines are transparently loaded by MASTIP, provided that the JSON structures exist and the resources recorded in those structures are available.
4. CONCLUSIONS MASTIP is our response to the difficulties faced when trying to make a specialized utility available to users at all levels of expertise. Normally, when testing all the possibilities in SAFIRA regarding its statistical priors or the cost function, the user might need to go inside the source code and modify its structure accordingly to the specific needs. MASTIP enables sequential testing, and its simple yet powerful debugging scheme allows the user to easily explore and find the SAFIRA version that best suits the project’s necessities. This is particularly useful in registration tasks, where there is not a rule of thumb to define which registration setup is better. MASTIP has been gaining a high degree of generalization and we are now in the process of implementing our other registration algorithms through it. It is comparable to similar approaches such as the Pipeline system for Octave and Matlab (PSOM),15 Pipeline tool for Diffusion MRI (PANDA)16 and many others found in the NITRC site (search string: matlab pipelines), however, those solutions prioritize bulk processing efficiency, while our software is focused on intuitiveness and code re-usability. We conceived MASTIP with the following core criteria: 1. Intuitive GUI, in order to support non-programmers, 2. JSON-based, so as to have unlimited organizational flexibility and easy integration with other platforms and 3. Intrusive, the platform keeps track of the calling sequences and outcomes. Moreover, the portable function paradigm allows for a consistent packing, thus additional users will find a group of files coherently arranged and fully executable codes with all the needed resources.
REFERENCES [1] Brun, C. C., Lepore, N., Pennec, X., Chou, Y.-Y., Lee, A. D., Barysheva, M., de Zubicaray, G. I., McMahon, K. L., Wright, M. J., and Thompson, P. M., “Statistically assisted fluid image registration algorithm - safira,” 1, 2 (2010). [2] Lepore, N., Chou, Y.-Y., Lopez, O. L., Aizenstein, H. J., Becker, J. T., Toga, A. W., and Thompson, P. M., “Fast 3d fluid registration of brain magnetic resonance images,” Proc. SPIE 6916, 69160Z–69160Z–8 (2008). [3] Christensen, G. E., Rabbitt, R. D., and Miller, M. I., “Deformable templates using large deformation kinematics - image processing,” IEEE Transactions on Image Processing 5(10), 1435–47 (1996). [4] Dinov, I., Van Horn, J., Lozev, K., Magsipoc, R., Petrosyan, P., Liu, Z., MacKenzie-Graha, A., Eggert, P., Parker, D. S., and Toga, A. W., “Efficient, distributed and interactive neuroimaging data analysis using the loni pipeline,” Frontiers in Neuroinformatics 3(22) (2009). [5] Callahan, S. P., Freire, J., Santos, E., Scheidegger, C. E., Silva, C. T., and Vo, H. T., “Vistrails: Visualization meets data management,” in [Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data], SIGMOD ’06, 745–747, ACM, New York, NY, USA (2006). [6] Harrison, A., Taylor, I., Wang, I., and Shields, M., “Ws-rf workflow in triana,” Int. J. High Perform. Comput. Appl. 22, 268–283 (Aug. 2008). [7] Goecks, J., Nekrutenko, A., and Taylor, J., “Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences,” Genome Biol. 1 (Aug. 2010). [8] Oinn, T., Greenwood, M., Addis, M., Alpdemir, M. N., Ferris, J., Glover, K., Goble, C., Goderis, A., Hull, D., Marvin, D., Li, P., Lord, P., Pocock, M. R., Senger, M., Stevens, R., Wipat, A., and Wroe, C., “Taverna: Lessons in creating a workflow environment for the life sciences: Research articles,” Concurr. Comput. : Pract. Exper. 18, 1067–1100 (Aug. 2006). [9] Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G. B., Good, J., Laity, A., Jacob, J. C., and Katz, D. S., “Pegasus: A framework for mapping complex scientific workflows onto distributed systems,” Sci. Program. 13, 219–237 (July 2005). [10] Ludascher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E. A., Tao, J., and Zhao, Y., “Scientific workflow management and the kepler system: Research articles,” Concurr. Comput. : Pract. Exper. 18, 1039–1065 (Aug. 2006). [11] Gorgolewski, K., Burns, C., Madison, Clark, D., Halchenko, Y., Waskom, M., and Ghosh, S., “Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python.,” Front Neuroinform 22 (Aug. 2011). [12] Laguitton, S., Rivi`ere, D., Vincent, T., Fischer, C., Geffroy, D., Souedet, N., Denghien, I., and Cointepas, Y., “Soma-workflow: a unified and simple interface to parallel computing resources,” (Sept. 2011).
[13] Stef-Praun, T., Clifford, B., Foster, I., Hasson, U., Hategan, M., Small, S. L., Wilde, M., and Zhao, Y., “Accelerating medical research using the swift workflow system,” [14] Baker, H. G. and Hewitt, C. E., “The incremental garbage collection of processes,” ACM SIGPLAN Notices 12(8) (1980). [15] Bellec, P., Lavoie-Courchesne, S., Dickinson, P., and Lerch, J. P., “The pipeline system for Octave and Matlab P SOM },00 F rontiersinN euroInf ormatics 6(2012). [16] Cui, Z., Zhong, S., Xu, P., He, Y., and Gong, G., “Panda: a pipeline toolbox for analyzing brain diffusion images,” (2013).