Journal of Structural Biology 178 (2012) 139–151
Contents lists available at SciVerse ScienceDirect
Journal of Structural Biology journal homepage: www.elsevier.com/locate/yjsbi
Dynamo: A flexible, user-friendly development tool for subtomogram averaging of cryo-EM data in high-performance computing environments Daniel Castaño-Díez ⇑, Mikhail Kudryashev, Marcel Arheit, Henning Stahlberg Center for Cellular Imaging and Nano Analytics (C-CINA), Biozentrum, University of Basel, Mattenstrasse 26, CH-4058 Basel, Switzerland
a r t i c l e
i n f o
Article history: Available online 8 January 2012 Keywords: Subtomogram averaging Single Particle Tomography High-performance computing GPU computing Classification
a b s t r a c t Dynamo is a new software package for subtomogram averaging of cryo Electron Tomography (cryo-ET) data with three main goals: first, Dynamo allows user-transparent adaptation to a variety of high-performance computing platforms such as GPUs or CPU clusters. Second, Dynamo implements user-friendliness through GUI interfaces and scripting resources. Third, Dynamo offers user-flexibility through a plugin API. Besides the alignment and averaging procedures, Dynamo includes native tools for visualization and analysis of results and data, as well as support for third party visualization software, such as Chimera UCSF or EMAN2. As a demonstration of these functionalities, we studied bacterial flagellar motors and showed automatically detected classes with absent and present C-rings. Subtomogram averaging is a common task in current cryo-ET pipelines, which requires extensive computational resources and follows a well-established workflow. However, due to the data diversity, many existing packages offer slight variations of the same algorithm to improve results. One of the main purposes behind Dynamo is to provide explicit tools to allow the user the insertion of custom designed procedures – or plugins – to replace or complement the native algorithms in the different steps of the processing pipeline for subtomogram averaging without the burden of handling parallelization. Custom scripts that implement new approaches devised by the user are integrated into the Dynamo data management system, so that they can be controlled by the GUI or the scripting capacities. Dynamo executables do not require licenses for third party commercial software. Sources, executables and documentation are freely distributed on http://www.dynamo-em.org. Ó 2012 Elsevier Inc. All rights reserved.
1. Introduction 1.1. Cryo Electron Tomography and subtomogram averaging Cryo Electron Tomography (cryo-ET) images the three-dimensional (3d) structure of cells (Frank, 2006; Lucic et al., 2005). Samples are kept as close as possible to native conditions during their imaging in the Transmission Electron Microscope (EM), and the final computational model that describes the three dimensional structure of the cell is constructed by integrating different views of the same sample. Although at typical working conditions, electron microscopy allows the resolution of features at the nanometer scale, image quality of the constructed 3d model is strongly degraded by different artifacts, such as the missing wedge, and low signal-to-noise due to limited dose to avoid radiation damage. As a consequence, while coarse features of the cell – as membranes and filaments – can be recognized with the naked eye, a quantitative description of the structure of individual macromolecules cannot be reliably recovered without further processing. ⇑ Corresponding author. E-mail address:
[email protected] (D. Castaño-Díez). 1047-8477/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.jsb.2011.12.017
The subtomogram averaging technique (Forster and Hegerl, 2007; Winkler, 2007) aims at overcoming these limitations by identifying and averaging the common signal that can be found in different noisy copies of the same macromolecular compound. This conceptual similarity with Single Particle techniques also shows in the use of cross-correlation as main tool to align data particles to references of increasing quality. Subtomogram averaging is therefore also often referred to as ‘‘Single Particle Tomography’’, when the technique is applied on isolated particles (Schmid, 2011). 1.2. Software for subtomogram averaging Subtomogram averaging is becoming an increasingly common tool for cryo-ET studies (Al-Amoudi et al., 2007; Beck et al., 2004; Brandt et al., 2009; Briggs et al., 2009; Liu et al., 2008; Medalia et al., 2002; Ortiz et al., 2010). Comprehensive software packages that cover a wide range of aspects in the analysis of tomographic data – such as acquisition, reconstruction, visualization, and particle picking – often incorporate support for subtomogram averaging, giving users the possibility to perform the full processing pipeline in the same environment. In this sense, the TOM package (Nickell et al., 2005) provides a framework for the tool AV3 (Forster and
140
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
Hegerl, 2007), and subtomogram averaging utilities of the package PEET (Nicastro et al., 2006) are available from Etomo, part of the IMOD package (Kremer et al., 1996). The jsubtomo package (Huiskonen et al., 2010) has been developed on the basis of Bsoft (Heymann et al., 2008), which incorporates its own subtomogram averaging utility (bfind), as also does the Protomo package (Winkler, 2007). SPIDER’s technique RAMOS (Rath et al., 2003; Renken et al., 2009) is also available. Other packages are specific for subtomogram averaging and address new algorithmic approaches. Bartesaghi et al. (2008) use a spherical harmonics formulation to accelerate numerically demanding computations, and Amat et al. (2010) use noise statistics models to increase the attained resolution. Finally, well-established packages that originated from the Single Particle field are now including extensions for subtomogram averaging tasks. XMIPP includes the ml_tomo tool (Scheres et al., 2009), which implements a maximum likelihood approach for simultaneous alignment and classification of subtomograms (an approach also followed in Stolken et al., 2011), and EMAN2 (Tang et al., 2007) is now distributed with a specific module for Single Particle Tomography based on the algorithms described in Schmid and Booth (2008). 1.3. Dynamo overview Many of the packages mentioned implement the same or a very similar pipeline to perform subtomogram averaging. However, they are not always easy to use, or they are not ready to benefit from available high-performance computing architectures. Finally, the best performing processing approach may differ from sample to sample, so that it would be helpful to be able to try different algorithms at once. With this in mind, we created Dynamo as a new tool. Dynamo performs subtomogram averaging computations, and supports the development of new algorithms. Dynamo combines graphical interfaces with command line capacities to address three main goals: 1.3.1. User friendliness Dynamo allows users an easy way to design a full subtomogram averaging experiment. First-time users will get their first average in minutes. Experienced users can concentrate on the physics aspects of the design of the experiment, as the cumbersome operational and technical aspects are transparently managed by the package. 1.3.2. High performance Dynamo provides a compact platform to design or run experiments in the whole range of high-performance computing (HPC) environments that nowadays are becoming available to an increasingly larger number of users. The software package allows easy installation and operation in clusters of parallel processors, multicore desktops or servers with a single or several graphics processing units (GPUs) and also clusters of GPUs. The algorithm implementation has been adapted in each case in order to adjust the resource management to the different computing architectures and device communication protocols. 1.3.3. Flexibility Dynamo intends to be used not only as closed package for subtomogram averaging, but also as a development tool. The package is structured in a modular form, allowing an easy insertion or replacement of new algorithmic steps. Specific software tools are provided to ensure that this adaptability does not come at the price of a reduced performance. Each of the following sections is devoted to one of these goals. Section 2 presents the software package, leaving for Section 3
and the Supplementary materials a more mathematically oriented introduction of the particularities of the used algorithm. Section 4 discusses the performance of the package in different computing environments. Section 5 details then the mechanisms that allow a natural embedding of algorithms designed by the user into the native structure of the software. Finally, a case study on real data is presented in Section 6. 2. Software concept Fig. 1 depicts the data processing steps that are required before beginning a subtomogram averaging experiment: alignment of tilt series, tomographic reconstruction and identification of approximate positions of particles of interest in the tomograms. The additional use of CTF correction procedures might be necessary towards high resolution (Fernandez et al., 2006; Kudryashev et al., 2012; Xiong et al., 2009). Dynamo assumes that these steps were undertaken previously by the user, preparing a folder with all particle data, i.e., the subvolumes cropped from the tomograms. 2.1. Procedure through the GUI The main GUI presents a unified view on all aspects of a project. It is shown in Fig. 2 along with a sketch of the basic procedure to compute an average from a set of particles. Dynamo divides user intervention into three operational steps (project setup, experiment design, and execution), which we briefly discuss in the following. Concrete guidelines and data formats are detailed in the manual, along with a walkthrough with an example synthetic data set intended to get the user started with Dynamo operations. 2.1.1. Project setup The minimal input required from the user is the location in the file system of a folder that contains the particle data. Optionally, files containing other basic data (such as an initial reference, masks, or an initial guess for the alignment parameters) can be indicated by the user. If no such files are defined, the corresponding information will automatically be taken from default values. 2.1.2. Experiment design The user indicates the number of desired iterations, and specifies parameters that determine exactly, which operations will be carried out at each group of iterations. These parameters might control the refinement loop on all particles (angular range and sampling to scan, frequency filtering, symmetrization or resizing of particles), or might modify the posterior operations (particle selection, classification, averaging) required to construct an updated reference. Additionally, parameters for user-defined functions inserted in the Dynamo flow can also be passed through the main GUI. 2.1.3. Execution The user selects the computing environment intended to perform the calculations, detailing how many CPUs or GPUs will be dedicated to the project. With this information, Dynamo automatically parses the project designed by the user into a command script, which can then be submitted to the computing environment for execution, or stored as part of a more complex project to be launched later. 2.2. Command line procedure The main GUI is the first contact point of new users with the software, providing a simple way to design and supervise an
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
141
Fig. 1. Preprocessing for subtomogram averaging. Processing of TEM data before beginning subtomogram averaging. Imaging the same object in a Transmission Electron Microscope from different orientations yields a tilt series. This is a stack of projection images that need to be aligned and then reconstructed into a tomogram to recover three-dimensional information. The tomogram is then analyzed (visually or computationally) to locate the approximate positions of the copies of the macromolecular compound intended for averaging. The red boxes in the right panel indicate subvolumes of the original tomogram centered on the estimated positions, which will be cropped out of the tomogram for further analysis. A typical subtomogram averaging project might require identifying such data particles from a large number of tomograms. The set of all extracted subvolumes is the input data for a Dynamo project. Availability of additional a priori information, as for instance a rough estimation on the particle orientations, can also be passed to improve the performance of the algorithm.
experiment. As the user becomes more experienced, she or he is likely to devise increasingly more complex experiments, which might not be suited for exclusive handling through the main GUI. This is the case for systematic tasks, such as designing and running similar projects that differ only at the values of some parameters. For this reason, main functionalities of Dynamo are also provided as independent commands (MATLAB functions or Linux executables), which can be invoked by the user in his or her own scripts according to the command syntax specified in the manual. 2.3. Complementary utilities Some additional utilities are accessible both from the main GUI and the command line. 2.3.1. Project I/O Using the GUI, projects can be saved, browsed, reloaded, and/or edited. Once designed in the GUI, a stored project becomes a separate entity that is viewed by the system as an independent command. Specific tools are provided for later editing and insertion into more complex script constructions. 2.3.2. Error checking Errors committed by a user while feeding a project with input parameters might crash the software during runtime, which will require analysis and debugging. Even worse, the program might complete the computation without crashing, which then would lead to wrong or unphysical results. To avoid unnecessary waste of user efforts and computing resources, projects are automatically checked for different kinds of incorrect or incoherent formats or contents. A deeper, more time consuming, level of checking can be manually requested by the user. The different levels of error checking are described in the user’s manual. 2.3.3. Time planning The user can ask for estimation of the total expected computing time. This allows adapting the project parameters to the available computing resources for efficient time management. 2.3.4. Help topics Short descriptions of command functions and the meaning of parameters are available for all items in the GUI through its built-in message console. In the command line, the system inherits
the help function of the MATLAB Toolbox, in form of Table of Contents and linked help topics. 2.3.5. Classification tools As compensation for missing wedge/pyramid in Fourier space was reported to determine the performance of classification methods (Bartesaghi et al., 2008; Forster et al., 2008; Heumann et al., 2011; Winkler et al., 2009), Dynamo adapts the set of MATLAB native tools for hierarchical clustering (Ward, 1963) and Principal Component Analysis (PCA) (Lebart et al., 1984) to the particular Fourier sampling geometry characteristic of tomography projects. These adapted tools are inserted into the pipeline of Dynamo projects. 2.3.6. Preprocessing tools The package includes interactive tools for manual alignment of data sets, allowing the user to create, whenever possible, rough estimations on the initial position and orientation of the particles. 2.3.7. Visualization tools Graphical analysis of intermediate results is crucial for adjusting the progress of an iterative algorithm and also for the correct interpretation of the final results. The importance of visual inspection has been long recognized in the field of Single Particle analysis, where it constitutes a well-established part of the typical working procedure, especially in the early stages of the iteration. Similarity of class averages and reprojections or angular coverage of the identified particles are regularly checked by the user to assess the quality of the last computed step and gather information to decide on the next course of action. Dynamo includes different visualization tools specifically adapted for the needs of subtomogram averaging. The gallery view is the basic tool for simultaneous visualization of groups of density maps. Here the user can execute interactively different operations on selected sets of particles or averaged volumes: symmetrization, alignment of data particles according to the results of a given iteration, filtering and simultaneous visualization of projections of selected ortho-slices along user-defined orientations. Conceptually, this tool combines a functionality typically required in Single Particle analysis, i.e., the visual exploration of sets of bidimensional images in gallery form (as implemented by the v2 or v4 viewers in EMAN or the xmipp_show command in Xmipp) with the wider span of 3d operations required in the analysis of single tomograms
142
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
Fig. 2. Dynamo workflow. The Dynamo GUI. Double headed arrows indicate how different parameter input areas in the GUI relate with the three basic operational steps: (area 1) defines a project, detailing the location of the involved density maps in the file system; (area 2) designs the experiment, with the set of parameters that determine the numerical treatment of the data, including user defined plugins; (area 3) formats and executes the project in the intended computing platform. The area inside the green dashed rectangle, connects the GUI with different viewing and analysis options. Other functionalities accessible from the GUI include time profiling, checking of project consistency, management of stored projects and access to onscreen help.
(as for instance implemented in the different viewers in the Etomo and TOM packages).
A further graphical tool, linked with the particle gallery viewer, facilitates the visual exploration and statistical analysis of the large
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
sets of numerical values generated during the iterative procedure. This tool exposes the interrelation between computed cross correlation coefficients, classification results, particle orientation and other user-defined quantities in different depiction options such as scattered plots, linear or spherical histograms, or spatial distribution according to the positions in original tomograms. Finally, Dynamo also includes wrappers for functionalities that are well covered by free software of common use in the EM community. In particular, UCSF Chimera (Pettersen et al., 2004), and EMAN2 provide fast and robust tools for isosurface representation of volumes. Dynamo generates macros that automate the use of those tools for visualization tasks that arise frequently in subtomogram averaging projects, such as a gallery depiction of averages produced in successive iterations, or location of particles in the tomograms. 2.3.8. Data management The analysis of a single data set by subtomogram averaging techniques may produce large amounts of files, describing averages, geometrical configurations, intermediate results and other information as FSC values, initial data and parameters, process logs or possibly classification result data sets. Typically, the user will run several experiments with different parameters and will then extract different pieces of information from them, which need to be related and analyzed to design further actions. Keeping the overview over all actions performed on a data set and the obtained results can be cumbersome. Dynamo organizes its file structure by a database, and includes tools that allow users to access files by explicit operations on this database. In interactive sessions, this means that all files can be readily accessed by a graphic browser or by command line tools. The database tools are more useful in the context of scripting, as they provide the user with a robust interface to handle data I/O when inserting his or her own scripts into Dynamo’s native workflow.
3. Basic algorithm 3.1. Iterative refinement Iterative refinement based on cross correlation optimization is the most common procedure in subtomogram averaging packages, and it is also the basis of Dynamo. We introduce some notation to discuss the general idea of this approach: the noise-free volumetric density map of the sought molecular structure p is considered to be a common underlying signal to the particles {di}i in the experimental data set. In the most general case, each di is considered to be the result of applying on p: (i) an unknown rigid-body transformation (shift and rotation), (ii) some form of noise, and (iii) the loss of those Fourier components that were not experimentally available while reconstructing the particle (e.g. due to the so-called ‘‘missing wedge effect’’). An iterative subtomogram averaging method creates a series of estimations q(0), q(1),. . ., which approximate with increasing precision the underlying signal p. A generic single iteration of such a method is typically divided into (up to) three parts: 3.1.1. Refinement loop All particles are visited independently by the algorithm, and compared to the starting reference of the iteration. For each particle, the parameters of the geometrical transformation that optimizes its similarity to the reference are determined and stored. This part forces the orientation of each particle to exhibit components of a signal that is potentially coherent across the set.
143
3.1.2. Selection This step is necessary when data particles of lesser quality are to be discarded, or when the experimentalist does not have full certainty about the nature of the data sets and needs to cope with possible conformational heterogeneity or the presence of different particles. The goal of this part is the identification of those particles that do contribute coherently to a common signal.
3.1.3. Averaging Here, the goal is the enhancement of the signal through the summation of the coherent contributions that have been enforced and identified in the previous steps.
3.2. Dynamo implementation Dynamo’s primary algorithm is described above. In addition to these basic operational principles, which are well known in the field of subtomogram averaging, some other, less widely used elements have been incorporated into the software’s workflow. Their rationale has been derived from hands-on experiments and qualitative principles derived from the mathematical formulation exposed in the Supplementary material. Here, we just list and briefly discuss the most relevant points. 1. Use of locally normalized correlation By default, Dynamo uses Roseman’s fast scheme (Roseman, 2003) to constraint the computation domain to a chosen region of interest when comparing template and data particles. 2. Search for global optima at the refinement steps During each refinement loop, Dynamo attempts to align the particles to the template by searching for alignment parameters that deliver and optimum of the similarity between template and particle. In the context of the whole iterative procedure, this policy presents different numerical properties compared to approaches that skip optimality constraints and rather aim to improve at each iteration the results of the previous one. 3. Particle selection through double-thresholding or classification Recognition and characterization of structural variability in the data set and identification of incoherent elements (‘‘bad particles’’) is a frequent requirement to ensure a good alignment. Inversely, classification algorithms will perform better with well-aligned data, a chicken-and-egg problem well known in structural studies on electron microscopy data, both in Single Particle and Tomography applications. Dynamo reflects this interlacing of alignment and classification by incorporating in its workflow the computation of the covariance matrix of the aligned particles at the end of each alignment iteration. Then, the use of this matrix as a basis for classification (using k-means on the PCA components) is also available in the pipeline of a single iteration. The user can then apply the algorithm customization tools of Dynamo to design a policy that uses the results of this pipeline to select, reject or regroup particles for the next iteration. 4. Use of different spatial masks for alignment and classification While deeply intertwined, alignment and classification are different tasks, which exploit the signal in a different way and need to be driven in a different manner. Dynamo incorporates this distinction in its workflow, distinguishing between regions of interests (classically defined by volumetric masks) relevant for alignment, and regions of interest relevant for classification. These points are indicated in their operational context in Fig. 3, which depicts the flowchart of one single iteration of the algorithm.
144
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
4. High-performance computation
4.2. Categories of high-performance computing systems
4.1. Motivation
The popularity of HPC systems has constantly grown in the last years. The continuous increase of computation power of individual processors (Moore’s Law) and of the number of computing units typically integrated in large-scale computation installations has been accompanied by the introduction of new possibilities for high-performing computation, such as the GPU or the Cell (Xu and Thulasiraman, 2011) processor. Moreover, access to these devices or facilities is becoming increasingly common, and for some of them the availability is almost universal:
Three dimensional image processing for Electron Microscopy is prone to demanding large computational resources (Fernandez, 2008), and subtomogram averaging is undoubtedly a good example. This technique inherently relies on computationally intensive tasks, since large numbers (typically at least several hundreds) of three-dimensional density maps need to be analyzed, and the analysis of each one comprises a series of rotations and Fourier transforms of volumetric data. Moreover, the frequent requirement to complement the refinement procedure with classification techniques yields even more computationally intensive algorithms. Fortunately, many of these tasks can be parallelized to reduce the execution time. The need for high-performance computation (HPC) also stems from the necessity of an interactive evaluation during a subtomogram averaging process. Since a completely objective approach is not yet available, the technique is still unlikely to produce results in a totally unsupervised manner, and correct interpretation of results typically requires comparing the output of several runs of the software on the dataset with different parameters and algorithmic variations. This holds true especially when analyzing proteins embedded in a membrane, in a crowded cytosol, or stemming from thick cells, where degraded imaging conditions and overlap of objects can cause artifacts that require careful control of the results. For sample preparations containing isolated copies of the same macromolecule in vitrified ice, the situation is closer to the scenario of Single Particle techniques, where the need for user intervention is reduced, but certainly not absent.
4.2.1. Multicore machines Desktops with twelve cores are becoming common, and eightcore machines are widely used. Laptops with four cores are becoming standard for new mid-range machines. 4.2.2. CPU clusters Research institutions and Universities frequently host parallel computation clusters of varying size. In addition, in many countries researchers can apply for computing time in large computation facilities that are supported by national research councils. 4.2.3. GPU and multiGPU accelerators Regular desktops are customarily provided with graphic cards that can be used to accelerate computations via Application Programming Interfaces (APIs) like CUDA (http://www.nvidia.com), or OpenCL. Besides, vendors also deliver GPU devices exclusively dedicated to computation. Newest generation GPUs include a full range of computing capabilities, aiming to bring features of GPU-based numerical applications close to the familiar ones offered by CPUs, both at the
Fig. 3. Algorithm. Flowchart of a single iteration. The process of updating a reference involves three main stages. The first step is a computationally intensive refinement loop, where each particle is separately aligned to the reference. This individual alignment is performed by an internal loop on a set of Euler angles, confronting the particle to different rigid body transformations of the reference. Upon completion of the refinement loop, a selection step interprets the results of the computed alignment to identify particles of higher quality to perform the last step, the update of the reference by averaging the selected particles. In the figure, elements with a colored underlay showcase particular aspects of the Dynamo flow that have been extensively discussed in the text, along with the used notation.
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
hardware level (e.g. double-precision floating point calculation capacity) and the software level (e.g. extensive math libraries). The variety of such devices includes multi-GPU configurations, in which a single machine controls several GPUs. During the recent past, GPU computation has become caught significant attention, as a cheap alternative to centralized CPU clusters. In Electron Microscopy, this interest has already translated into several packages, notably for Single Particle techniques (Li et al., 2010; Schmeisser et al., 2009; Tagare et al., 2010; Zhang and Zhou, 2010) but also in tomography for procedures as iterative reconstruction (Castaño-Díez et al., 2007; Palenstijn et al., 2011; Xu et al., 2010; Zheng et al., 2011), markerless alignment of tilt series (Castano-Diez et al., 2010) and other generic three-dimensional image processing tasks (Castano-Diez et al., 2008; Gipson et al., 2011). In spite of this progression, the currently available offer in GPU-enabled software packages is still modest in comparison to the CPU-based counterparts. 4.2.4. GPU clusters Some supercomputing centers are integrating into their ‘‘classical’’ CPU cores additional cores attached to one or several GPUs. This interesting progress not only boosts the potential speed-up for an application that can be distributed among several GPUs, but also makes high end GPU devices available to users that otherwise would not have the resources to buy or manage such devices. The availability of high computation resources is rarely a limiting factor for their application. For practical purposes, a more common limitation is the availability and usability of adequate tools for particular tasks, as software has to be specifically adapted to each platform. This step might include major code rewriting to adapt original algorithms to the different architecture requirements, as well as integration of communication protocols in the code. 4.3. Dynamo implementation Dynamo follows different strategies to adapt to all supercomputing platforms described above. This divergence is transparent
145
to the user, who operates a unified interface for all computation targets. Internally, the package exploits two different ways in which the iterative algorithm described previously lends itself to be accelerated, see the scheme in Fig. 4. 4.4. Coarse-grained parallelization: independent processing of particles The loop on all particles to align against a given reference falls in the category of ‘‘embarrassingly parallelizable algorithms’’. This means that the bulk of the algorithm is composed by tasks that can be carried out without any communication among them. When starting a loop, Dynamo distributes the particles marked for further processing in equally populated sets, so that each set can be assigned to a different processing unit. This approach is valid for distributing tasks among different CPUs in a large-scale cluster, among the different cores in a server, among different GPUs in a ‘‘multiGPU system’’ (different GPUs mounted on a single node) or among different GPUs mounted on different nodes in a cluster. (Xu and Thulasiraman, 2011). For large scale computing clusters of CPUs or GPUs, a syntaxparsing device has been provided to guide the user when adapting Dynamo for the particular syntax used by the queuing system that operates his or her cluster. This device analyzes a typical submission script formatted after the queuing system requirement and informs Dynamo about how to create its scripts accordingly. Examples and concrete steps are detailed in the Dynamo manual and the FAQ (Frequently Asked Questions) section of the Dynamo website. This device has been tested for PBS and SGE queuing systems . However, syntax of the queuing system commands is very dependent on design decisions of the system administrator, which might translate into a requirement for minor adjustments on the parsing device. After completion of these steps, task distribution will happen transparently to the user. Internally, multicore computations are driven by the OpenMP (Open MultiProcessor, (Chandra, 2001)) interface, while task distribution among the nodes of a cluster is harnessed through MPI (Message Passing Interface, (Pacheco, 1997)).
Fig. 4. Parallelization. Two setups for acceleration. At the beginning of a refinement loop, the data are distributed into sets that will be aligned separately. Dynamo uses different computational devices to control this level of parallel execution: MPI (large scale clusters of CPUs or GPUs), OMP (multicore) or Pthreads (multiGPUs). Additionally, if GPU devices are available they can be used to accelerate the alignment of each particle, typically with a factor of 10x–20x as compared to an analogous computation in a single CPU.
146
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
4.5. Fine-grained parallelization: FFT acceleration Further, significant speed-up can be gained by accelerating the operations required to align one particle using the GPU. Different steps are involved during the process of rotating a reference and comparing it to a particle, but the most intensive computations are the three-dimensional Fast Fourier Transform (FFT3D), followed by the rotation step. The effective maximally attainable speedup for the whole algorithm is thus given by the maximal speed-up that the combination of hardware accelerator, driver and algorithm can deliver for the FFT3D. While minor steps of the computation (maximum finding, scalar products, normalizations) can attain speed-up factors up to 100x when moving the computation from one CPU of one desktop computer to its GPU card (Castaño-Díez et al., 2008), current FFT3D implementations in the CUFFT library delivered by the NVIDIA vendor for its CUDA computing interface for their graphics cards typically attain speedups in the range of 7x–20x for volume sizes typically found in subtomogram averaging applications (volumes of side lengths between 32 and 128 pixels), imposing an upper limit to the acceleration attainable in a single currently available GPU system. Particle volumes with a side length of 256 pixels or more are rarely reported. Table 1 shows acceleration factors obtained using a multiGPU setting for particle boxes of a side length of 32, 64 and 128 pixels for one alignment round on 120 particles and different numbers of scanned orientations. The table reflects a general feature of the performance of a GPU application, applied to data of different sizes. For data of small size, the measured performance will tend to decrease if the overhead of initiating the GPU computation grows comparable to the computations themselves. On the other side, a large data size can also dampen the performance, as large amounts of data create difficulties when the application tries to layout the GPU internal memory in an optimal way. In our tests in realistic conditions, the best speed-up factors arise for a side length of 64 pixels with the current GPU hardware and GPU interfacing protocols. From the developer’s point of view, design of the GPU applications focuses into ensuring that the data flow among different parts of the algorithm does not create a bottleneck that would prevent the actual realization of the nominal 7x-20x acceleration factor (Castano-Diez et al., 2010). The Dynamo implementation recovers this speed-up factor for single GPUs and scales almost linearly with the total number of available GPUs, both for multiGPU devices (several GPUs mounted on a single server and connected through the CUDA thread management system) and for cluster of GPUs (different GPUs mounted on different computing nodes connected through MPI protocols). 4.6. Classification After completing a refinement loop on all the particles, Dynamo offers an option to compute and store the correlation matrix between the particles aligned according to the computed parameters. This matrix can be used for the different classification methods integrated in the Dynamo flow (PCA and Hierarchical Ascendant), or for user’s custom approaches. The classification part of the algorithm can become very computationally intensive, depending on the experiment parameters. The number of independent elements in the correlation matrix scales with the square of the number of particles (for a set of N 2 particles, the correlation matrix includes N 2þN N independent elements). If scorrelation is the time elapsed during the computation of a single element, the total time consumed by the computation of 2 the correlation matrix will scale as T correlation / N2 scorrelation . For comparison, the total time consumed during the refinement loop is T alignment ¼ jKjN salignment , where salignment is the time consumed to
Table 1 Speed-up factors for one single iteration on a data set of 120 particles, for different angular samplings and different particle sizes. L is the side length in pixels of each particle. The times indicated for multiGPU were obtained on a 3xC1060 Nvidia machine. The CPU times were computed for a single thread in an AMD Phenom 9950 Quad-Core. Number of triplets 200 Orientations
500 Orientations
4500 Orientations
CPU MultiGPU Speed-up CPU MultiGPU Speed-up CPU MultiGPU Speed-up
L = 32
L = 64
L = 128
7 min:2 s 16 s 26x 36 m:45 s 61 s 37x 2 h:38 min 4 m:11 s 39x
1 h:2 min 1 min:8 s 54x 5 h:1 min 5 min:15 s 57x 22 h:28 min 23 m:32 s 57x
6 h:38 min 17 min:15 s 23x 1 day:7 h 1 h:24 min 22x 6 days 6 h:23 min 23x
compute the similarity value between the reference and one particle for one triplet of Euler angles (translational alignment is computed by the Fourier correlation theorem), and |K| being the total number of Euler triplets to be tested for one particle. For large s particle sets, if N approaches the order of magnitude of 2jKj s alignment , correlation Tcorrelation becomes comparable or larger than Talignment. This means that any acceleration gained at the refinement loop stage would become irrelevant, as the matrix computation stage would bottleneck the algorithm. To avoid this, Dynamo offers the possibility of carrying out the computation of the correlation matrix in a parallel way. The approach is similar to the one discussed previously for the loop parallelization, although slightly more complex, as the entities to distribute in equal amounts among the processing units to ensure a good load balance are matrix elements related to particle pairs, not individual particles. This entails a – small – degree of task cooperation that was not required during the refinement loop. 5. Algorithm customization 5.1. Motivation Beyond its primary algorithm, Dynamo also provides users with a flexible tool to test and use their own approaches for subtomogram averaging. Practical features as the use of the GUI for experiment design or transparent distribution of tasks among processors in a parallel cluster are thus directly inherited without requiring from the user any specific effort or technical knowledge in GUI design or parallel computation. This is a suitable framework for users that want to design and test their own ideas. In general, assessing the real resolution gain yielded by slight algorithm variations requires the user to insert his or her modifications into a full subtomogram averaging procedure. Writing a full package from scratch for this goal might be a disproportionate effort. Even the insertion of such modifications into existing third party code can be problematic. Non-experts in computation are banned from this approach, and also experienced software developers will need additional knowledge on the internal flow of the initial software. Finally, changes intended by the user might not be compatible with the internal structure of the initial package. Dynamo incorporates specific tools for user-friendly algorithm extension to overcome these limitations. 5.2. Plugins Obviously, not every possible course of action devised by potential users can be incorporated as an option in a given particular algorithm. Even less can a GUI provide support for passing parameters, especially if a compact visual presentation of the experiment design is to be preserved.
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
147
Fig. 5. Plugins. Dynamo plugins. Plugins allow the use of custom scripts adapted to particular experiments. In any designed experiment, each iteration can be separately modified in three different ways: (left) a custom defined similarity criterion can replace the native locally normalized cross correlation inside the refinement loop; (middle) additional post-processing steps can be performed after completion of the refinement loop and averaging of the aligned particles; (right) a full iteration can be completely substituted with a different approach. While similarity and post-processing plugins can be combined in the same round, iteration plugins exclude the use of any other plugin in the same round.
Instead, Dynamo’s approach is to define insertion points in its flow. Those are selected parts of the processing pipeline of a single iteration that can be replaced or complemented with custom functions. When designing a complete experiment that would typically be embracing several iterations, the user has full control on which custom modifications will be operated in which iterations. Parameters for custom functions are passed over as native Dynamo parameters, either from the GUI or through scripting. In the phase of creating a custom plugin, the user is required to prepare executables that read and write parameters and results from and into locations in the file systems that are consistent with the general Dynamo work flow. Concrete formats and conventions needed for this insertion are indicated in the help functionality of the GUI and extensively detailed in the manual. These executables do not have any design restriction other than compliance with the I/O syntax of Dynamo, so that they can be written in whichever language the user feels comfortable with (MATLAB, C++, Fortran, . . .), and can include tools from other packages such as SPIDER, Xmipp, Bsoft, TOM. . .. The current distribution of Dynamo includes already some plugins. They are intended not only to extend the capacities of the primary algorithm, but also to provide examples to guide users in the construction of their own plugins. Fig. 5 schematizes the different insertion points foreseen in Dynamo. 5.2.1. Similarity This insertion point allows testing different approaches for the treatment of the missing wedge, different algorithms for the numerical optimization of the orientational and translational search, alternative ways to apply direct spatial masks, or even similarity measures not based on cross-correlation. 5.2.2. Post-processing After the completion of one whole iteration, i.e., after all the data particles have been visited in the alignment loop and a new template has been created, the user might still wish to experiment with the introduction of some post-processing step. As example, plugins for enforcing helical symmetrization of the newly computed average are provided. This insertion point is also the natural framework to embed a classification scheme into the iteration. Here the user can insert his/her own classification approach and make use of the natively provided option for computation of the cross-correlation matrix of the aligned particles. The provided plugin examples perform this classification based on PCA and hierarchical clustering.
5.2.3. Iteration This provides a deeper level of algorithm flexibility: a whole iteration can be performed with algorithms completely diverging from the primary Dynamo approach. Dynamo includes some example plugins that insert invocations to other subtomogram averaging packages, formatting them to adapt their input and output to the general Dynamo flow. 5.3. Performance of user plugins Similarity plugins are cleanly managed by the MPI controllers of Dynamo without loss of performance. In practical terms, this feature allows the user to test her or his own algorithms for particle alignment in large-scale parallel clusters without any knowledge of MPI. GPU computations offer less flexibility. Dynamo internally uses two completely different implementations of the refinement loop. One version runs CPU operations in all computing platforms: the same set of scripts is used to perform computations in single core, multicore (shared memory systems), or parallel clusters (distributed memory systems), changing only the API that harnesses the distribution of the different data subsets and their parallel treatment. This version is MATLAB-based and is highly modular. The currently available GPU technologies were not able to support this level of flexibility without incurring a severe performance penalty. For this reason, although post-processing plugins are unaffected, the currently delivered GPU version does not allow the direct insertion of similarity plugins. However, source code and a compilation guide are provided for GPU-experienced users that wish to reuse the code to implement their own algorithmic variations. 6. Case study: Borrelia flagellar motors In order to illustrate the performance of Dynamo on real data, we present here a case study on flagellar motors of Borrelia spirochetes, that are used by bacteria to generate the rotation of flagellar filaments leading to bacterial swimming (Kudryashev et al., 2010). All steps of the analysis reported here were performed with Dynamo tools: preprocessing, alignment, classification, visualization and final volume rendering. 6.1. Data collection The dataset contained 138 particles extracted from tomograms as described before in (Kudryashev et al., 2009). Briefly, three microliters of Borrelia spirochetes in serum and gelatin-free BSK medium
148
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
were transferred onto EM carbon grids and incubated for 5–10 min. After removing excess liquid, grids were rapidly plunged into liquid ethane. Grids were mounted in a Gatan cryo-holder and imaged in a Philipps CM300 electron microscope equipped with a field emission gun and a Gatan post-column energy filter. Tilt series with 2-degree increments totaling 60–65 low-dose images and corresponding to a cumulative dose of under 20,000 e/nm2, filtered at zero energy loss, were recorded on a 2k 2k pixel Gatan CCD camera, at a magnification of 43,000 (0.82 nm/pixel), and objective lens underfocus of 10 microns. Tilt series were aligned by fiducial gold markers and tomograms were generated my weighted back projection with Etomo (Kremer et al., 1996). Particles were manually extracted from non-binned tomograms to boxes of 128 128 128 pixels. 6.2. Alignment and classification The mentioned previous studies had hinted at a possible population diversity in this data set, suggesting that the C-ring of the motor could be absent in a significant fraction of the particles. We set off to confirm this observation, repeating de novo a part of the original analysis and complementing it with additional processing steps. Here, the full procedure comprised three steps: i. Joint alignment of the entire data set Particles were manually aligned by orienting their axes normally to the cell membrane. These roughly aligned particles were added together to provide an initial template used as seed for iterative alignment cycles. In this procedure, we used a total of 16 iterations divided in two rounds. The first eight were operated with binned particles (resulting in cubes of 64 64 64 voxels). The correlation between particle and rotated template was scanned for 700 Euler triplets, distributed in six grid levels of increasing angular resolution as explained in the Supplementary material. The coarsest angular grid allowed the particle axis to drift a maximum of 60° from the previous position. The second round of eight iterations was performed on non-binned particles (128 128 128), scanning in a finer set of 800 Euler triplets, where the coarsest grid allowed for an angular drift of maximally 5°. Eight iterations were completed for this finer distribution, although improvement was very residual after the second iteration. No thresholding was applied during this procedure. C16 symmetry was imposed on the averages obtained at the end of each alignment iteration in order to use them to as seed for the next round. For the construction of the final structure after these 16 iterations, the 50% of the particles that showed the best correlation coefficients were averaged. Fig. 6a, is a representation of the main elements of the obtained density, after imposition C16 symmetry. The stator is shown in red (around the plane marked as zstator) and the C-ring in green (around the plane labeled as zC-ring). The faint black density represents the membrane. ii. Classification of the aligned particles Visual inspection of the average obtained by the previous step suggested that the density associated to the C-ring is less sharply defined than the stator. This is shown in Fig. 6b, in the first column of this panel. There, the top row is a central y-section of the symmetrized average (depicted as ycentral in Fig. 6a). Middle and bottom rows in this column show the density distribution found at slices a zstator as zC-ring. No symmetrization has been used for this representation. In view of these results, in agreement with the observation described in (Kudryashev et al., 2010), we ran a classification on the aligned particles. This process comprised three steps
a. Computation of the covariance matrix of the particles (taking into account the different missing wedges). Individual particles were symmetrized for the computation of the matrix entries. Further, a local classification mask was used in order to constraint the cross correlation computation for each pair of particles to the area around the C-ring. b. Principal Components Analysis on the ground of the obtained matrix. In view of the steep decay of the Eigenvalues (not shown), only five Eigenvolumes were computed. Each particle was replaced by its five coefficients on the Eigenvector basis. c. K-means classification of the data represented by its expansion coefficients. This computationally unexpensive step was repeated several times, in order to evaluate the results yielded by different random seeds. Qualitative inspection of the class averages showed invariably classes with a –more or less sharply – defined intensity in the C-ring and classes where the C-ring became blurred in comparison to the joint average obtained in step i. During this step, we explicitly checked that class membership didn’t correlate with the orientation of the particles. This is important to ensure that the classification algorithm was actually grouping particles in accordance with their structural content and not with their missing wedge. iii. Independent alignment of relevant classes In order to confirm this observation, two subsets of particles were extracted from the data and aligned independently. The first subset comprised particles from the classes whose class averages seemed to keep a C-ring. We selected there the 40 particles with the best cross correlation coefficient with the average produced at step i. The second subset was constructed in an analogous way for the classes without C-ring. We then allowed each subset to evolve iteratively without the interference of the possibly incoherent signal from the other subset. The respective class average was provided as starting reference, and eight refinement iterations were then performed. The result is shown the central and right columns in the panel b. The central column shows that the class that kept the C-ring was able to slightly enhance the signal around it, as can be appreciated in the top row. The right column shows the refinement obtained with particles assigned to the class with blurred C-ring. The C-ring has in fact disappeared, while the rest of the structure remains in place. 6.3. Numerical performance Computations related to the alignment were performed on a multiGPU system comprising three Tesla C1060 GPUs. The computing times required for alignment of the whole data set in step (i) was on 1 h 02 min for the 8 iterations on the binned particles and 20 h for the next 8 iterations on binned particles. With these angular sampling settings, the computation that one single CPU would require to align the whole data set was extrapolated using the actual time measured when aligning a Single Particle with the same angular settings. This yielded an estimation of 65 h for the binned particles and 21 days for unbinned particles, in close agreement with the speed-up factors indicated in Table 1. This agreement was repeated for step iii, where the two classes were refined iteratively, consuming 9 h in the multiGPU system (or an estimation of 8 days in a single CPU). The full process took thus 30 h on the 3x Tesla C1060 multiGPU system, compared with a total estimation of 34 days of computation in a single CPU. It is worth mentioning that no specific stopping criterion was implemented, and the iterative procedure was
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
149
Fig. 6. Application example: Borrelia data set. Performance of alignment and classification tasks was tested on a real data set. a is an isosurface representation of a (c16symmetrized) Borrelia motor. Each marked slice (CCCC) corresponds in this order to a row in panel B. There, first column represents the average obtained previously to any classification step. Second and third column represent the density obtained by refining the two classes obtained by classification. The class in the central column seems to gather particles with C-rings, while the class in the right column has lost its C-ring, while apparently keeping the rest of the structure. Note that the view along y (upper row) has been symmetrized, while in the two z-views (central and bottom rows) the interest features are recognizable without symmetrization.
computing for more iterations than actually needed to obtain qualitatively similar results. 7. Distribution and portability Licenses for commercial third party software are not required. Although the backbone of Dynamo is created in MATLAB, Dynamo is distributed both as MATLAB scripts as well as precompiled executables for Linux or OSX platforms. No licensing is required from launching and operating Dynamo from the Linux/OSX shell. Users who want to use Dynamo from a MATLAB environment (Linux, OSX or Windows platforms) will then need licenses for MATLAB itself. For some functionalities, these users would then also need licenses for the Image Processing and the Curve Fitting toolboxes.
As MATLAB, or the freely distributed MATLAB Compiler Runtime (MCR) device for Linux-based operation, provide support for a vast range of auxiliary tasks, dependence of external libraries is reduced to an strict minimum, ensuring so a high degree of portability and stability of the software in different platforms. Dependences relate only to libraries needed for harnessing acceleration devices: MPI for clusters, OMP for multicore computations and CUDA (version 3.2) for GPUs. The software is freely available upon request through the website http://www.dynamo-em.org. 8. Summary and outlook Dynamo provides a comprehensive tool for subtomogram averaging. Its design facilitates easy operation on all commonly available
150
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151
types of computing resources. Dynamo’s plugin system represents a large growth potential for the package. It enables to adapt its full set of capacities for new algorithmic developments or specific tasks without major programming efforts and without backward compatibility issues. New expansions of the package will be released as libraries of utilities that do not require reinstallation of the software. Because of this combination of algorithmic flexibility and package robustness, Dynamo can provide a valuable tool to promote the exchange and test of ideas among users. While operation protocols and algorithmic backbone are intended to remain stable, implementation details for different computing platforms might be modified in the future in order to tune the software’s performance to new technical developments. The mechanism for coarse-grained parallelization of the refinement loop implemented at present is adapted to the needs of the current algorithm, where time consumed by each particle is expected to be roughly equal, as the same quantity of angles is tested for each particle. Under this assumption, pre-assigning the same number of particles to each available processing unit is an appropriate strategy for load balancing. If further algorithm developments result in methods for particle alignment that introduce a load imbalance, so that different particles would require different computing times (e.g. if some particles would need to be tested against a higher number of angles than others to attain a given tolerance), then a different approach would become necessary to efficiently use all processors during the computation. Such a new approach might then benefit more from multicore systems with shared memory, where current task distribution strategies seem to work sub-optimally. The use of other parallelization devices, as for example Posix threads (Butenhof, 1997), might also lead to improvements of the performance of Dynamo with this kind of platforms. The software architecture of Dynamo’s GPU functionalities will likely see large changes in the future. While the grounds for parallel computation on CPUs appear to be stably set for current APIs, GPU computation is a quickly-evolving field with a fast turnover of tools and interfaces. In particular, MATLAB is in the process of incorporating full GPU support through the CUDA package. We expect that future developments will prepare the ground for an increased flexibility of the GPU tools in Dynamo, allowing their full integration into the plugin system without decreasing their current performance. This maintenance and development of further GPU tools is important from a practical point of view, as our case study shows that the use of the graphic acceleration devices reduces typical computation time scales for the analysis of typical, fairly sized subtomogram averaging problems from days to hours. Acknowledgments We thank Fernando Amat and Nicola Abreschia for careful reading of the manuscript and extended discussions on the algorithm and testing the package. Thanks are also given to Juha Huiskonen for his guidance on the jsubtomo package and clarifications on its mathematical background. Thanks goes to Carlos Oscar Sorzano and Roberto Marabini for fruitful in-depth discussions and help in the installation of the xmipp package and its subtomogram averaging functionalities. Cluster computations were performed at the Swiss National Supercomputing Center (project 234) and at the Computing Center of Biozentrum of the University Basel, Switzerland. We thank for continuous technical assistance of Martin Jaquot. Visualization of spherical histograms uses the MATLAB package ‘‘MATLAB and Octave Functions for Computer Vision and Image Processing’’ written by Peter Kovesi, and is distributed in Dynamo under the copyright terms of an MIT type license.
This work was in part supported by the Swiss initiative for Systems Biology (SystemsX.ch, grant CINA), the Swiss National Supercomputing Centre (CSCS, grant under project ID 274-2011), the Swiss National Science Foundation (Grant 205321_126490), the NCCR TransCure, the NCCR Nano, and the NCCR Structural Biology.
Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.jsb.2011.12.017.
References Al-Amoudi, A., Diez, D.C., Betts, M.J., Frangakis, A.S., 2007. The molecular architecture of cadherins in native epidermal desmosomes. Nature 450, 832– 837. Amat, F., Comolli, L.R., Moussavi, F., Smit, J., Downing, K.H., Horowitz, M., 2010. Subtomogram alignment by adaptive Fourier coefficient thresholding. J. Struct. Biol. 171, 332–344. Bartesaghi, A., Sprechmann, P., Liu, J., Randall, G., Sapiro, G., Subramaniam, S., 2008. Classification and 3D averaging with missing wedge correction in biological electron tomography. J. Struct. Biol. 162, 436–450. Beck, M., Forster, F., Ecke, M., Plitzko, J.M., Melchior, F., Gerisch, G., Baumeister, W., Medalia, O., 2004. Nuclear pore complex structure and dynamics revealed by cryoelectron tomography. Science 306, 1387–1390. Brandt, F., Etchells, S.A., Ortiz, J.O., Elcock, A.H., Hartl, F.U., Baumeister, W., 2009. The native 3D organization of bacterial polysomes. Cell 136, 261–271. Briggs, J.A., Riches, J.D., Glass, B., Bartonova, V., Zanetti, G., Krausslich, H.G., 2009. Structure and assembly of immature HIV. Proc. Natl. Acad. Sci. USA 106, 11090– 11095. Butenhof, D.R., 1997. Programming with POSIX threads Addison-Wesley, Reading, Mass. Castano Diez, D., Mueller, H., Frangakis, A.S., 2007. Implementation and performance evaluation of reconstruction algorithms on graphics processors. J. Struct. Biol. 157, 288–295. Castano-Diez, D., Moser, D., Schoenegger, A., Pruggnaller, S., Frangakis, A.S., 2008. Performance evaluation of image processing algorithms on the GPU. J. Struct. Biol. 164, 153–160. Castano-Diez, D., Scheffer, M., Al-Amoudi, A., Frangakis, A.S., 2010. Alignator: a GPU powered software package for robust fiducial-less alignment of cryo tilt-series. J. Struct. Biol. 170, 117–126. Chandra, R., 2001. Parallel programming in OpenMP. Morgan Kaufmann Publishers, San Francisco, CA. Fernandez, J.J., 2008. High performance computing in structural determination by electron cryomicroscopy. J. Struct. Biol. 164, 1–6. Fernandez, J.J., Li, S., Crowther, R.A., 2006. CTF determination and correction in electron cryotomography. Ultramicroscopy 106, 587–596. Forster, F., Hegerl, R., 2007. Structure determination in situ by averaging of tomograms. Methods Cell Biol. 79, 741–767. Forster, F., Pruggnaller, S., Seybert, A., Frangakis, A.S., 2008. Classification of cryoelectron sub-tomograms using constrained correlation. J. Struct. Biol. 161, 276– 286. Frank, J.e., 2006. Electron tomography: methods for three-dimensional visualization of structures in the cell, second ed. Springer. Gipson, B.R., Masiel, D.J., Browning, N.D., Spence, J., Mitsuoka, K., Stahlberg, H., 2011. Automatic recovery of missing amplitudes and phases in tilt-limited electron crystallography of two-dimensional crystals. Phys. Rev. E Stat. Nonlin. Soft Matter. Phys. 84, 011916. Heumann, J.M., Hoenger, A., Mastronarde, D.N., 2011. Clustering and variance maps for cryo-electron tomography using wedge-masked differences. J. Struct. Biol. 175, 288–299. Heymann, J.B., Cardone, G., Winkler, D.C., Steven, A.C., 2008. Computational resources for cryo-electron tomography in Bsoft. J. Struct. Biol. 161, 232–242. Huiskonen, J.T., Hepojoki, J., Laurinmaki, P., Vaheri, A., Lankinen, H., Butcher, S.J., Grunewald, K., 2010. Electron cryotomography of Tula hantavirus suggests a unique assembly paradigm for enveloped viruses. J. Virol. 84, 4889–4897. Kremer, J.R., Mastronarde, D.N., McIntosh, J.R., 1996. Computer visualization of three-dimensional image data using IMOD. J. Struct. Biol. 116, 71–76. Kudryashev, M., Cyrklaff, M., Baumeister, W., Simon, M.M., Wallich, R., Frischknecht, F., 2009. Comparative cryo-electron tomography of pathogenic Lyme disease spirochetes. Mol. Microbiol. 71, 1415–1434. Kudryashev, M., Cyrklaff, M., Wallich, R., Baumeister, W., Frischknecht, F., 2010. Distinct in situ structures of the Borrelia flagellar motor. J. Struct. Biol. 169, 54– 61. Kudryashev, M., Stahlberg, H., Castano-Diez, D., 2012. Assessing the benefits of focal pair cryo-electron tomography. J. Struct. Biol. 178, 88–97. Lebart, L., Morineau, A., Warwick, K.M., 1984. Multivariate descriptive statistical analysis : correspondence analysis and related techniques for large matrices. Wiley, New York.
D. Castaño-Díez et al. / Journal of Structural Biology 178 (2012) 139–151 Li, X., Grigorieff, N., Cheng, Y., 2010. GPU-enabled FREALIGN: accelerating single particle 3D reconstruction and refinement in Fourier space on graphics processors. J. Struct. Biol. 172, 407–412. Liu, J., Bartesaghi, A., Borgnia, M.J., Sapiro, G., Subramaniam, S., 2008. Molecular architecture of native HIV-1 gp120 trimers. Nature 455, 109–113. Lucic, V., Forster, F., Baumeister, W., 2005. Structural studies by electron tomography: from cells to molecules. Annu. Rev. Biochem. 74, 833–865. Medalia, O., Weber, I., Frangakis, A.S., Nicastro, D., Gerisch, G., Baumeister, W., 2002. Macromolecular architecture in eukaryotic cells visualized by cryoelectron tomography. Science 298, 1209–1213. Nicastro, D., Schwartz, C., Pierson, J., Gaudette, R., Porter, M.E., McIntosh, J.R., 2006. The molecular architecture of axonemes revealed by cryoelectron tomography. Science 313, 944–948. Nickell, S., Forster, F., Linaroudis, A., Net, W.D., Beck, F., Hegerl, R., Baumeister, W., Plitzko, J.M., 2005. TOM software toolbox: acquisition and analysis for electron tomography. J. Struct. Biol. 149, 227–234. Ortiz, J.O., Brandt, F., Matias, V.R., Sennels, L., Rappsilber, J., Scheres, S.H., Eibauer, M., Hartl, F.U., Baumeister, W., 2010. Structure of hibernating ribosomes studied by cryoelectron tomography in vitro and in situ. J. Cell Biol. 190, 613–621. Pacheco, P.S., 1997. Parallel programming with MPI. Morgan Kaufmann Publishers, San Francisco, CA. Palenstijn, W.J., Batenburg, K.J., Sijbers, J., 2011. Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs). J. Struct. Biol. 176, 250–253. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., Ferrin, T.E., 2004. UCSF Chimera – a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612. Rath, B.K., Hegerl, R., Leith, A., Shaikh, T.R., Wagenknecht, T., Frank, J., 2003. Fast 3D motif search of EM density maps using a locally normalized cross-correlation function. J. Struct. Biol. 144, 95–103. Renken, C., Hsieh, C.E., Marko, M., Rath, B., Leith, A., Wagenknecht, T., Frank, J., Mannella, C.A., 2009. Structure of frozen-hydrated triad junctions: a case study in motif searching inside tomograms. J. Struct. Biol. 165, 53–63. Roseman, A.M., 2003. Particle finding in electron micrographs using a fast local correlation algorithm. Ultramicroscopy 94, 225–236. Scheres, S.H., Melero, R., Valle, M., Carazo, J.M., 2009. Averaging of electron subtomograms and random conical tilt reconstructions through likelihood optimization. Structure 17, 1563–1572.
151
Schmeisser, M., Heisen, B.C., Luettich, M., Busche, B., Hauer, F., Koske, T., Knauber, K.H., Stark, H., 2009. Parallel, distributed and GPU computing technologies in single-particle electron microscopy. Acta Crystallogr. D Biol. Crystallogr. 65, 659–671. Schmid, M.F., 2011. Single-particle electron cryotomography (cryoET). Adv. Protein Chem. Struct. Biol. 82, 37–65. Schmid, M.F., Booth, C.R., 2008. Methods for aligning and for averaging 3D volumes with missing data. J. Struct. Biol. 161, 243–248. Stolken, M., Beck, F., Haller, T., Hegerl, R., Gutsche, I., Carazo, J.M., Baumeister, W., Scheres, S.H., Nickell, S., 2011. Maximum likelihood based classification of electron tomographic data. J. Struct. Biol. 173, 77–85. Tagare, H.D., Barthel, A., Sigworth, F.J., 2010. An adaptive Expectation-Maximization algorithm with GPU implementation for electron cryomicroscopy. J. Struct. Biol. 171, 256–265. Tang, G., Peng, L., Baldwin, P.R., Mann, D.S., Jiang, W., Rees, I., Ludtke, S.J., 2007. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157, 38–46. Ward, J.H., 1963. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236. Winkler, H., 2007. 3D reconstruction and processing of volumetric data in cryoelectron tomography. J. Struct. Biol. 157, 126–137. Winkler, H., Zhu, P., Liu, J., Ye, F., Roux, K.H., Taylor, K.A., 2009. Tomographic subvolume alignment and subvolume classification applied to myosin V and SIV envelope spikes. J. Struct. Biol. 165, 64–77. Xiong, Q., Morphew, M.K., Schwartz, C.L., Hoenger, A.H., Mastronarde, D.N., 2009. CTF determination and correction for low dose tomographic tilt series. J. Struct. Biol. 168, 378–387. Xu, M., Thulasiraman, P., 2011. Mapping iterative medical imaging algorithm on cell accelerator. Int. J. Biomed. Imaging 2011, 843924. Xu, W., Xu, F., Jones, M., Keszthelyi, B., Sedat, J., Agard, D., Mueller, K., 2010. Highperformance iterative electron tomography reconstruction with long-object compensation using graphics processing units (GPUs). J. Struct. Biol. 171, 142–153. Zhang, X., Zhou, Z.H., 2010. Low cost, high performance GPU computing solution for atomic resolution cryoEM single-particle reconstruction. J. Struct. Biol. 172, 400–406. Zheng, S.Q., Branlund, E., Kesthelyi, B., Braunfeld, M.B., Cheng, Y., Sedat, J.W., Agard, D.A., 2011. A distributed multi-GPU system for high speed electron microscopic tomographic reconstruction. Ultramicroscopy 111, 1137–1143.