COMMUNICATION
TO THE
EDITOR
IsoDesign: A Software for Optimizing the Design of 13C-Metabolic Flux Analysis Experiments Pierre Millard,1–3 Serguei Sokol,1–3 Fabien Letisse,1–3 Jean-Charles Portais1–3 1
MetaSys, Universite´ de Toulouse; INSA, UPS, INP, LISBP, 135 Avenue de Rangueil, F-31077, Toulouse, France; telephone: þ33 (0) 561 559 689; fax: þ33 (0) 561 559 689; e-mail:
[email protected] 2 INRA, UMR792, Inge´nierie des Syste`mes Biologiques et des Proce´de´s, F-31400, Toulouse, France 3 CNRS, UMR5504, F-31400, Toulouse, France ABSTRACT: The growing demand for C-metabolic flux analysis (13C-MFA) in the field of metabolic engineering and systems biology is driving the need to rationalize expensive and time-consuming 13C-labeling experiments. Experimental design is a key step in improving both the number of fluxes that can be calculated from a set of isotopic data and the precision of flux values. We present IsoDesign, a software that enables these parameters to be maximized by optimizing the isotopic composition of the label input. It can be applied to 13 C-MFA investigations using a broad panel of analytical tools (MS, MS/MS, 1H-NMR, 13C-NMR, etc.) individually or in combination. It includes a visualization module to intuitively select the optimal label input depending on the biological question to be addressed. Applications of IsoDesign are described, with an example of the entire 13C-MFA workflow from the experimental design to the flux map including important practical considerations. IsoDesign makes the experimental design of 13C-MFA experiments more accessible to a wider biological community. IsoDesign is distributed under an open source license at http://metasys.insa-toulouse. fr/software/isodes/ Biotechnol. Bioeng. 2014;111: 202–208. ß 2013 Wiley Periodicals, Inc. KEYWORDS: 13C-metabolic flux analysis; experimental design; MS; NMR; isotope labeling experiments 13
Stationary 13C-Metabolic flux analysis (13C-MFA) is used to quantify metabolic fluxes, that is, the actual rates of biochemical reactions under given physiological conditions. Conflict of Interest: none declared. Correspondence to: J.-C. Portais Contract grant sponsor: INRA (Institut National de la Recherche Agronomique) [Program CJS] Contract grant sponsor: French National Research Agency project ALGOMICS Contract grant number: ANR-08-BIOE-002 Received 15 March 2013; Revision received 14 June 2013; Accepted 3 July 2013 Accepted manuscript online 24 July 2013; Article first published online 18 September 2013 in Wiley Online Library (http://onlinelibrary.wiley.com/doi/10.1002/bit.24997/abstract). DOI 10.1002/bit.24997
202
Biotechnology and Bioengineering, Vol. 111, No. 1, January, 2014
Measuring metabolic fluxes is crucial for a thorough understanding of metabolism and its regulation. In the last decade, 13C-MFA has been increasingly used to provide novel biological insights into systems biology (Ishii et al., 2007; Nicolas et al., 2007) and to improve industrial biotechnology processes (Iwatani et al., 2008). In practice, 13C-MFA relies on complex, expensive, and time-consuming investigations requiring the combination of 13C-labeling experiments and mathematical tools to calculate fluxes from the labeling patterns of metabolites measured by mass spectrometry (MS) and/or nuclear magnetic resonance (NMR) spectrometry (Wiechert et al., 2001). The design step plays a key role in maximizing information on the fluxes with minimal costs and efforts depending on the biological purpose of the study. The step consists in optimizing the isotopic composition of the label input in the 13C-labeling experiments to maximize the number of fluxes that can be calculated from a set of measurable isotopic data, to increase the precision of flux values, or to maximize both parameters at the same time. The design of 13C-labeling experiments must be optimized according to the isotopic data that can be made available. There may be significant differences in the optimal label input depending on the metabolites that are analyzed, and primarily on the analytical tool which is available, since NMR and MS provide different types of isotopic information (e.g., positional vs. non-positional information). Various methods (here we mean the theoretical principles enabling the optimization process) have recently been developed to determine the optimal label input for both MS-based (Chen et al., 2007; Crown and Antoniewicz, 2012; Crown et al., 2012; Metallo et al., 2009; Walther et al., 2012) and NMR-based (alone or combined with MS) 13C-MFA experiments (Libourel et al., 2007; Mollney et al., 1999; Weitzel et al., 2013). Although these theoretical approaches are of great interest for the widespread use of 13C-MFA, only a few tools—that is, software implementing some of these design approaches—are available to date, namely OpenFlux (Quek et al., 2009), Metran (Yoo et al., 2008), 13C-Flux ß 2013 Wiley Periodicals, Inc.
(Wiechert et al., 2001), and its recent update 13C-Flux2 (Weitzel et al., 2013). Current tools vary considerably in terms of the number and nature of optimization criteria, data visualization, and user-friendliness (see Supplementary Table I). Briefly, OpenFlux and Metran allow the design of MS-based 13C-MFA experiments via a graphic user interface. 13C-Flux is a major tool for 13C-MFA in general, and has established some key formalisms in this field, including the FBTL format used in this work. It is a command-line tool able to deal with both MS and NMR data. Regarding the experimental design, 13C-Flux2 allows optimization of label inputs according to a range of optimality criteria computed from linear statistics, such as A-, D-, E-, or M-optimality criteria, thereby optimizing the label input when an overall picture of metabolism is sought. Here, we present IsoDesign, a software for the experimental design of 13C-MFA, whose originality lies in the fact that it is generic and flexible with regard to both manageable isotopic data and optimization criteria, as well as being user-friendly and open source (see Supplementary Table I). IsoDesign includes the following features: - It can determine the optimal composition of label inputs containing up to six different isotopic forms. - All kinds of isotopic data can be included (MS, MS/MS, 1 H-NMR, 13C-NMR, etc.), individually or in combination, to cover the current and future panel of analytical tools dedicated to quantitative isotopic analysis. - Depending on the biological question, for example, if an overall picture of metabolism is required or only
particular fluxes are of interest, optimal label inputs are not the same. Various scoring criteria can be defined by the user to compare inputs according to the specific biological question to be addressed. - The non-linearity of metabolic systems studied using 13 C-MFA is taken into account in the design process by applying both linear and non-linear statistical methods, thereby increasing the accuracy and robustness of the scores calculated for each input. - The user can define lower and upper bounds for each isotopic form to be included in the label input. This helps limit costs (e.g., by fixing upper boundaries for the proportions of expensive isotopic forms). - The output of IsoDesign is a sensitivity landscape which displays in an intuitive way the flux information of interest provided by the different label inputs. This makes the choice of the optimal input easier for the investigator. General Strategy. The general strategy for selecting optimal label input in a 13C-MFA experiment is presented in Figure 1. It consists in calculating the sensitivity of flux values for an assumed set of fluxes with respect to the composition of the label input. Briefly, a series of models of the same metabolic network with different label inputs are generated. The labeling patterns of metabolites that can be measured experimentally are simulated for each model, and a set of statistical tools is applied to the complete set of simulated data to determine the precision (i.e., the standard deviation) of each flux. Labeling simulations and statistics are calculated by the software influx_s (Sokol et al., 2012) distributed with
Figure 1. Workflow implemented in IsoDesign to determine the optimal label input in a 13C-MFA experiment. Blue, orange, and green boxes detail steps performed by the calculation module, the visualization module and via a spreadsheet program, respectively.
Millard et al.: IsoDesign: Experimental Design for
13
C-MFA
Biotechnology and Bioengineering
203 203
IsoDesign. The results of these simulations are used to calculate a score for each label input according to either a single criterion or to multiple criteria defined by the investigator. The optimal input is selected visually based on this score. The IsoDesign workflow was implemented in two distinct modules dedicated to calculation and data visualization, respectively, developed in Python programming language (http://python.org) (Fig. 2). IsoDesign was tested both under Linux (Ubuntu 12 and Mandriva, 2011) and Windows (XP and 7). First, a brief overview of the software and its features is presented in the following section headed Input data,
Figure 2.
204
Estimation of flux sensitivities, and in the section headed Selection of the optimal input. We then describe some applications of IsoDesign, including important practical considerations. Complete user instructions for IsoDesign are given in the tutorial provided with the software. Input data. The tracer evaluation process requires three inputs: - The topology of the metabolic network to be investigated, including the set of reactions and the carbon atom transitions of each reaction. - The assumed flux distribution in the physiological conditions to be examined. This can be obtained from
Screenshot of the calculation (A) and visualization (B) modules of IsoDesign.
Biotechnology and Bioengineering, Vol. 111, No. 1, January, 2014
the literature for similar conditions, strains or organisms, or computed with appropriate tools such as flux balance analysis. Even if an assumption of fluxes looks restrictive, in practice a label input which is optimal for one flux set remains optimal for a wide range of flux sets (Mollney et al., 1999). - The nature of the labeling data and their precision, which can be measured experimentally. Since the optimal label input can vary depending on the nature of the isotopic data measured, the investigator needs to define which measurements are available to him/her. This information must be provided in an input text file, using the FTBL format developed by Wiechert (2001) (for details, please refer to the tutorial). Various FTBLs are available to the community and can be used as a scaffold to adapt the network to the purpose of the study concerned, thereby saving a lot of time compared to starting from scratch. For new networks to be investigated, the most tedious part of model construction is mapping carbon atoms from substrates into products. Tools such as KEGG RPAIR (Kanehisa et al., 2006) or the ARM project (Arita, 2003) can assist the experimenter in the identification of carbon transitions in metabolic networks. To design labeling experiments for eukaryotic organisms, where the metabolic reactions are compartmentalized, an appropriate FTBL file defining compartments must be provided. Estimation of Flux Sensitivities. After running the calculation module and loading the FTBL file, the user can adjust parameters independently for each isotopic form of the compound to be mixed in the label input. IsoDesign generates a list of label inputs and assembles information concerning the precision of each flux for each input calculated by influx_s (Sokol et al., 2012). This software uses detailed isotopomer balancing—including both net and exchange fluxes—to describe the distribution of the 13C label within biochemical networks and implements non-linear statistical methods to estimate flux precision, thereby addressing the non-linearity problems of 13C-MFA. Influx_s options can be adapted by the user. Thus, IsoDesign benefits from all influx_s features, including the highly stable, accurate, and rapid NLSIC algorithm. Two methods of calculation are available to determine the precision of individual flux values: (i) a linear method, based on covariance between labeling data and fluxes, which is fast (from minutes to hours depending on the number of label inputs and on the metabolic network to investigate) but provides overestimated confidence intervals on the fluxes (Antoniewicz et al., 2006), and (ii) a Monte Carlo method, which gives accurate confidence intervals (i.e., close to true flux uncertainty) (Antoniewicz et al., 2006; Quek et al., 2009) but requires more computational time (from hours to day). However, both Monte-Carlo and linear methods should provide the same optimal label input when an overall picture of the metabolism is sought (Mollney et al., 1999). After calculation, for each label input IsoDesign generates an output file containing (i) the standard deviation of each flux
that can be calculated and (ii) the list of fluxes that cannot be estimated from the experimental conditions defined. This plain text file can be easily edited and is the input for the visualization module. Selection of the Optimal Input. After running the visualization module, the user can load the output file generated by the calculation module and select a criterion that is used to calculate a score for each label input. Available scoring criteria include (i) the number of fluxes identified (i.e., fluxes that can be estimated), (ii) the sum of the standard deviations of the fluxes, and the number of fluxes whose precision is (iii) higher or (iv) lower than a threshold defined by the user. A valuable original feature of IsoDesign is that the score can be calculated for the entire set of fluxes, or for a selected subset of—net and/or exchange—fluxes. The choice of the criteria and of the (set of) fluxes for which the score is calculated makes it possible to finely optimize the label input with respect to the specific biological question to be addressed (e.g., to maximize precision on a pathway of interest or to resolve the maximum number of fluxes). The “Optimum” button displays inputs with the highest and lowest scores. Inputs can also be compared visually using this module (option available for input mixtures containing up to four isotopic forms). The output is a 2D- or 3D-sensitivity landscape, that is, a plot of the score value as a function of the isotopic composition of the label input, from which the user can easily identify the optimal input. When the label input has been optimized for a particular target flux—or subset of fluxes—, the user is encouraged to check if this input gives also proper results for the overall metabolic picture. Example of Application. To illustrate the simplicity but robustness of IsoDesign, it was used to design a 13C-MFA experiment to determine the flux distribution in Escherichia coli K-12 MG1655 grown on acetate, a C2-carbon source for which the four existing isotopic forms are all commercially available. The input file was created based on (i) the literature for assumptions concerning the metabolic network and the flux distribution and (ii) experimental (metabolomics) data to identify the isotopic measurements that are available to us using our LC–MS/MS system (for details, see the Materials and Methods Section). IsoDesign was run varying the proportions of each of the four isotopic forms of acetate in the input from 0% (no isotopic form) to 100% (pure isotopic form). More formally, we evaluated all the mixtures containing a%(12C-acetate) þ b%(1-13C-acetate) þ c%(2-13C-acetate) þ d%(U-13C-acetate), where a%, b%, c%, and d% were varied from 0% to 100% in 10% increments, with the sum a% þ b% þ c% þ d% being constant (100%). A total of 286 isotopic mixtures were investigated. For each input, the precision of the fluxes was calculated using the Monte Carlo method (100 iterations). In total, 28,600 flux fittings were performed in 13 h on an 8-core computer. Figure 3A–G show examples of sensitivity landscapes generated by IsoDesign using different scoring parameters. Comparing these sensitivity landscapes revealed that both the number of identifiable fluxes (Fig. 3A) and the precision of
Millard et al.: IsoDesign: Experimental Design for
13
C-MFA
Biotechnology and Bioengineering
205 205
Figure 3.
Sensitivity landscapes generated by IsoDesign, where each square represents the score obtained for a single label input. Standard deviations (SDs) are in logarithmic scale. P(#00), P(#10), P(#01), and P(#11) refer to the relative proportions of 12C-, 1-13C-, 2-13C-, and U-13C-acetate in the label input, respectively. These sensitivity landscapes display results for a subset (66) of all the label inputs evaluated (286), where P(#10) was set to 0 for visualization purposes, thus P(#00) ¼ 1-P(#01)-P(#11) for each input. Considering 50 MS isotopic data from 9 metabolites: (A) number of unidentifiable fluxes, (B) number of fluxes with a SD < 0.05, (C) sum of SDs of net fluxes, sum of SDs of fluxes of particular pathways—(D) gluconeogenesis, (E) TCA cycle, (F) pentose phosphate pathway—, (G) SD on the pps flux. H: Sensitivity landscape for the PEP synthase (pps) flux considering 12 additional MS isotopic data from 3 metabolites. I: Sum of SDs on net fluxes calculated from 92 isotopic data that can be obtained with a 2D-HSQC NMR experiment on proteinogenic amino acids. These landscapes contain the optimal mixture identified by IsoDesign (using the Optimum button) from all the label inputs evaluated, and are representative of the overall results.
the entire set of fluxes (Fig. 3B–D) were maximum when the label input was pure 2-13C-acetate. This label input also gave the best precision on fluxes through gluconeogenic (Fig. 3D) and TCA cycle (Fig. 3E) reactions. In contrast, if the investigator is interested in maximizing the precision on fluxes through the pentose phosphate pathway in particular, the optimal input consists in a mixture of 60% of 2-13Cacetate and 40% of U-13C-acetate (Fig. 3F). To assess the consistency of flux precisions predicted by IsoDesign using experimental data, we performed a 13C-MFA experiment using the optimal label input identified by IsoDesign (for full details on the materials and methods
206
Biotechnology and Bioengineering, Vol. 111, No. 1, January, 2014
please refer to supplementary material). The flux distribution obtained (Fig. 4A) is in agreement with previously published results (Zhao and Shimizu, 2003). If measured fluxes differ significantly from assumed ones that could in turn invalidate precision estimations. In this case, the experimenter should run a new simulation with more realistic fluxes (taken from the first 13C-MFA experiment) to determine more optimal label input for a new 13C-MFA experiment. Fortunately, this iterative process rapidly converges. In our case, assumed fluxes were not too far from real fluxes in a single iteration. The experimental flux precisions were consistently in agreement with IsoDesign predictions (Fig. 4B). This
Figure 4. A: Experimental (net) flux distribution in the central carbon metabolism of E. coli K-12 MG1655 grown on acetate, obtained using the optimal label input identified using IsoDesign (pure 2-13C-acetate). B: Comparison between predicted (by IsoDesign) and experimentally observed standard deviations. example provides step-by-step illustration of how to use IsoDesign to efficiently set up a 13C-MFA experiment. Besides the optimal design of label inputs IsoDesign can be also use to optimize other aspects of 13C-MFA experiments, for instance to evaluate the potential benefit of additional isotopic measurements. One example is provided for acetategrown cells, for which sensitivity landscapes were calculated for a particular flux (PEP synthase, pps) according to the MS dataset considered above without (Fig. 3G) or with (Fig. 3H) the addition of MS data for pyruvate, oxalo-acetate, and glyoxylate. The data showed that the sensitivity landscape is substantially modified when the additional data are included, with a significant increase in the number of label inputs that give the precise pps flux. Thus IsoDesign make it possible to improve the design of the label input when biologically important fluxes are not initially well determined, and to make decision before engaging novel—and costly—methodological developments to access new isotopic data.
IsoDesign can also be used to compare information on fluxes provided by different analytical tools dedicated to isotopic measurements (e.g., LC–MS/MS vs. NMR analyses). An example is shown in Figure 3I for acetate-grown cells. This sensitivity landscape displays the sum of SDs of net fluxes when isotopic data were measured on proteinogenic amino acids by 2D-HSQC NMR (Massou et al., 2007). In contrast to results obtained for MS data, here the optimal label input was not a pure isotopic form but a mixture of 40% U-13C-acetate, 30% 2-13C-acetate, and 30% 12C-acetate. This type of simulation is of special value to select the most relevant analytical platform(s) to be used to collect appropriate isotopic data.
Materials and Methods To design the 13C-MFA experiment to determine the intracellular flux distribution in E. coli K-12 MG1655 grown on acetate, the topology of the metabolic network and the
Millard et al.: IsoDesign: Experimental Design for
13
C-MFA
Biotechnology and Bioengineering
207 207
assumed flux distribution were taken from the literature (Zhao and Shimizu, 2003). The isotopic data used for the simulations were the isotopologue abundances of intracellular metabolites that can be measured using the sensitive LC–MS/MS of (Kiefer et al., 2007). The actual intracellular metabolite contents were checked by LC–MS/ MS analysis of samples collected from E. coli cells grown on unlabeled acetate. In total, a dataset of 50 isotopic measurements from nine central metabolites (glucose-6-phosphate, fructose-6-phosphate, ribose-5-phosphate, sedoheptulose-7phosphate, phosphoenolpyruvate, combined pools of 2- and 3-phophoglycerate, malate, citrate, and succinate) was used in this design process, and their precision was set to 1% (Ruhl et al., 2012). All these data are contained in the sample input file provided with IsoDesign. IsoDesign was run with the parameters detailed in the main section of this manuscript. Full details on the materials and methods relative to the 13 C-MFA experiment and flux calculations are provided in Supplementary Material. The authors are grateful to the members of the MetaSys team (LISBP, Toulouse, France) for fruitful discussions. The PhD fellowship of P.M. was supported by INRA (Institut National de la Recherche Agronomique) [Program CJS]. This work was supported by the French National Research Agency project ALGOMICS (ANR-08-BIOE-002).
References Antoniewicz MR, Kelleher JK, Stephanopoulos G. 2006. Determination of confidence intervals of metabolic fluxes estimated from stable isotope measurements. Metab Eng 8(4):324–337. Arita M. 2003. In silico atomic tracing by substrate-product relationships in Escherichia coli intermediary metabolism. Genome Res 13(11): 2455–2466. Chen J, Zheng H, Liu H, Niu J, Liu J, Shen T, Rui B, Shi Y. 2007. Improving metabolic flux estimation via evolutionary optimization for convex solution space. Bioinformatics 23(9):1115–1123. Crown SB, Antoniewicz MR. 2012. Selection of tracers for 13C-metabolic flux analysis using elementary metabolite units (EMU) basis vector methodology. Metab Eng 14(2):150–161. Crown SB, Ahn WS, Antoniewicz MR. 2012. Rational design of 13C-labeling experiments for metabolic flux analysis in mammalian cells. BMC Syst Biol 6(1):43. Ishii N, Nakahigashi K, Baba T, Robert M, Soga T, Kanai A, Hirasawa T, Naba M, Hirai K, Hoque A, Ho PY, Kakazu Y, Sugawara K, Igarashi S, Harada S, Masuda T, Sugiyama N, Togashi T, Hasegawa M, Takai Y, Yugi K, Arakawa K, Iwata N, Toya Y, Nakayama Y, Nishioka T, Shimizu K, Mori H, Tomita M. 2007. Multiple high-throughput analyses monitor the response of E. coli to perturbations. Science 316(5824):593–597. Iwatani S, Yamada Y, Usuda Y. 2008. Metabolic flux analysis in biotechnology processes. Biotechnol Lett 30(5):791–799. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. 2006. From genomics to chemical genomics: New developments in KEGG. Nucleic Acids Res 34(Database issue):D354–D357.
208
Biotechnology and Bioengineering, Vol. 111, No. 1, January, 2014
Kiefer P, Nicolas C, Letisse F, Portais JC. 2007. Determination of carbon labeling distribution of intracellular metabolites from single fragment ions by ion chromatography tandem mass spectrometry. Anal Biochem 360(2):182–188. Libourel IG, Gehan JP, Shachar-Hill Y. 2007. Design of substrate label for steady state flux measurements in plant systems using the metabolic network of Brassica napus embryos. Phytochemistry 68(16–18):2211– 2221. Massou S, Nicolas C, Letisse F, Portais JC. 2007. NMR-based fluxomics: Quantitative 2D NMR methods for isotopomers analysis. Phytochemistry 68(16–18):2330–2340. Metallo CM, Walther JL, Stephanopoulos G. 2009. Evaluation of 13C isotopic tracers for metabolic flux analysis in mammalian cells. J Biotechnol 144(3):167–174. Mollney M, Wiechert W, Kownatzki D, de Graaf AA. 1999. Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol Bioeng 66(2):86–103. Nicolas C, Kiefer P, Letisse F, Kromer J, Massou S, Soucaille P, Wittmann C, Lindley ND, Portais JC. 2007. Response of the central metabolism of Escherichia coli to modified expression of the gene encoding the glucose-6-phosphate dehydrogenase. FEBS Lett 581(20):3771–3776. Quek LE, Wittmann C, Nielsen LK, Kromer JO. 2009. OpenFLUX: Efficient modelling software for 13C-based metabolic flux analysis. Microb Cell Fact 8:25. Ruhl M, Rupp B, Noh K, Wiechert W, Sauer U, Zamboni N. 2012. Collisional fragmentation of central carbon metabolites in LC-MS/MS increases precision of (1)(3)C metabolic flux analysis. Biotechnol Bioeng 109(3):763–771. Sokol S, Millard P, Portais JC. 2012. Influx_s: Increasing numerical stability and precision for metabolic flux analysis in isotope labeling experiments. Bioinformatics 28(5):687–693. Walther JL, Metallo CM, Zhang J, Stephanopoulos G. 2012. Optimization of 13C isotopic tracers for metabolic flux analysis in mammalian cells. Metab Eng 14(2):162–171. Weitzel M, Noh K, Dalman T, Niedenfuhr S, Stute B, Wiechert W. 2013. 13CFLUX2 high-performance software suite for (13)C-metabolic flux analysis. Bioinformatics 29(1):143–145. Wiechert W. 2001. 13C metabolic flux analysis. Metab Eng 3(3):195–206. Wiechert W, Mollney M, Petersen S, de Graaf AA. 2001. A universal framework for 13C metabolic flux analysis. Metab Eng 3(3):265–283. Yoo H, Antoniewicz MR, Stephanopoulos G, Kelleher JK. 2008. Quantifying reductive carboxylation flux of glutamine to lipid in a brown adipocyte cell line. J Biol Chem 283(30):20621–20627. Zhao J, Shimizu K. 2003. Metabolic flux analysis of Escherichia coli K12 grown on 13C-labeled acetate and glucose using GC-MS and powerful flux calculation method. J Biotechnol 101(2):101–117.
Supporting Information Additional supporting information may be found in the online version of this article at the publisher’s web-site. Table SI. Overview on the main features of publicly available software for experimental design of 13C-MFA experiments. Table SII. Metabolic reactions and experimental flux values and standard deviations in acetate-growing E. coli cells.