This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE/ACM TRANSACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
1
Fast computation of minimal cut sets in metabolic networks with a Berge algorithm that utilizes binary bit pattern trees Christian Jungreuthmayer, Marie Beurton-Aimar, Jurgen Zanghellini ¨ Abstract—Minimal cut sets are a valuable tool for analyzing metabolic networks and for identifying optimal gene intervention strategies by eliminating unwanted metabolic functions and keeping desired functionality. Minimal cut sets rely on the concept of elementary flux modes which are sets of indivisible metabolic pathways under steady state condition. However, the computation of minimal cut sets is non-trivial, as even medium sized metabolic networks with just 100 reactions easily have several hundred million elementary flux modes. We developed a minimal cut set tool that implements the well known Berge algorithm and utilizes a novel approach to significantly reduce the program run time by using binary bit pattern trees. By using the introduced tree approach the size of metabolic models that can be analyzed and optimized by minimal cut sets is pushed to new and considerably higher limits. Index Terms—Elementary mode analysis, minimal cut sets, gene knockout, bit pattern, tree code
!
1
I NTRODUCTION
Elementary flux mode (EFM) analysis is a well established method to unbiasedly decompose (metabolic) networks into unique, indecomposable steady-state pathways [1], [2]. The indecomposability of EFMs implies that the deletion of any single reaction contributing to an EFM completely disables it. A cut set (CS) is defined as a set of reaction deletions that eliminates all target EFMs to be killed. A CS is called a minimal cut set (MCS) if none of its subsets is a CS [3]. MCSs are of special interest for metabolic engineering, as they require the least effort when biologically implemented. It has been shown that CSs are especially useful for two types of tasks: (i) The calculation of all MCSs of an unconstrained system which means that every determined MCS kills all EFMs of the investigated systems. Numerous applications of this method have been suggested, e.g. evaluating structural robustness and fragility, identifying targets in rational drug design, and predicting phenotypes [4]. (ii) EFMs can be utilized to split the functional units of a metabolic system into two groups: (a) desired functions and (b) unwanted functions [5]. A constrained cut set (cCS) is defined as a set of reaction deletions that eliminates all unwanted EFMs and keeps - at least some of - the desired EFMs. Such a strategy can be used to optimize microbiological production hosts to efficiently produce substances of interest, e.g. ethanol [6]. This is achieved by assigning all EFMs that are involved in C. Jungreuthmayer and J. Zanghellini are with the the Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria, and with the Metabolic Modeling Group, Austrian Centre of Industrial Biotechnology (ACIB), Muthgasse 11/DG, 1190 Vienna, Austria, e-mail:
[email protected] M. Beurton-Aimar is with the Laboratoire Bordelais de Recherche en Informatique, University of Bordeaux, Talence, France, EU Manuscript received April 4, 2013.
Digital Object Indentifier 10.1109/TCBB.2013.116
efficient ethanol production to the set of desired EFMs. All other EFMs are assigned to the set of unwanted EFMs. Consequently, the biological implementation of the computed MCSs of such a system results in an optimized microorganism. Therefore, MCSs are a valuable tool to identify and realize optimal gene intervention strategies, as has been shown by Trinh et al. [7]. The MCS computation of small metabolic networks is simple and computationally not demanding. However, the computation of MCSs suffers from a major disadvantage. The number of EFMs of metabolic networks grows combinatorially with the systems size [8]. Even medium sized networks with approximately 100 reactions can easily have several hundred million EFMs. The huge number of EFMs makes the computation of MCSs a challenge and results in very long run times that do not allow the use of MCSs for medium scale or large scale metabolic models. Recently, several methods have been developed to compute efficiently MCSs using hitting set algorithms [4], [9], [3], [5], [10] and binary linear programming [11]. The performance evaluation of MCS computation algorithms is non-trivial and currently no methods are known that are output-polynomial [10]. In the present article we introduce an improved version of the hitting set approach that was originally reported by Haus et al. [3]. The approach of Haus et al. significantly increased the performance of MCS programs by utilizing Berge’s algorithm [12]. The Berge algorithm mainly processes three data sets: (i) the set of all EFMs to be killed, (ii) the set of preminimal cut sets (preMCS) which kill all EFMs that have already been processed, and (iii) the new preMCS candidates. The Berge algorithm executes two nested loops (see Figure 1). The outer loop iterates through all EFMs that have to be killed. The inner loop runs over all
1545-5963/13/$31.00 © 2013 IEEE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE/ACM TRANSACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
2
didates that are supersets of already computed preMCS [3]. Consequently, a superset test is required to guarantee that all CSs are MCSs. Using a linear search approach to find subsets in the set of existing preMCS results in a search time of O(n2 ). Various other subset search approaches that scale better than the linear approach have been reported, e.g. in [13] and [14] methods were reported that can identify subsets in time O(n2 / log n). In order to reduce the runtime spent in the subset search of Berge’s algorithm we implemented a binary bit pattern tree approach [15] that can be run in parallel on multiple CPU cores. The concept of binary bit pattern trees is not new, but we are not aware that they have been used in the context of MCS computation. By using our tree approach the size of metabolic models that can be analyzed and genetically optimized by MCSs is pushed to new and significantly higher limits. Besides the importance of MCSs in metabolic engineering, the computation of MCSs is also an essential problem in discrete mathematics with many applications in computer science, artificial intelligence and game theory [16]. These fields may also benefit from the idea of using binary bit pattern trees to speed up the computation of MCSs. In the present study we explain the main concept of our tree method and compare the performance of the tree method with a linear search algorithm.
2
Fig. 1. Flow chart of the essential parts of the Berge algorithm.
existing preMCSs. Inside the inner loop it is tested if the current preMCS kills the EFM that is being analyzed. If the EFM is killed then the preMCS is valid and the next preMCS is tested. However, if the EFM is not eliminated by the tested preMCS then the preMCS is invalid. In this case the preMCS is used to generate new preMCS candidates and, then, it is removed from the set of preMCSs. New candidates are simply created by taking the invalid preMCS and by adding a single element for each reaction that the processed EFM carries a flux. E.g. if an insufficient preMCS contains two reactions ({R1,R7}) and the EFM that caused the preMCS to fail the test has four flux-carrying reactions ({R4,R6,R9,R11}), four new candidates are created with three reactions each ({R1,R7,R4},{R1,R7,R6},{R1,R7,R9},{R1,R7,R11}). After the last EFM has been processed all remaining preMCSs are MCSs that kill all EFMs that have to be deleted. However, the Berge algorithm suffers from a bottleneck which is caused by the need to remove preMCS can-
M ETHODS
Our MCS calculator utilizes the Berge algorithm [12] as demonstrated by [3]. In Berge’s algorithm the majority of the computation time is spent on the procedure that checks whether or not a new MCS candidate is a superset of an already determined preMCS [3] (see Table 2). If this is the case then the new preMCS candidate is dismissed. Otherwise, the candidate is added to the list of preMCS. A simple approach to perform this superset test is to sequentially access each preMCS of the complete set of existing preMCS and to compare it with the new preMCS candidate. As soon as a subset is found for a new candidate the search procedure is stopped and the next candidate is tested. However, this approach scales badly with an increasing number of preMCS. In particular, this is true if there do not exist any subsets for a candidate, in which case the complete set of existing preMCSs is examined. Note that the number of preMCSs can be huge and, in general, is much larger than the final number of computed MCSs (see Figure 2 which shows the number of preMCSs as a function of processed EFMs for the presented benchmark system). Typically, CSs are expressed in form of bit patterns, where ’1’ stands for a reaction that is knocked out by the CS and ’0’ means that the corresponding reactions is not affected. A new candidate is a superset of an existing preMCS, if the result of a boolean AND-operation of the candidate and the preMCS is equal to the preMCS itself (see Figure 1). This implies that a candidate is not a
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE/ACM TRANSACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
3
500
011101 100101
5
000000
bit 3: −−*−−−
400
Number of preMCS [x10^3]
4
300 3
200
2
100
Number of preMCS / number of final MCS
&
001000
bit 4: −−−*−−
000100
bit 4: −−−*−−
&
001001
001110
bit 2: −*−−−−
000100
bit 2: −*−−−−
&
&
1
011001 0 0
20000
40000 60000 80000 Number of processed EFMs
100000
0 120000
Fig. 2. Number of preMCS over number of processed EFMs for the system used throughout this article. TABLE 1 Implementation of the superset test utilizing a boolean AND-operation superset
no superset
candidate
100110
100110
pre−cutset
000110
010010
AND
000110
000010
not equal
superset if the bit pattern of a preMCS contains a ’1’ at a bit position where the candidate is ’0’. This fact can be used to speed up the superset check of the Berge algorithm by organizing the set of existing preMCSs in a favorable way as a binary tree and by traversing through this tree in an intelligent fashion. Thereby, a significant amount of superset tests can be avoided that would have to be performed in the case of a linear search. As shown in Figure 3, our implementation of the Berge algorithm uses a binary tree to structure the preMCSs. A leaf node of the tree is always the bit pattern of a preMCS and is represented by rectangles with rounded corners. Whether a preMCS is attached left or right to its parent node is determined by the bit value of the preMCS at a certain bit position. In Figure 3 a preMCS is attached to the left side if the bit at the inspected bit position is set and to the right side if it is not set. E.g. the bit position of the root node is 3. Consequently, all preMCSs with a ’1’ at position 3 are attached to the subtree at the left side of the root node, whereas all preMCSs on the right side of the root node are ’0’ at bit position 3. The bit position that is used at each level of the tree (e.g. in Figure 3 at level 0 bit number 3, at level 1 bit number 4, at level 2 bit number 2, and so on) is determined during a pre-processing step before the Berge algorithm is started and has a significant influence on the runtime of the program. The bit pattern value of a non-leaf node is created by boolean AND-operations of the bit patterns of the node’s subtrees. This AND-operation has the effect that ’1’s that are set in all sub-nodes are propagated from
001011
010100
000111
Fig. 3. Binary bit pattern tree of all existing preMCS. Leaf nodes containing bit patterns of preMCS are represented by rectangles with rounded corners. The bit patterns of the non-leaf nodes are built by boolean AND-operations of the node’s subtrees.
the bottom of the tree (leaf nodes) to the top (root node). If a candidate is checked, the superset test is started on the root node. The left subtree of a node is always investigated first. If no subset could be found in the left subtree, the right subtree is explored. However, the left side of a subtree is only entered if the tested candidate does not contain a ’0’ where all sub-nodes and, hence, all preMCS have a ’1’. Such a case is shown in Figure 3, where the candidate ’100101’ is tested (blue-colored dash-dotted line). As the bit pattern of the root nodes (’000000’) does not contain any ’1’s, the left subtree of the root node is entered. The bit pattern of the left child node (’001000’) indicates that in all nodes and, hence, in all preMCSs attached to it the bit at position 3 is set. As bit number 3 of the tested candidate ’100101’ is not set, none of the attached preMCSs can be a subset of the candidate and this subtree is not further investigated. Therefore, the right subtree of the root node is entered next. In this subtree all leaf nodes are accessed to find a subset. As no subset is found, the candidate is accepted and can be added to the set of preMCSs. A different situation is illustrated by the candidate ’011101’ (redcolored dashed line). Since the bit at position 3 is set, the left subtree of the non-leaf node ’001000’ must also be checked. However, this check does not return a valid subset. Hence, the right subtree is investigated. In this subtree a subset is detected (’011001’). Consequently, the check procedure is immediately terminated and the candidate is dismissed. Before the Berge algorithm is started, extensive preprocessing of the set of provided EFMs is performed. The pre-processing comprises the following steps: (i) eliminating reactions which must not be knocked out, as this would result in the deletion of (too many) wanted modes, (ii) removing duplicate modes, (iii) removing modes that are supersets of other modes, and (iv) combining duplicate reactions. The pre-processing step (iv) is illustrated in Figure 1 as the step ’compress EFMs’ which is the reverse operation of the step ’expand MCS’. For
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE/ACM TRANSACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
4
(a)
1e+06
R1
R7 R2
R3
100000
R4
R5
EFM1 Number of MCSs
R6
EFM2
(b)
R67
R1
R23
details we refer to [17]. Our MCS calculator is written in C and supports multi-threaded execution. Multi-threading was implemented by utilizing the POSIX thread library. Each started thread gets an equally sized set of preMCS candidates which it tests against all existing preMCSs. The existing preMCS are stored in a global memory region that can be accessed by all threads. Note that only read-access to the preMCS data is required by the threads during the superset test. The tree management is done by a single thread and, hence, does not benefit from multi-threaded execution. A Makefile is provided with the source code to allow compilation of the program. The program was developed and tested only on Linux and Mac OS X platforms. Other operating systems are currently not supported. The software is available from the authors on request. In order to benchmark the performance of our bit pattern tree algorithm, we used a system with 114,614 EFMs. The EFMs were calculated with the open source program regEfmtool [18], [19]. The used network has been described in [20]. It models the central carbon metabolism of generic plant cells and consists of 78 reactions (33 reversible and 45 irreversible) and 55 internal metabolites. The network describes the following main pathways: TCA cycle, glycolysis, pentose phosphate, respiration, sucrose and starch synthesis. All of them belong to one of these 4 compartments: vacuole, mitochondria, plastid and cytosol. Transporters have been added for intercompartmental exchange. For the benchmark we computed MCSs of the unconstrained system which means that all EFMs of the system have to be killed. An unconstrained system has the effect that the pre-processing steps (i) to (iii) mentioned earlier cannot be applied and, hence, the size of the system is
1000
100
R45
Fig. 4. Network (a) is the original system and network (b) the system after combining duplicate reactions (R2 and R3 → R23; R4 and R5 → R45; R6 and R7 → R67;). Both networks have two EFMs. However, network (a) has 7 MCSs, whereas network (b) has only 3 MCSs. In a postprocessing step the 3 MCSs of the compressed network can be expanded to the 7 MCSs of the original network. Using the compressed network to do the MCS computation results in strong reduction of the total runtime.
10000
10 0
5
10 15 Number of knocked out reactions per MCS
20
Fig. 5. Number of MCSs as a function of number of knockouts for the metabolic system used in the presented benchmark. only marginally reduced during pre-processing. Consequently, using an unconstrained system results in a high computational workload. The total number of MCSs of the chosen system is 2,815,375 where the minimum and maximum number of knockouts is 4 and 18, respectively (see Figure 5). However, because of pre-processing step (iv), which combines duplicate reactions, the Berge algorithm only needs to compute 93,009 MCSs. In a post-processing step these 93,009 MCSs are expanded to the final number of 2,815,375. The concept of CS expanding is illustrated in Figure 4 which shows two networks. Network (a) depicts the original toy system that has two EFMs and 7 MCSs that remove both EFMs: {R1},{R4},{R5},{R2, R6},{R3, R6},{R2, R7}, and {R3, R7}. Network (b) is the toy system after combining duplicate reactions which only has three MCS: {R1},{R45}, and {R23, R67}. However, in a postprocessing step the three MCSs can be expanded in order to obtain the full set of MCSs of the original network. Using the compressed network instead of the full network results in a strong reduction of the program’s total execution time [17].
3
R ESULTS
The results of the benchmark are listed in Table 2. The table clearly shows that the tree approach is superior to the linear search algorithm, e.g. in single-thread mode the tree algorithm is approximately 30 times faster than the linear search algorithm. The performance gain is even higher if the MCSs of even bigger systems are computed (data not shown). The benchmark also shows that most of the time is spent on the superset test (at least 84%) which is consistent with the observations of others [3]. The time lost managing the bit pattern tree accumulated to approximately 40 seconds. A part of these 40 seconds is spent in the candidate generation procedure where invalid preMCSs are eliminated and, hence, must be removed from the tree. The other task of
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE/ACM TRANSACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
num. threads search type comparison events time tree management time candidate generation time superset test total run time search type comparison events time tree management time candidate generation time superset test total run time search type comparison events time tree management time candidate generation time superset test total run time
1
5 10 (a) linear 20.4·1012 34.3s 32.2s 31.4s 5h 5m 27.8s 1h 11m 27.7s 42m 24.6s 5h 6m 08.5s 1h 12m 06.2s 43m 04.4s (b) tree - random 97.2·109 42.4s 44.7s 42.9s 52.3s 56.4s 57.0s 32m 27.3s 18m 28.7s 14m 58.7s 33m 26.7s 19m 22.2s 16m 03.0s (c) tree - EFM 30.9·109 39.2s 38.7s 40.3s 51.4s 50.7s 53.3s 8m 57.6s 6m 21.7s 5m 01.5s 9m 53.2s 7m 20.1s 5m 58.9s
TABLE 2 Comparison of MCS computation runs with (a) linear search, (b) tree search with random bitorder, and (c) tree search with a bit order derived from EFM properties for a system with 114,614 EFMs. The results of test (b) were obtained by averaging over 5 individual runs each with varying order of bit position. The results of (a) and (b) were averaged of 3 identical runs. Note that the tree management mainly comprises two tasks: (i) removing insufficient preMCS and (ii) adding new valid candidates to the preMCS tree. The tree management time listed in this table shows the sum of both tasks. However, the time spent for removing insufficient preMCS also shows up in the candidate generation time.
the tree management is to add valid candidates to the preMCS tree. As shown in Table 2, the bit position used to create the tree (see Figure 3) has a strong influence on the total run time (compare test case (b) and test case (c)). In our benchmark the best performance was achieved when the order of the bit position was determined by the frequency of active reactions in the set of EFMs to be killed. The more often a certain reaction was carrying a flux in the EFMs, the higher up in the tree the reaction was used to split the set of preMCSs. For example given in Figure 3 this means that the reaction represented by bit position 2 occurs most frequently in the EFMs to be killed. The results obtained by using this EFM statistics approach to determine the bit order are shown in test case (c) tree - EFM. Test case (b) tree -random used a random bit order. For test case (b) 5 runs with varying bit orders were used and the results averaged. In order to obtain comparable results the same random bit order was used for each set of runs with varying numbers of threads. Using the bit order derived from the reaction frequency resulted in a performance gain of approximately a factor 3 compared to the random bit order. The achieved performance improvement can be ex-
5
plained by the number of comparison events that have to be performed (see Table 2). In the linear case 20.4·1012 subset tests were done in total, whereas only 30.9·109 were required for the tree approach shown in test case (c). Although, in test case (a) 660 times more comparison events occurred than in test case (c), the run time improvement was much lower than the factor 660. This is mainly caused by the additional complexity of the tree code and by unavoidable effects such as an increased number of cache mismatches which occur more frequently if memory is accessed randomly instead of linearly. Note that Table 2 also illustrates that the linear search scales much better with an increasing number of threads than the tree approach, as the total run time for the linear search was reduced by a factor of 7.1 when 10 threads were used, whereas the tree implementation only gained a factor of 1.7. Analyzing and improving the multithreaded tree code implementation will be scope for future work. Results of MCS computations for five metabolic networks are listed in Table 3. The table clearly shows that the performance gain grows with the number of preMCSs that have to be determined during the computation. The number of EFMs is not necessarily indicative of the number of MCSs. Our P. pastoris model has 15 million EFMs and roughly 1.8 million MCS, while our A. thaliana contains only 1.7 EFMs but has 3.7 billion MCSs. Note that we terminated the linear search runs of the two large systems, as the execution time was extraordinarily high. Based on the average time spent to search for a single candidate in the set of preMCSs, an extrapolated runtime was estimated. Table 3 shows that the tree approach outperforms the linear search algorithm. Moreover, the tree approach allows to study metabolic systems which could not be analyzed with a linear search.
4
C ONCLUSION
We presented a program that computes minimal cut sets (MCSs) by utilizing a hitting set algorithm originally invented by Berge [3]. The bottleneck of the Berge algorithm is the search procedure that is required to guarantee that all computed MCSs are truly minimal. This is done by performing a test that verifies that a new MCS candidate is not a superset of any of the already determined preMCS [5]. A simple, but slow approach to test a new candidate against all already existing preMCSs is to linearly iterate through the complete set of existing preMCSs and check if this set contains a subset of the candidate. In order to speed up the superset test we stored the existing preMCSs in a binary tree structure. This tree structure can be used to skip a significant amount of the necessary comparison events between a candidate and the existing preMCSs. Consequently, the total runtime for the computation of the complete set of MCSs is strongly reduced and, hence, numerical
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE/ACM TRANSACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
Organism S. cerevisiae E. coli P. pastorisa P. pastorisb A. thaliana
num. EFMs 136,086 429,276 15,082,361 1,406,983 1,720,557
num. MCSs (compressed) 16,695 41,989 127,761 3,819,275 75,900,274
num. MCSs (expanded) 104,940 1,473,564 1,754,491 15,844,753 3,708,404,636
6
max. preMCSs 18,928 134,931 393,686 6,117,123 367,879,975
runtime (linear) 55s 9m 33s 17h 38m 15s † 69d 03h 39m 59s ‡ 302y 229d 13h 54m 49s
runtime (tree) 39s 4m 07s 7h 10m 42s 11h 43m 37s 27d 13h 26m 15s
speedup factor 1.4 2.3 2.5 † 141.5 ‡ 4008.0
TABLE 3 Comparison of MCS computation with linear search and with tree search for different metabolic networks using 10 parallel threads. The model P. pastorisa contains the core metabolism and pathways for amino acid, nucleotides, and riboflavin production. The network P. pastorisb also contains the core and amino acid metabolism, but any cofactor coupling has been removed from the model. Consequently, P. pastorisb is only a numerical test network and not a proper biological model. † extrapolated data, case terminated after 77h at 14,900 of 1,406,983 iterations. ‡ extrapolated data, case terminated after 121h at 98 of 1,720,557 iterations.
analyses and gene intervention strategies of much bigger systems can be calculated by our approach. Note that the determination of the MCSs requires the calculation of the EFMs first. Even though significant progress has been made in recent years on the efficient computation of EFMs [15], [21], [22], in many situations the EFM computation is the limiting factor of MCS analyses. However, recently alternative approaches have been reported which are based on the duality of EFMs and MCSs [3], [23]. These approaches allow the computation of MCSs directly from the stoichiometric matrix without having to calculate the EFMs beforehand. The Berge algorithm is well suited to compute MCSs of systems with a moderate number of columns/reactions (up to several hundred) and a high number of rows/modes (several million). However, systems with other characteristics (e.g. a high number of columns and a low number of rows) cannot be investigated easily with the Berge algorithm - even if a tree approach is used - as the number of preMCSs grows dramatically with each processed row/mode. The huge number of preMCSs results in a tremendous memory consumption and a very slow execution which make the Berge algorithm unsuitable for these type of systems. Besides the importance of MCSs in metabolic engineering, the calculation of MCSs is also an essential problem in discrete mathematics with numerous applications in computer science, artificial intelligence and game theory [16]. These fields might also benefit from the idea of using binary bit pattern trees to speed up the computation of MCSs.
R EFERENCES [1]
[2]
[3] [4] [5] [6]
[7]
[8] [9] [10] [11] [12] [13]
ACKNOWLEDGMENT This work has been supported by the Federal Ministry of Economy, Family and Youth (bmwfj), the Federal Ministry of Traffic, Innovation and Technology (bmvit), the Styrian Business Promotion Agency SFG, the Standortagentur Tirol and ZIT - Technology Agency of the City of Vienna through the COMET-Funding Program managed by the Austrian Research Promotion Agency FFG.
[14] [15] [16] [17]
S. Schuster, D. A. Fell, and T. Dandekar, “A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks,” Nat Biotech, vol. 18, no. 3, pp. 326–332, Mar. 2000. S. Schuster, T. Dandekar, and D. A. Fell, “Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering,” Trends in Biotechnology, vol. 17, no. 2, pp. 53–60, Feb. 1999. U.-U. Haus, S. Klamt, and T. Stephen, “Computing knock-out strategies in metabolic networks,” Journal of Computational Biology, vol. 15, no. 3, pp. 259–268, Apr. 2008. S. Klamt and E. D. Gilles, “Minimal cut sets in biochemical reaction networks,” Bioinformatics, vol. 20, no. 2, pp. 226 –234, Jan. 2004. O. H¨adicke and S. Klamt, “Computing complex metabolic intervention strategies using constrained minimal cut sets,” Metabolic Engineering, vol. 13, no. 2, pp. 204–213, Mar. 2011. C. T. Trinh, P. Unrean, and F. Srienc, “Minimal escherichia coli cell for the most efficient production of ethanol from hexoses and pentoses,” Applied and Environmental Microbiology, vol. 74, no. 12, pp. 3634–3643, Jun. 2008. [Online]. Available: http://aem.asm.org/content/74/12/3634.abstract C. T. Trinh, R. Carlson, A. Wlaschin, and F. Srienc, “Design, construction and performance of the most efficient biomass producing e. coli bacterium,” Metabolic Engineering, vol. 8, no. 6, pp. 628–638, Nov. 2006. S. Klamt and J. Stelling, “Combinatorial complexity of pathway analysis in metabolic networks,” Molecular Biology Reports, vol. 29, no. 1, pp. 233–236, Mar. 2002. S. Klamt, “Generalized concept of minimal cut sets in biochemical networks,” Biosystems, vol. 83, no. 2-3, pp. 233–247, Feb. 2006. M. Hagen, “Lower bounds for three algorithms for transversal hypergraph generation,” Discrete Applied Mathematics, vol. 157, pp. 1460–1468, 2009. C. Jungreuthmayer and J. Zanghellini, “Designing optimal cell factories: Integer programing couples elementary mode analysis with regulation,” BMC Systems Biology, vol. 6, p. 103, 2012. C. Berge, Hypergraphs, Volume 45: Combinatorics of Finite Sets, 1st ed. North Holland, Aug. 1989. H. Sheni and D. J. Evans, “Fast sequential and parallel algorithms for finding extremal sets,” International Journal of Computer Mathematics, vol. 61, pp. 195–211, 1996. P. Pritchard, “On computing the subset graph of a cellection of sets,” Journal of Algorithms, vol. 33, pp. 187–203, 1999. M. Terzer and J. Stelling, “Large-scale computation of elementary flux modes with bit pattern trees,” Bioinformatics, vol. 24, no. 19, pp. 2229 –2235, Oct. 2008. T. Eiter, K. Makino, and G. Gottlob, “Computational aspects of monotone dualization: A brief survey,” Discrete Applied Mathematics, vol. 156, no. 11, pp. 2035–2049, Jun. 2008. C. Jungreuthmayer, G. Nair, S. Klamt, and J. Zanghellini, “Comparison and improvement of algorithms for computing minimal cut sets,” BMC Bioinformatics, submitted 2013.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE/ACM TRANSACTION ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
[18] C. Jungreuthmayer, D. Ruckerbauer, and J. Zanghellini, “regefmtool: Speeding up elementary flux mode calculation using transcriptional regulatory rules in the form of three-state logic,” BioSystems, submitted 2013. [19] University of Natural Resources and Life Sciences, Vienna, Institut of Applied Microbiology, Metabolic Modelling Group, “regEfmtool - Regulatory Elementary Flux Mode Tool,” http://www.biotec.boku.ac.at/regulatoryelementaryfluxmode.html, 2012. [20] M. Beurton-Aimar, B. Beauvoit, A. Monier, F. Vall´ee, M. DieuaideNoubhani, and S. Colombi´e, “Comparison between elementary flux modes analysis and 13c-metabolic fluxes measured in bacterial and plant cells,” BMC Systems Biology, vol. 5, no. 95, 2011. [Online]. Available: http://www.biomedcentral.com/17520509/5/95 [21] D. Jevremovi´c, C. T. Trinh, F. Srienc, C. P. Sosa, and D. Boley, “Parallelization of nullspace algorithm for the computation of metabolic pathways,” Parallel Computing, vol. 37, no. 67, pp. 261– 278, 2011, DOI:10.1016/j.parco.2011.04.002. [22] C. Jungreuthmayer, D. Ruckerbauer, and J. Zanghellini, “Utilizing gene regulatory information to speed up the calculation of elementary flux modes,” arXiv, p. 1208.1853v1, 2012. [Online]. Available: http://arxiv.org/abs/1208.1853 [23] B. Kathrin, v. K. Axel, S. Klamt, and U.-U. Haus, “Minimal cut sets in a metabolic network are elementary modes in a dual network,” Bioinformatics, vol. 28, no. 3, pp. 381–387, Dec. 2011.
Christian Jungreuthmayer received his PhD from the Vienna University of Technology in 2005. He is a scientist at the Austrian Centre of Industrial Biotechnology (ACIB). His research background is high performance computing and metabolic modeling. His current research interests include numerical modeling and computational simulations of metabolic networks.
Marie Beurton-Aimar received her PhD from the University of Bordeaux in 2001. She is a scientist at the Laboratoire Bordelais de Recherche en Informatique (LaBRI) in Bordeaux. Currently, her research focus lies on modeling metabolic networks and the simulation of biological processes with multi-agent systems.
Jurgen ¨ Zanghellini received his PhD from the Vienna University of Technology in 2004. He is currently heading a group on metabolic modeling at the Austrian Centre of Industrial Biotechnology (ACIB). His current research interests are geared toward numerical methods in biology with a special focus on computational systems biology.
7