1 Computer Science Department, University of California at Davis, Davis, CA 95616. 2 Centre de recherches mathSmatiques, Universit$ de Montreal, C.P. 6128, ...
E x a c t and a p p r o x i m a t i o n a l g o r i t h m s for t h e inversion d i s t a n c e b e t w e e n two chromosomes J o h n Kececioglu 1 a n d D a v i d Sankoff 2 1 Computer Science Department, University of California at Davis, Davis, CA 95616 2 Centre de recherches mathSmatiques, Universit$ de Montreal, C.P. 6128, suce. A, Montr6al, Quebec H3C 3J7
A b s t r a c t . Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the proble m of computing the shortest series of reversals that transform one permutation to another, The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements and reverses their order. For this problem we develop two algorithms: a greedy approximation algorithm that finds a solution provably close to optimal in O(n 2) time and O(n) space for an n element permutation, and a branch and bound exact algorithm that finds an optimal solution in O(mL(n, n)) time and O(n 2) space, where m is the size of the branch and bound search tree and L(n, n) is the time to solve a linear program of n variables and n constraints. The greedy algorithm is the first to come within a constant factor of the optimum, and guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch and bound algorithm are a novel application of maximum weight matehings, shortest paths, and linear programming. In a series of experiments we study the performance of an implementation. For random permutations we find that the average difference between the upper and lower bounds is less than 3 reversals for n < 50. Due to the tightness of these bounds we can solve to optimality random permutations on 30 elements in a few minutes of computer time.
1
Introduction
M u c h research has been devoted to efficient a l g o r i t h m s for the edit d i s t a n c e b e t w e e n two strings, t h a t is, the m i n i m u m n u m b e r of insertions, deletions, a n d s u b s t i t u t i o n s t o t r a n s f o r m one s t r i n g into another. M o t i v a t i o n comes in large p a r t f r o m c o m p u t a t i o n a l b i o l o g y : at t h e level of i n d i v i d u a l characters, genetic sequences m u t a t e b y these o p e r a t i o n s , so edit d i s t a n c e is a useful m e a s u r e of e v o l u t i o n a r y d i s t a n c e . A t t h e c h r o m o s o m e level, however, genetic sequences m u t a t e b y m o r e g l o b a l o p e r a t i o n s such as t h e reversal of a s u b s t r i n g (inversion), the d e l e t i o n a n d s u b s e q u e n t r e i n s e r t i o n of a s u b s t r i n g far f r o m its original site (transposition), d u p l i c a t i o n o f a s u b s t r i n g , a n d t h e exchange of prefixes or suffixes of two sequences in t h e s a m e o r g a n i s m (translocation). A n inversion, which takes takes a s u b s t r i n g of u n r e s t r i c t e d size a n d replaces it by its reverse in one o p e r a t i o n , has t h e effect o f r e v e r s i n g t h e
88 order of the genes contained within the substring. It is perhaps the most c o m m o n of these operations [17, pages 174-175] especially in organisms with one chromosome. To take an example, the only m a j o r difference between the gene orders of two of the most well-known bactera, Escherichia coil and Salmonella t y p h i m u r i u m , is an inversion of a long substring of the chromosomal sequence [13]. In plants, Palmer, Osorio and T h o m p s o n [14] modeled the evolution of part of the pea chloroplast genome, which is also a single chromosome, in terms of five successive overlapping inversions. And in the fruit fly, inversions are a far more frequent reflection of differences both between and within species than either translocation or transposition [4, page 155]. The importance of inversion in these examples suggests that algorithmic study of gene rearrangement by inversions is a worthwhile first step in the study of evolutionary distance at the level of the chromosome. In the m a t h e m a t i c a l problem that we consider, we are given the order of n genes in two related single-chromosome organisms or two related organelles, which we represent by permutations ~ = (~h ~2 "'" ~n) and 7- = (7-1 r2 -.. rn). 3 In this notation cri denotes a(i). Such gene orders often come from genetic maps t h a t are the distillation of the work of m a n y experimental geneticists. In current practice, the positions of the genes are increasingly found by sequence comparison or DNA hybridization, as opposed to the m a p p i n g experiments of traditional genetics. We model an inversion by the reversal of an interval of elements. Formally, a reversal of interval [i, j] is the permutation 4
P=
j-1
.i
"
Applying p to ~ by the composition 5 ~ 9 p has the effect of reversing the order of genes ai, ai+l, . . . , a j . Our problem is the following. D e f i n i t i o n l . The reversal distance problem on permutations is, given p e r m u t a tions a and r, find a series of reversals Pl, P 2 , . . . , Pd such that o" . pl
and d is minimum.
" P2
" " " Pd
----- r ,
[]
We call d the reversal dislance 6 between cr and ~'. Like edit distance, it satisfies the axioms of a metric. Reversal distance measures the amount of evolution t h a t m u s t have taken place at the chromosome level, assuming evolution proceeded by inversions. 3 Genes in one organism may be missing in the other. We assume however that such genes can be removed from the analysis, and that gene insertions and deletions can be analyzed separately. 4 This notation is shorthand for p(i) = j , p(i + 1) = j - 1, and so on. Outside interval [i, j], p leaves elements unchanged. 5 The composition of permutations (r and p indicated by a 9 p is a permutation ~r where ~(i) = a(p(i)). We sometimes refer to d informally as the inversion distance.
89 Notice that the reversal distance between a and r is equal to the reversal distance between r -1 9a and the identity permutation z, where v -1 denotes the inverse of v. 7 Hence we can take as input the permutation 7r = r - l a and compute its distance f r o m z. We call this formulation of the problem, sorting by reversals. Note also t h a t any algorithm for the reversal distance between two strings that does not exploit a bounded-size alphabet must, as a special case, solve the reversal distance problem on permutations. Little is known about reversal distance: even its computational complexity is open. The only reference to an algorithm appears to be Watterson, Ewens, Hall, and Morgan [20], which gives the first definition of the problem and a heuristic for computing reversal distance, which we describe in Section 2. /.From the perspective of edit distance, Wagner [19] and Tichy [18] consider variations on the problem of editing one string into another where the operations include swapping of adjacent characters, or copying of substrings. More recently SchSniger and W a t e r m a n [16] present a heuristic for computing edit distance when non-overlapping inversions are allowed. /.From the perspective of sorting, Gates and Papadimitriou [7] is interesting. T h e y consider the problem of sorting a permutation by prefix reversals, s which are reversals of the form [1, i], and derive bounds on the diameter of the problem. The diameter of the prefix reversal problem, which we denote by dprefix(n), is tile m a x i m u m of the m i n i m u m number of prefix reversals over all permutations on n elements. G a t e s 5 5 and Papadimitriou show that dprefix(n) _< ~n -k- ~, and that for infinitely m a n y n, 17 dprefix(n) _> ]-gn. Under the requirement that each element is reversed an even number of times, which m a y be appropriate if elements have an orientation, they show ~n-3 1 _< dp~efix(n) < 2n + 3. In other work, Aigner and West [1] consider the diameter of sorting when the operation is reinsertion of the first element, and A m a t o , Blum, Irani, and Rubenfeld [2] consider a variation inspired by the problem of reversing trains on a track. /.From the perspective of group theory, Even and Goldreich [5] and J e r r u m [9] are interesting. Even and Goldreich show that, given a set of generators of a permutation group G and a permutation lr, determining the shortest product of generators t h a t equals 7r is NP-hard. Their reduction jmplies that the problem remains NP-hard even if every generator is its own inverse, as is the case in our problem. J e r r u m established that the problem is PSPACE-complete, and remains so when restricted to two generators. In our problem the generator set is fixed. Thus, while these complexity results give a feel for the problem, they do not imply the intractability of sorting by reversals. Nevertheless, we believe sorting by reversals is NP-complete. In the next section we present an approximation algorithm for sorting by reversals and show it never exceeds the minimum number by more than a factor of 2. Section 3 develops an exact algorithm using the branch and bound technique. The lower bound uses a relaxation to m a x i m u m weight matchings and linear programming. Section 4 presents results from experiments with an implementation of these algorithms on biological data and random permutations. 7 The identity permutation t is (1 2 ..- n), and the inverse of permutation r is the permutation r -1 satisfying r -1 9~ = ~. s This is also known as the pancake flippin 9 problem.
90 2
An
approximation
algorithm
Perhaps the most natural algorithm for sorting by reversals, suggested by Watterson, Ewens, Hall and Morgan [20], is to bring element 1 into place, then element 2, and so on up to element n. Formally, at step i perform reversal [i, ~r~-1] if 7ri ~ i,. Once step n - 1 is completed element n must be in position n, so this sorts any n-element permutation in at most n - 1 reversals. While it is conjectured that there are permutations for every n that require n 1 reversals [8], which if true means this algorithm is worst-case optimal, for specific instances the algorithm can perform arbitrarily poorly. Consider for example the permutation (n 1 2 ... n - 1 ) . Bringing 1 into place, then 2, and so on, uses n 1 reversals, yet the permutation can be sorted in two steps: reverse [1, n], then [1, n 1]. Thus this natural algorithm can produce a solution 89 - 1) times longer than the shortest, for arbitrarily large n. Using the idea of a breakpoint, also introduced in Watterson et al. [20], we show there is a simple algorithm guaranteed to use no more than twice the minimum number of reversals. To the best of our knowledge, this is the first constant-factor approximation for sorting by reversals. In order to describe the algorithm, we first define some terminology. A breakpoint of a permutation 7r is a pair of adjacent positions (i, i + 1) such that [Tri+x - ~ri[ ~ 1. In other words, (i, i + 1) forms a breakpoint if values 7ri and 7ri+l are consecutively increasing or decreasing. To handle the boundaries we let 7r0 have the value 0, 7rn+l have the value n + 1, and allow i to range from 0 to n in the definition. Thus (0, 1) is a breakpoint if 7rl ~ 1, and (n, n + 1) is a breakpoint if 7rn ~ n. Notice that the identity permutation has no breakpoints, any other permutation has some breakpoint, and the number of breakpoints is at most n + 1. When [~ri+l-a'i[ = 1, we say values 7ri+l and ri are adjacent, and write 7ri+1 ' ~ 71"i . A strip of r is an interval [i, j] such that (i - 1, i) and (j, j + 1) are breakpoints, and no breakpoint lies between them. In other words, a strip is a maximal run of increasing or decreasing elements. A reversal p affects the breakpoints of 7r only at the endpoints of p. (In the interior, p only makes an increasing pair (Tri, ri+l) decreasing, and vice versa.) Let us write ~ ( r ) for the number of breakpoints in 7r, and for a given reversal p, let A~(~) = ~ ( r ) - ~(~.p). Since a reversal [i,j] changes the adjacency of only two points, namely ( i - 1 , i) and (j, j + l ) , the only values A~(rr) can have are between - 2 and 2. Since a solution must decrease the number of breakpoints from~/i(Tr) to zero, a greedy strategy is to choose a reversal that achieves the greatest decrease, or maximizes ACi(Tr). As any 7r ~ z has a reversal with A~(Tr) >_ 0, we can always achieve a decrease of 2, 1, or 0. Figure 1 specifies our greedy algorithm. With the rule of favoring reversals that leave decreasing strips, we are able to show it exceeds the minimum by at most a factor of 2. In the following, a strip [i,j] is decreasing iff 7ri, 7ri+l, ..., 7rj is decreasing. We consider a strip of one element to be decreasing, except for rr0 and 7rn+l, which are always increasing. Thus the identity permutation forms one increasing strip extending from 0 to n + 1.
91
a l g o r i t h m GREEDY(r) i:=O w h i l e lr contains a breakpoint d o i:=i+1 Let pi be a reversM that removes the most breakpoints of a', resolving ties among those that remove one breakpoint in favor of reversals that leave a decreasing strip. end r e t u r n i, (Pl, P2 . . . . . P,) end F i g . 1. The greedy algorithm.
L e m m a 2. Every permutation with a decreasing strip has a reversal that removes a breakpoint.
Proof. C o n s i d e r t h e decreasing strip of 7r whose last element, 7ri, is s m a l l e s t . Elem e n t ~ r i - 1 m u s t be in an increasing s t r i p (else 7ri is not s m a l l e s t ) t h a t lies e i t h e r to " t h e left or to the right of the strip c o n t a i n i n g 7ri, as i l l u s t r a t e d in F i g u r e 2. In e i t h e r case, t h e i n d i c a t e d reversal removes at least one b r e a k p o i n t . []
~i-1
9
"" I
~'"
]
9
~..
4
! ] ..-
Y P
(a)
P
(b)
F i g . 2, A permutation ~- with a decreasing strip has a reversal p that removes a breakpoint. Element rl is the smallest that lies in a decreasing strip.
Lemma3. Let ~r be a permutation with a decreasing strip. I f every reversal that removes a breakpoint of 7r leaves a permutation with no decreasing strips, r has a reversal that removes two breakpoints.
Proof. A g a i n consider the decreasing strip of 7r c o n t a i n i n g the s m a l l e s t e l e m e n t 71"i . Case (b) of F i g u r e 2 c a n n o t occur, since p is a reversal t h a t r e m o v e s a b r e a k p o i n t a n d leaves a decreasing strip. T h u s the increasing s t r i p c o n t a i n i n g 7ri - 1 m u s t be to t h e left of the s t r i p c o n t a i n i n g 7ri, as in case (a). Call t h e reversal of case (a), Pi.
92 Consider the decreasing strip of 7r whose first element, 7rj, is largest. E l e m e n t 7rj + 1 m u s t be in an increasing strip (else 7rj is not largest) t h a t is to the right of the strip containing ~'j. Otherwise, a reversal analogous to case (b) removes a b r e a k p o i n t and leaves a decreasing strip. Call the reversal for 7rj t h a t is analogous to Pi, Pj. Notice t h a t 7rj m u s t lie in interval Pi and 7rj + 1 m u s t lie outside, else Pi leaves a decreasing strip. Similarly 7ri m u s t lie in pj and 7ri - 1 outside, else pj leaves a decreasing strip. T h e situation is as shown in Figure 3. Intervals Pi and pj overlap.
9
Y
P,
9 Y
Pj
Fig. 3. If every reversal that removes a breakpoint of r leaves a permutation with no decreasing strips, pi and pj must overlap. Elements 7ri and rr~ are the smallest and largest in decreasing strips.
We now argue t h a t not only do pi and pj overlap, they n m s t be the s a m e interval. For suppose Pi - Pj is not empty. If it contains a decreasing strip, reversal pj leaves a decreasing strip, and if it contains an increasing strip, reversal p,: leaves a decreasing strip. Similarly interval pj - Pi m u s t be empty, which implies interval pi = pj. Since reversal pi removes the breakpoint on its left and reversal pj removes the b r e a k p o i n t on its right, and as these breakpoints are distinct, reversal p = Pi = Pj removes two breakpoints. [] L e m m a 4. The greedy algorithm sorts a p e r m u t a t i o n r with a decreasing strip in at m o s t 4~(r) - 1 reversals. Proof. T h e p r o o f is by L e m m a s 2 and 3, and induction on r
in [10]. Theorem5.
Details are given []
The greedy algorithm sorts every p e r m u t a t i o n r in at m o s t ~(Tr) re-
versals. Proof. If ~- has a decreasing strip, by L e m m a 4 , GREEDY sorts it within O ( r ) reversals. If 7r has no decreasing strip, any reversal chosen by GREEDY transforms r to a p e r m u t a t i o n ~r' with a decreasing strip such t h a t ~/i(Tr') < ~(Tr). By L e m m a 4, GREEDY sorts r ~ in at m o s t ~ ( r ~) -- 1 reversals, which sorts 7r in at m o s t ~ ( r ) reversals. []
93 An i m m e d i a t e consequence of Theorem 5 is the following. An e-approximation algorithm for an optimization problem is an algorithm that runs in polynomial time and delivers a solution whose value is within a factor e of optimal. Corollary6.
The greedy algorithm is a 2-approximation algorithm for sorting by
reversals. Proof. Write OPT(r) for the m i n i m u m number of reversals to sort a p e r n m t a t i o n r, and GREEDY(r) for the number taken by the greedy algorithm. Since a solution must remove all breakpoints and any reversal can remove at most two, OPT(r) _) ~1 H(r) > _ 1 GREEDY(r). [] Since H ( r ) < n + 1, Theorem 5 implies that the greedy algorithm terminates in O(n) iterations. How much time does an iteration take? As there are in general (~) reversals to consider, a naive implementation could take O(n 2) t.ime per iteration, or O(n 3) total time. By considering the form of reversals that remove one and two breakpoints, the greedy reversal for an iteration can be found in O(n) time, which yields an O(n 2) time algorithm. Details are given in [10]. We suspect that the algorithm also has an O(nlogn) time implementation, though as we will see in Section 4, the approximation algorithm is far from being the dominant step in practice.
3
An
exact
algorithm
In the last section we obtained an algorithm that comes close to the o p t i m u m by applying a greedy strategy: of all reversals, select one that removes the most breakpoints. To obtain an algorithm that reaches the o p t i m u m we use a branch and bound strategy: consider all reversals, and eliminate those that cannot lead to an o p t i m a l solution. Figure 4 shows the form of our branch and bound algorithm. We maintain three global variables: Bound, a dynamic upper bound on the solution value; Best, an array of reversals that sort the pernmtation in Bound steps; and Current, the series of reversals currently under consideration. At the start we initialize Bound and Best to values obtained from an upper bound algorithm. The algorithm we use is essentially GREEDY with a fixed-depth look-ahead, and is described in Section 3.3. 9 After obtaining an upper bound we explore a tree of subproblems depth-first. Each invocation of SEARCH corresponds to a node of the tree and is labelled with r, a p e r m u t a t i o n to be sorted, and Depth, the number of edges from the root to the node. Array Current is maintained as a stack by SEARCH,and holds the reversals on the p a t h from the root to the current node. We chose a depth-first strategy for 9 We use x l , . . . , x n := el,...,e,~ as shorthand for the parallel assignments xi := e,. Function UPPERBOUND, like GREEDY, returns two values: an integer, followed by a list of reversals.
94 global Bound, Current[1..n], Best[1..n] a l g o r i t h m BRANCHANDBOUND(r) Bound, Best := UPPERBOUND(r) SEARCH(a-,0) r e t u r n Bound, Best end
algorithm SEARCH(a-,Depth) if a- is the identity permutation t h e n if Depth < Bound t h e n Bound, Best := Depth, Current else for each reversal p in order of decreasing z~(a-) do if LOWERBoUND(a- 9p) + Depth + 1 < Bound t h e n Current[Depth + 1] := p SEARCH(a-. p, Depth + 1) end end Fig. 4. The branch and bound algorithm.
traversing the tree as this uses a polynomial amount of space even when the tree is of exponential size. Space, not time, is often the limiting resource. Examining all reversals yields a-very large tree: with (~) children per node and a height of n - 1, there are O(n2n/2 n) nodes. The means we have to reduce the size of the tree is by ordering children and computing lower bounds. The algorithm of Figure 4 orders children by decreasing A ~ on the assumption that the optimal solution uses reversals of greatest Ar We now explain how the lower bound is computed. 3.1
A lower b o u n d from m a t c h i n g s
As stated in the proof of Corollary 6, a simple lower bound on OPT(Tr) is [~(~r)/2]. While this is sufficient to prove an approximation factor of 2, it is extremely weak. It assumes every breakpoint of ~r can be eliminated by a reversal that removes two breakpoints, which call rarely be achieved. To obtain a better bound we ask, for a given permutation, how many breakpoints can possibly be eliminated by reversals that remove two breakpoints? A pair of breakpoints p = (i, i + 1) and q = (j, j + 1) with values (~ri, 7ri+l) = (x, y) and (Trj, ~rj+l) = (x', y') can be eliminated in one reversal iffx ,-~ x' and y ~ y'. This holds whether p is to the left or to the right of q. The only requirement is that x and y occur in the same order as x' and ~ . Notice that such a reversal also effects other pairs of breakpoints. A pair with values (x, y) and (y', x'), which cannot be eliminated immediately because adjacent values are not in the same order, can be eliminated in one step if preceded by a reversal that contains exactly one breakpoint of the pair. Such a reversal transforms the pair to the preceding case.
95 In general, determining when a collection of 2m breakpoints can be eliminated by a sequence of m reversals appears difficult. (In Section 5 we conjecture it is NPcomplete.) To obtain a lower bound that can be efficiently computed, we ignore dynamic information about the order and interaction of reversals. The static information we retain is simply the adjacency of values between breakpoints, which can be represented by a graph. Figure 5 shows the construction. Each breakpoint of ~r is m a p p e d to a vertex of G(Tr). We place an edge between breakpoints p and q iff either of the above cases apply. Effectively, if two breakpoints may be eliminated by one reversal, possibly after a sequence of other reversals that eliminate two breakpoints, they share an edge. Note that the order of the two values at a breakpoint is not i m p o r t a n t for the construction.
zr""
x y
...
x t
p
y~
...
q
@_@
O1'
zr""
x y
p
...
y~
a(,~)
x ~ ...
q
Fig. 5. Breakpoints {x,y} and {x',y'} share an edge in G(zr) iff x ,~ x' and y ,.o y'.
Since each edge models a reversal and performing the reversal removes both vertices, a series of reversals of 7r that each eliminate two breakpoints corresponds to a set of vertex-disjoint edges in G(Tr). A set of vertex-disjoint edges is a matching. T h e key property of G(Tr) is that the most reversals we can possibly perform on 7r t h a t each remove a pair of breakpoints, without performing any intervening reversals t h a t remove less than two breakpoints, is the size of a m a x i m u m cardinality matching of G(~r). Let m be the number of vertices ill a maximuna cardinality matching of G(Tr), or in other words twice the number of edges in the matching. How m a n y reversals m u s t be performed to remove the remaining q~(zr) - m breakpoints of zr? The best we can do is to expend a reversal that. removes one breakpoint to set up a reversal t h a t removes two breakpoints. Notice that we cannot remove one breakpoint, then two, then two again, since a reversal that removes one breakpoint can affect only one additional breakpoint, so the third reversal must have been available from the start. This is impossible since the matching is of lnaximum cardinality. In short the best we can do is to remove three breakpoints in two reversals. This gives a lower
96 bound of
+
(1)
which has the extreme value [ ~ Q r ) ] . We can construct G(Tr) from 7r in O(n) time. Certainly the O(n) breakpoints of ~r can be determined in O(n) time. Every breakpoint has a constant number of edges, since the only values adjacent to x are x + l and x - l , and with the help oflr -1 we can determine all edges in O(n) time as well. Finally, a m a x i m u m cardinality matching of a graph with V vertices and E edges can be computed in O ( E v / V ) time [11]. As we have said, V and E are O(n). Thus we can evaluate the lower bound in O(n z/2) time. 3.2
A family of lower bounds
One can improve the bound further by considering 3-tuples of breakpoints, 4-tuples of breakpoints, and so on. Let us call a reversal that eliminates k breakpoints a k-move. Thus a 2-move is a reversal that eliminates 2 breakpoints, and a ( - 2 ) - m o v e is a reversal that creates 2 breakpoints. In general for k >_ 3 we define a k-move as follows. Over all permutations consider all series of reversals that eliminate k breakpoints. A k-move for k >_ 3 is a shortest series that eliminates k breakpoints given that no 2-, 3-, up to ( k - 1 ) - m o v e s are available. The following two lemmas characterize the structure of a k-move. Proofs are given in [10]. L e m m a 7. For k >_ 3, a k-move is a series of k - 1 reversals that decomposes into either (i) a 1-move followed by a (k - 1)-move, or (ii) a O-move followed by an i-move and a j-move where i + j = k.
[] L e m m a 8. For k >_ 2, the sets of values at the breakpoints eliminated by a k-move have the form {xl,x } ..(2)
where xi "~ x~ for 1 < i < k.
U
We now describe how to construct a graph HQr) from p e r m u t a t i o n a" that allows us to efficiently identify sets of breakpoints of form (2) for small k. In the construction, breakpoints of Ir are m a p p e d to vertices of H(Tr), and pairs of breakpoints that share an adjacency, such as {x,a} {x', b) where x ~ x', are m a p p e d to edges as shown in Figure 6. Edges of H(~r) are directed, but not in the standard sense. An edge touching v and w contributes to either the in- or out-degree of v and w. In a directed graph, an edge (v, w) that contributes to the out-degree of v necessarily contributes to the in-degree of w, and vice versa. However in what we call a bidirected graph there are two more possibilities: (v, w) may contribute to the in-degree of both v and w, or to the out-degree of both v and w.
97 xla
b~'
P
q
~
~
H(~)
w
ax P
x'lb q
~-~
( ~ ~
H(r)
.
xla
x'[b
P
q
ax
b ~"
~
( ~ ~
H(~)
P
q
Fig. 6. Construction of graph H(~'). Values x and x' are adjacent, and induce an edge that contributes to the in- or out-degree of p and the in- or out-degree of q, depending on whether the value is on the left- or right-side of the breakpoint.
This gives rise to the four types of edges of Figure 6. We indicate the direction of an edge with double-ended arrows. When drawing an edge incident to v, we place an arrowhead pointing into v at the end touching v if the edge contributes to v's in-degree. Otherwise we direct the arrowhead out of v. T h e utility of this construction is in the correspondence between cycles in the graph and k-moves on the permutation, as summarized in the following lemma. A k-cycle in a bidirected graph is a series of edges
(vl,v2) (v2,v3)-.. (vk,v ) such t h a t the
Vi
are distinct, and every vi has in- and out-degree 1.
L e m l n a 9. The sets of breakpoints of ~ whose values have the form of k-moves are
in one-to-one correspondence with the sets of vertices of H(~r) thai form k-cycles. [] We now have the tools to generalize the lower bound of Section 3.1. Ill outline, we construct, a hypergraph G(k)(lr) whose vertices correspond to breakpoints, but whose edges are sets of up to k vertices that correspond to k~-moves for k I < k. A series of moves on ~r m a p s to a matching of G (k), where a matching of a hypergraph is a collection of vertex-disjoint edges. Choosing a k-move corresponds to performing a series of k - 1 reversals. We weight edges by the number of reversals they represent and seek as before a m a x i m u m weight matching. However computing a m a x i m u m weight matching is in general NP-complete for hypergraphs [6]. We express the matching problem as an integer programming problem and relax the integrality constraint to obtain a linear p r o g r a m m i n g problem. This gives a somewhat weaker lower bound t h a t is computable in polynomial time.
98 This approach is summarized in the following theorem. The k-girth of a graph is the length of a shortest cycle of more than k edges. If the graph does not contain such a cycle, we define its k-girth to be n + 1, where n is the number of vertices.
Let (V,E) be the graph G(k)(r), g be the k-girth of H(Tr), and be the solution value of the linear program
Theorem l0.
s
minimize a~A~(Tr) - ~
g
;ge
eqE
subject to 0