Supplementary Material: An Exact Algorithm for ...

8 downloads 0 Views 594KB Size Report
Comparison of CREx [1] and CREx2 for instances that have a linear SIT. ... Middendorf, M.: CREx: Inferring genomic rearrangements based on common intervals ...
1

Supplementary Material: An Exact Algorithm for Sorting by Weighted Preserving Genome Rearrangements Tom Hartmann, Matthias Bernt, and Martin Middendorf

F

This supplementary material is not part of the paper and only for reviewing purposes. Section S1 contains supplementary proofs. A detailed description of the adjustment of GeRe-ILP ( [3]) is given in Section S2. Figures that could not be included in the main text are given in Section S3.

S1

S UPPLEMENTARY P ROOFS

In the following, let Π ⊆ sPn be a set of signed permutations and λ, π ∈ sPn be consistent with Π. Corollary 1. Let S = (ρ1 , . . . , ρl ) be a sequence for λ that is preserving for Π, λi := ρi ◦ . . . ◦ ρ1 ◦ λ with i ∈ [1 : l], and λ0 := λ. Then, for each linear node N



TH, MM are associated with the Swarm Intelligence and Complex Systems Group, Faculty of Mathematics and Computer Science, University of Leipzig, Augustusplatz 10, D-04109 Leipzig, Germany. E-mail: {thartmann,middendorf}@informatik.uni-leipzig.de



MB is associated with the Helmholtz Centre for Environmental Research - UFZ, Permoserstrae 15, D-04318 Leipzig, Germany. E-mail: [email protected].

2

in T λj (π, Π) it holds that consecutive child nodes of N are also consecutive in T λj+1 (π, Π), where j ∈ [0 : l − 1]. Proof: Let N be a linear node in T λj (π, Π), j ∈ [1 : l − 1], with child nodes N1 , . . . , Ndeg(N ) in that order. By Theorem 1 it holds that N is a linear node in T λj+1 (π, Π). Hence, the order of the child nodes of N in T λj+1 (π, Π) is either N1 , . . . , Ndeg(N ) or its reverse, i.e., Ndeg(N ) , . . . , N1 . Consequently, for all i ∈ [1 : deg(N ) − 1] it holds that the consecutive child nodes Ni , Ni+1 (of N in T λj (π, Π)) are also consecutive in T λj+1 (π, Π) (either in the same order, i.e., Ni , Ni+1 , or in the reversed order, i.e., Ni+1 , Ni ).

Corollary 2. Let ρ be a rearrangement for λ of type I , T , iT , or minimal T DRL. Then ρ is preserving for Π if and only if one of the following cases holds i) ρ = X is of type I where X is a prime-sibling with respect to π and Π or ρ = (X, Y ) is of type T , iT , or minimal T DRL where X and Y are prime-

siblings with respect to π and Π; ii) ρ = X is of type I , where X is a linear node in T λ (π, Π); iii) ρ = (X, Y ) is of type T , where X and Y are the only child nodes of a linear node X ∪ Y in T λ (π, Π); iv) ρ = (X, Y ) is of type iT , where Y is the first or last child of a linear node X ∪ Y in T λ (π, Π).

Proof: ⇒) Assume that ρ is a rearrangement for λ that is preserving for Π and of type I , T , iT , or T DRL. Since ρ is preserving T λ (π, Π) and T ρ◦λ (π, Π)

have the same nodes. Therefore ρ can change only the order of the child nodes of some nodes, change the sign of nodes, or add a sign to a node that was prime in T λ (π, Π) and becomes linear in T ρ◦λ (π, Π). If ρ is of type I and X contains a single element, then X is a leaf in T λ (π, Π) and the corollary holds. Otherwise, ρ changes the order of (at least) two elements and therefore changes the order of at least two child nodes of some node. Let N be a highest node in T λ (π, Π) for which the order of its child nodes is changed, i.e., the order of the child nodes of all predecessors of N is not changed. Let

3

N1 , . . . , Ndeg(N ) be the child nodes of N in T λ (π, Π) in that order. From the

possible types of rearrangements it can be seen that for all nodes N 0 which are not within the subtree with root N of T λ (π, Π) the order of the child nodes is not changed. Moreover, if ρ is of type T , iT , or minimal T DRL for each child node Ni , i ∈ [1 : deg(N )] one of the following cases holds: i) Ni ⊂ X , ii) Ni ⊂ Y , iii) Ni ∩ X = ∅ and Ni ∩ Y = ∅. Similarly, if ρ is of type I for each

child node Ni , i ∈ [1 : deg(N )] one of the following cases holds: Ni ⊂ X or Ni ∩ X = ∅. For I , T , and iT it holds that X is an interval and therefore it is of

the form X = Ni ∪ . . . ∪ Nj for 1 ≤ i ≤ j ≤ deg(N ). Similarly, for T , and iT it holds that Y is an interval and therefore it is of the form Y = Nk ∪ . . . ∪ Nl for 1 ≤ k ≤ l ≤ deg(N ) where either j < k or l < i holds. For minimal T DRL it

holds that X ∪Y is an interval and therefore it is of the form X ∪Y = Ni ∪. . .∪Nj for 1 ≤ i ≤ j ≤ deg(N ). Assume that N is a prime node. If ρ is of type I it follows that X is a prime sibling. Similarly, if ρ is of type T , or iT it follows that X and Y are prime siblings and if ρ is of type minimal T DRL then X ∪ Y is a prime sibling. Now assume that N is a linear node. Then by Corollary 1 consecutive child nodes of N in T λ (π, Π) are also consecutive in T ρ◦λ (π, Π). If ρ is an inversion, then for X = Ni ∪ . . . ∪ Nj it follows that i = 1 and j = deg(N ) must hold. The reason is that for i > 1 (j < deg(N )) the consecutiveness

on child nodes Ni−1 and Ni (respectively Nj and Nj+1 ) would be destroyed. If ρ = (X, Y ) is a transposition, then N must have exactly two child nodes and therefore either X = N1 and Y = N2 or X = N2 and Y = N1 holds. To see this assume deg(N ) ≥ 3. Consider X = Ni ∪ . . . ∪ Nj and Y = Nk ∪ . . . ∪ Nl for the case j < k (the case l < i can be proven analogously). Since ρ is a transposition Nj and Nk are consecutive, i.e., j = k − 1 must hold. Then, one of the following

cases holds j ≥ 2 or k < deg(N ). In the first case either Nj−1 6∈ X (which implies that ρ would destroy the consecutiveness of Nj−1 and Nj ) or Nj−1 ∈ X (which implies that ρ would destroy the consecutiveness of Nj and Nk ). The latter case can be shown analogously.

4

If ρ = (X, Y ) is an inverse transposition, then for X = Ni ∪ . . . ∪ Nj and Y = Nk ∪ . . . ∪ Nl either k = l = deg(N ) or 1 = k = l must hold, i.e., Y contains the elements of either only N1 or node Ndeg(N ) . To see this, first

consider the case j < k . Since ρ is an inverse transposition j = k − 1 must hold. Then k < l is not possible because the consecutiveness of Nj and Nk would be destroyed by ρ. Now, k = l < deg(N ) is not possible because ρ would destroy the consecutiveness of Nl and Nl+1 . Now 1 < i is not possible because ρ would destroy the consecutiveness of Ni−1 and Ni . Hence, i = 1, j = deg(N ) − 1 and k = l = deg(N ) must hold. Similarly, it can be shown for case l < i that 1 = k = l, i = 2 and j = deg(N ) must hold.

The case that ρ = (X, Y ) is a minimal T DRL with X ∪ Y = Ni ∪ . . . ∪ Nj for 1 ≤ i ≤ j ≤ deg(N ) remains. Assume that there exist i0 < i00 < i000 with Ni0 ⊂ X , Ni000 ⊂ X , and Ni00 ⊂ Y . Then ρ would destroy the consecutiveness of Ni0 and Ni00 . Similarly, there cannot exist i0 < i00 < i000 with Ni0 ⊂ Y , Ni000 ⊂ Y , and Ni00 ⊂ X . Thus, there must exist an i0 with i ≤ i0 < j and X = Ni ∪ . . . ∪ Ni0 and Y = Ni0 +1 ∪ . . . ∪ Nj . Hence, ρ = (X, Y ) is a transposition. As in the proof of (ii)

it follows that X and Y are the only child nodes of a node X ∪ Y in T λ (π, Π). ⇐) Theorem 1 shows that ρ is preserving if the following property (∗) holds:

linear nodes in T λ (π, Π) are linear in T ρ◦λ (π, Π). Let ρ be a rearrangement for which one of the cases (i) - (iv) holds. It remains to shows that property (∗) holds for ρ. If (i) holds, then ρ changes only the order of the child nodes of a prime node of T λ (π, Π). Hence property (∗) holds. It is not hard to show that in all the cases (ii) - (iv) property (∗) holds. In the following, a sequence S for λ and S ◦ λ of k rearrangements is denoted by (ρ1 , . . . , ρk ) and the desired sign is denoted by s. Further, the number of rearrangements within a sequence S is denoted by |S|. A sequence S 0 = (ρ01 , . . . , ρ0k ) for λ and S 0 ◦ λ is called equivalent to a sequence S = (ρ1 , . . . , ρk ) for λ and S ◦ λ if S and S 0 consists of exactly the same number of types of rearrangements and ρk ◦ . . . ◦ ρ1 ◦ λ = ρ0k ◦ . . . ◦ ρ01 ◦ λ.

5

Proposition 3. Let Π ⊆ sPn , π, λ ∈ sPn consistent with Π, and N be a linear node of T λ (π, Π). The set of possible rearrangements are all rearrangements of type I , T , iT that are preserving for Π with given positive weights ωI , ωT , respectively ωiT . Let s ∈ {+, −} be a given sign. Then the total

weight for a parsimonious (preserving) scenario S that transforms λ into a permutation S ◦ λ such that all nodes within the subtree with root N in T S◦λ (π, Π) are linear nodes with sign s is min(κ0 (N, s), κ2 (N, s)) if s = S(N )

and min(κ1 (N, s), κ3 (N, s)) otherwise, where κi (N, s), i = 0, . . . , 3 are defined within the proof. Proof: Let N1 , . . . , Ndeg(N ) be the child nodes of N . It follows from Corollary 2 cases (ii)-(iv) that only a few specific cases for rearrangements are possible since N is a linear node, e.g., an iT can only be an siT or a piT . Proposition 2 shows that it can be assumed that all rearrangements that act on N form a subsequence in S . Two types of rearrangements occur in S : a) rearrangements that act on nodes within one of the subtrees where a child node of N is the root, and b) rearrangements that act on N . Since the relative order of rearrangements that act on different nodes does not matter, we assume that all rearrangements of (a) are done first. Hence, S = Sa Sb , where Sa is a sequence of rearrangements in (a) and Sb is a sequence

of rearrangements in (b). Also, it can be assumed that after the application of Sa to λ each child node of N is linear and therefore has a sign. Before we consider what sequences Sb of rearrangements are possible, we assume without loss of generality that s = + holds. (Otherwise, + and − have to be exchanged in the formulas that are shown below.) We make the following observation: Sb cannot contain one of the following subsequences (I, I), (T, T ), (piT, siT ), and (siT, piT ) since in each case the application of the second rearrangement removes the effect of the first one, see figures S1(a)-S1(c). Case 1 - |Sb | = 0, i.e., Sb is empty: This case is possible only when N has sign s and each child node of N has sign s after the application of Sa = S ,

6

i.e., in T S◦λ (π, Π). The total weight for a parsimonious sequence that orders all nodes within the subtree with root N to s were no rearrangement acts on N is P ) κ0 (N, +) = deg(N κ(Ni , +). i=1 Case 2 - |Sb | = 1: As a direct consequence of Corollary 2 it can be seen that Figure S3 illustrates the only possible cases. It follows that the total weight for a parsimonious sequence S that orders all nodes within the subtree with root N to s were exactly one rearrangement acts on N is deg(N )

κ1 (N, +) = min{ωI +

X

κ(Ni , −),

i=1

ωT + κ(N1 , +) + κ(N2 , +),

\\only if deg(N ) = 2

deg(N )

ωiT + κ(N1 , +) +

X

κ(Ni , −),

i=2 deg(N )−1

ωiT +

X

κ(Ni , −), +κ(Ndeg(N ) , +)}.

i=1

Case 3 - |Sb | = 2: Sequences of applying one I and one iT (i.e., Sb = (siT, I), Sb = (I, piT ), Sb = (I, siT ), and Sb = (piT, I)) cannot be part of a parsimonious S by

the following argumentation. Assume that such an Sb is part of a parsimonious S , then the weight of Sb must be less than the weight of an I that acts on the

one child node with the sign − and changes its sign to + (see Figure S1(d) and Figure S1(e)). This would imply ωI + ωiT ≤ ωI , which is not possible since ωiT > 0.

Considering this and the observation made above the only remaining scenarios are: Sb = (I, T ) Sb = (T, I), Sb = (piT, piT ), Sb = (siT, siT ), Sb = (T, piT ), Sb = (siT, T ), and Sb = (T, siT ), Sb = (piT, T ). It follows that the total weight for

a parsimonious sequence S that orders all nodes within the subtree with root N to s were exactly two rearrangements acts on N is

7

κ2 (N, +) = min{ωI + ωT + κ(N1 , −) + κ(N2 , −),

\\only if deg(N ) = 2

deg(N )−1

2ωiT + κ(N1 , −) +

X

κ(Ni , +) + κ(Ndeg(N ) , −),

i=2

ωT + ωiT + κ(N1 , −) + κ(N2 , +), ωT + ωiT + κ(N1 , +) + κ(N2 , −)}.

\\only if deg(N ) = 2 \\only if deg(N ) = 2

Case 4 - |Sb | = 3: Consider first the case that sequence Sb contains a T . Then, Corollary 2 implies that N has only two child nodes. It is not hard to show that each combination of signs and order of the two child nodes of N can be sorted with one I and one T . Therefore, Sb cannot contain an I and a T (in addition to a third) rearrangement. It can also be shown that each combination of signs and order of the two child nodes can be sorted with one siT and one T or with one piT and one T if deg(N1 ) = 2 or deg(N2 ) = 2. A sequence with one T and two iT can be replaced

by a sequence that contains only one I , if ωI < ωT + 2ωiT . Therefore, a sequence with one T and two iT might be parsimonious when ωI > ωT + 2ωiT , deg(N1 ) > 2 and deg(N2 ) > 2 holds (see Figure S5).

The remaining possible sequences that contain at least one T are (T, siT, T ) and (T, piT, T ). Note that (T, siT, T ) (respectively (T, piT, T )) is equivalent to (T, T, piT ) (respectively (T, T, siT )). Therefore, it holds that (T, siT, T ) and (T, piT, T )

cannot be parsimonious, since a parsimonious scenario cannot contain two consecutive T rearrangements. Clearly, sequence Sb cannot end with a subsequence with 2 rearrangements that is not parsimonious. Therefore, Sb cannot end with one of the following subsequences (siT, I), (I, piT ), (I, siT ), and (piT, I). The sequences (piT, piT, piT ) and (siT, siT, siT ) (illustrated in Figure S1(f ) and Figure S1(g), respectively) cannot be parsimonious, since they can be replaced by the sequences (siT ) and (piT ) that have a smaller weight. Note that this holds since ωpiT = ωsiT = ωiT .

8

The only remaining sequences are: (I, piT, piT ), (I, siT, siT ), (piT, piT, I), and (siT, siT, I). It can easily be seen that the sequences (siT, siT ) and (piT, piT )

have the same effect. Consequently, the weight for a parsimonious sequence S that orders all nodes within the subtree with root N to + where exactly three rearrangements acts on N is κ3 (N, +) = ωT +2ωiT +κ(N1 , −)+κ(N2 , −) if deg(N ) = 2, deg(N1 ) > 2, deg(N2 ) > 2, and ωI > ωT + 2ωiT holds. Otherwise, κ3 (N, +) = ∞.

Case 5 - |Sb | ≥ 4: It is not hard to see that a parsimonious scenario with ≥ 4 preserving rearrangements must end with a sequence of three operations that is a parsimonious sequence. Hence, it can only end with (piT, piT, T ), (siT, siT, T ), (T, piT, piT ), (T, siT, siT ), (siT, T, piT ), or (piT, T, siT ). Note that (piT, piT, T )

(respectively (siT, siT, T ) and (siT, T, piT )) is equivalent to (T, piT, piT ) (respectively (T, siT, siT ) and (piT, T, siT )). Now consider all the scenarios with four rearrangements that have (piT, piT, T ), (siT, siT, T ), or (siT, T, piT ) at the end, i.e., (X, piT, piT, T ), (X, siT, siT, T ), and (X, siT, T, piT ), where X ∈ {T, I, piT, siT }.

All sequences that contain three iT cannot be parsimonious since either siT and piT occur subsequently or siT and piT occur subsequently after replacing (siT, siT ) (respectively (piT, piT )) with its equivalent sequence (piT, piT ) (respectively (siT, siT )). It is easy to see that (T, piT, piT, T ) is equivalent to (siT, siT, T, T ), that (T, siT, siT, T ) is equivalent to (piT, piT, T, T ), and that (T, siT, T, piT )

is equivalent to (piT, T, T, piT ). Therefore, sequences of Sb with four rearrangements that start with T cannot be parsimonious. Further, it is easy to verify that (I, piT, piT, T ) is equivalent to (piT, piT, I, T ) and that (I, siT, siT, T ) and (I, siT, T, piT ) are equivalent to (siT, siT, I, T ). Since the sequences (piT, I, T )

and (siT, I, T ) at the end are non-parsimonious the following holds. For all X ∈ {T, I, piT, siT } a scenario that contains one of the subsequences (X, piT, piT, T ), (X, siT, siT, T ), and (X, siT, T, piT ) cannot be parsimonious. Hence, no scenario

with four rearrangements (and therefore also no scenario with more than four rearrangements) that act on N is parsimonious.

9

siT piT − + + ... ... ... + + + + − − + + + I I − + + ... ... + + − − + ...+

(c)

(a)

I siT + − + ... ... − + + − − − + + ...+

T T + − + + + + + + +

(d)

(b)

I siT + − + ... ... ... + + − + − − + + + (e)

piT piT piT − + − + ... ... ... + − − − − + + − − − − + + + ...+ + (f )

siT siT siT − + − + ... ... ... − − − + − + + − + − − − + + ...+ + (g) Fig. S1. Examples of sequences of rearrangements that are not parsimonious. In the shown scenarios all rearrangements acts on a node N . It assumed that N is linear and its child nodes are linear or leaves (each node is represented by a box with the sign of the node) and the sequence of rearrangements transforms them into an order where all nodes have sign +. (a) (I, I); (b) (T, T ); (c) (piT, siT ); (d) (siT, I); (e) (I, siT ); (f) (piT, piT, piT ); and (g) (siT, siT, siT ).

Consequently,

P

ρ∈S

ω(ρ) = min{κ0 (N, s), κ1 (N, s), κ2 (N, s), κ3 (N, s)}. By Corol-

lary 1 any rearrangement that acts on N switches the sign of N . Therefore, for the application of an even (respectively odd) number of rearrangements it cannot hold that the sign of N in T S◦λ (π, Π) is s if S(N ) 6= s (respectively S(N ) = s). Hence, if i ∈ {1, 3} and S(N ) = s, then κi (N, s) = ∞, and if i ∈ {0, 2} and P S(N ) 6= s, then κi (N, −s) = ∞. Therefore, ρ∈S ω(ρ) = min(κ0 (N, s), κ2 (N, s)) if P s = S(N ) and ρ∈S ω(ρ) = min(κ1 (N, s), κ3 (N, s)) otherwise. All subsequences of rearrangements that act on a linear node of a SIT and might occur within a parsimonious preserving scenario S are shown in figures S3-S5.

10

S2

A DJUSTMENT

OF

G E R E -ILP [3]

In the following, the three adjustments of GeRe-ILP that have been made with respect to its description in [3] are presented in detail. Adjustment 1): For including T DRLs, we define for each l ∈ [1 : d] binary variables T Ll that determine if a T DRL takes place, i.e., T Ll = 1 if and only if ρ transforms πl into πl+1 and ρ is a T DRL. Since only one rearrangement can

transform πl into πl+1 , Formula (3) of [3] is replaced by ∀l : T Ll + Il + Tl + Ul = 1.

Assume a T DRL ρ = (X, Y ) transforms πl into πl+1 . Then, ρT DRL is encoded by binary variables F Sel such that F Sel = 0 if and only if for element e of πl it holds that e ∈ X . This is satisfied by the following formulas. ∀e, f, l : T Ll ∧ F Sel = F Sf l ⇒ Oef l = Oef l+1 ∧ Of el = Of el+1 ∀e, f, l : T Ll ∧ F Sel 6= F Sf l ⇒ Oef l 6= Oef l+1 ∧ Of el 6= Of el+1

The following formula guarantees that the sign of an element e cannot be changed by a T DRL. ∀e, l : T Ll ⇒ Sel = Sel+1

In order to guarantee that constrains of the ILP formulation do not contradict each other, it is important to relax some constraints in the case that T Ll = 1. This is guaranteed for all formulas (5) − (6) and (13) − (32) in [3] by replacing the corresponding formula of the type A ⇒ B with ¬T Ll ∧ A ⇒ B . Adjustment 2): Let N be prime node of a given SIT and let s be the desired sign. Then, algorithm GeRe-ILP calculates κ(N, s) by assigning signs to the elements of π|N , which represent the signs of the child nodes of N , such that the signed π|N can be sorted to ±ι with a parsimonious scenario. Therefore, GeRe-ILP uses the κ(Ni , s) of the child nodes N1 , . . . , Ndeg(N ) of N , which have been pre-computed by recursive function calls. Therefore, Formula (2) in [3] was removed for all Se0 . The parsimonious weight, which includes the weight

11

of the scenario acting on N plus the weight to realize the corresponding signs of the child nodes, is calculated by the following objective function that replaces Formula (33) in [3]. min

d X l=1

(T Ll ωT L + Il ωI + Tl ωT + Ul ωU ) +

|N | X

(1 − Si0 )κ(Ni , +1) + Si0 κ(Ni , −1)

i=1

Adjustment 3): The Gurobi solver ( [2]) provides the parameter “ TimeLimit” to easily include a time limit L to the optimization model.

12

S3

S UPPLEMENTARY F IGURES

13

1e+04

Runtime

1e+02

1e+00

1e−02 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

A

B

C

Set

Fig. S2. Run time to compute a solution for an instance in set A, B , and C .

I − + − ...− + ...+

T − + + + + +

siT − + + − ...− + ...+ +

piT − + − ...− + + + ...+

Fig. S3. Examples of rearrangements that act on a node N and transform N and its child nodes into an order where all nodes have sign +. It is assumed that N is linear and its child nodes are linear or leaves (each node is represented by a box with the sign of the node). (a) I ; (b) T ; (c) siT ; (d) piT .

14

T piT + − + + − − + + + I T + − + − − + + + +

piT T + − + − + + + + +

T I + − + − − − − + +

siT T + − + + − + + + +

(a)

siT T + − + − + + − + + (b)

piT piT + − + ... ... − + + − − − − + + + ...+ + siT siT + − + ... ... − + + − + − − − + + ...+ + (c) Fig. S4. Examples of sequences of two rearrangements of types I , T , or iT that act on a node N and transform N and its child nodes into an order where all nodes have sign +. It is assumed that N is linear and its child nodes are linear or leaves (each node is represented by a box with the sign of the node). (a) (I, T ) and (T, I); (b) (T, piT ), (piT, T ), (siT, T ), and (T, siT ); (c) (piT, piT ) and (siT, siT ).

piT piT T − + − + − − − + + + + +

T siT siT − + − + − − − − + − + +

siT siT T − + − + − − + − + + + +

siT T piT − + − + − − + − − + + +

T piT piT − + − + − − − − − + + +

piT siT T − + − + − − − + + − + +

Fig. S5. Examples of sequences of three rearrangements (one rearrangement of type T and two rearrangements of type iT ) that act on a node N and transform N and its child nodes into an order where all nodes have sign +. It is assumed that N is linear and its child nodes are linear or leaves (each node is represented by a box with the sign of the node).

15

0.4

9

Relative frequency

Length of scenario

0.5

6

0.3

0.2

0.1

3

0.0 CREx

CREx2

Algorithm

0

1

2

3

Additional length

Fig. S6. Comparison of CREx [1] and CREx2 for instances that have a linear SIT. Left: Box plots showing the scenario lengths obtained by algorithms CREx and CREx2. Right: Relative frequency of additional lengths of scenarios that are obtained by CREx.

16

Fig. S7. Normalized alignment scores of all mitochondrial protein coding genes for pairs of genomes and their corresponding number of rearrangements in the exact scenarios for set A. The title of the form a(x, y) denotes that Pearson’s correlation test gives a correlation coefficient of x (see regression line) for gene a and a t-test shows that the correlation is significantly not equal to 0 with a p-value smaller than y .

17

Fig. S8. Normalized alignment scores of all mitochondrial protein coding genes for pairs of genomes and their corresponding number of rearrangements in the possibly unexact scenarios for set B . Notation as in Figure S7.

18

R EFERENCES [1] Bernt, M., Merkle, D., Ramsch, K., Fritzsch, G., Perseke, M., Bernhard, D., Schlegel, M., Stadler, P.F., Middendorf, M.: CREx: Inferring genomic rearrangements based on common intervals. Bioinformatics 23, 2957–8 (2007) [2] Gurobi Optimization, Inc.: Gurobi optimizer reference manual (2015), http://www.gurobi.com [3] Hartmann, T., Wieseke, N., Sharan, R., Middendorf, M., Bernt, M.: Genome Rearrangement with ILP. IEEE ACM T Comput Bi (2017), in press