Reconstruction of signed permutations from their distorted patterns Elena Konstantinova Sobolev Institute of Mathematics 630090 Novosibirsk, Russia Email: e
[email protected]
Abstract— The problem of reconstructing an unknown and unique signed permutations from their distorted patterns is considered in the paper. The set of signed permutations with the reversal metric is investigated. The reversal metric is defined as the minimal number of reversals (inversions of permutation intervals with replacing signs) which are needed to transform one permutation into another. It is proved that any unknown signed permutation on elements at least two is uniquely reconstructible from three distinct signed permutations at reversal distance at most one from the unknown signed permutation. The proposed approach is based on the investigation of structural properties of a certain graph constructed for this problem. In particular, it is proved that the considered graph does not contain triangles and pentagons and does contain quadrangles. We investigate the cases when two signed permutations are sufficient to determine an unknown singed permutation uniquely. It is also shown that in the case of at most two reversal errors the reconstruction of an unknown signed permutation needs in general many more erroneous patterns.
I. I NTRODUCTION The problem of efficient reconstruction of sequences was introduced and investigated in [1], [2], [3]. The sequences are considered as a vertex set V of a graph G = (V, E), where (x, y) ∈ E if there exist single errors of the type under consideration which transform x into y and y into x. The problem was solved for graphs corresponding to such errors as substitutions, transpositions, deletions and insertions of symbols which are of interest in coding theory. One of the metric problems which arises here is reconstructing an unknown vertex x from the minimal number of vertices of the metric to ball Br (x) of radius r centered at x ∈ V . It is reduced finding the value N (G, r) = maxx,y∈V,x=y |Br (x) Br (y)|, since N (G, r) + 1 is the minimal number of distinct vertices in a metric ball Br (x) of an unknown vertex x which are sufficient to reconstruct the vertex x under the condition that at most r single errors were happened. The reconstruction of permutations P ∈ Sn distorted by single reversal errors was considered in [4]. It was proved that for any n ≥ 3 the minimal number of distinct permutations in a metric ball B1 (P ) which are sufficient to reconstruct P equals 4. This problem is related to the problem of sorting permutations by reversals which comes from computational and molecular biology [5], [6]. Permutations as well as signed permutations are used for gene representations. In this paper the reconstruction of signed permutations distorted by reversal errors is considered [7]. For the set Sn± of
all 2n n! signed permutations on n elements we define a one to-one correspondence with the set S2n of permutations on 2n elements. We prove that any 3 distinct permutations from S2n allow to determine whether there exists a permutation P ∈ S2n such that these 3 permutations are obtained from P ∈ S2n by at most one reversal error and determine uniquely such a permutation if it exists. We also investigate the cases when 2 distinct permutations obtained from P ∈ S2n by at most one reversal error allow to reconstruct the unknown permutation P . We prove that a permutation P ∈ S2n is reconstructible from 2 permutations with probability p2 ∼ 13 as n → ∞ (under the conditions that these permutations are uniformly distributed). To get these results we investigate structural properties of the graph G2n = (S2n , E) under the reversal metric. The vertices of the considered graph correspond to permutations of a set of 2n elements, and two vertices are connected by an edge if one of the corresponding permutations can be obtained from the other by applying a single even reversal. In particular, we prove that G2n does not contain triangles, pentagons nor subgraphs isomorphic to K2,3 and we describe all its subgraphs isomorphic to K2,2 . Moreover we show that in the case of at most two reversal errors n(n + 1) distinct erroneous patterns are not sufficient in general to uniquely reconstruct an unknown signed permutation. II. R EVERSAL METRIC ON SIGNED PERMUTATIONS Let Sn be the set of all n! permutations (one-to-one mappings) on the set Nn = {1, 2, ..., n} for a fixed n, n ≥ 3. We represent a permutation P ∈ Sn as a vector P = (π1 , ..., πn ), where πi = P (i), i = 1, ..., n. We consider the set Sn± of all 2n n! signed permutations P σ = (σ1 π1 , ..., σn πn ) on n elements, where P = (π1 , ..., πn ) ∈ Sn and σ = (σ1 , ..., σn ), σi ∈ {+, −}, i = 1, ..., n. For example, P σ = (−2, +3, +4, −1, +6, −5) ∈ S6± is given by the pair of vectors P = (2, 3, 4, 1, 6, 5) and σ = (−, +, +, −, +, −). We consider the symmetric group S2n of permutations on 2n elements. For any fixed i, i = 1, ..., n, elements of the set {2i − 1, 2i} are called vicinal numbers. Let denote by S2n the subset of permutations P of S2n which transform any set of vicinal numbers to a set of vicinal numbers. This means that for any P ∈ S2n and i, i = 1, ..., n, there exists h, h = 1, ..., n, such that P (2i−1) = 2h−1 and P (2i) = 2h or P (2i − 1) = 2h and P (2i) = 2h − 1 and these h are different for different i.
Now we define a one-to-one correspondence between the set Sn± and the set S2n . For any signed permutation P σ ∈ Sn± we consider a permutation P ∈ S2n such that for i = 1, ..., n we have σi + 1 P (2i − 1) = 2P (i) − . (1) 2 σi − 1 P (2i) = 2P (i) + . (2) 2 Since for any σi = ±1, i = 1, ..., n, the numbers 2P (i)− σi2+1 and 2P (i) + σi2−1 are vicinal numbers and thus P ∈ S2n . Moreover, this mapping is a one-to-one mapping since (1) and (2) defines a pair of vectors P and σ in a unique way. We denote by P ◦Q the product of the permutations P, Q ∈ S2n , that is the permutation T = (P ◦ Q) ∈ S2n such that T (i) = P (Q(i)), i = 1, ..., 2n. Lemma 1: P ◦ Q ∈ S2n for any P , Q ∈ S2n .
It follows from Lemma 1 that S2n is a subgroup of the group S2n since I = I ∈ S2n , where I is the identity permutation of the length 2n. We denote by (P )−1 the inverse of P ∈ S2n −1 −1 and have P ◦(P ) = (P ) ◦P = I. For any T , P , Q ∈ S2n we also have
(T ◦ P ) ◦ Q = T ◦ (P ◦ Q ).
(3)
since S2n forms a subgroup of the group S2n . An even reversal of an interval [2i − 1, 2j] is a permutation R2i−1,2j = (1, 2, ..., 2i − 2, 2j, 2j − 1, ..., 2i, 2i − 1, 2j + 1, ..., 2n + 1) ∈ S2n , where 1 ≤ i ≤ j ≤ n. Note that R2i−1,2j ∈ S2n and R2i−1,2j ◦ R2i−1,2j = I and hence (R2i−1,2j )−1 = R2i−1,2j . The reversal distance d(P , Q ) between two permutations P and Q ∈ S2n is the minimal number d of even reversals needed to transform P to Q :
P ◦ R2i1 −1,2j1 ◦ · · · ◦ R2id −1,2jd = Q .
(5)
This means that S2n is the isometry group of the metric space {S2n , d(P , Q )}. In particular, we have d(P , Q ) = d(I , (P )−1 ◦ Q ) (but d(P , Q ) = d(I , Q ◦ (P )−1 ) is in general not true). Since there is a one–to–one correspondence between Sn± and S2n it follows from [8] that we have
max d(P , Q ) = n + 1.
(6)
P ,Q ∈S2n
Investigating the metric space {S2n , d(P , Q )} we identify the permutation P = (π1 , π2 , ..., π2i−1 , π2i , ..., π2n−1 , π2n ) ∈ S2n with the following permutation P on {0, 1, ..., 2n + 1}:
αi = αi (P ) = π2i+1 − π2i ,
P = (π0 , π1 , π2 , ..., π2i−1 , π2i , ..., π2n−1 , π2n , π2n+1 ) (7)
i = 0, ..., n.
(8)
We say that the permutation (7) has a breakpoint between positions 2i and 2i + 1, i ∈ [0, n], if |αi (P )| ≥ 2. Denote by b(P ) the number of breakpoints of P . Note that b(P ) = 0 if and only if P = I and b(P ) ≥ 2 when P = I. If (7) has b = b(P ) ≥ 2 breakpoints between positions 2ih and 2ih + 1, 0 ≤ ih ≤ n, h = 1, ..., b, then the interval [0, 2n + 1] is partitioned into b − 1 internal monotonicity intervals [2ih + 1, 2ih+1 ],
h = 1, ..., b − 1
consisting of an even number of successive integers in decreasing or increasing order and two external monotonicity intervals [0, 2i1 ] and [2ib + 1, 2n + 1] consisting of an odd number of successive integers in increasing order (the external intervals have the only one number when i1 = 0 or ib = n respectively). For any 1 ≤ i ≤ j ≤ n, we will denote the internal even intervals 2i − 1, 2i, ..., 2j − 1, 2j and 2j, 2j − 1, ..., 2i, 2i − 1 − → ← − of a permutation P ∈ S2n by i, j and i, j respectively. The −−→ external increasing odd intervals will be denoted by 0, i1 and − → ← − −−−−−−−−→ ib + 1, n + 1 respectively. We will also use i and i instead − → ← − − → −−−→ of i, j and i, j when i = j, and use 0 and n + 1 instead of −−−−−−−−→ −−→ 0, i1 and ib + 1, n + 1 when i1 = 0 and ib = n respectively. For such a permutation (7) with b = b(P ) ≥ 2 breakpoints we define the vector (the up– and down–sequence of breakpoint monotonicity)
β(P ) = (β1 , ..., βb ) where
βh =
(4)
The reversal distance satisfies the axioms of a metric. In particular, the symmetry d(P , Q ) = d(Q , P ) holds −1 since (R2i−1,2j ) = R2i−1,2j , (4) implies that P = Q ◦ R2id −1,2jd ◦· · ·◦R2i1 −1,2j1 , and the triangle inequality follows from (3) and (4). By the same formulas, for any T , P , Q ∈ S2n , we have d(T ◦ P , T ◦ Q ) = d(P , Q ).
where π0 = 0, π2n+1 = 2n + 1, and define
+ if − if
π2ih < π2ih +1 , π2ih > π2ih +1 .
h = 1, ..., b.
Note that 0 ≤ π2i1 < π2i1 +1 and π2ib < π2ib +1 ≤ 2n + 1, and hence β1 = βb = +. We also define the vector (the up– and down–sequence of monotonicity of internal intervals)
µ(P ) = (µ1 , ..., µb−1 ) where µh =
+ −
if π2ih+1 > π2ih +1 , if π2ih+1 < π2ih +1 .
h = 1, ..., b − 1.
Vectors β(P ) and µ(P ) are uniquely defined by the permutation P , and two permutations are distinct if at least one of these vectors is different. For example, for the permutation P σ = (+1, −3, −2, +4, −5) we have P = (0, 1, 2, 6, 5, 4, 3, 7, 8, 10, 9, 11) for which the interval [0, 11] is partitioned into 5 intervals [0, 2], [3, 6], [7, 8], [9, 10], [11], b(P ) = 4, β(P ) = (+, +, +, +), µ(P ) = (−, +, −). The permutation P is also represented as P = −→ ←− −→ ←−− − → (0, 2, 3, 6, 7, 8, 9, 10, 11). To estimate the change of the number of breakpoints as a result of applying an even reversal R2i−1,2j to P ∈ S2n we
introduce one additional function of two integer variables x and y: 1 if |x − y| ≥ 2, δ(x, y) = 0 if |x − y| ≤ 1.
Lemma 2: Let P ∈ S2n and 1 ≤ i ≤ j ≤ n. Then b(P ◦ R2i−1,2j ) − b(P ) = δ(π2i−2 , π2j ) + δ(π2i−1 , π2j+1 ) − δ(π2i−2 , π2i−1 ) − δ(π2j , π2j+1 ).
Lemma 3: Let P = (π0 , π1 , ..., π2n , π2n+1 ) ∈ S2n , 1 ≤ i ≤ j ≤ n. Then b(P ◦ R2i−1,2j ) = b(P ) + 1 if and only if αi−1 = −αj and either π2j = π2i−1 − 2αi−1 or π2j = π2i−1 + 2αi−1 ; otherwise b(P ◦ R2i−1,2j ) = b(P ) + 2.
Lemma 4: If |αi−1 | = |αj | = 1, where 1 ≤ i ≤ j ≤ n, then b(R2k−1,2l ◦ R2i−1,2j ) = 4.
For the metric space {S2n , d(P , Q )} of permutations with the reversal distance we define the graph G2n = (V, E) where V = S2n and (P , Q ) ∈ E if and only if there exists an even reversal R2i−1,2j such that Q = P ◦ R2i−1,2j (and hence P = Q ◦ R2i−1,2j ). The graph G2n = (V, E) does not contain loops and parallel edges, since P ◦ R2i−1,2j = P −1 and Q = P ◦ R2i−1,2j implies R2i−1,2j = (P ) ◦ Q . The path distance between vertices P and Q of G2n coincides with the reversal distance d(P , Q ) and the diameter of G2n equals n + 1, by (6). Consider the partition of the vertex set V = S2n into n + 2 subsets
Br (P ) = {Q : Q ∈ S2n , d(P , Q ) ≤ r}
(10)
Vi+1 (P ) = {Q : Q ∈ Vi+1 , d(P , Q ) = 1}, 0 ≤ i ≤ n,
for which µ(Q ) = (+, −); 4. U4 (k, l) with k + 1 ≤ i ≤ l, j = l; U4 (k, l) consists of −−−−→ ←−−−−−−−−−− −−−−−−−→ −−−−−−−−−−→ Q = (0, k − 1, l + k − i + 1, l, k, l + k − i, l + 1, ..., n + 1)
of radius r with center in P does not depend on P and equals r i=0 |Vi |. To investigate this structure, for any P ∈ Vi , we define
for which µ(Q ) = (−, +); 3. U3 (k, l) with 1 ≤ i ≤ k − 1, j = l; U3 (k, l) consists of −−−−→ −→ ←−−−− −−−−−−−→ Q = (0, i − 1, k, l, i, k − 1, l + 1, n + 1)
for which µ(Q ) = (−, +); 5. U5 (k, l) with i = l + 1, l + 1 ≤ j ≤ n; U5 (k, l) consists
(9)
The partition (9) allows to consider the graph G2n as a ranked partial ordered set (poset) where P ∈ Vi precedes Q ∈ Vj if and only if i ≤ j and there exists a path in G2n of length j − i connected P with Q (edges between vertices of the same Vi are not essential for the definition of this poset). By the property (5) we have an isomorphic poset if an arbitrary permutation T ∈ S2n is used in the definition (9), instead of I. It also follows that the size of the metric ball
for which µ(Q ) = (+, −); 2. U2 (k, l) with i = k, l + 1 ≤ j ≤ n; U2 (k, l) consists of −−−−→ ←−−−− −→ −−−−−−−−→ Q = (0, k − 1, l + 1, j, k, l, j + 1, n + 1)
Vi = {P : P ∈ S2n , d(P , I) = i}, i = 0, ..., n + 1.
Lemma 5: The set {Q : Q ∈ V2 (R2k−1,2l ), b(Q ) = 3} consists of disjoint sets Uh (k, l), h = 1, ..., 6, of permutations Q = R2k−1,2l ◦ R2i−1,2j defined as follows: 1.U1 (k, l) with i = k, k ≤ j ≤ l − 1; U1 (k, l) consists of −−−−→ −−−−−−−→ ←−−−−−−−−−− −−−−−−−→ Q = (0, k − 1, l + k − j, l, k, l + k − j − 1, l + 1, n + 1)
III. G RAPH REPRESENTATION OF THE METRIC SPACE
2, for any Q ∈ V2 (R2k−1,2l ), b(Q ) can equal two (we shall see that this case is not possible), three, or four. First we show that when the conditions of Lemma 3 are satisfied for P = R2k−1,2l , 1 ≤ k ≤ l ≤ n, then b(R2k−1,2l ◦ R2i−1,2j ) = 3 and hence b(R2k−1,2l ◦ R2i−1,2j ) = 4.
Vi−1 (P ) = {Q : Q ∈ Vi−1 , d(P , Q ) = 1}, 1 ≤ i ≤ n + 1. A ranked poset is called regular if for every i we have that |Vi+1 (P )| and |Vi−1 (P )| does not depend on P ∈ Vi for all i. For the problems of reconstruction of sequences in the case of errors which are substitutions or transpositions of symbols the corresponding posets are regular [2]. The difficulties of solving similar problems for the reversal errors are explained by the fact that it is not a regular poset. Nevertheless the graph G2n is regular and we have V1 = V1 (I) = {R2i−1,2j , 1 ≤ + n = (n+1)n . i ≤ j ≤ n}, |V1 | = (n−1)n 2 2 Now we fix k and l, 1 ≤ k ≤ l ≤ n, and investigate V2 (R2k−1,2l ). Note that any R2k−1,2l ∈ V1 and by Lemma
of
−−−−→ ←− ←−−−− −−−−−−−−→ Q = (0, k − 1, k, l, l + 1, j, j + 1, n + 1)
for which µ(Q ) = (−, −); 6. U6 (k, l) with 1 ≤ i ≤ k − 1, j = k − 1; U6 (k, l) consists of
−−−−→ ←−−−− ←− −−−−−−−→ Q = (0, i − 1, i, k − 1, k, l, l + 1, n + 1)
for which µ(Q ) = (−, −).
Corollary 1: Any permutation Q = R2k−1,2l ◦ R2i−1,2j where R2k−1,2l = R2i−1,2j has 3 or 4 breakpoints.
Lemma 6: The set {Q : Q ∈ V2 (R2k−1,2l ), b(Q ) = 4} consists of disjoint sets Wh (k, l), h = 1, ..., 4, of permutations Q = R2k−1,2l ◦ R2i−1,2j defined as follows: 1. W1 (k, l) with k + 1 ≤ l + 1 < i ≤ j or i + 1 ≤ j + 1 < k ≤ l; if Q ∈ W1 (k, l), then β(Q ) = (+, +, +, +), µ(Q ) = (−, +, −);
2. W2 (k, l) with k < i ≤ j < l or i < k ≤ l < j; if Q ∈ W2 (k, l), then β(Q ) = (+, −, −, +), µ(Q ) = (−, +, −);
3. W3 (k, l) with i ≤ k − 1 < j < l; if Q ∈ W3 (k, l), then β(Q ) = (+, −, +, +), µ(Q ) = (+, −, −);
4. W4 (k, l) with k < i < l + 1 ≤ j; if Q ∈ W4 (k, l), then β(Q ) = (+, +, −, +), µ(Q ) = (−, −, +). Lemmas 5 and 6 determine all sets V2 (R2k−1,2l ). Our next goal is to find the number of edges connecting V1 to a fixed
vertex Q ∈ V2 , that is to find |V1 (Q )|. We consider the lexicographic ordering on the permutations R2k−1,2l , 1 ≤ k ≤ l ≤ n, assuming that R2k−1,2l < R2k −1,2l if k < k or k = k and l < l . Any pair of edges (R2k−1,2l , Q ) and (R2k −1,2l , Q ) of the graph G2n implies the following expression for Q : R2k−1,2l ◦ R2i−1,2j = R2k −1,2l ◦ R2i −1,2j
(11)
where R2i−1,2j = R2k−1,2l ◦ Q and R2i −1,2j = R2k −1,2l ◦ Q (see (3)). We shall say that (11) is a representation of Q if R2k−1,2l < R2k −1,2l and that (11) is a minimal representation of Q if R2k−1,2l is the minimal in the lexi cographic order permutation of V1 (Q ). Note that if Q has the only representation, then this representation is minimal and |V1 (Q )| = 2. In a general case, if Q has h, h = 0, 1, ..., minimal representations, then |V1 (Q )| = h + 1.
Corollary 3: Any P ∈ S2n is reconstructible from any 3 distinct permutations in B1 (P ). We denote by ti the number of sets of i distinct permutations in B1 (P ) from which P is reconstructible. We also denote by N the number of permutations being at the reversal distance at most one from the permutation P . It follows from (10) n(n+1) that N = |B1 (P )| = + 1. Then pi = tNi is the 2 (i) probability of the event that a permutation P is reconstructible from i distinct permutations in B1 (P ) under the condition that these permutations are uniformly distributed. It is evident that p1 = 0 and p3 = 1, i.e., we never can reconstruct any permutation P from a single permutation in B1 (P ) and we can always reconstruct an arbitrary permutation P from 3 distinct permutations in B1 (P ). Theorem 3: p2 ∼
1 3
as n → ∞.
Lemma 7: Given a permutation Q ∈ V2 let k, 1 ≤ k ≤ n, be the minimal integer such that R2k−1,2s ∈ V1 (Q ) for some s, k ≤ s ≤ n. Then 1. |V1 (Q )| = 2, if Q has one of the following representations: 1.1. R2k−1,2l ◦R2k−1,2j = R2(k+l−j)−1,2l ◦R2k−1,2l , where k ≤ j ≤ l − 1; 1.2. R2k−1,2l ◦R2k−1,2j = R2k−1,2j ◦R2(k+j−l)−1,2j , where l + 1 ≤ j ≤ n; 1.3. R2k−1,2l ◦ R2i−1,2j = R2i−1,2j ◦ R2k−1,2l , where k ≤ l < i ≤ j or k < i ≤ j < l; 2. |V1 (Q )| = 1 in the remaining cases.
By using Theorems 2 and 3 it is easy to realize a reconstruction algorithm for the cases when 3 or 2 distinct signed permutations are sufficient to reconstruct an unknown signed permutation. The similar algorithm for unsigned permutations was presented in [4]. The reconstruction of a signed permutation in the case of at most two reversal errors requires in general many more than its distinct erroneous patterns.
We denote by Km,h a complete bipartite graph whose parts consist of m and h vertices, 1 ≤ m ≤ h.
V. ACKNOWLEDGMENT
Theorem 1: The graph G2n , n ≥ 2, does not contain triangles, pentagons nor subgraphs isomorphic to K2,3 ; each 1 (n − 1)n(n + 1)(n + 4) isomorphic to vertex belongs to 12 subgraphs K2,2 . Corollary 2: |V2 | =
(n − 1)n(n + 1)2 . 6
IV. T HE MINIMAL NUMBER OF DISTORTED PATTERNS SUFFICIENT FOR RECONSTRUCTION
In this section we present the minimal number of distorted patterns of an unknown permutation P ∈ S2n , which are sufficient for its reconstruction. First of all, we prove the following Theorem 2: For any n ≥ 2 we have
max
P ,Q ∈S2n ,P =Q
|B1 (Q ) ∩ B1 (P )| = 2.
We say that a permutation P ∈ S2n is reconstructible from distinct permutations S1 , ..., Sh ∈ B1 (P ), if there does not exist a permutation Q ∈ S2n , Q = P , such that {S1 , ..., Sh } ∈ B1 (Q ). From this definition and Theorem 2 follows
Corollary 4: For any n ≥ 2 we have
max
P ,Q ∈S2n ,P =Q
|B2 (Q ) ∩ B2 (P )| ≥ n(n + 1).
The author cordially thanks Prof. Alphonse Baartmans and Prof. Vladimir Tonchev for the invitation to visit Michigan Technological University in summer 2004, where the presented paper was written in the general under the NSF grant, entitled “U.S.–Russian Collaborative Research”, Award No. CCR– 0310632 and partially under the RFBR grant 03–01–00796. R EFERENCES [1] V. I. Levenshtein, “Reconstructing objects from a minimal number of distorted patterns,” Doklady Mathematics, vol. 55, pp. 417–420, 1997. [2] V. I. Levenshtein, “Efficient reconstruction of sequences,” IEEE Trans. Inform. Theory, vol. 47, pp. 2–22, 2001. [3] V. I. Levenshtein, “Efficient reconstruction of sequences from their subsequences or supersequences,” Journal of Combin. Theory, Ser. A, vol. 93 pp. 310–332, 2001. [4] E. V. Konstantinova, “Reconstruction of permutations distorted by single reversal errors”, Abstracts of talks of the 2004 IEEE International Symposium on Information Theory ISIT-2004, Chicago, June 27 – July 2, 2004, 451. [5] P. A. Pevzner, Computational molecular biology: an algorithmic approach, The MIT Press, Cambridge, MA, 2000. [6] D. Sankoff and N. El-Mabrouk, Genome rearrangement, In: Current topics in computational molecular biology, Eds.: T. Jiang, T. Smith, Y. Xu and M. Q. Zhang, MIT Press, 2002. [7] E. V. Konstantinova, “On reconstruction of signed permutations distorted by reversal errors”, in preparation [8] D. E. Knuth, Sorting and Searching, Vol. 3 of The Art of Computer Programming, Addison–Wesley, Reading, Massachusetts, second edition, 1998.