Design and Performance Evaluation of Sequence ... - Springer Link

2 downloads 0 Views 911KB Size Report
Engineering Program, University of Texas at Dallas, Richardson, TX 75083, U.S.A.. 3Department of Mathematics and Computer Science, Salisbury University, ...
Yang B, Chen J, Lu EY et al. Design and performance evaluation of sequence partition algorithms. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 23(5): 711–718 Sept. 2008

Design and Performance Evaluation of Sequence Partition Algorithms Bing Yang1 (杨

兵), Jing Chen2 (陈

菁), En-Yue Lu3 (吕恩月), and Si-Qing Zheng2,4 (郑斯清)

1

Cisco Systems, 2200 East President George Bush Highway, Richardson, TX 75082, U.S.A.

2

Telecom. Engineering Program, University of Texas at Dallas, Richardson, TX 75083, U.S.A.

3

Department of Mathematics and Computer Science, Salisbury University, Salisbury, MD 21801, U.S.A.

4

Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, U.S.A.

E-mail: [email protected]; [email protected]; [email protected]; [email protected] Revised August 5, 2008. Abstract Tradeoffs between time complexities and solution optimalities are important when selecting algorithms for an NP-hard problem in different applications. Also, the distinction between theoretical upper bound and actual solution optimality for realistic instances of an NP-hard problem is a factor in selecting algorithms in practice. We consider the problem of partitioning a sequence of n distinct numbers into minimum number of monotone jq(increasing or k decreasing)

subsequences. This problem is NP-hard and the number of monotone subsequences can reach 2n + 41 − 12 in the worst case. We introduce a newkalgorithm, the modified version of the Yehuda-Fogel algorithm, that computes a solution of no jq more than

2n +

1 4



1 2

monotone subsequences in O(n1.5 ) time. Then we perform a comparative experimental study

on three algorithms, a known approximation algorithm of approximation ratio 1.71 and time complexity O(n3 ), a known greedy algorithm of time complexity O(n1.5 log n), and our new modified Yehuda-Fogel algorithm. Our results show that the solutions computed by the greedy algorithm and the modified Yehuda-Fogel algorithm are close to that computed by the approximation algorithm even though the theoretical worst-case error bounds of these two algorithms are not proved to be within a constant time of the optimal solution. Our study indicates that for practical use the greedy algorithm and the modified Yehuda-Fogel algorithm can be good choices if the running time is a major concern. Keywords

1

monotone subsequence, permutation algorithm, NP-complete, approximation, complexity

Introduction

A subsequence of a sequence of distinct numbers is monotone if it is increasing or decreasing. Partitioning a sequence into monotone subsequences is a problem that has many applications. This problem has attracted attention for many years (e.g., [1–17]). As early as 1935, Erd¨ os and Szekeres[1] proved that every sequence of n √ numbers has a monotone subsequence of size d n e. In 1950, Dilworth proved his famous Dilworth Theorem, which says that for any sequence with elements in a partially ordered set, the size of a longest increasing (decreasing) subsequence is equal to the minimum number of decreasing (increasing) subsequences that the sequence can be partitioned into[12] . There were several extensions of Dilworth’s results on the partition problems on partially ordered sets, including [13–17]. Though the problem of partitioning a sequence into minimum number of increasing (decreasing) subsequences can be easily solved, the problem of Regular Paper

partitioning a sequence into minimum number of monotone subsequences is difficult. In 1984, Wagner[5] proved that this problem is NP-hard. In 1998, Yehuda and Fogel[2] gave an O(n1.5 )-time algorithm to parti√ tion a sequence into 2b n c monotone subsequences. In 2002, Fomin, Kratsch and Novelli[3] gave an O(n3 )-time approximation algorithm (which is named the FominKratsch-Novelli algorithm in this paper) of approximation ratio 1.71 for this problem. In [3], an O(N 1.5 log n)time greedy algorithm is also presented. It is shown that the solutions computed by this algorithm have solution values no more than ln n times the optimal solution values. A related topic is finding a tight upper bound for the minimum monotone subsequences that a sequence can be divided into. The results of Erd¨os et al.[1] √ and Dilworth’s[12] led to a bound of 2b nc. In 1986, Brandst¨ adt and Kratsch[4] gave a smaller bound of ¦ ¥q 2n + 41 − 12 and proved it is existentially tight. A

712

J. Comput. Sci. & Technol., Sept. 2008, Vol.23, No.5

generalized result was stated by Erd¨os et al.[6] in 1991. In this paper, we first present a new O(n1.5 )-time algorithm based on the speedup techniques of [2]. This algorithm, which is named the modified Yehuda-Fogel ¥q ¦ algorithm, uses the upper bound 2n + 14 − 21 as a heuristic and guarantees a solution of no more than ¥q ¦ 1 2n + 4 − 12 monotone subsequences. Noticing that the Fomin-Kratsch-Novelli algorithm, the greedy algorithm and the modified Yehuda-Fogel algorithm have different time complexities, we compare their performances in terms of solution optimality by running them on randomly generated data sets. Our experiments show that their actual performances are quite close. Since tradeoffs between time complexities and solution optimalities and the distinction between theoretical upper bound and actual solution optimality for realistic instances of an NP-hard problem are important when selecting algorithms for an NP-hard problem in different applications, our study provides evidence that for random inputs the partition algorithms of lower time complexities are desirable in time-critical applications. 2

element in P if there is no element in P dominating p. The set of all maximal elements in P is denoted as M (P ). Yehuda and Fogel defined a layer structure on P as L (P ) = (L1 (P ), L2 (P ), . . . , Ll (P )), where Li (P ), called the i-th-layer of P , is defined recursively as follows: L1 (P ) = M (P ), ³ ´ [ Li (P ) = M P − Lj (P ) . 16j p.x and q.y > p.y. An element p in P is called a maximal

Fig.1. Layer structure of π = (9, 5, 11, 4, 1, 3, 2, 10, 6, 7, 8).

Clearly, each layer Li (P ) is a decreasing subsequence. Yehuda and Fogel showed that |L (P )|, the number of layers, is the size of a longest increasing subsequence in P , and the layer structure of P can be computed in O(n log n) time[2] . We name such an algorithm the Layer-Construction algorithm. It is shown in [2] that, given L (P ) and an integer k 6 |L (P )|, an increasing subsequence of k elements from k consecutive layers of L (P ) can be found in O(|P |) time. We name such an algorithm the Increasing Subsequence algorithm. Let S be an increasing subsequence of P computed by the Increasing Subsequence algorithm. Yehuda and Fogel gave an algorithm that computes the layer structure L (P −S) of P −S from L (P ) in O(n+|S|2 ) time, where |P | 6 n. We name this algorithm the Layer Update algorithm.

Bing Yang et al.: Design and Performance Evaluation of Sequence Partition Algorithms

713

Now, we present a monotone subsequence partition algorithm based on the Yehuda-Fogel algorithm[2] with the improved solution optimality. We call our algorithm the modified Yehuda-Fogel algorithm. Input: a permutation π = (π(1), π(2), . . . , π(n)). Output: a list of jq monotone subsequences S k (S1 , S2 , . . . , Sk ) with k 6 2n + 14 − 21 .

=

1. Let S be an empty set for monotone subsequences. jq k 2. Let K := 2n + 14 − 12 , P := π and S := ∅. 3. Use the Layer Construction algorithm to compute L (P ) = (L1 , L2 , . . . , Ll ), the layer structure of P . 4. For k = K down to 1 by −1 do (a) Let l := |L (P )|. (b) If l 6 k then include Lj , 1 6 j 6 l, as SK−k+j , and go to Step 5. (c) If l > k then construct L (P 0 ) = (Ll−k+1 , Ll−k+2 , . . . , Ll ) from L (P ) = (L1 , L2 , . . . , Ll ), with P 0 being the set of elements in (Ll−k+1 , Ll−k+2 , . . . , Ll ). (d) Use the Increasing Subsequence to find a increasing subsequence D of P 0 and include D into S as SK−k+1 . (e) Use the Layer Update algorithm to get L (P 0 − D) = (L01 , . . . , L0k0 ). (f) Let P := P − D. If P = ∅ then go to Step 5. (g) For j = 1 to k0 do: rename L0j as Ll−k+j . (h) Let L (P ) := (L1 , L2 , . . . , Ll−k+k0 ). 5. Return S.

Let us go through the algorithm using the permutation in Fig.1. 1) At Step 1, S = {}. 2) At Step 2, calculate K = 4. 3) At Step 4, first round loop: k = 4. (a) As |L (P )| = 5 > 4, L (P 0 ) = (L2 , L3 , L4 , L5 ). Select D = (7, 6, 2, 1) from L (P 0 ), one element from each layer. Add S1 = D into S. Use Layer Update algorithm to get L (P 0 − D) = (L01 ). Let P := P − D and rename layer L01 to L2 . Then L (P ) = (L1 , L2 ), as shown in Fig.2. 4) Second round loop: k = 3. (a) As |L (P )| = 2 < 3, let S2 = L1 = (11, 10, 8), S3 = L2 = (9, 5, 4, 3). Add S2 , S3 into S. Exit loop. 5) S = (S1 , S2 , S3 ) is the result partition. Return S. Done.

The difference between the original Yehuda-Fogel algorithm and the modified Yehuda-Fogel algorithm lies in the selection of k in Step 4. In the original Yehuda√ Fogel algorithm, k is the fixed value of d n e. Instead of using a fixed number in Step 4, our modified algorithm

Fig.2. Layer structure of π = (9, 5, 11, 4, 3, 10, 8).

uses a flexible control mechanism to achieve a smaller number of resulting monotone subsequences, which is √ at least about 2 times smaller than the solution computed by the original Yehuda-Fogel algorithm. The correctness proof of the modified Yehuda-Fogel algorithm is similar to the correctness proof of the Yehuda-Fogel algorithm[2] . The time complexity of the modified Yehuda-Fogel algorithm is determined by the performance of Step 4. Each substep takes O(n) time, with only Step 4(e) requiring explanation. It takes q ¥ ¦ 1 O(n + k 2 ) = O(n) time, since k 6 2n + 4 − 12 . The ¥q ¦ for-loop of Step 4 runs for at most 2n + 14 − 12 = O(n0.5 ) iterations, resulting in total O(n1.5 ) time complexity of Step 4. In summary, we have the following claim. Theorem 2. Any permutation π in Π (n) can be q partitioned into at most 2n + 14 − 12 monotone subsequences by the modified Yehuda-Fogel algorithm in O(n1.5 ) time. 3

Average Performance

The problem of partitioning a sequence into minimum number of monotone subsequences was shown NP-hard. The Fomin-Kratsch-Novelli algorithm (F-KN for short), the greedy algorithm (Greedy for short), the Yehuda-Fogel algorithm, (Y-F for short) and the modified Yehuda-Fogel algorithm (M.Y-F for short) are approximation/heuristic algorithms for this problem. Their time complexities and solution optimalities (in terms of the worst-case theoretical ratios to P (π) of the obtained solution values and the optimal solution values), are summarized in the following table.

714

J. Comput. Sci. & Technol., Sept. 2008, Vol.23, No.5 Time

Provable Worst Ratio

F-K-N

O(n3 )

1+

Greedy Y-F

O(n1.5 log n) O(n1.5 )

M.Y-F

O(n1.5 )

ln(n) √ 2b nc/P (π) jq k 2n + 14 − 12 /P (π)

1 √ 2

≈ 1.71

There is an issue of tradeoffs between algorithm time complexities and the solution optimalities. For example, a natural question arises: what are the average performances of these algorithms? The answer to this question is important in determining time/optimality tradeoffs for practical uses. One way of answering this question is by theoretical analysis, which can be very difficult. Another way is by experiments, which is much easier and more meaningful for practical purposes. Evaluating the performances of approximation/heuristic algorithms by experiments can provide strong evidence on the effectiveness of these algorithms, and this approach has been widely used in practice. Since the modified Yehuda-Fogel algorithm always outperforms the Yehuda-Fogel algorithm, we conducted extensive experiments only on the FominKratsch-Novelli algorithm, the greedy algorithm and the modified Yehuda-Fogel algorithm. We implemented these three algorithms and applied them to randomly generated permutations (i.e., sequences) with size n equaling 50, 100, 500, 1000, 2000, 3000, and 4000, respectively. The numbers m of random permutations used are summarized in the following table. Since the algorithms take much longer time to execute when the permutation size increases, a smaller m is chosen for a larger n. n m

50 10000

100 10000

500 2000

1000 500

2000 200

3000 100

NA,Pm (n) =

m

Fig.3. NA,Pm (n) , mean number of monotone subsequences, with n = 50, 100, 500, and 1000.

4000 50

The effectiveness of our new modified Yehuda-Fogel algorithm is derived from comparing the average numbers of monotone subsequences computed by the three algorithms. Let Pm (n) = {Π1 (n), Π2 (n), . . . , Πm (n)} be a set of m random permutations of size n, and NA (Πi (n)) the number of monotone subsequences produced by applying algorithm A to permutation Πi (n). Define X NA (Πi (n)) Πi (n)∈Pm (n)

NF-K-N,Pm (n) , NGreedy,Pm (n) , and NM.Y-F,Pm (n) ) computed by the three algorithms with random permutations (sequences) of various sizes. Fig.5 shows the N performance differences N A,Pm (n) of the three algoF-K-N,Pm (n) rithms, using the average subsequence numbers computed by the Fomin-Kratsch-Novelli algorithm as basis (i.e., 100%).

.

That is, NA,Pm (n) is the average number of monotone subsequences obtained by applying algorithm A to the random permutations of Pm (n). Figs. 3 and 4 show the average numbers of monotone subsequences (i.e.,

Fig.4. NA,Pm (n) , mean number of monotone subsequences, with n = 2000, 3000, and 4000.

Figs. 3∼5 show that the average solution values computed by the modified Yehuda-Fogel algorithm are only about 10% larger than the average solution values computed by the other two algorithms (except n = 50). This indicates that, in average, the solutions computed by the modified Yehuda-Fogel algorithm within about

Bing Yang et al.: Design and Performance Evaluation of Sequence Partition Algorithms

1.71 × 1.10 = 1.88 times the optimal solutions. As expected the modified Yehuda-Fogel algorithm is the fastest among the three, but it produces less accurate results. If the running time is the major concern in some applications, the modified Yehuda-Fogel algorithm can be used.

715

Let

   1, if NF-K-N (Πi (n)) −NGreedy (Πi (n)) = k; OD k (Πi (n)) =   0, otherwise.

Define X ODR Pm (n) (k) =

Fig.5. Mean number of monotone subsequences compared with the mean number of monotone subsequences computed by the Fomin-Kratsch-Novelli algorithm.

Somewhat surprisingly, the average solution values computed by the greedy algorithm are about the same as (and for n = 50, even better than) the average solution values computed by the Fomin-Kratsch-Novelli algorithm, even though its O(n1.5 log n) time complexity is significantly lower than the O(n3 ) complexity of the Fomin-Kratsch-Novelli algorithm. 4

OD k (Πi (n))

Πi (n)∈Pm (n)

m

.

ODR Pm (n) (k) is the occurrence difference ratio of the difference of exactly k between the monotone subsequences obtained by applying the Fomin-KratschNovelli algorithm and the greedy algorithm to the set Pm (n) of m random permutations of size n. Subfigure (b) of each of Figs.6 to 12 displays the distribution of ODR Pm (n) (k). For example, Fig.6(b) shows that the number of subsequences computed by the greedy algorithm is at most 1 more than the number of subsequences computed by the Fomin-Kratsch-Novelli algorithm for n = 50 in our experiments; furthermore, the number of subsequences computed by the greedy algorithm is 1 less than the number of subsequences computed by the Fomin-Kratsch-Novelli algorithm in about 10% of tested instances. By comparing the subsequence numbers and subsequence number differences, we can easily see that the

Closer Look at Greedy Algorithm

Our experiment shows that not only the average performance, but the monotone subsequence numbers on individual permutations computed by the greedy algorithm are also very close to those computed by the Fomin-Kratsch-Novelli algorithm (which is a constant ratio approximation algorithm). Let ½ 1, if NA (Πi (n)) = k; OA,k (Πi (n)) = 0, otherwise. Define X OR A,Pm (n) (k) =

OA,k (Πi (n))

Πi (n)∈Pm (n)

m

.

OR A,Pm (n) (k) is the occurrence ratio of the exactly k monotone subsequences obtained by applying algorithm A to the set Pm (n) of m random permutations of size n. Subfigure (a) of each of Figs.6 to 12 compares the OR F-K-N,Pm (n) (k) and OR Greedy,Pm (n) (k) for the same Pm (n)s.

Fig.6. (a) OR A,Pm (n) (k). (b) ODR Pm (n) (k). (n = 50)

716

J. Comput. Sci. & Technol., Sept. 2008, Vol.23, No.5

Fomin-Kratsch-Novelli algorithm and the greedy algorithm compute roughly identical number of subsequences. Since the Fomin-Kratsch-Novelli algorithm has a constant approximation ratio, our experiments show strong evidence that the greedy algorithm is likely to be an approximation algorithm with a constant approximation ratio. Since the greedy algorithm is much faster than the Fomin-Kratsch-Novelli algorithm, the greedy algorithm should be given preference in realworld usage. 5

Concluding Remarks

In this paper we proposed a modified Yehuda-Fogel algorithm for q partitioning any sequence with size n ¥ ¦ into at most 2n + 14 − 12 monotone subsequences. This algorithm guarantees a solution better than the one computed by the original Yehuda-Fogel algorithm without taking more time. Then we compared this algorithm with other two known algorithms (the FominKratsch-Novelli algorithm and the greedy algorithm) of higher time complexities by experiments. Our experiments show that in average the modified Yehuda-Fogel algorithm computes comparable solutions. The input patterns, if any, for which the greedy algorithm and the modified Yehuda-Fogel algorithm compute results that are significantly worse than the Fomin-Kratsch-Novelli algorithm may be very rare. By experiments, we also compared the performances of the Fomin-Kratsch-Novelli algorithm and the greedy

Fig.8. (a) OR A,Pm (n) (k). (b) ODR Pm (n) (k). (n = 500)

Fig.9. (a) OR A,Pm (n) (k). (b) ODR Pm (n) (k). (n = 1000)

Fig.7. (a) OR A,Pm (n) (k). (b) ODR Pm (n) (k). (n = 100)

algorithm. Our experiments show that the solutions computed by these two algorithms are almost identical. Our study shows that for practical use both of the modified Yehuda-Fogel algorithm and the greedy algorithm are good choices. For time-critical applications that tolerate certain degree of inaccuracy, the modified

Bing Yang et al.: Design and Performance Evaluation of Sequence Partition Algorithms

Fig.10. (a) OR A,Pm (n) (k). (b) ODR Pm (n) (k). (n = 2000)

717

Fig.12. (a) OR A,Pm (n) (k). (b) ODR Pm (n) (k). (n = 4000)

question is affirmative. However, proving or disproving this conjecture remains to be an outstanding open problem. References

Fig.11. (a) OR A,Pm (n) (k). (b) ODR Pm (n) (k). (n = 3000)

Yehuda-Fogel algorithm performs better because of its lower time complexity. In [3], Fomin et al. posed a question of whether or not the greedy algorithm is an approximation algorithm with a constant approximation ratio. Our experiments show that it is very likely that the answer to this

[1] Erd¨ os P, Szekeres G. A combinatorial problem in geometry. Compositio Mathematica, 1935, 2: 463–470. [2] Yehuda R B, Fogel S. Partitioning a sequence into few monotone subsequences. Acta Informatica, 1998, 35(5): 421–440. [3] Fomin F V, Kratsch D, Novelli J. Approximating minimum cocolorings. Information Processing Letters, 2002, 84(5): 285–290. [4] Brandst¨ adt A, Kratsch D. On partitions of permutations into increasing and decreasing subsequences. Elektron. Inf. Verarb. Kybern., 1986, 22: 263–273. [5] Wagner K. Monotonic coverings of finite sets. Elektron. Inf. Verarb. Kybern., Dec. 1984, 20(12): 633–639. [6] Erd¨ os P, Gimbel J, Kratsch D. Some extremal results in cochromatic and dichromatic theory. Journal of Graph Theory, 1991, 15: 579–585. [7] Myers J S. The minimum number of monotone subsequences. Electronic Journal of Combinatorics, 2002, 9(2): R4. [8] Tracy C A, Widom H. On the distributions of the lengths of the longest monotone subsequences in random words. Probab. Theory Relat. Fields, 2001, 119: 350–380. [9] Siders R. Monotone subsequences in any dimension. Journal of Combinatorial Theory, Series A, 1999, 85(2): 243–253. [10] Matouˇsek J, Welzl E. Good splitters for counting points in triangles. In Proc. The 5th Ann. ACM Conf. Computational Geometry, 1989, pp.124–130. [11] Fredman M L. On computing the length of longest increasing subsequences. Discrete Mathematics, 1975, 11: 29–35. [12] Dilworth R P. A decomposition theorem for partially ordered sets. Annals of Mathematics, 1950, 51(1): 161–166.

718 [13] Frank A. On chain and antichain families of a partially ordered set. Journal of Combinatorial Theory, Series B, 1980, 29: 176–184. [14] Hoffman A J, Schwartz D E. On partitions of a partially ordered set. Journal of Combinatorial Theory, Series B, 1977, 23: 3–13. [15] Greene C, Kleitman D J. The structure of sperner k-families. Journal of Combinatorial Theory, Series A, 1976, 20: 41–68. [16] Greene C. Some partitions associated with a partially ordered set. Journal of Combinatorial Theory, Series A, 1976, 20: 69–79. [17] Greene C, Kleitman D J. Strong versions of sperner’s theorem. Journal of Combinatorial Theory, Series A, 1976, 20: 80–88.

Bing Yang received the B.S. degree in computer science from Nankai University, China, in 1993, the M.S. degree in mathematics from Southern Illinois University in 1996, the M.S. degree in computer science from Texas A&M University in 1998, and the Ph.D. degree from the University of Texas at Dallas in 2005. From 1998∼2006, he joined the telecommunication industry and worked at Ericsson Inc. and Cisco Systems. Currently, he works for financial industry and is a software programmer at InteractiveBrokers.com. His primary research interest is in networking algorithms. Jing Chen received the B.E. degree in computer engineering from the Southwest Jiaotong University, Chengdu, China, in 1996, and M.S. degree in telecommunications engineering from the University of Texas at Dallas, in 2006, where she is currently a Ph.D. candidate. Her research interests are in the areas of network architecture, scheduling policies, algorithms, performance analysis and quality-ofservice issues in high-speed networks. She received the Best Papaer Award at the 8th International Conference on Algorithms and Architectures for Parallel Processing in 2008.

J. Comput. Sci. & Technol., Sept. 2008, Vol.23, No.5 En-Yue Lu received the Ph.D. degree in computer science from the University of Texas at Dallas in 2004. She is currently an assistant professor in the Department of Mathematics and Computer Science at Salisbury University, Maryland. Dr. Lu’s main research interests include computer and communication networks, parallel processing and computing, algorithm design and analysis, computer architectures, and graph theory. She earned a Best Paper Award at the 14th IASTED International Conference on Parallel and Distributed Computing and Systems in 2002. Si-Qing Zheng received the Ph.D. degree from the University of California, Santa Barbara, in 1987. After being on the faculty of Louisiana State University for eleven years, he joined the University of Texas at Dallas in 1998, where he is currently a professor of computer science, computer engineering, and telecommunications engineering. Dr. Zheng’s research interests include algorithms, computer architectures, networks, parallel and distributed processing, telecommunications, and VLSI design. He has published about 250 papers in these areas. He served as the program committee chairman of numerous international conferences and editor of several professional journals.

Suggest Documents