IEEE CALCON2014
A Review Report on Divide and Conquer Sorting Algorithm Smita Paira, Sourabh Chandra, Sk Safikul Alam, and Partha Sarthi Dey
maintain and manage the Stack The Divide and Conquer approach has been implemented in many conventional sorting algorithms like Merge Sort, Quick Sort, Heap Sort, Radix Sort, etc. They usually follow a time complexity of O(n log2 n). Apart from these, many researchers have studied and formulated new algorithms, based on this approach, which has applications in diverse fields. We have compared the efficiency of the conventional and other newly proposed Divide and Conquer sorting algorithms in the next sections.
Abstract— Proper management of data in a database system requires special operations like insertion, deletion, sorting, searching, traversing etc. Sorting organizes the data in a particular order which makes other operations easier. Hence a better sorting algorithm increases the efficiency of each of the subsequent operations. Among various Sorting Techniques, Divide and Conquer algorithms hold promise since most of them may put less burden both in terms of memory in use (Space) as well as processor time. Like all Sorting methods Divide and Conquer algorithms follow both Iterative as well as Recursive approaches. In this paper, we have surveyed some of the conventional and newly proposed Divide and Conquer sorting algorithms. We have also analyzed the efficiency of such algorithms with respect to time and space complexities. Index Terms— DBMS, Divide and Conquer, Greedy, Heap Sort, Hierarchical Bubble Sorting, Linear Time Suffix Array, Merge Sort, Nested Grid Files, non-Manhattan Channel Routing, Quick Sort, Radix Sort, Sorting, space complexity, time complexity.
I. INTRODUCTION
R
esearches in various architectural and database management systems have given rise to various sorting algorithms among which Divide and Conquer techniques holds the most promise in terms of both Space complexity as well as Time complexity . Sorting can be done in iterative as well as recursive way. The iterative approach makes repeated passes over some portions of the list of elements being sorted which in the worst case may lead to scanning almost the entire list of n elements. Therefore, the worst case time complexity is slightly high, usually O(n2). However the memory used (Space complexity) do not require additional overheads in terms of Stack management.
Fig. 1. Classification of various sorting algorithms
II. COMPARISON STUDY FOR CONVENTIONAL DIVIDE AND CONQUER SORTING ALGORITHMS In this section, we have made a comparison table of the various conventional Divide and Conquer sorting algorithms. The sorting algorithms compared here are Quick Sort, Merge Sort, Heap Sort and Radix/Bucket Sort. TABLE I COMPARISON TABLE FOR CONVENTIONAL DIVIDE AND CONQUER SORTING ALGORITHMS
The recursive methods, on the other hand, are mainly based on Inductive steps which can easily be formulated in terms of Divide and Conquer approach. These methods, compared to the iterative sorting algorithms, though elegant in terms of reduced number of steps needed to sort the list of n elements, may put additional burden in terms overhead required to
Sorting Properties
Best Case Time Average Case Complexity Worst Case Space Complexity Weakness Applications
Smita Paira, student, is with Calcutta Institute of Technology, Uluberia, Howrah (e-mail:
[email protected]). Sourabh Chandra, assistant professor, is with Calcutta Institute of Technology, Uluberia, Howrah (e-mail:
[email protected]). Sk Safikul Alam, assistant professor, is with Calcutta Institute of Technology, Uluberia, Howrah (e-mail:
[email protected]). Partha Sarthi Dey, professor, is with Indian Institute of Technology, Kharagpur (e-mail:
[email protected]).
ISBN: 978-93-833-0383-0 ©IEEE Kolkata Section
Sorting Properties
35
Quick Sort It has better locality of reference and is easily parallelized. It is inherently recursive, with less cache misses and no dynamic spaced allocations. O (n log2 n) O (n log2 n) O (n2) O (log2 n) Becomes slow for large arrays[12] Widely used in modern architecture and in graphics processor [14].
Merge Sort It has reasonable locality and requires
IEEE CALCON2014
Best Case Time Average Case Complexity Worst Case Space Complexity Weakness Applications
dynamic allocations. It can be parallelized. Inherently recursive. Worst case performance is best[13]. O (n log2 n) O (n log2 n) O (n log2 n) O (n) Requires more auxiliary space [11]. It can be adopted for external sorting, like arranging records in a magnetic tape.
Sorting
Heap Sort
Best Case Time Average Case Complexity Worst Case Space Complexity Weakness Applications
It does not have good locality of reference but requires dynamic allocations. It cannot be parallelized. O (n log2 n) O (n log2 n) O (n log2 n) O(1) It is unstable. Used in interval scheduling. Because of unstable nature, it does not find much application compared to other sorting algorithms [16].
Properties
Sorting Properties
Best Case Time Average Case Complexity Worst Case Space Complexity
Weakness
Applications
have a linear time complexity of O (n) and require no extra space. J. –T. Yan [8] surveyed a Hierarchical Bubble Sorting nonManhattan Channel Routing problem and proposed an alternate algorithm with a complexity of O (hn) where n is number of terminals and h is number of routing tracks. It represents sequences of left-swap and right-swap passes. It follows a top-down approach where the swap operations are mapped to a particular track.Juan Lopezand Emilio L. Zapata [5] presented a unified architecture, based on Divide and Conquer, for Tridiagonal System[3][2] Solvers. They presented the architecture through three different algorithms which are CGCR, CGSD and CGRD. The data is partitioned in a natural manner and is based on perfect unshuffle permutation. The CGCR is considered to be the most stable and appropriate one compared to CGSD and CGRD since the space and time requirements as well as the hardware cost is less in case of CGCR. K. C. Tan et al. [9] presented Cooperative Coevolutionary Algorithm for the purpose of multiobjective optimization in Coevolutionary architectures. In this method,the decision vectors are divided into small components provides multiple solutions in terms of cooperative subpopulations. The simulation time decreases with increasing number of peers. Yuke Wang and Mostafa Abd-El-Barr [7] proposed a new algorithm for Residue Number System (RNS), based on Divide and Conquer. The algorithm decomposes a set of moduli into numerous groups of two moduli each. Then, a particular element of each group is being decoded. The time complexity of such an algorithm is O(log n) and requires low size RAM compared to other existing algorithms.
Radix Sort It is hard to generalize, compared to other algorithms. Fixed key size along with a standard way to break is necessary [17]. It works faster than Quick Sort for large arrays. O (n2) O (n log2 n) O (n log2 n) O (m+n) or O (mn) where m is the space required to hold the radices and n is the number of elements. Its efficiency deteriorates when the number of digits become less than O (log2 n) [15] It allows the synchronization of various nodes in a multiprocessor.
IV. COMPARISON STUDY OF NEWLY PROPOSED DIVIDE AND CONQUER SORTING ALGORITHM TABLE II COMPARISON TABLE FOR NEWLY PROPOSED DIVIDE AND CONQUER SORTING ALGORITHMS
III. LITERATURE SURVEY ON THE NEWLY PROPOSED DIVIDE AND CONQUER SORTING ALGORITHMS
Method: Merge join parallel sorting algorithm[4] Properties
Joel L. Wolf et al. [4] proposed a merge join parallel sorting algorithm to remove the data skew problem which the conventional parallel sort algorithms suffer from. In this method, the sort phase is followed by a scheduling phase to balance the load which the multiple processors suffer from during join phase. Ellis Horowitz and Alessandro Zorat [1] formulated a divide and conquer paradigm for parallel processing to increase the computational speed. The processes remain active at any time. If the number of processors increases with number of elements, N the time complexity is reduced to O (N). Thomas A.Mueck and Manfred J. Schauer [6] presented two order query execution algorithms for both Balanced and Nested Grid files that uses greedy and Divide and Conquer approaches respectively. They reduce the length of read and write sequences and pass the disk access plans to DBMS query processor for further execution. It has a processing complexity of O (nh). Ge Nong et al. [10] proposed two algorithms for Linear Time Suffix array construction. They
Efficiency
Weaknesses Applications
It removes the data skew problem which the conventional parallel sort algorithms suffer from. The sort phase is followed by a scheduling phase to balance the load. Very good load balancing capacity, robust, less overheads. The number of processes is small for small number of processors and vice versa. It works well up to 128 processors CPU, I/O bound environments, etc.
Method: Order Query Execution algorithms for both Balanced and Nested Grid Files[6] Properties
Efficiency
Weaknesses Applications
36
The two algorithms use Greedy and Divide and Conquer approaches respectively. It reduces the read/write sequences and passes the disk access plans to DBMS query processor for further execution. Divide and Conquer approach requires less number of I/O operations and buffer gates. Storage consumption is less. Worst case run time for construction phase is O(nh) and processing phase is O(n+th) If the buffer page pool is not controlled by the query processor, then the algorithm fails. Spatial and commercial DBMS that rely on B-Tree or single key hash structures for index maintenance.
IEEE CALCON2014
Weaknesses
Method: Algorithm against Hierarchical Bubble Sorting Based non-Manhattan Channel Routing problem[8] Properties
Efficiency Weaknesses Applications
Applications
It performs the operation by dividing the overall pass into left swap pass and right swap pass which are further subdivided respectively. The top to bottom swap operations are mapped into a single track. It has a complexity of O(hn) where n is the number of terminals and h is number of routing tracks It requires more space. VLSI design automation
Efficiency
Weaknesses Applications
Sorting has various real time implementations. It helps in statistical record management so that a quick search can be performed for accessing any particular data from the record. The various conventional algorithms allow this quick access but fails in case of large lists. The usual time complexities of such algorithms is O(n). Hence, in case of large array, the numbers of steps for sorting is very large and the complexity thus increases. The Divide and Conquer sorting approaches solve this problem as they follow an iterative as well as recursive path while handling the data. They usually sorts the data with a complexity of O(n log2 n) which is very less compared to the traditional iterative methods. In this brief, we have discussed the conventional as well as newly proposed sorting algorithms based on Divide and Conquer. From table 2, it is found that the RNS decoding algorithm is highly efficient among other algorithms. It requires less space in RAM and has a time complexity of O (n). The merge join parallel sorting algorithm is a new concept but it can work up to 128 processors. Yet it requires fewer steps and has a good load balancing factor. The Divide and Conquer sorting algorithms has a wide range of applications in DBMS, in various record keepings like telephone directories, dictionaries, bank directories, etc.
It combines the dynamically express parallelism with the principle of Divide and Conquer approach. The amount of parallelism depends on the processed data and varies when the program is being executed The different instantiations of recursive procedures need not be kept track by the writer. If the number of processors, P increases with number of elements, n the time complexity is reduced to O (n). The process is highly efficient only if large number of processors is available. Parallel Processors
Method: Cooperative Coevolutionary Algorithm (CCEA) for multiobjective optimization[9] Properties
Efficiency
Weaknesses Applications
It divides the decision vectors into small components, provides multiple solutions in terms of cooperative subpopulations CCEA can maintain archive diversity. Distributed CCEA is suitable for con current processing and reduces the simulation time as the number of peer increases As the number of peers increases the communication cost increases. Coevolutionary architectures
Method: Algorithms for Linear Time Suffix array construction[10] Properties Efficiency Weaknesses
Applications
References
It uses variable length Leftmost S-type substrings and fixed length d-critical substring. It does not require any extra physical space and has a time complexity of O (n). If S-type and L-type characters are distributed randomly in the string then the reduction ratio does not increase more than 1/3 For data retrieving, processing, storing, etc.
Method: Unified architecture for Tridiagonal System Solvers[5] Properties
Efficiency
Weaknesses Applications
The data is partitioned in a natural manner and is based on perfect unshuffle permutation. The data flows are presented through Recursive Doubling, Parallel Cyclic Reduction and Successive Doubling methods. The CGRD algorithm does not include division operations in evaluation. The communication cost, space and time complexities of CGCR are less compared to CGSD and CGRD. It is highly stable. Pipelining can only be done for cyclic distribution of data in the architecture. VLSI technology.
Method: Algorithm for RNS decoding[7] Properties
Efficiency
Dynamic range extension, sign detection and multiplication and division through small arithmetic shifts, etc.
V. CONCLUSION
Method: Divide and Conquer for Parallel Processing[1] Properties
RAM compared to other existing algorithms. There is no idea of assumption and requires small size modulo adders. -------------------
The algorithm decomposes a set of moduli into numerous groups of two moduli each. Then, a particular element of each group is being decoded. Its time complexity is O(log n) and requires low size
37
[1]
E. Horowitz and A. Zorat, “Divide-and-Conquer for Parallel Processing”, IEEE Transactions on Computers, vol. C-32, no. 6, pp.582585, June 1983
[2]
R. W. Hockney and C. R.Jesshope, Computers”.Philadelphia,PA: Adam Hilger, 1988.
[3]
R. F. Boisvert, “Algorithms for special tridiagonal systems,” SIAM J.Scient@c Statist.Computat., vol. 12, no. 2, pp, 423-442, 1991.
[4]
J. L. Wolf, Daniel M. Dias, and P. S. Yu, “A Parallel Sort Merge Join Algorithm for Managing Data Skew”,IEEE Transactions on Parallel and Distributed Systems, vol. 4,no. 1,pp. 70-86, Jan. 1993
[5]
J. Lopez and E. L. Zapata, “Unified Architecture for Divide and Conquer Based Tridiagonal System Solvers”, IEEE Transactions on Computers, vol. 43, no. 12, pp. 1413-1425,Dec. 1994
[6]
T. A.Mueck and M. J. Schauer, “Optimizing Sort Order Query Execution in Balanced and Nested Grid Files”, IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 2, pp. 246-260, April 1995
[7]
Y. Wang and M. Abd-El-Barr, “A New Algorithm for RNS Decoding”, IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, vol. 43, no. 12, pp. 198-1001, Dec. 1996
“Parallel
IEEE CALCON2014 [8]
J. –T. Yan, “Hierarchical bubble-sorting-based non-Manhattan channel routing”, IEE Proc.-Comput. Digit. Tech, vol. 147, no. 4, pp. 215-220, July 2000
[9]
K. C. Tan, Y. J. Yang, and C. K. Goh, “A Distributed Cooperative Coevolutionary Algorithm for Multiobjective Optimization”, IEEE Transactions on Evolutionary Computation, vol. 10, no. 5, pp. 527-549, Oct. 2006
[10] G. Nong, S. Zhang and W. H. Chan, “Two Efficient Algorithms for Linear Time Suffix Array Construction”, IEEE Transactions on Computers, vol. 60, no. 10, pp. 1471-1484, Oct. 2011 [11] A. V. Aho, J. E. Hopcroft, J. D. Ullman, in Data Structures & Algorithms, 2nd Edition, ISBN13: 9780201000238, Chapter 8, Page: 366-425 [12] S. Lipschutz , in Data structure & Algorithm, Schaum’s Outlines Tata McGraw Hill 2nd Edition , ISBN13: 9780070991309. [13] Augenstein & T. Langsam , in Data Structure Using C & C++, 2nd Ed, ISBN13: 9788120311770 [14] camlunity.ru/swap/Library/Conflux/Algorithms%20and%20Data%20Str uctures/gpuqsort.pdf [15] stackoverflow.com/questions/4146843/when-should-we-use-radix-sort [16] programmers.stackexchange.com/questions/194258/applications-ofheapsort [17] stackoverflow.com/questions/4146843/when-should-we-use-radix-sort
38