C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011. 63 ..... [
6] Robert Sedgewick, Algorithms, Addison-Wesley 1983 (chapter 8 p. 95).
World Applied Programming, Vol (1), No (1), April 2011. 62-71 ISSN: 2222-2510 ©2011 WAP journal. www.waprogramming.com
Popular sorting algorithms C. Canaan *
M. S. Garai
M. Daya
Information institute Chiredzi, Zimbabwe
[email protected]
Information institute Chiredzi, Zimbabwe
[email protected]
Information institute Chiredzi, Zimbabwe
[email protected]
Abstract: Here we want to introduce some of sorting algorithms. So, we begin with Bubble sort and will continue with Selection sort, Insertion sort, Shell sort, Merge sort, Heapsort, Quicksort and Bucket sort. These are the most popular sorting algorithms. All of these algorithms are perfectly described during the paper. Key word: Bubble sort Selection sort Quicksort Bucket sort I.
Insertion sort
Shell sort
Merge sort
Heapsort,
INTRODUCTION
In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions: 1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order); 2. The output is a permutation, or reordering, of the input. Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and lower bounds. II.
POPULAR SORTING ALGORITHMS
In this section we are going to list some of the most popular sorting algorithms and describe them. They are: Bubble sort, Selection sort, Insertion sort, Shell sort, Merge sort, Heapsort, Quicksort and Bucket sort.
Bubble sort Bubble sort, also known as sinking sort, is a simple sorting algorithm that works by repeatedly stepping through the list to be sorted, comparing each pair of adjacent items and swapping them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted. The algorithm gets its name from the way smaller elements "bubble" to the top of the list. Because it only uses comparisons to operate on elements, it is a comparison sort. The equally
62
C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011.
simple insertion sort has better performance than bubble sort, so some have suggested no longer teaching the bubble sort [2] [3]. Bubble sort has worst-case and average complexity both О(n2), where n is the number of items being sorted. There exist many sorting algorithms with substantially better worst-case or average complexity of O(n log n). Even other О(n2) sorting algorithms, such as insertion sort, tend to have better performance than bubble sort. Therefore, bubble sort is not a practical sorting algorithm when n is large. The only significant advantage that bubble sort has over most other implementations, even quicksort, but not insertion sort, is that the ability to detect that the list is sorted is efficiently built into the algorithm. Performance of bubble sort over an already-sorted list (best-case) is O(n). By contrast, most other algorithms, even those with better average-case complexity, perform their entire sorting process on the set and thus are more complex. However, not only does insertion sort have this mechanism too, but it also performs better on a list that is substantially sorted (having a small number of inversions). The positions of the elements in bubble sort will play a large part in determining its performance. Large elements at the beginning of the list do not pose a problem, as they are quickly swapped. Small elements towards the end, however, move to the beginning extremely slowly. This has led to these types of elements being named rabbits and turtles, respectively. Various efforts have been made to eliminate turtles to improve upon the speed of bubble sort. Cocktail sort achieves this goal fairly well, but it retains O(n2) worst-case complexity. Comb sort compares elements separated by large gaps, and can move turtles extremely quickly before proceeding to smaller and smaller gaps to smooth out the list. Its average speed is comparable to faster algorithms like quicksort. The algorithm can be expressed as: procedure bubbleSort( A : list of sortable items ) do swapped = false for each i in 1 to length(A) - 1 inclusive do: if A[i-1] > A[i] then swap( A[i-1], A[i] ) swapped = true end if end for while swapped end procedure
Selection sort Selection sort is a sorting algorithm, specifically an in-place comparison sort. It has O(n2) complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations. Selection sort is not difficult to analyze compared to other sorting algorithms since none of the loops depend on the data in the array. Selecting the lowest element requires scanning all n elements (this takes n − 1 comparisons) and then swapping it into the first position. Finding the next lowest element requires scanning the remaining n − 1 elements and so on, for (n − 1) + (n − 2) + ... + 2 + 1 = n(n − 1) / 2 ∈ θ(n2) comparisons. Each of these scans requires one swap for n − 1 elements (the final element is already in place). Selection Sort's philosophy most closely matches human intuition: It finds the largest element and puts it in its place. Then it finds the next largest and places it and so on until the array is sorted. To put an element in its place, it trades positions with the element in that location (this is called a swap). As a result,
63
C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011.
the array will have a section that is sorted growing from the end of the array and the rest of the array will remain unsorted [4]. Algorithm is as follow [5]: for i ← 1 to n-1 do min j ← i; min x ← A[i] for j ← i + 1 to n do If A[j] < min x then min j ← j min x ← A[j] A[min j] ← A [i] A[i] ← min x
Insertion sort Insertion sort is a simple sorting algorithm: a comparison sort in which the sorted array (or list) is built one entry at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:
Simple implementation Efficient for (quite) small data sets Adaptive, i.e. efficient for data sets that are already substantially sorted: the time complexity is O(n + d), where d is the number of inversions More efficient in practice than most other simple quadratic, i.e. O(n2) algorithms such as selection sort or bubble sort; the best case (nearly sorted input) is O(n) Stable, i.e. does not change the relative order of elements with equal keys In-place, i.e. only requires a constant amount O(1) of additional memory space Online, i.e. can sort a list as it receives it Most humans when sorting—ordering a deck of cards, for example—use a method that is similar to insertion sort.[6]
Every repetition of insertion sort removes an element from the input data, inserting it into the correct position in the already-sorted list, until no input elements remain. The choice of which element to remove from the input is arbitrary, and can be made using almost any choice algorithm. Sorting is typically done in-place. The resulting array after k iterations has the property where the first k + 1 entries are sorted. In each iteration the first remaining entry of the input is removed, inserted into the result at the correct position, thus extending the result:
Becomes
with each element greater than x copied to the right as it is compared against x. The most common variant of insertion sort, which operates on arrays, can be described as follows:
64
C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011.
1.
2.
Suppose that there exists a function called Insert designed to insert a value into a sorted sequence at the beginning of an array. It operates by beginning at the end of the sequence and shifting each element one place to the right until a suitable position is found for the new element. The function has the side effect of overwriting the value stored immediately after the sorted sequence in the array. To perform an insertion sort, begin at the left-most element of the array and invoke Insert to insert each element encountered into its correct position. The ordered sequence into which the element is inserted is stored at the beginning of the array in the set of indices already examined. Each insertion overwrites a single value: the value being inserted.
Below is the pseudocode for insertion sort for a zero-based array: for j ←1 to length(A)-1 key ← A[ j ] > A[ j ] is added in the sorted sequence A[1, .. j-1] i ← j - 1 while i >= 0 and A [ i ] > key A[ i +1 ] ← A[ i ] i ← i -1 A [i +1] ← key
Shell sort Shell sort is a sorting algorithm, devised by Donald Shell in 1959, that is a generalization of insertion sort, which exploits the fact that insertion sort works efficiently on input that is already almost sorted. It improves on insertion sort by allowing the comparison and exchange of elements that are far apart. The last step of Shell sort is a plain insertion sort, but by then, the array of data is guaranteed to be almost sorted. The algorithm is an example of an algorithm that is simple to code but difficult to analyze theoretically. Although Shell sort is easy to code, analyzing its performance is very difficult and depends on the choice of increment sequence. The algorithm was one of the first to break the quadratic time barrier, but this fact was not proven until some time after its discovery.[4] The initial increment sequence suggested by Donald Shell was [1,2,4,8,16,...,2k], but this is a very poor choice in practice because it means that elements in odd positions are not compared with elements in even positions until the very last step. The original implementation performs O(n2) comparisons and exchanges in the worst case [7]. A simple change, replacing 2k with 2k-1, improves the worst-case running time to O(N3/2) [8], a bound that cannot be improved.[9] A minor change given in V. Pratt's book[9] improved the bound to O(n log2 n). This is worse than the optimal comparison sorts, which are O(n log n), but lends itself to sorting networks and has the same asymptotic gate complexity as Batcher's bitonic sorter. Consider a small value that is initially stored in the wrong end of the array. Using an O(n2) sort such as bubble sort or insertion sort, it will take roughly n comparisons and exchanges to move this value all the way to the other end of the array. Shell sort first moves values using giant step sizes, so a small value will move a long way towards its final position, with just a few comparisons and exchanges. One can visualize Shell sort in the following way: arrange the list into a table and sort the columns (using an insertion sort). Repeat this process, each time with smaller number of longer columns. At the
65
C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011.
end, the table has only one column. While transforming the list into a table makes it easier to visualize, the algorithm itself does its sorting in-place (by incrementing the index by the step size, i.e. using i += step_size instead of i++). The principle of Shell sort is to rearrange the file so that looking at every hth element yields a sorted file. We call such a file h-sorted. If the file is then k-sorted for some other integer k, then the file remains h-sorted [7]. For instance, if a list was 5-sorted and then 3-sorted, the list is now not only 3-sorted, but both 5- and 3-sorted. If this were not true, the algorithm would undo work that it had done in previous iterations, and would not achieve such a low running time. The algorithm draws upon a sequence of positive integers known as the increment sequence. Any sequence will do, as long as it ends with 1, but some sequences perform better than others [8]. The algorithm begins by performing a gap insertion sort, with the gap being the first number in the increment sequence. It continues to perform a gap insertion sort for each number in the sequence, until it finishes with a gap of 1. When the increment reaches 1, the gap insertion sort is simply an ordinary insertion sort, guaranteeing that the final list is sorted. Beginning with large increments allows elements in the file to move quickly towards their final positions, and makes it easier to subsequently sort for smaller increments [7]. Although sorting algorithms exist that are more efficient, Shell sort remains a good choice for moderately large files because it has good running time and is easy to code. The following is an implementation of Shell sort written in pseudocode. The increment sequence is a geometric sequence in which every term is roughly 2.2 times smaller than the previous one: input: an array a of length n with array elements numbered 0 to n − 1 inc ← round(n/2) while inc > 0 do: for i = inc .. n − 1 do: temp ← a[i] j ← i while j ≥ inc and a[j − inc] > temp do: a[j] ← a[j − inc] j ← j − inc a[j] ← temp inc ← round(inc / 2.2)
Merge sort Merge sort is an O(n log n) comparison-based sorting algorithm. Most implementations produce a stable sort, meaning that the implementation preserves the input order of equal elements in the sorted output. It is a divide and conquer algorithm. Merge sort was invented by John von Neumann in 1945 [10]. In sorting n objects, merge sort has an average and worst-case performance of O(n log n). If the running time of merge sort for a list of length n is T(n), then the recurrence T(n) = 2T(n/2) + n follows from the definition of the algorithm (apply the algorithm to two lists of half the size of the original list, and add the n steps taken to merge the resulting two lists). The closed form follows from the master theorem. In the worst case, merge sort does an amount of comparisons equal to or slightly smaller than (n ⌈lg n⌉ - 2⌈lg n⌉ + 1), which is between (n lg n - n + 1) and (n lg n + n + O(lg n)) [11].
66
C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011.
For large n and a randomly ordered input list, merge sort's expected (average) number of comparisons approaches α·n fewer than the worst case where
The worst case, merge sort does about 39% fewer comparisons than quicksort does in the average case; merge sort always makes fewer comparisons than quicksort, except in extremely rare cases, when they tie, where merge sort's worst case is found simultaneously with quicksort's best case. In terms of moves, merge sort's worst case complexity is O(n log n)—the same complexity as quicksort's best case, and merge sort's best case takes about half as many iterations as the worst case. Recursive implementations of merge sort make 2n − 1 method calls in the worst case, compared to quicksort's n, thus merge sort has roughly twice as much recursive overhead as quicksort. However, iterative, non-recursive implementations of merge sort, avoiding method call overhead, are not difficult to code. Merge sort's most common implementation does not sort in place; therefore, the memory size of the input must be allocated for the sorted output to be stored in. Merge sort as described here also has an often overlooked, but practically important, best-case property. If the input is already sorted, its complexity falls to O(n). Specifically, n-1 comparisons and zero moves are performed, which is the same as for simply running through the input, checking if it is pre-sorted. Sorting in-place is possible (e.g., using lists rather than arrays) but is very complicated, and will offer little performance gains in practice, even if the algorithm runs in O(n log n) time. (Katajainen, Pasanen & Teuhola 1996) In these cases, algorithms like heapsort usually offer comparable speed, and are far less complex. Additionally, unlike the standard merge sort, in-place merge sort is not a stable sort. In the case of linked lists the algorithm does not use more space than that the already used by the list representation, but the O(log(k)) used for the recursion trace. Merge sort is more efficient than quick sort for some types of lists if the data to be sorted can only be efficiently accessed sequentially, and is thus popular in languages such as Lisp, where sequentially accessed data structures are very common. Unlike some (efficient) implementations of quicksort, merge sort is a stable sort as long as the merge operation is implemented properly. As can be seen from the procedure merge sort, there are some demerits. One complaint we might raise is its use of 2n locations; the additional n locations were needed because one couldn't reasonably merge two sorted sets in place. But despite the use of this space the algorithm must still work hard: The contents of m are first copied into left and right and later into the list result on each invocation of merge_sort (variable names according to the pseudocode above). An alternative to this copying is to associate a new field of information with each key (the elements in m are called keys). This field will be used to link the keys and any associated information together in a sorted list (a key and its related information is called a record). Then the merging of the sorted lists proceeds by changing the link values; no records need to be moved at all. A field which contains only a link will generally be smaller than an entire record so less space will also be used. Another alternative for reducing the space overhead to n/2 is to maintain left and right as a combined structure, copy only the left part of m into temporary space, and to direct the merge routine to place the merged output into m. With this version it is better to allocate the temporary space outside the merge routine, so that only one allocation is needed. The excessive copying mentioned in the previous paragraph is also mitigated, since the last pair of lines before the return result statement (function merge in the pseudo code above) become superfluous.
67
C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011.
function merge_sort(m) if length(m) ≤ 1 return m var list left, right, result var integer middle = length(m) / 2 for each x in m up to middle add x to left for each x in m after middle add x to right left = merge_sort(left) right = merge_sort(right) result = merge(left, right) return result
Heapsort Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a well implemented quicksort, it has the advantage of a more favorable worst-case O(n log n) runtime. Heapsort is an in-place algorithm, but is not a stable sort. Heapsort begins by building a heap out of the data set, and then removing the largest item and placing it at the end of the partially sorted array. After removing the largest item, it reconstructs the heap, removes the largest remaining item, and places it in the next open position from the end of the partially sorted array. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements. Heapsort inserts the input list elements into a binary heap data structure. The largest value (in a maxheap) or the smallest value (in a min-heap) are extracted until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction. During extraction, the only space required is that needed to store the heap. To achieve constant space overhead, the heap is stored in the part of the input array not yet sorted. (The storage of heaps as arrays is diagrammed at Binary heap#Heap implementation.) Heapsort uses two heap operations: insertion and root deletion. Each extraction places an element in the last empty location of the array. The remaining prefix of the array stores the unsorted elements. The following is the "simple" way to implement the algorithm in pseudocode. Arrays are zero based and swap is used to exchange two elements of the array. Movement 'down' means from the root towards the leaves, or from lower indices to higher. Note that during the sort, the smallest element is at the root of the heap at a[0], while at the end of the sort, the largest element is in a[end].
function heapSort(a, count) is input: an unordered array a of length count (first place a in max-heap order) heapify(a, count) end := count-1 //in languages with zero-based arrays the children are 2*i+1 and 2*i+2 while end > 0 do (swap the root(maximum value) of the heap with the last element of the heap) swap(a[end], a[0]) (put the heap back in max-heap order)
68
C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011.
siftDown(a, 0, end-1) (decrease the size of the heap by one so that the previous max value will stay in its proper placement) end := end - 1 function heapify(a, count) is (start is assigned the index in a of the last parent node) start := count / 2 - 1 while start ≥ 0 do (sift down the node at index start to the proper place such that all nodes below the start index are in heap order) siftDown(a, start, count-1) start := start - 1 (after sifting down the root all nodes/elements are in heap order) function siftDown(a, start, end) is input: end represents the limit of how far down the heap to sift. root := start while root * 2 + 1 ≤ end do (While the root has at least one child) child := root * 2 + 1 (root*2 + 1 points to the left child) swap := root (keeps track of child to swap with) (check if root is smaller than left child) if a[swap] < a[child] swap := child (check if right child exists, and if it's bigger than what we're currently swapping with) if child < end and a[swap] < a[child+1] swap := child + 1 (check if we need to swap at all) if swap != root swap(a[root], a[swap]) root := swap (repeat to continue sifting down the child now) else return
Quicksort Quicksort is a sorting algorithm developed by C. A. R. Hoare that, on average, makes O(nlogn) (big O notation) comparisons to sort n items. In the worst case, it makes O(n2) comparisons, though if implemented correctly this behavior is rare. Typically, quicksort is significantly faster in practice than other O(nlogn) algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data it is possible to make design choices that minimize the probability of requiring quadratic time. Additionally, quicksort tends to make excellent usage of the memory hierarchy, taking perfect advantage of virtual memory and available caches. Although quicksort is usually not implemented as an in-place sort, it is possible to create such an implementation.[12] Quicksort (also known as "partition-exchange sort") is a comparison sort and, in efficient implementations, is not a stable sort. Quicksort sorts by employing a divide and conquer strategy to divide a list into two sub-lists.The steps are: 1. 2.
Pick an element, called a pivot, from the list. Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
69
C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011.
3.
Recursively sort the sub-list of lesser elements and the sub-list of greater elements.
The bases cases of the recursion are lists of size zero or one, which never need to be sorted. Algorithm is listed bellow: function quicksort(array) var list less, greater if length(array) ≤ 1 return array // an array of zero or one elements is already sorted select and remove a pivot value pivot from array for each x in array if x ≤ pivot then append x to less else append x to greater return concatenate(quicksort(less), pivot, quicksort(greater))
Bucket sort Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, and is a cousin of radix sort in the most to least significant digit flavour. Bucket sort is a generalization of pigeonhole sort. Since bucket sort is not a comparison sort, the O(n log n) lower bound is inapplicable. The computational complexity estimates involve the number of buckets. Bucket sort works as follows: 1. 2. 3. 4.
Set up an array of initially empty "buckets." Scatter: Go over the original array, putting each object in its bucket. Sort each non-empty bucket. Gather: Visit the buckets in order and put all elements back into the original array.
Algorithm of Bucket sort is listed bellow: function bucket-sort(array, n) is buckets ← new array of n empty lists for i = 0 to (length(array)-1) do insert array[i] into buckets[msbits(array[i], k)] for i = 0 to n - 1 do next-sort(buckets[i]) return the concatenation of buckets[0], ..., buckets[n-1]
III.
CONCLUSION
In this paper, we got into sorting problem and investigated different solutions. We talked about the most popular algorithms that are useful for sorting lists. They are: Bubble sort, Selection sort, Insertion sort, Shell sort, Merge sort, Heapsort, Quicksort and Bucket sort. Algorithms were represented with perfect descriptions. Also, it was tried to indicate the computational complexity of them in the worst, middle and best cases. At the end, implementation code was placed. REFERENCES [1] [2] [3] [4] [5]
Wikipedia. Address: http://www.wikipedia.com Owen Astrachan. Bubble Sort: An Archaeological Algorithmic Analysis. SIGCSE 2003 Hannan Akhtar. Available at: http://www.cs.duke.edu/~ola/papers/bubble.pdf. Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Second Edition. Addison-Wesley, 1998. ISBN 0-201-89685-0. Pages 106–110 of section 5.2.2: Sorting by Exchanging. Available at: http://webspace.ship.edu/cawell/Sorting/selintro.htm Available at: http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Sorting/selectionSort.htm
70
C. Canaan et al., World Applied Programming, Vol (1), No (1), April 2011.
[6] [7] [8] [9]
Robert Sedgewick, Algorithms, Addison-Wesley 1983 (chapter 8 p. 95) Sedgewick, Robert (1998). Algorithms in C. Addison Wesley. pp. 273–279. Weiss, Mark Allen (1997). Data Structures and Algorithm Analysis in C. Addison Wesley Longman. pp. 222–226. Pratt, V (1979). Shellsort and sorting networks (Outstanding dissertations in the computer sciences). Garland. ISBN 0-82404406-1. (This was originally presented as the author's Ph.D. thesis, Stanford University, 1971) [10] Merge Sort - Wolfram MathWorld, available at: http://mathworld.wolfram.com/MergeSort.html [11] The worst case number given here does not agree with that given in Knuth's Art of Computer Programming, Vol 3. The discrepancy is due to Knuth analyzing a variant implementation of merge sort that is slightly sub-optimal. [12] R. Sedgewick, Implementing quicksort programs, Comm. ACM, 21(10):847-857, 1978. Implementing Quicksort Programs, available: http://delivery.acm.org/10.1145/360000/359631/p847sedgewick.pdf?key1=359631&key2=9191985921&coll=DL&dl=ACM&CFID=6618157&CFTOKEN=73435998
71