A new Relative Sort Algorithm based on Arithmetic mean value

0 downloads 0 Views 3MB Size Report
based on comparing the arithmetic mean with each item in the list. Running cost analysis and results obtained after various implementations are also provided ...
A new Relative Sort Algorithm based on Arithmetic mean value Wasi Haider Butt

Muhammad Younus Javed

College of Electrical and Mechanical Engineering National University of Sciences and Technology Rawalpindi, Pakistan [email protected]

College of Electrical and Mechanical Engineering National University of Sciences and Technology Rawalpindi, Pakistan [email protected]

Abstract-In this article we propose a novel sorting algorithm based on comparing the arithmetic mean with each item in the list. Running cost analysis and results obtained after various implementations are also provided with the intention of comparing the efficiency of the proposed mechanism with other existing sorting methodologies. I.

Computational Copmplexity

Computational complexity is the complexity in terms of comparisons of elements of list of size 'n'. The behavior of a sorting algorithm is checked in three cases i.e. the best case, the average case and the worst case. For sorting algorithms, good and bad behaviors are represented by the functions O(n log n) and n(n 2) respectively. Ideal behavior of a sorting algorithm is displayed by the function O(n).

INTRODUCTION

A Sorting Algorithm is an algorithm that puts elements of a list in a certain order. The most used orders are numerical orders and lexicographical orders. There are two basic requirements that a sorting algorithm must fulfill, the first one is that the output should be any permutation or reordering of the input and the second is that all the elements in the output should be in the desired order, which is increasing or decreasing. Sorting is of significant importance as we live in a world obsessed in keeping information. In order to efficiently search for required information we must keep that information in a sensible (logically appealing) order [1]. So for our convenience computers spend a considerable amount of time on keeping data in order [2]. Basically sorting is the rearranging of given items on the basis of some well defined ordering rules [3]. From the very start of computer science the sorting problem, due to its immense usefulness, has invited the interest of researchers. The aim is to reduce the cost and complexity of the algorithm and to make them achieve efficiency levels untouched in the past. In this paper, we propose a new sorting method, derive its algorithm and compare it with well known existing methods. We discover that the algorithm proposed in this paper is relatively simpler and efficient. A second algorithm centered on the same motif is under the development phase. It is an enhancement to the presented algorithm, and will be discussed in an upcoming sequel. II.

A.

B.

Memory Usage

Some algorithms are "in place" i.e. they do their work using the input list only, however some algorithms require auxiliary locations in the memory to store temporary data. Both of the above issues are of core importance while designing a sorting algorithm. A good sorting algorithm will be efficient in both of the above mentioned dimensions. III.

A BRIEF INTRODUCTION OF EXISTING SORTING ALGORITHMS

Many sorting methods have been proposed so far. In this section we have tried to cover a brief introduction of the most popular among them. Following are those: -

A.

DESIGN ISSUES

While designing a sorting algorithm, generally, the following issues need to be kept in mind:

978-1-4244-2824-3/08/$25.00 ©2008 IEEE

374

Bubble Sort

Bubble sort is the oldest-known and the simplest of all sorting algorithms. It works by repeatedly stepping through the list to be sorted (either as ascending or descending), comparing two items at a time and swapping them if they are in the reverse order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted. The algorithm gets its name from the way smaller elements "bubble" to the top of the list [4]. The method is considered relatively inefficient to modem algorithms and is not used anywhere except for theoretical purposes [5]. If there is a list of 100 elements, Bubble sort will make 10000 comparisons, this is extravagant and hence inefficient in view of present-day standards. At its best, when the list is initially sorted, Bubble sort has O(n) behavior. In both worst and average case, it has a complexity of O(n2j.

th

Proceedings ofthe 12 IEEE International Multitopic Conference, December 23-24,2008 B.

Cocktail Sort

Cocktail sort, also known as bidirectional bubble sort cocktail shaker sort, shaker sort (this also refers to a variant of selection sort), ripple sort, shuttle sort or happy hour sort, is a variation of bubble sort that is both a stable sorting algorithm and a comparison sort [6]. The algorithm differs from bubble sort by sorting in both directions in each pass it makes through the list. This sorting algorithm is marginally more difficult than bubble sort to implement. Hence it involves the same amount of simplicity, and solves the problem with so-called 'turtles' in bubble sort. Like bubble sort, its average and worst case are both O(n 2).

C. Selection Sort Selection sort is an in-place comparison sort. It has O(n 2) complexity, making it inefficient while processing large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity, and also has notable performance advantages over more complicated algorithms in certain situations. Selection sort first scans the list for the smallest element, puts it at the first location, then second smallest, puts to the second location and so on. It performs these steps till it reaches the largest element in the list. The number of passes through the list is equal to the number of elements in the list [7]. A unique property of selection sort is that its running time is not affected by the ordering of the input elements. It performs e(n 2) comparisons in each case [8].

D. Insertion Sort Insertion Sort is also a very simple sorting algorithm. Sorted list is built one at a time. It selects one element in each pass and inserts it to its original location in a new list. It has some definite advantages which include ease of implementation and high efficiency on small datasets. Worst time is also like above sorts i.e. O(n 2) [9].

E.

Shell Sort

Shell sort is a sorting algorithm that is a generalization of insertion sort. It has two important features i.e. shell sort is efficient if the input is "almost sorted" and shell sort is typically inefficient because it moves values just one position at a time [10].

F.

Merge Sort

Merge sort is an efficient O(nlogn) comparison based algorithm [11]. It is highly efficient and divides the list into sub-lists by recursion and then merges them keeping all elements in the newly concocted lists in a sorted order.

until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction.

H.

Quick Sort

Quick sort is a well-known sorting algorithm developed by C.A.R. Hoare. On average it makes O(nlogn) comparisons to sort 'n' items. However, in the worst case it makes e(n 2) comparisons. Typically, quick sort is significantly faster in practice than other e(nlogn) algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data it is possible to make design choices which minimize the probability of requiring quadratic time [13]. 1.

Bucket Sort

Bucket sort, or bin sort, works by partitioning an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm [14]. It is a cousin of radix sort in the most to least significant digit flavor. Since bucket sort is not a comparison sort, the D(nlogn) lower bound is inapplicable. The computational complexity estimates involve the number of buckets. J.

Radix Sort

Radix sort is a sorting algorithm that sorts integers by processing individual digits. Because integers can represent strings of characters (e.g., names or dates) and specially formatted floating point numbers, radix sort is not limited to integers [15].

IV.

RELATIVE SORT ALGORITHM

Relative sort is based on comparing each element by the arithmetic mean of the given unsorted list. Consequently we then have an idea of the location of that corresponding element. The steps of the proposed Relative sort algorithm are as follows:



Take a new list having size equal to the size of given list

• •

Take Arithmetic mean of the given list Iterate through the given list and perform the following steps for each item:

o

Compare the picked item with Arithmetic Mean

o

If Item is less than mean:

G. Heap Sort Heap sort is a comparison-based sorting algorithm and is part of the selection sort family. Although somewhat slower in practice on most machines compared to an efficient implementation of quick sort, it has the advantage of a worstcase e(nlogn) runtime [12]. Heap sort inserts the input list elements into a heap data structure. The largest value (in a max-heap) or the smallest values (in a min-heap) are extracted

375



If list is empty, place item at first location Else start from the location where last element was inserted in this direction and compare element with that Iterating downwards and shifting elements one location ahead (if

Proceedings ofthe 12th IEEE International Multitopic Conference, December 23-24,2008

required) place the element to its exact location

o

Else if item is greater than or equal to mean •



If list empty, place item at last location • Else start from the location where last item was inserted in this direction and compare that element with that • Iterating upwards and shifting elements one location back (if required), place the element to its exact location New list will be in sorted order V.

VII.

COMPARISON OF RELATIVE SORT WITH VARIOUS EXISTING SORTING ALGORITHMS

To contrast efficiency of Relative sort performance, it was implemented along with other well known sorting algorithms. By taking lists of various sizes, items were generated using a uniform random number generator with values ranging from 1 to 10,000. Running times were noted down. The plots are given below.

PSEUDO CODE

RELATIVESORT(List,NewList)

Average:= AVERAGE [List] First:=O Last:=LENGTH[List] For i~ 1 to LENGTH [List] if (First < Last) if (List[i] < Average) For j ~First down to 1 if (List [i] > NewList [j -1] ) break else NewList [j - 1]; NewList[j] List [i] ; NewList[j] First++ else For j ~Last to LENGTH [List] if (List[i] < NewList[j+1]) break else NewList[j] NewList[j + 1] NewList[j] List [i] Last-else NewList[first] = List[i] VI.

order. The worst case will be involve the inverse scenario i.e. the given list is in descending order from the start to the average value and is in ascending order from the average value to the last, in this case cost will be O(n2). In the average case, the running cost will be O(nk), where 'n' is the number of elements in the list and 'k' is a constant whose value depends on the order of the given list, specifically it is the number of time the inner loops are executed. The algorithm has been implemented and tested a multiple times. After a thorough analysis we discovered that the sorting will depict average behavior in all cases except then the list is arranged for achieving the best or worst case. In the upcoming section, we have included several comparisons of our algorithm with present day, highly used sorting algorithms.

A.

Comparison with Bubble Sort 120000 100000 80000

60000 40000

20000

o 1000

5000

10000

25000

50000

100000

In the above graph, at x-axis we have placed number of items in the list to be sorted and at y-axis we have placed the time taken by program for execution in milliseconds. It can be seen clearly that up to 10000 elements the difference is not much but when the size starts increasing, execution time of bubble sort also increases rapidly while relative sort and much more better performance that is obvious from the graph. B.

Comparison with Cocktail Sort 120000 100000

RUNNING COST ANALYSIS

A careful observer will note that there is one main loop within which lie two nested loops. The outer loop will be executed in every case while the execution of inner loops depends on the item taken within an iteration of the list. As the item is compared with the average value, it will traverse in one direction, so out of these two inner loops one loop will be executed. The best case will depict O(n) behavior when the given list is arranged in such a way that from the start of the data structure to the location holding the average value the elements are in ascending order and after that in descending

376

80000 60000 40000 20000

Proceedings ofthe 12th IEEE International Multitopic Conference, December 23-24,2008 Again at x-axis are the numbers of items and along y-axis are the execution times in milliseconds. Like bubble sort the change in execution times is almost same but performance difference starts emerging when we move to larger lists.

When we increase number of elements in the input list shown in the graph on x-axis, there is not much effect on the performance of merge sort while Relative sort has performance degradation.

C. Comparison with Selection Sort

F.

60000

Comparison with Quick Sort 25000

50000

20000 15000

30000 -="

10000

20000 -I

r

5000

10000 -.

r·······

o

0·,

1000

5000

10000

25000

50000

1000

100000

Above graph depicts the performance difference of Selection and the Relative sort. Along x-axis are the number of items in the input list while along yare the execution times. Relative sort is giving much lesser time of execution for lists almost larger than 750 elements.

5000

10000

25000

50000

100000

Quick sort has also not impact of increasing number of elements but Relative sort starts losing performance on increasing number of elements. VIII.

CONCLUSIONS

From the graphs above, it can be clearly observed that the Relative sort is much more efficient than most of well known existing methods although it is less efficient than the most efficient algorithms like Merge sort and Quick sort. In the next sequel, an enhanced version of our algorithm will be presented which will be hopefully much more efficient.

D. Comparison with Insertion Sort 35000 30000 25000 20000 15000

ACKNOWLEDGMENT

10000

Thanks to parents whose prayers are the reason of this humble effort. Thanks to NUST for providing such type of research opportunities and grooming. Special thanks to all well wishers and friends especially Kashif Kamran and Ahmed.

5000

o

..00>

#' ..#' / / ../

REFERENCES

Horizontally, numbers of input list elements are kept and along y is the execution time. Almost up to 37000 elements, the difference between the execution times is ignorable but graph presents a valuable difference when we keep on increasing input list size.

E.

Comparison with Merge Sort

[1] [2] [3] [4]

25000 20000

[5] [6]

15000

[7]

10000

[8] [9]

5000

o 1000

5000

10000

25000

50000

100000

R. L.Kruse and A. 1. Ryba, Data Structures and Program Design inC+ +, 2nd ed. Pearson Education, 1999. D. A. Bailey, Java Structure:Data Structure in Java for Principled Programmer, 2nd ed. McGraw-Hill, 2003. R. Sedgewick, Algorithms in C++. Reading, Massachusetts: AddisonWesley, 1992. Donald Knuth. The Art of ComputerProgramming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-20189685-0. pp. 106-110 of section 5.2.2: Sorting by Exchanging. http://en.wikipedia.org/wiki/Sorting algorithm#Bubble sort http://en.wikipedia.org/wiki/Cocktail sort Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-20189685-0., pp. 138-141, of Section 5.2.3: Sorting by Selection. http://en.wikipedia.org/wiki/Sorting algorithm#Selection sort Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 2.1: Insertion sort, pp.15-21.

[10] Shell, D.L. (1959). "A high-speed sorting procedure". Communications ofthe ACM2 (7): 30-32. doi:l0.1145/368370.368387.

377

Proceedings ofthe 12th IEEE International Multitopic Conference, December 23-24,2008 [11] Cormen, Thomas H.; Leiserson, Charles E., Rivest, Ronald L., Stein, Clifford [1990] (2001). "2.3: Designing algorithms", Introduction to Algorithms, 2nd edition, MIT Press and McGraw-Hill, pp. 27-37. ISBN 0-262-03293-7

[14] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 8.4: Bucket sort, pp.l74-177.

[12] 1. W. J. Williams. Algorithm 232 - Heapsort, 1964, Communications of the ACM 7(6): 347-348.

[15] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 8.3: Radix sort, pp.17Q-173.

[13] Hoare, C. A. R. "Partition: Algorithm 63," "Quicksort: Algorithm 64," and "Find: Algorithm 65." Comm. ACM 4(7),321-322,1961.

378

Suggest Documents