Parallel Sorting Algorithms
Sorting Lists of Data
Sorting a list of data is a very common operation. Experimental Physicists often have to order lots of data. Astronomers may have to deal with plenty of data as well. But much less appreciated is the need in some numerical schemes to "order" data along the way. For instance, molecular dynamics may require to order the atoms in a simulation in terms of distances to build a distance classification tree.
The algorithms to perform ordering have been very well studied. They range widely in performance from the typical O(N^2) to an efficient O(N log N). They also vary widely in terms of complexity of implementation. The best algorithm for a particular need may depend heavily on the size of the list to order, whether it is for a mere 1,000 "body" or 1,000,000 data points.
Sorting lists of numbers in parallel using MPI definitely adds to the level of difficulties of sorting data. This will be the point of this section.
Simplest algorithm: Bubble Sort
By far the simplest sorting algorithm is the bubble sort. It consists of scanning the list over and over again, exchanging pairwise the data points adjacent to each other according to an increasing or decreasing order. The sorting is completed when a pass through the list of data no longer encounters a need to exchange data points.
Watch out. Bubble sorting is an algorithm of order O(N^2) according to Numerical Recipes. They suggest you never use bubble sorting in production code.
Nevertheless, here is a simple implementation of the bubble sort algorithm. The code bubble.c generates a list of random numbers which are then ordered. The bubble sort algorithm is certainly the easiest to implement and therefore the easiest to parallelize.
Parallel Bubble Sort
A way to implement bubble sorting in parallel is to divide the list of data (more or less) equally between the N-1 nodes 1 to (N-1) of an N node parallel machine, keeping node 0 to administer the calculation. Each node 1 to (N-1) can then sort its partial list and send it back to node 0 for a final global merge.
The implementation of this code is left to the reader.
Quicksort Algorithm
Numerical Recipes says that the quicksort algorithm is the fastest sorting algorithm, being of order O(N log N) to O(N^2), depending on the degree of partial ordering in the list from totally random to totally ordered.
The quicksort algorithm:
- Choose a "pivot" P — an element near mid-point of the list.
- Exchange P and the first element of the list.
- Scan the list upward from element 2 until an element < P is encountered.
- Simultaneously scan list downward from the last element until an element > P is encountered.
- Exchange these two elements.
- Repeat until the scans meet.
- P should go there — exchange with element 1
- At this time all elements are separated, those larger (less) to the left (right) of P. Repeat the sorting steps recursively on the left and right sub-lists until the entire list is sorted.
quicksort.c implements this algorithm.
The merge component of this parallel implementation is the hardest to code. The data set multi_lists, which contains three short lists, can be used to practice with.
Parallel Quicksort Algorithm
In its simplest form, the parallel implementation of the quicksort algorithm can be similar to that of the bubble sort.
A more efficient implementation could take advantage of the relative ordered ranges of the left-right sub-lists in the algorithm. The merging step would be simplified. But the number of nodes would have to be the master node plus a power-of-two worker nodes.
References
The examples from this section are bundled in sorting.tar.gz.
- Numerical Recipes describes many of the most common algorithms.
- There a number of GSL implementations for heapsorting various objects, but heap sorting does not parallelize easily.
- Merge sort is a parallelizable O(n log n) alorithm.
Besides comparison sorting algorithms, there are also network sorting algorithms such as