Other Algorithms efficient algorithms for sorting and synchronization 1999

§2.4 Other Algorithms 46 0.5 1 1.5 2 2.5 3 1 10 100 Reads and Writes per element k Figure 2.6 : Number of reads and writes per element for a range of k values A experiment with 10 million 64 bit elements and a k of 7 gave a slowdown for the worst case compared to random data of about a factor of 4.

2.3.3 First pass completion

An important component of the algorithm is the fact that most slices are completed in the first pass, allowing the remaining passes to “clean up” the unfinished elements quickly. This is analogous to the primary and cleanup phases of the internal parallel sorting algorithm discussed in the previous chapter. Figure 2.7 shows the percentage of slices that are completed after the first pass when sorting 50 million 64 bit random elements on the 128 processor AP1000 for vary- ing values of k. Note the narrow y range on the graph. The results show that even for quite high values of k nearly 90 of slices are completed after the first pass.

2.4 Other Algorithms

The field of external parallel sorting is one for which direct comparison between algo- rithms can be difficult. The wide variation in the configuration of disk and memory hierarchies combined with varying requirements for data types and key sizes leaves §2.4 Other Algorithms 47 80 85 90 95 100 10 20 30 40 50 60 70 completion k Figure 2.7 : Percentage of slices completed after the first pass for varying k little common ground between many published results. The best that can be done is to make broad comparisons. Akl provides external parallel sorting algorithms for disks and tapes by adapting an internal parallel sorting algorithm[Akl 1985]. The result is an algorithm that per- forms at least 2 log P IO operations per element which is excessive except for very small machines. Aggarwal and Plaxton introduce the hypercube-based sharesort external parallel sorting algorithm[Aggarwal and Plaxton 1993] and demonstrate that it is asymptot- ically optimal in terms of IO operations on general multi-level storage hierarchies. Their algorithm is aimed at larger values of k than are considered in this chapter and concentrates on providing a framework for expressing very generalised models of ex- ternal parallel sorting. Nodine and Vitter introduce an external parallel sorting algorithm called balance- sort[Nodine and Vitter 1992] which, like sharesort, is asymptotically optimal in the number of IO operations but assumes a CRCW-PRAM model of parallel architecture which limits its practical usefulness. I am not aware of any other external parallel sorting algorithms which take advan- tage of the early completion that is the key to the algorithm presented in this chapter. §2.5 Conclusions 48