Nomenclature Aims of the Algorithm

§1.2 Algorithm Details 8

1.1.4 Putting it all together

We are now in a position to describe the algorithm as a whole. The steps in the algorithm are • distribute the data over the P processors • sort the data within each processor using the best available serial sorting algorithm for the data • perform logP merge steps along the edges of a hypercube • find which elements are unfinished this can be done in logNP time • sort these unfinished elements using a convenient algorithm Note that this algorithm arose naturally out of a simple consideration of a lower bound on the sort time. By developing the algorithm in this fashion we have guaran- teed that the algorithm is optimal in the average case.

1.2 Algorithm Details

The remainder of this chapter covers the implementation details of the algorithm, showing how it can be implemented with minimal memory overhead. Each stage of the algorithm is analyzed more carefully resulting in a more accurate estimate of the expected running time of the algorithm. The algorithm was first presented in [Tridgell and Brent 1993]. The algorithm was developed by Andrew Tridgell and Richard Brent and was implemented by Andrew Tridgell.

1.2.1 Nomenclature

P is the number of nodes also called cells or processors available on the parallel machine, and N is the total number of elements to be sorted. N p is the number of elements in a particular node p 0 ≤ p P. To avoid double subscripts N p j may be written as N j where no confusion should arise. Elements within each node of the machine are referred to as E p ,i , for 0 ≤ i N p and ≤ p P. E j ,i may be used instead of E p j ,i if no confusion will arise. §1.2 Algorithm Details 9 When giving “big O” time bounds the reader should assume that P is fixed so that O N and ONP are the same. The only operation assumed for elements is binary comparison, written with the usual comparison symbols. For example, A B means that element A precedes element B. The elements are considered sorted when they are in non-decreasing order in each node, and non-decreasing order between nodes. More precisely, this means that E p ,i ≤ E p , j for all relevant i j and p, and that E p ,i ≤ E q , j for 0 ≤ p q P and all relevant i, j. The speedup offered by a parallel algorithm for sorting N elements is defined as the ratio of the time to sort N elements with the fastest known serial algorithm on one node of the parallel machine to the time taken by the parallel algorithm on the parallel machine.

1.2.2 Aims of the Algorithm

The design of the algorithm had several aims: • Speed. • Good memory utilization. The number of elements that can be sorted should closely approach the physical limits of the machine. • Flexibility, so that no restrictions are placed on N and P. In particular N should not need to be a multiple of P or a power of two. These are common restrictions in parallel sorting algorithms [Ajtai et al. 1983; Akl 1985]. In order for the algorithm to be truly general purpose the only operator that will be assumed is binary comparison. This rules out methods such as radix sort [Blelloch et al. 1991; Thearling and Smith 1992]. It is also assumed that elements are of a fixed size, because of the difficulties of pointer representations between nodes in a MIMD machine. To obtain good memory utilization when sorting small elements linked lists are avoided. Thus, the lists of elements referred to below are implemented using arrays, without any storage overhead for pointers. §1.2 Algorithm Details 10 The algorithm starts with a number of elements N assumed to be distributed over P processing nodes. No particular distribution of elements is assumed and the only restrictions on the size of N and P are the physical constraints of the machine. The algorithm presented here is similar in some respects to parallel shellsort [Fox et al. 1988], but contains a number of new features. For example, the memory overhead of the algorithm is considerably reduced.

Nomenclature Aims of the Algorithm

1.1.4 Putting it all together

1.2 Algorithm Details

1.2.1 Nomenclature

1.2.2 Aims of the Algorithm

1.2.3 Infinity Padding

Parts

Dokumen yang terkait

Synchronization Interfaces for Improving Moodle Utilization

Design And Development Of Turntable For Automatic Sorting System.

algorithms for programmers 2008

Efficient for Sustainable Growth

Efficient for Sustainable Growth

Mastering Algorithms with C Useful Techniques from Sorting to Encryption pdf pdf

C++ Data Structures and Algorithms Learn how to write efficient code to build scalable and robust applications in C++ pdf pdf

Starting protection, synchronization and control for synchronous motors

Tips for writing efficient, faster, and organized manuscript

Algorithms from and for Nature and Life

Dukungan

Links

Nomenclature Aims of the Algorithm

1.1.4 Putting it all together

1.2 Algorithm Details

1.2.1 Nomenclature

1.2.2 Aims of the Algorithm

1.2.3 Infinity Padding

Parts

Dokumen yang terkait

Synchronization Interfaces for Improving Moodle Utilization

Design And Development Of Turntable For Automatic Sorting System.

algorithms for programmers 2008

Efficient for Sustainable Growth

Efficient for Sustainable Growth

Mastering Algorithms with C Useful Techniques from Sorting to Encryption pdf pdf

C++ Data Structures and Algorithms Learn how to write efficient code to build scalable and robust applications in C++ pdf pdf

Starting protection, synchronization and control for synchronous motors

Tips for writing efficient, faster, and organized manuscript

Algorithms from and for Nature and Life

Dokumen yang Anda mencari sudah siap untuk unduhkan