Efficiency of Quicksort

We’ve said that quicksort operates in O(N*logN) time. As we saw in the discussion of mergesort in Chapter 6, this is generally true of the divide-and-conquer algorithms, in which a recursive method divides a range of items into two groups and then calls itself to handle each group. In this situation the logarithm actually has a base of 2:

The running time is proportional to N*log 2 N.

You can get an idea of the validity of this N*log 2 N running time for quicksort by running one of the quickSort Workshop applets with 100 random bars and examin- ing the resulting dotted horizontal lines.

Each dotted line represents an array or subarray being partitioned: the pointers leftScan and rightScan moving toward each other, comparing each data item and swapping when appropriate. We saw in the “Partitioning” section that a single partition runs in O(N) time. This tells us that the total length of all the dotted lines is proportional to the running time of quicksort. But how long are all the lines? Measuring them with a ruler on the screen would be tedious, but we can visualize them a different way.

There is always 1 line that runs the entire width of the graph, spanning N bars. This results from the first partition. There will also be 2 lines (one below and one above the first line) that have an average length of N/2 bars; together they are again N bars long. Then there will be 4 lines with an average length of N/4 that again total N bars, then 8 lines, 16 lines, and so on. Figure 7.15 shows how this looks for 1, 2, 4, and 8 lines.

In this figure solid horizontal lines represent the dotted horizontal lines in the quicksort applets, and captions like N/4 cells long indicate average, not actual, line lengths. The circled numbers on the left show the order in which the lines are created.

Each series of lines (the eight N/8 lines, for example) corresponds to a level of recursion. The initial call to recQuickSort() is the first level and makes the first line; the two calls from within the first call—the second level of recursion—make the next two lines; and so on. If we assume we start with 100 cells, the results are shown in Table 7.4.

356 CHAPTER 7 Advanced Sorting

11 N/8 N/4 cells long 1

N/2 cells long N cells long

8 lines

6 Two lines One line

7 Four lines Eight 2

FIGURE 7.15 Lines correspond to partitions.

TABLE 7.4 Line Lengths and Recursion

Number of Total Length Level

in Figure

8, 11, 12, 14, 15 5 Not shown

6 Not shown

7 Not shown

1 64 64 Total = 652

Radix Sort 357

Where does this division process stop? If we keep dividing 100 by 2, and count how many times we do this, we get the series 100, 50, 25, 12, 6, 3, 1, which is about seven levels of recursion. This looks about right on the workshop applets: If you pick some point on the graph and count all the dotted lines directly above and below it, there will be an average of approximately seven. (In Figure 7.15, because not all levels of recursion are shown, only four lines intersect any vertical slice of the graph.)

Table 7.4 shows a total of 652 cells. This is only an approximation because of round- off errors, but it’s close to 100 times the logarithm to the base 2 of 100, which is

6.65. Thus, this informal analysis suggests the validity of the N*log 2 N running time for quicksort.

More specifically, in the section on partitioning, we found that there should be N+2 comparisons and fewer than N/2 swaps. Multiplying these quantities by log 2 N for various values of N gives the results shown in Table 7.5.

TABLE 7.5 Swaps and Comparisons in Quicksort

N 8 12 16 64 100 128

log 2 N 3 3.59 4 6 6.65 7 N*log 2 N

Comparisons: (N+2)*log 2 N

Swaps: fewer than N/2*log 2 N

The log 2 N quantity used in Table 7.5 is actually true only in the best-case scenario, where each subarray is partitioned exactly in half. For random data the figure is slightly greater. Nevertheless, the QuickSort1 and QuickSort2 Workshop applets approximate these results for 12 and 100 bars, as you can see by running them and observing the Swaps and Comparisons fields.

Because they have different cutoff points and handle the resulting small partitions differently, QuickSort1 performs fewer swaps but more comparisons than QuickSort2. The number of swaps shown in Table 7.5 is the maximum (which assumes the data is inversely sorted). For random data the actual number of swaps turns out to be one- half to two-thirds of the figures shown.

Efficiency of Quicksort