Lecture Notes 13 – Sorting

CSS 501 - Mathematical Principles of Computing

To be covered in this lecture:

Sorting algorithms
Complexity

Merge sort

Next, we will examine two recursive algorithms that achieve better efficiency than the methods that we’ve looked at so far. These algorithms use a technique called divide-and-conquer. They split the data into two (or more) subsets, solve the problem on the smaller sets, and then combine the sets again. Merge sort is the more straightforward of the two. This method just breaks the array into two subsets by cutting it in the middle, sorts both subsets (recursively), and then combines them again. Let’s return to our simple example:

Array: 3 2 5 4 1

Split: 3 2 5 4 1

Sort (recursively): 2 3 5 1 4

Merge: 1 2 3 4 5

Since the recursive calls keep splitting the set until the base case (0 or 1 items in the list), it is actually the merging step where most of the work is done.

The basic algorithm is very simple:

template <typename Comparable>

void mergeSort(vector<Comparable> &a, int first, int last) {

if (first < last) {

int mid = (first + last) / 2;

mergeSort(a, first, mid);

mergeSort(a, mid + 1, last);

merge(a, first, mid, last);

}

However, we now need a function to merge the sets. Here is one implementation:

template <typename Comparable>

void merge(vector<Comparable> &a, int first, int mid, int last) {

vector<Comparable> tmp( last – first + 1);

int first1 = first;

int last1 = mid;

int first2 = mid + 1;

int last2 = last;

// As long as both lists still have elements, add the next.

int index = 0;

while ((first1 <= last1) && (first2 <= last2))

if (a[first1] < a[first2]) {

tmp[index] = a[first1];

first1++;

} else {

tmp[index] = a[first2];

first2++;

}

index++;

// One of the lists still has elements, so add them now.

while (first1 <= last1) {

tmp[index] = a[first1];

index++;

first1++;

}

while (first2 < last2) {

tmp[index] = a[first2];

index++;

first2++;

}

// Copy back into the array.

for (index = 0; index < last - first + 1; index++)

a[index + first] = tmp[index];

}

Now, let’s examine how long this algorithm requires to run. Since the algorithm is recursive, we can measure the efficiency using a recurrence relation. Note that the merge must copy n elements into tmp and then move them back, so this step requires O(n) time.

T(n) = 2T(n / 2) + n

T(1) = 0

Using iteration, we get:

T(n) = 4T(n / 4) + 2n / 2 + n

T(n) = 8T(n / 8) + 4n / 4 + 2n / 2 + n

We get n steps for each level until n / 2^k reaches 1, which occurs when k = log n. Since each step requires O(n) work, the overall complexity is n log n. This is true in the best, worst, and average cases.

This efficiency is considerably better than the other algorithms that we have seen so far. For an array with 1024 elements, the previous algorithms have required as many as 523776 comparisons. Merge sort performs no more than 10240 comparisons. The primary drawback to merge sort is that it requires a temporary array to store values. The extra storage needed is as big as the size of the original array for the final merge step. This can be a significant drawback, if memory is limited.

Quick sort

The last of the conventional sorting algorithms that we will examine is the quick sort, which is the most commonly used sorting algorithm for large arrays because it is (usually) both time efficient (like merge sort) and space efficient (unlike merge sort). It is a little more complicated than the algorithms we’ve seen so far. In quick sort, we choose one particular element that is called the “pivot.” The array is then divided into all elements that are less than the pivot (these are placed at the beginning of the array) and all elements that are greater than (or equal to) the pivot (these are placed at the end of the array). The algorithm is then recursively called on both of the smaller sets. When the algorithm finishes, the data is completely sorted.

Example: 3 2 5 4 1

Choose pivot: 3

Partition: 2 1 3 5 4

Sort recursively: 1 2 3 4 5

Once again, the overall algorithm looks simple, if we hide the helper function:

template <typename Comparable>

void quickSort(vector<Comparable> &a, int first, int last) {

if (first < last) {

int pivotIndex;

// Partition the array into two sets.

partition(a, first, last, pivotIndex);

// Sort both sets recursively

quickSort(a, first, pivotIndex – 1);

quickSort(a, pivotIndex + 1, last);

}

How should we partition the data according to the method described above? A good method is to walk through the array starting from one end (let’s say the start). Any item that is below the pivot is skipped, since it is on the correct side of the array. Any item that is on the wrong side is swapped with the last “unknown” item at the end of the array. The item at this position is no longer unknown, so we keep an index to the last position that is not a swapped item. Typically, the pivot is moved to the beginning of the array for the duration of the partitioning.

Example: 3 2 5 4 1 (3 is the pivot)

The first number after the pivot is 2. Since it belongs on the “left” side of the array, we skip it. The next number is 5, it belongs on the “right” side of the array, since it is greater than 3, so we swap it with 1:

3 2 1 4 5

Now, the first unknown is the 1 and the last unknown is the 4. The 1 is on the correct side, so we skip it. The 4 belongs at the end. It is also the last unknown, so our algorithm (which isn’t very smart) swaps it with itself and then we are done.

After partitioning, we should put the pivot between the sets that we just created and it should not be included in the recursive calls. This guarantees that the problem gets smaller at each step.

Here is one version of the partitioning algorithm:

template <typename Comparable>

void partition(vector<Comparable> &a, int first, int last, int &pivotIndex) {

Comparable pivot = a[first]; // select first element as pivot (bad choice?)

int firstUnknown = first + 1; // first is the pivot

int lastUnknown = last;

// As long as we haven't reached the last unknown,

// swap the element if it belongs in the second part of the array.

while (firstUnknown <= lastUnknown) {

if (a[firstUnknown] >= pivot) {

swap(a[firstUnknown], a[lastUnknown]);

lastUnknown--;

}

else

firstUnknown++;

}

// Move the pivot to the spot between the partitioned sets.

swap(a[first], a[lastUnknown]);

pivotIndex = lastUnknown;

}

This completes the algorithm, except for one issue: How should the pivot be chosen? In the above method, we always chose the first element in the array. However, if the list is already sorted, this will not give us good results. Each time we partition the array, the set below the pivot will be empty, and we will perform a recursive call with all but one of the elements (the pivot) in the other set. There are a few ways to avoid this problem. More than one can be used: (notice if we use a pivot that is not an element in the array, the above partition() algorithm must be modified slightly (how?)):

Check to see whether the array is sorted before you start and/or in the partitioning routine.
Choose the pivot element at random. This makes the chances of the worst-case very small for large arrays.
Choose the pivot as the median of the first, last, and middle objects.
Use a linear median finding algorithm to select the median. This guarantees that the algorithm uses the best pivot. However, this method is not typically used, since the hidden constant in the median finding algorithm makes the sorting slower on average.

Let’s analyze the running time for quick sort. First, how efficient is the partitioning method? This can perform up to n – 1 comparisons. Everything else is O(1), so the overall time is O(n). We can now evaluate quick sort using a recurrence relation. In the worst case, the pivot is the smallest or largest element and we still have to sort n – 1 elements, so we get:

T(n) = n + T(n – 1)

T(1) = O(1)

Using iteration, this is:

T(n) = n + (n – 1) + (n – 2) + … + O(1)

This is O(n²), which doesn’t look very good. So, why do people use quick sort? It is very uncommon for the worst case to occur for every recursive call. On average (if we assume that the chance of choosing any item as the pivot is equal), the efficiency is O(n log n). In fact, the number of comparisons is usually less than merge sort and it is usually faster, in practice. Let’s look at the best-case complexity. The best case occurs when the pivot is the median value, thus the two recursive calls are problems with approximately half the size of the original problem. This recurrence looks the same as merge sort:

T(n) = 2T(n / 2) + O(n) = O(n log n).

The average case is also O(n log n). We won’t analyze it fully in this class, since it is more complex. In fact, O(n log n) is the best that can be done if we perform the sorting by comparing elements. The reasoning for this is interesting, but not straightforward. Overall, there are n! possible outcomes of the sorting routine (there are n! possible permutations of the input that could be the correct output). Each comparison between two elements allows you to narrow down the possible outcomes. In the best case, the possible outcomes are divided into two sets of equal size and one can be discarded. (If the sets are not equal, then we might have the worse outcome of being discarding the smaller set.) If we continue this process, we can only get down to one outcome after log n! steps. Since (n/2)^n/2 < n! < nⁿ, log n! is O(n log n).

Radix sort

The final sorting algorithm that we are going to discuss is radix sort, which is very different from the other sorting algorithms that we have looked at so far, since no comparisons are ever done between array elements. The basic idea is to group the elements into sets, where the sets can be placed in a known order. An example is in sorting 3 digit numbers between 000 and 999. All numbers that start with a 0 can be placed in a group, all numbers starting with 1 in another, and so on. When we are done with the first digit, we know what order to place the sets in, but we must still sort within each set. This is performed using the same process on the second digit of the number (recursively or iteratively). In practice, this is usually done in the reverse order of the digits, so that the most important digit is examined last. However, for each step, this means we must keep the elements with the same digit at the current position in the same order that they were in from the previous step to make sure that the numbers stay sorted correctly. Here is an example:

532 294 534 256 123

Sorting by the least significant digit yields:

532 123 294 534 256

Sorting by the second digit yields:

123 532 534 256 294

Finally, sorting by the first digit yields:

123 256 294 532 534

What does the algorithm look like? I’m not going to give complete code, but the basic ideas are as follows:

radixSort(vector<integer> &a, int d) { // d is the number of digits

// Sorts d-digit integers

for (j = d; j > 0; j--) {

Initialize 10 groups to empty

for (i = 0; i < a.size(); i++) {

k = jth digit of a[i]

Place a[i] at the end of group k

}

Replace the items in a with all the items in group 0, group 1, etc.

}

This algorithm has significant disadvantages. First, it can only be used with elements that have a restricted range of values (d digit numbers, strings with no more than d characters, etc.) In addition, it requires several groups to be maintained, any of which may be required to store n objects, so space is a significant issue. This can be implemented efficiently, but require the use of a linked list (or other data structure). So, why would you use this algorithm? The reason is that the efficiency is very good. The first loop executes d times, but this is a constant that is independent of n. The inner loop executes n times every time the outer loop executes. Everything else in the loops is O(1). So, the total time is O(dn), which is O(n), if d is treated as a constant.

Summary

Sorting algorithms can be summarized as follows:

Worst case Average case Comments

Selection sort O(n²) O(n²) Use only for small arrays

Bubble sort O(n²) O(n²) Use only for small arrays or nearly sorted arrays

Insertion sort O(n²) O(n²) Use only for small arrays or incremental processing

Shell sort O(n log² n) O(n log² n) Interesting algorithm, but worse than recursive sorts

Merge sort O(n log n) O(n log n) Good algorithm, but requires extra space

Quick sort O(n²) O(n log n) Good general purpose algorithm

Radix sort O(dn) O(dn) Not suitable for many applications

Quick sort is probably the best general-purpose algorithm, but the others can be used in special circumstances.

Practice problems (optional):

Carrano 5^th Ed, Chapter 9: #16, #18, #24, #25

Carrano 6^th Ed, Chapter 11: #10, #12, #16, #17