Merge sort
Next, we will
examine two recursive algorithms that achieve better efficiency than the
methods that we’ve looked at so far. These algorithms use a technique called divide-and-conquer.
They split the data into two (or more) subsets, solve the problem on the
smaller sets, and then combine the sets again. Merge sort is the more
straightforward of the two. This method just breaks the array into two subsets
by cutting it in the middle, sorts both subsets (recursively), and then
combines them again. Let’s return to our simple example:
Array: 3 2 5 4 1
Split: 3 2 5 4 1
Sort (recursively): 2 3 5 1
4
Merge: 1 2 3 4 5
Since the recursive
calls keep splitting the set until the base case (0 or 1 items in the list), it
is actually the merging step where most of the work is done.
The basic algorithm
is very simple:
template <typename
Comparable>
void mergeSort(vector<Comparable>
&a, int first, int
last) {
if
(first < last) {
int mid = (first + last) / 2;
mergeSort(a, first, mid);
mergeSort(a, mid + 1, last);
merge(a,
first, mid, last);
}
}
However, we now need
a function to merge the sets. Here is one implementation:
template <typename
Comparable>
void
merge(vector<Comparable> &a, int first, int mid, int last) {
vector<Comparable>
tmp( last – first + 1);
int first1 = first;
int last1 = mid;
int first2 = mid + 1;
int last2 = last;
// As long as
both lists still have elements, add the next.
int index = 0;
while
((first1 <= last1) && (first2 <= last2))
if (a[first1] < a[first2]) {
tmp[index] = a[first1];
first1++;
}
else {
tmp[index] = a[first2];
first2++;
}
index++;
// One of the
lists still has elements, so add them now.
while
(first1 <= last1) {
tmp[index] = a[first1];
index++;
first1++;
}
while
(first2 < last2) {
tmp[index] = a[first2];
index++;
first2++;
}
//
Copy back into the array.
for
(index = 0; index < last - first + 1; index++)
a[index
+ first] = tmp[index];
}
Now, let’s examine
how long this algorithm requires to run. Since the algorithm is recursive, we
can measure the efficiency using a recurrence relation. Note that the merge
must copy n elements into tmp and then move
them back, so this step requires O(n) time.
T(n) = 2T(n / 2) + n
T(1) = 0
Using iteration, we
get:
T(n) = 4T(n / 4) +
2n / 2 + n
T(n) = 8T(n / 8) +
4n / 4 + 2n / 2 + n
We get n steps for
each level until n / 2k reaches 1, which occurs when k = log n.
Since each step requires O(n) work, the
overall complexity is n log n. This is true in the best, worst, and average
cases.
This efficiency is
considerably better than the other algorithms that we have seen so far. For an
array with 1024 elements, the previous algorithms have required as many as
523776 comparisons. Merge sort performs no more than 10240 comparisons. The primary
drawback to merge sort is that it requires a temporary array to store
values. The extra storage needed is as big as the size of the original array
for the final merge step. This can be a significant drawback, if memory is
limited.
Quick sort
The last of the conventional
sorting algorithms that we will examine is the quick sort, which is the most
commonly used sorting algorithm for large arrays because it is (usually)
both time efficient (like merge sort) and space efficient (unlike merge sort).
It is a little more complicated than the algorithms we’ve seen so far. In quick
sort, we choose one particular element that is called the “pivot.” The array is
then divided into all elements that are less than the pivot (these are placed
at the beginning of the array) and all elements that are greater than (or equal
to) the pivot (these are placed at the end of the array). The algorithm is then
recursively called on both of the smaller sets. When the algorithm finishes,
the data is completely sorted.
Example: 3 2 5 4 1
Choose pivot: 3
Partition: 2 1 3 5
4
Sort recursively: 1 2 3 4
5
Once again, the
overall algorithm looks simple, if we hide the helper function:
template <typename
Comparable>
void quickSort(vector<Comparable>
&a, int first, int
last) {
if
(first < last) {
int pivotIndex;
//
Partition the array into two sets.
partition(a,
first, last, pivotIndex);
//
Sort both sets recursively
quickSort(a, first, pivotIndex –
1);
quickSort(a, pivotIndex + 1,
last);
}
}
How should we
partition the data according to the method described above? A good method is to
walk through the array starting from one end (let’s say the start). Any
item that is below the pivot is skipped, since it is on the correct side of
the array. Any item that is on the wrong side is swapped with the last
“unknown” item at the end of the array. The item at this position is no
longer unknown, so we keep an index to the last position that is not a swapped
item. Typically, the pivot is moved to the beginning of the array for the duration
of the partitioning.
Example: 3 2 5 4 1 (3 is the pivot)
The first number
after the pivot is 2. Since it belongs on the “left” side of the array, we skip
it. The next number is 5, it belongs on the “right” side of the array, since it
is greater than 3, so we swap it with 1:
3 2 1 4 5
Now, the first
unknown is the 1 and the last unknown is the 4. The 1 is on the correct side,
so we skip it. The 4 belongs at the end. It is also the last unknown, so our algorithm
(which isn’t very smart) swaps it with itself and then we are done.
After partitioning, we should put the pivot between the sets that we just created and
it should not be included in the recursive calls. This guarantees that the
problem gets smaller at each step.
Here is one version
of the partitioning algorithm:
template <typename
Comparable>
void
partition(vector<Comparable> &a, int first,
int last, int &pivotIndex) {
Comparable
pivot = a[first]; // select
first element as pivot (bad choice?)
int firstUnknown = first + 1; // first is the pivot
int lastUnknown = last;
//
As long as we haven't reached the last unknown,
//
swap the element if it belongs in the second part of the array.
while
(firstUnknown <= lastUnknown)
{
if
(a[firstUnknown] >= pivot) {
swap(a[firstUnknown], a[lastUnknown]);
lastUnknown--;
}
else
firstUnknown++;
}
//
Move the pivot to the spot between the partitioned sets.
swap(a[first],
a[lastUnknown]);
pivotIndex = lastUnknown;
}
This completes the algorithm,
except for one issue: How should the pivot be chosen? In the above
method, we always chose the first element in the array. However, if the list is
already sorted, this will not give us good results. Each time we partition the
array, the set below the pivot will be empty, and we will perform a recursive
call with all but one of the elements (the pivot) in the other set. There are a
few ways to avoid this problem. More than one can be used: (notice if we use a
pivot that is not an element in the array, the above partition()
algorithm must be modified slightly (how?)):
Let’s analyze the
running time for quick sort. First, how efficient is the partitioning method?
This can perform up to n – 1 comparisons. Everything else is O(1), so
the overall time is O(n). We can now evaluate quick sort using a
recurrence relation. In the worst case, the pivot is the smallest or largest
element and we still have to sort n – 1 elements, so we get:
T(n) = n + T(n – 1)
T(1) = O(1)
Using iteration,
this is:
T(n) = n + (n – 1) + (n – 2) + …
+ O(1)
This is O(n2),
which doesn’t look very good. So, why do people use quick sort? It is very
uncommon for the worst case to occur for every recursive call. On average (if
we assume that the chance of choosing any item as the pivot is equal), the
efficiency is O(n log n). In fact, the number of comparisons is usually less than
merge sort and it is usually faster, in practice. Let’s look at the best-case
complexity. The best case occurs when the pivot is the median value, thus the
two recursive calls are problems with approximately half the size of the
original problem. This recurrence looks the same as merge sort:
T(n) = 2T(n / 2) + O(n) = O(n
log n).
The average case is
also O(n log n). We won’t analyze it fully in this class, since it is more
complex. In fact, O(n log n) is the best that can be done if we
perform the sorting by comparing elements. The reasoning for this is
interesting, but not straightforward. Overall, there are n!
possible outcomes of the sorting routine (there are n!
possible permutations of the input that could be the correct output). Each
comparison between two elements allows you to narrow down the possible
outcomes. In the best case, the possible outcomes are divided into two sets of
equal size and one can be discarded. (If the sets are not equal, then we might
have the worse outcome of being discarding the smaller set.) If we continue
this process, we can only get down to one outcome after log n! steps. Since
(n/2)n/2 < n! < nn, log
n! is O(n log n).
Radix sort
The final sorting
algorithm that we are going to discuss is radix sort, which is very different
from the other sorting algorithms that we have looked at so far, since no
comparisons are ever done between array elements. The basic idea is to group
the elements into sets, where the sets can be placed in a known order. An
example is in sorting 3 digit numbers between 000 and 999. All numbers that
start with a 0 can be placed in a group, all numbers starting with 1 in
another, and so on. When we are done with the first digit, we know what order
to place the sets in, but we must still sort within each set. This is performed
using the same process on the second digit of the number (recursively or
iteratively). In practice, this is usually done in the reverse order of the
digits, so that the most important digit is examined last. However, for each
step, this means we must keep the elements with the same digit at the current
position in the same order that they were in from the previous step to make
sure that the numbers stay sorted correctly. Here is an example:
532 294 534 256 123
Sorting by the least
significant digit yields:
532 123 294 534 256
Sorting by the
second digit yields:
123 532 534 256 294
Finally, sorting by
the first digit yields:
123 256 294 532 534
What does the
algorithm look like? I’m not going to give complete code, but the basic ideas
are as follows:
radixSort(vector<integer>
&a, int d) { //
d is the number of digits
//
Sorts d-digit integers
for
(j = d; j > 0; j--) {
Initialize
10 groups to empty
for
(i = 0; i < a.size(); i++) {
k
= jth digit of a[i]
Place
a[i] at the end of group k
}
Replace
the items in a with all the items in group 0, group 1, etc.
}
}
This algorithm has
significant disadvantages. First, it can only be used with elements
that have a restricted range of values (d digit numbers, strings
with no more than d characters, etc.) In addition, it requires several
groups to be maintained, any of which may be required to store n objects,
so space is a significant issue. This can be implemented efficiently, but
require the use of a linked list (or other data structure). So, why would you
use this algorithm? The reason is that the efficiency is very good. The
first loop executes d times, but this is a constant that is independent of
n. The inner loop executes n times every time the outer loop executes.
Everything else in the loops is O(1). So, the total time is O(dn), which is O(n), if d is treated as a constant.
Summary
Sorting algorithms
can be summarized as follows:
Worst
case Average case Comments
Selection sort O(n2) O(n2) Use only for small arrays
Bubble sort O(n2) O(n2) Use only for small arrays
or nearly sorted arrays
Insertion sort O(n2) O(n2) Use only for small arrays
or incremental processing
Shell sort O(n log2 n) O(n
log2 n) Interesting
algorithm, but worse than recursive sorts
Merge sort O(n
log n) O(n log n) Good algorithm, but requires extra
space
Quick sort O(n2) O(n
log n) Good general purpose
algorithm
Radix sort O(dn) O(dn) Not
suitable for many applications
Quick sort is
probably the best general-purpose algorithm, but the others can be used in
special circumstances.
Carrano 5th Ed, Chapter 9: #16, #18, #24, #25
Carrano 6th Ed, Chapter 11: #10, #12, #16, #17