With its time complexity of O(n log(n)) heapsort is optimal. It utilizes a special data structure called heap. This data structure is explained in the following.
Definition: Let T = (V, E) an almost complete binary tree with a vertex labelling a : V → M that assigns to each vertex u a label a(u) from an ordered set (M, ≤ ).
A vertex u ∈ V has the heap property if it has no direct descendant with a greater label, i.e.
∀ v ∈ V : (u, v) ∈ E ⇒ a(u) ≥ a(v)
T is a heap if all vertices have the heap property, i.e.
∀ (u, v) ∈ E : a(u) ≥ a(v)
We call T a semi-heap if all vertices except possibly the root r have the heap property, i.e.
∀ (u, v) ∈ E, u ≠ r : a(u) ≥ a(v)
Example:
Figure 1: Heap with n = 10 vertices
Observe that each leaf automatically has the heap property regardless of its label, since it has no descendants.
The data structure of the heapsort algorithm is a heap. The data sequence to be sorted is stored as the labels of the binary tree. As shown later, in the implementation no pointer structures are necessary to represent the tree, since an almost complete binary tree can be efficently stored in an array.
The following description of heapsort refers to Figure 2 (a) - (e).
Figure 2: Retrieving the maximum element and restoring the heap
If the sequence to be sorted is arranged as a heap, the greatest element of the sequence can be retrieved immediately from the root (a). In order to get the next-greatest element, the rest of the elements have to be rearranged as a heap.
The rearrangement is done in the following way:
Let b be a leaf of maximum depth. Write the label of b to the root and delete leaf b (b). Now the tree is a semi-heap, since the root possibly has lost its heap property.
Making a heap from a semi-heap is simple:
Do nothing if the root already has the heap property, otherwise exchange its label with the maximum label of its direct descendants (c). Let this descendant be v. Now possibly v has lost its heap property. Proceed further with v, i.e. make a heap from the semi-heap rooted at v (d). This process stops when a vertex is reached that has the heap property (e). Eventually this is the case at a leaf.
Making a heap from a semi-heap can conceptionally be implemented by the following procedure downheap:
procedure downheap (v)
Input:
semi-heap with root v
Output:
heap (by rearranging the vertex labels)
Method:
exchange a(v) and a(w)
set v := w
Procedure downheap can be used to build a heap from an arbitrarily labelled tree. By proceeding bottom-up, downheap is called for all subtrees rooted at inner vertices. Since leaves are already heaps they may be omitted.
procedure buildheap
Input:
almost complete binary tree T of depth d(T) with vertex labelling a
Output:
heap (by rearranging the vertex labels)
Method:
The following simulation illustrates how to transform an almost complete binary tree to a heap by procedure buildheap:
for all inner nodes v repeat (bottom-up)
downheap(v)
A call of buildheap is the first step of procedure heapsort, which can now be written down as follows:
procedure heapsort
Input:
almost complete binary tree with root r and vertex labelling a
Output:
vertex labels in descending order
Method:
while r is not a leaf do
choose leaf b of maximum depth
write label a(b) to r
delete leaf b
downheap (r)
output a(r)
In a simulation of algorithm heapsort the labels of the nodes are rearranged in such a way that in the end the tree contains the sorted sequence.
An almost complete binary tree with n vertices has a depth of at most log(n). Therefore, procedure downheap requires at most log(n) steps. Procedure buildheap calls downheap for each vertex, therefore it requires at most n·log(n) steps. Heapsort calls buildheap once; then it calls downheap for each vertex, together it requires at most 2·n·log(n) steps.
Thus, the time complexity of heapsort is T(n) ∈ O(n·log(n)). The algorithm is optimal, since the lower bound of the sorting problem is attained.
An almost complete binary tree with n vertices and vertex labelling a can be stored most efficiently in an array a:
All vertices at positions 0, ..., n/2-1 are inner nodes, all vertices n/2, ..., n-1 are leaves (integer division / ). The heap with n = 10 vertices of Figure 1 is shown in Figure 3 as an example.
Figure 3: Array representation of the heap of Figure 1
Using this array representation of the heap, heapsort can be implemented as an in-place sorting algorithm. In each step of heapsort, the root label a(r) is not output but stored at the position of the leaf b that is deleted in the following. Deleting leaf b means to consider just the array elements left of b as the remaining heap.
In other words: the four steps
are replaced by an exchange of the root label and the label of b:
The following Java class HeapSorter encapsulates the functions downheap, buildheap and heapsort. In order to sort an array b, Heapsort is called with the statement HeapSorter.sort(b).
With its time complexity of O(n log(n)) heapsort is optimal. Unlike mergesort, heapsort requires no extra space. On the other hand, heapsort is not stable. A sorting algorithm is stable, if it leaves the order of equal elements unchanged.
The heap data structure can also be used for an efficient implementation of a priority queue. A priority queue is an abstract list data structure with the operations insert and extractMax. With each element in the list a certain priority is associated. By insert an element together with its priority is inserted into the list. By extractMax the element with the highest priority is extracted from the list. Using a heap, both operations can be implemented in time O(log(n)).
Heapsort was originally published by J.W.J. Williams as Algorithm 232 in the journal Communications of the ACM [Wil 64].
[AHU 74] A.V. Aho, J.E. Hopcroft, J.D. Ullman: The Design and Analysis of Computer Algorithms. Addison-Wesley (1974)
[CLRS 01] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein: Introduction to Algorithms. 2nd edition, The MIT Press (2001)
[Sed 88] R. Sedgewick: Algorithms. 2nd edition, Addison-Wesley (1988)
[Wil 64] J.W.J. Williams: Algorithm 232: Heapsort. Communications of the ACM, 7, 6, 347-348 (1964)
Next: [Mergesort] or [up]