A binary heap is an interesting and useful data structure based on the concept of binary trees. In this article, we'll introduce this data structure, discuss the core concepts, create an implementation strategy, then show a full code implementation in multiple languages. If you're not familiar with binary trees, please read our introduction to binary trees to ensure you have the required context to move forward.
Heaps
A heap is a tree with a restriction on how nodes can be arranged. There are two types of heaps: min heaps and max heaps. A max heap is a heap which has the property that for any given node, all decendants of that node have a value less than the node. A min heap has the reverse: all decendants have a value greater than. In both types, the additional restriction gives us certain guarantees about the state of the tree at any given time: the root of the tree will always contain the min/max value. Since they are closely related, we will only discuss max heaps for the remainder of this article.
A binary heap is a binary tree which has the properties discussed above, and one which is complete. This is used most commonly when working with priority queues, a type of queue where items are dequeued based on highest priority instead of order of insertion.
Complete Binary Trees
A complete binary tree is a tree where all rows above the bottom row are entirely filled, and the bottom row is itself entirely filled or partially filled from the left. In other words, there can be no "gaps" in the bottom row except for one to the right of the last node. The diagram below shows a few example of complete and incomplete binary trees. The orange nodes represent missing nodes which cause the tree to be incomplete.
Insertion
When inserting a value, we must ensure that the tree remains complete and the min/max heap value is preserved. The steps to insert a value while preserving these properties are given below:
- Create a new node with the value to insert: since the tree must remain complete at all times, there is only one possible place to insert the node.
- If the node's value is greater than its parent's value:
- Swap the node value with its parent value.
- Repeat step 2 above with the parent node we've just updated.
With this algorithm, large values will "bubble up to the top" as they are added in. The heap structure is not guaranteed to be fully sorted, but larger values will tend to rise to the top, and it is guaranteed that the root will have the largest value at all times.
Removal
In a heap, the only value which will be removed at any given time is the value at the root node. The steps to remove the root node while ensuring the tree remains complete and maintains the heap property are given below:
- Set the root node to the value of the last node. This will delete the original value by overwriting it.
- Delete the last node. Remove the node reference to the last node, now that its value is in the root.
- If the node's value is smaller than its largest child's value:
- Swap the node's value with its largest child's value.
- Repeat step 3 above with the largest child.
With this algorithm, once the root node is swapped with the value of the last item, that last item will "bubble down" until it finds its correct position. This guarantees that the new root value will contain the largest of the remaining values.
Implementation Strategy
Unlike other tree structures, we can conveniently implement a min/max binary heap using an array, array-list, or similar structure. Since the tree is always complete, we will not need to accomodate holes/gaps; thus, a continuous sequence of values will be enough to implicitly represent the tree with little complexity. Each index within the array can be considered a node, and the indices of a node's parent or children can be calculated as follows:
- Left Child Index:
(nodeIndex + 1) * 2 - 1
- Right Child Index:
(nodeIndex + 1) * 2
- Parent Index:
floor((nodeIndex - 1) / 2)
Care must be taken when calculating these indices, as a potential "off by one" error can easily appear. An interesting note: if heaps provided the ability to iterate over all nodes: looping through each item in the array, in order, would be equivalent to performing a breadth-first search on the heap tree.
Time Complexity
Heaps provide fast performance for insertions and removals and always guarantee the value removed is the min/max value in the structure.
Operation | Worst Case Complexity | Notes |
---|---|---|
Find Max Value | O(1) | Max heap structure guarantees the max value will always be the root node. |
Insert | O(log n) | The upper bound on the number of operations is related to the tree's height. |
Remove | O(log n) | The upper bound on the number of operations is related to the tree's height. |
Full Implementation
The implementation below brings everything discussed above together into a "CustomMaxHeap" class: