A binary search tree is a very useful type of data structure which preserves ordering throughout the tree and allows for fast insertions and retrievals. In this article, we'll outline the foundational concepts, walk through some code examples, and in the last sections, discuss some benefits and drawbacks of using binary search trees. Lastly, we will provide a full code implementation in multiple languages.
Binary Trees
A binary tree is a type of tree structure where each node can have at most two children. The following sections in this article will assume you have an understanding of the concepts of basic binary trees. For a complete overview of the basic concepts of binary trees with diagrams and code examples, please refer to our introduction to binary trees.
Binary Search Trees
A binary search tree is a special type of binary tree which has the property that the value of a node is greater than all values in its left child node's subtree and less than those in its right child node's subtree. We will state that a binary tree with this property is sorted. When implementing any type of binary search tree, the implementation must ensure that any insertion or deletion from the tree does not leave it in an un-sorted state. We can accomplish this by starting with the root node, and at each node, decide whether to traverse down the left path or right path based on how the value of the node compares to the value we're inserting or deleting. We'll cover this process in more detail in the sections below.
Binary Search Tree Node
We can implement a binary tree node with a few getter and setter methods / properties as shown below. In our example, we'll also place a few utility functions to facilitate some tree operations shown in the following sections. In concept, an entire binary tree can be represented in a recursive manner via reference to a single node, that reference being the root node of the tree. However, in this article, we will make a seperate class to encapsulate a binary search tree and it's operations.
Inserting Elements
To insert an element, we must ensure that we find the correct position for the new element based on how the tree is ordered so that we preserve the tree's sorting. For example: if we tried to place a value larger than anything in the tree as a child of the leftmost leaf, and the tree has more than one element, the tree will no longer be sorted. To find the correct position to insert the item, our algorithm will recursively search through the nodes of the tree until it finds that position. In our implementation, we will use a while loop to iterate through the nodes with a search node reference, and at each traversal, iterate the search node reference to a child node.
The flowchart below outlines the process, starting with the purple oval on the top left:
Notice the handle special case state, which is reached when we are trying to insert a value which already exists in the tree. Depending upon the type of binary search tree, you may want to ignore duplicates or allow for them. The action taken at the special state depends upon whether or not your search tree will allow for duplicates. The implementation below will ignore duplicates; we'll cover that in a later section.
Searching for Elements
The process for checking if an element exists in the tree is similar to the process for inserting an element. If at any point in the node traversal, a node is found which is equal to the search value, we know the tree has the value. If we get to a point where the child node we would need to traverse to is null, then the value does not exist in the tree.
The flowchart below outlines the process, starting with the purple oval on the top left:
The implementation below uses two methods: hasValue and findNodeByValue. The findNodeByValue method is a helper method which will also be used by other methods we'll discuss in later sections.
Deleting Elements
The process for deleting an element is more complex than those for inserting and finding one. In our implementation, we will have our remove method return true if something was deleted and false otherwise. To delete, we will need to find the node with the value we want to remove, then take one of three actions depending on how many children the node has:
- The node has 0 children: remove the node and take no further action.
- The node has 1 child: replace the node with its own child.
- The node has 2 children: find the node's predecessor, replace the node to delete's value with that of the predecessor, and delete the predecessor node instead.
A node's predecessor is a descendant node which has the largest value that is still smaller than the original node's own value. The method for finding this value is very similar to that of finding any given value as described in previous sections. The flowchart below outlines the full process for deleting a node:
The implementation below covers the functionality for deletion and for finding a node's predecessor:
Iterating Over All Elements
One thing our implementation cannot yet do is allow the consumer to iterate over all nodes in order. To provide this functionality, we will need to implement a method in our class which will perform a depth first traversal over the tree. This means we will access items based on how far left they are. More formally, we will recursively iterate over all nodes, starting with the left child of the root node, and recursively iterate left until no more nodes are available. Then, we will invoke some callback function, backtrack, and repeat the process.
In our implementation, we will use loops instead of recursive function calls to accomplish this. The diagram below outlines the process:
The code below will allow for iteration via a forEach
method which will accept a callback and invoke it for each element.
Storing Duplicate Values
The code we've shown in the sections above assumes that our tree will not take duplicate values. If we want the tree to be able to accept duplicate values, we can modify our node implementation have, in addition to its nodeValue propert, a count property to keep track of how many instances of each value exists in the tree. With this approach, we would need to make a few minor modifications to the code:
- Insertion: if a node with the value to insert exists, increment its count property. Otherwise, create a new node with a count of 1.
- Deletion: if the node has a count greater than one, decrement the count. Otherwise, follow the normal process to remove the node.
- ForEach: invoke the callback count number of times for each node instead of just once.
Tree Balance and Time Complexity
Binary search trees, in theory, can provide lookups of log(n)
time, but in practice, the worst case can degenerate to O(n)
, which is no better than that of linked lists. This is because a binary search tree can actually become a linked list if additional action is not taken. The image shown here is both a binary search tree and a linked list.
Operation | Average Case | Worst Case |
---|---|---|
Insertion | O(log n) | O(n) |
Deletion | O(log n) | O(n) |
Retrieval | O(log n) | O(n) |
If we want to improve the worst case scenario to O(log n)
, we will need to balance the search tree after each insertion and removal.
AVL Trees
An AVL Tree is a type of binary search tree which uses a process called balancing to keep the height of the search tree as small as possible. This process reduces the worst case scenarios discussed above to O(log n)
. To learn more about this type of tree, please visit our dev page which introduces and outlines the concepts: