A trie is an interesting type of data structure which represents words using a tree of characters. It supports fast insertions and deletions and is well suited for autocomplete functionality. In this article, we'll introduce the data structure, outline the core concepts, discuss how to implement insertions, deletions, etc, then show a full implementation in multiple languages. If you'd like to skip to the code, feel free to jump to the bottom of the article.
Tries
A trie, also known as a digital tree or prefix tree, is a type of search tree where each node stores individual characters in such a way that words are represented via parent-child node relationships. This structure is generally used for strings but can also be used for other types of data. Unlike many other commonly used tree structures, tries are not binary trees by design. We can add words by recursively inserting a node for each character in the word, check if a word exists via depth-first search, and delete a word by traversing the structure, then removing certain nodes and markers. A trie is a natural fit for autocomplete: given a prefix, find how many words in the trie begin with that prefix.
The diagram below shows a trie tree which has the words ant, ants, ate, bat, bats, born, bot, and bots. The asterisk "*" marks that a particular node represents the end of a word.
Implementation Strategy
To implement a trie, we will need to create a class which can support:
- Word Insertion: the class should support inserting new words while keeping existing words intact.
- Word Removal: it should support removing existing words while keeping other words intact.
- Word Lookup: it should be able to determine if a word exists in the trie.
- Prefix Lookup: it should be able to retrieve a list of all words within the structure which begin with some prefix.
All of these operations should be efficient; we should not need to iterate over every word in the trie to perform these operations. To satisfy these requirements, we will internally use a tree, where each node represents a character in one or more words. Each node will have one child for every word which has all nodes upto the current node as a prefix. In order to quickly access child nodes by character, we will use a map structure to provide O(1) access. If you are not familiar with maps, please visit our introduction to hash tables.
Insert a Word
To insert a word into a trie, we will need to traverse the path of the word we're trying to create, insert nodes along the way, then once all nodes have been inserted, mark that the last character corresponds to the end of a word. The full steps are:
- Create a variable visitNode and set it to the trie's root node.
- For each character in the word to add:
- Check if the visitNode has a child node with the current word character.
- If it does not, create a new node with that characfter and add it as a child of visitNode.
- Set the visitNode to that child.
- The visitNode reference now points to the node corresponding to the last character of the string. Mark it as the end of a word.
We will mark a node as an end of a word by setting a child reference's key to an asterisk "*" while setting its value to null, true, or some other non-node value. If we wanted to accept "*" as a character itself, we would need to find some other way to mark the end of a word, possibly by creating a seperate boolean field on the node object itself.
Check if a Word Exists
To check if a word exists in a trie, we will need to perform a depth-first traversal along the path of our string, then check at the end if the last node has an "end of word" marker. The marker will ensure we don't mark a prefix of a word as a valid word itself. The full steps are:
- Create a variable visitNode and set it to the trie's root node.
- For each character in the word to check:
- Check if the visitNode has a child node with the current word character.
- If it does not, return false and stop searching, as the word does not exist.
- Set the visitNode to that child.
- The visitNode reference now points to the node corresponding to the last character of the string.
- If it is marked as the end of a word, return true.
- Otherwise, return false.
This algorithm is similar to the algorithm for inserting a word.
Delete a Word
To delete a word, we will need to perform a depth-first search along the word's path, unmark the last node as being the end of a word, then move back and delete nodes if they are no longer required. We must be careful to leave other words intact; thus, we need to leave some nodes in place if they are still used by other words. The full steps are shown below:
- Create a variable isSearching and set to true.
- Create a variable visitNode and set to the trie's root node.
- Create a variable nodeStack as an empty stack object.
- For each character in the word to remove:
- Push visitNode to nodeStack.
- Check if visitNode has a child with the current character.
- If it does, set visitNode to that node.
- Otherwise, set isSearching to false.
- If isSearching is true and visitNode is the end of a word.
- Unmark visitNode as the end of a word.
- Create a variable prevNode.
- While nodeStack is not empty:
- Set prevNode to visitNode.
- Pop from nodeStack and set visitNode to that popped value.
- If prevNode is empty, remove visitNode's reference to it.
Note that unlike the other operation, we are using a stack. If we were using a recursive function implementation, the stack would be implicitly represented by the call stack, but since our approach uses looping, we need to explicitly use our own stack object. We need to move through the entire path before we can determine which nodes can and cannot be entirely removed.
Get a List of Words Which Start with a Prefix
To get a list of words which start with a prefix, we will need to preform a depth-first search down the prefix, then perform a traversal of the subtree from that last node, and at each node which is marked as the end of a word, add it to some words collection, then return that collection. In our algorithm, we will perform a breadth-first search at that point and use queues to process them. The full steps are given below:
- Create a variable isSearching and set to true.
- Create a variable visitNode and set to the trie's root node.
- For each character in the prefix:
- Check if the visitNode has a child node with the current word character.
- If it does not, set isSearching to false.
- Set the visitNode to that child.
- If isSearching is false, return an empty array immediately.
- Otherwise, create a variable words, an array of strings where full words will be stored.
- Create a variable nodeQueue, a queue of nodes.
- Create a variable queuePrefixes, a queue of strings.
- While we have nodes to visit:
- Get all of the next nodes visitNode points to.
- For each next node, add it to the nodeQueue and add prefix + the next node's character to queuePrefixes.
- If visitNode is marked as the end of a word, add prefix to words.
- Shift nodeQueue and set visitNode to the shifted value.
- Shift queuePrefixes and set prefix to the shifted value.
- return words
We can use the same approach to get a list of all words which exist in the trie. In our implementation shown at the bottom of this article, we use an empty prefix to query all words.
Statistics and Metadata for Words
In the beginning of this article, we stated that this data structure can be used to implement an efficient autocomplete functionality. As we've described the operations so far, we are only storing the words themselves, not any data about which ones are more common in natural language, which ones are searched for most often, etc. We can store this information directly within the trie structure by allowing nodes which are marked as ends of words to carry this additional information. Then, when we retrieve a list of words which start with some prefix, or all words in the structure, we can return an object which encapulates a word and its metadata instead of just a list of words.
Space Time Complexity
In our complexity analysis, we will use n to refer to the number of nodes in the trie structure and k to refer to the number of characters in an input string. The table below outlines the complexity for the operations described above.
Complexity | Notes | |
---|---|---|
Space | O(n) | One node per character per word, though some words will share nodes. |
Insert a Word | O(k) | Complexity is based on the length of the string to insert. |
Check if a Word Exists | O(k) | Complexity is based on the length of the string to insert. |
Remove a Word | O(k) | Complexity is based on the length of the string to insert. |
Get Words with Prefix | O(n) | Worst case scenario, we fetch every word in the trie and touch each node once. |
Full Implementation
In this implementation, we will use iterative algorithms instead of recursive algorithms to perform the operations discussed in the previous sections. Our TrieNode class will encapsulate the node's requirement of having a map of children, being marked as the end of a word, and holding a character value. The CustomTrie class will hold a root node with an empty/throwaway character. Its chilren will correspond to the first letter of each word it holds.