Graph traversal is the process of visiting nodes within a graph. During a grpah traversal, we might be searching for one or more values, updating nodes, deleting nodes, or doing some type of other process for each node, such as printing the node's value to the screen. In this article, we'll introduce two fundamental algorithms: depth-first search and breadth-first search. We'll introduce them in the context of searching for values, outline when one vs the other algorithm should be used, and outline additional ways these algorithms can be used other than searching. We'll also showcase full code implementations for these algorithms in multiple programming languages.
Connected and Disconnected Graphs
Before proceeding to our overview of graph algorithms, we will need to introduce the concepts of connected and disconnected graphs. A graph is called connected if, for every pair of vertices v and u, a path exists which connects the two vertices. A graph is called disconnected if there exists any pair of vertices which do not have a path connecting them. For any graph which is disconnected, there are at least two subgraphs which are themselves connected.
Searching In Graphs
Generally speaking, the purpose of a graph traversal is to either find some value in a graph, or check if some value exists in a graph. During the traversal process, the algorithm will visit nodes in a certain order and check each node. If the node has the expected value, the algorithm will stop and indicate that it has found what it is looking for. If the node does not, the algorithm will proceed with the next node based on some explicit or implicit node ordering. In the algorithms we'll be discussing here, the search will start with some abitrary start node, which we will refer to as the root node, then move on to nodes which are connected to that node, then their connections, etc. Because cycles can exist in graphs, our algorithm will need to ensure that we don't get caught up in an infinite loop. It will also need to ensure that we don't visit nodes more than once, as this would be wasteful. The algorithm will also need to determine if and when all nodes have already been visited, which corresponds to the case where the item which is being searched for does not exist in the graph. In the sections below, we'll introduced depth first search and breadth first search, both of which are based on discovering nodes and exploring them. We'll also discuss the difference between searching from one node vs searching the entire graph, including disconnected subgraphs.
Depth First Search
Depth first search, or dfs, is an algorithm which explores a graph by expanding upon branches / paths as far as possible before backtracking. For example, if we start a dfs with a node which has two children, the dfs algorithm will visit the first child, then its first child, etc., and the entire subgraph of the left child will be fully visited before the root node's right child is visited. In other words, it will fully explore the first path, then backtrack, then finish exploring the second path, then backtrack, etc., until either some value has been found or until all paths have been explored. In the code below, we'll use a recursive approach to accomplish this. The algorithm will be split-up in two methods: depthFirstSearchRecursive, and doDepthFirstSearchRecursive. The second method serves as a helper method to the first. To clearly explain the algorithm, we'll start by listing the steps for doDepthFirstSearchRecursive below:
- Create a set of nodes called visitedNodes. We'll use this to prevent ourselves from entering an infinite loop.
- Check if the currentNode has the value we're looking for. If it does, return true immediately. Otherwise, proceed.
- For each childNode (outgoing connection) of the curentNode:
- If childNode node exists in the visitedNodes, don't proceed to the next steps.
- Add childNode to visitedNodes.
- Call doDepthFirstSearchRecursive with childNode as the parameter and store the result in valueFound.
- If valueFound is true, return true immediately. Otherwise, continue.
- If the value was not found in the steps above, return false.
The steps above outline a pure depth-first search algorithm, which will explore some or all paths starting from some entry node. However, if we need to perform a depth-first search on a graph which might have disconnected subgraphs, we'll need to take an additional step which will allow us to run the depth-first search using one starting node per subgraph. This is what the method depthFirstSearchRecursive encapsulates. The steps can be summarized as follows:
- Create a set of nodes called visitedNodes. We'll use this to prevent ourselves from entering an infinite loop.
- Iterate over each node of the graph.
- If visitedNodes has node, do not proceed with the steps below for this node.
- Call doDepthFirstSearchRecursive and pass in this node and visitedNodes.
- If the value was not found in the steps above, return false.
In the steps provided above, we iterate over each node in the graph outside of the depth-first search algorithm and reuse the same visitedNodes set to filter out nodes which have already appeared in a depth-first search so we don't perform any redundant searches. The code below outlines the implementation for these methods. Note that in some programming languages, Stacks, Queues, and Sets are not provided by the language. In those cases, I've provided implementations below. If you'd like to learn more about our Php set implementation, please visit our page on php sets with object references.
Iterative Depth First Search
We can also implement this algorithm using an iterative approach. Instead of using the call stack to handle backtracking for us, we will explicitly handle this ourselves using a stack object. If you're not familiar with stacks or would like a quick refresher, please visit our introduction to stacks. This approach uses two methods depthFirstSearchIterative and doDepthFirstSearchIterativeFromNode. The first method fulfills the same purpose as the depthFirstSearchRecursive method above, which handles the case of disconnected subgraphs, so we will not outline the steps here. The steps for the iterative algorithm in doDepthFirstSearchIterativeFromNode can be summarized as follows:
- Create a set of nodes called visitedNodes. We'll use this to prevent ourselves from entering an infinite loop.
- Create a stack called nodesToVisit. We'll add to this as we find nodes during our traversal.
- Add the start node passed into the method to nodesToVisit.
- While nodesToVisit is not empty:
- Call pop on nodesToVisit and assign the value returned to visitingNode.
- If the value of visitingNode is the value we're looking for, return true immediately.
- For each childNode of visitedNodes: if childNode is not in visitedNodes, push it to nodesToVisit.
- If the value was not found in the steps above, return false.
The code below implements this algorithm and provides implementations for Stacks for languages which do not provide it by default:
Breadth First Search
Breadth first search, or bfs, is an algorithm which explores a graph by visiting all of the nodes at each level of depth before proceeding to the next. For example, if we start a bfs with a node which has three children, the bfs algorithm will visit the root node and its three children before it visits any of its "grandchildren". In other words, it will "branch out" rather than "going deep" as depth-first search does. The implementation below uses two methods: breadthFirstSearchIterative and doBreadthFirstSearchIterativeFromNode. The first method fulfills the same purpose as the depthFirstSearchRecursive method above, which handles the case of disconnected subgraphs, so we will not outline the steps here. The steps for the iterative algorithm in doBreadthFirstSearchIterativeFromNode can be summarized as follows:
- Create a set of nodes called visitedNodes. We'll use this to prevent ourselves from entering an infinite loop.
- Create a queue called nodesToVisit. We'll add to this as we find nodes during our traversal.
- Add the start node passed into the method to nodesToVisit.
- While nodesToVisit is not empty:
- Call dequeue on nodesToVisit and assign the value returned to visitingNode.
- If the value of visitingNode is the value we're looking for, return true immediately.
- For each childNode of visitedNodes: if childNode is not in visitedNodes, enqueue it to nodesToVisit.
- If the value was not found in the steps above, return false.
You may have noticed that the steps above seem similar to the steps for the iterative depth-first search algorithm. These algorithms are actually entirely the same, except the breadth-first search algorithm uses a queue instead of a stack. The nature of queues vs stacks changes the order in which nodes are traversed, thus resulting in bfs instead of dfs. If you're not familiar with queues or would like a quick refresher, please visit our introduction to queues. The code below implements this algorithm, and in cases where the languages does not provide a queue implementation, the code below does:
Note that we have not included a recursive implementation for the depth first search. The recursive implementation we've provided for depth-first search utilizes the function call stack which is handled internally by the operating system. There is no "call queue" which we can directly utilize, so any recursive implementation of breadth-first search would be more complex, and therefore is omitted from this article.
Iterating Over and Visiting Nodes
In the algorithms above, we've implemented depth-first search and breadth-first search as methods which search for one particular value and return if the value exists. There are other applications which these algorithms can be applied to, such as:
- Searching for multiple values in a list.
- Filtering values based on some callback function.
- Collecting a flat list of all values in the graph.
- Executing a callback function for each value in the graph.
In these other use cases, the algorithms will be largely the same; the only differences will be in where the return statements are. Instead of baking the search logic into the algorithm, we could introduce a layer of abstract to seperate the logic of traversing the graph vs the logic of what to do when we visit a node. This is a common use case for the visitor pattern. With that layer of abstraction, we could still perform searches, but we could also perform some of the other items listed above. In other words, we can make the behavior a parameter of the search algorithm via a callback function. The use of dfs vs bfs will change the ordering of when each callback function is invoked, but will otherwise perform the same function.
When to Use DFS vs BFS
Depth-first search and breadth-first search both accomplish the same goals, but selecting the right algorithm can make a huge difference in time and memory consumed during the search / traversal. In the table below, we'll outline the different kinds of use cases each algorithm is useful for. We'll also include an entry for list iteration, which is applicable in cases where we have an external list of all nodes in addition to the nodes themselves.
Algorithm | When To Use | Example Use Cases |
---|---|---|
List Iteration | When we have access to a consolidated list or set of all vertices and need to find some specific value and there is no special relationship between the nodes in the graph. Note: if we only have a graph represented by a node or list of nodes, and there is no single consolidated list, we cannot use this method. |
|
Depth-First Search | If we are performing a search and expect the value we are looking for to be deep in the graph with respect to the entry node, this will find the solution faster than breadth-first search and possibly list iteration. This also applies if each depth of the graph implies some change, such as a turn taken in a game. We can also use dfs if we want to avoid the extra memory overhead of using the queue in breadth-first search. |
|
Breadth-First Search | If we are searching and expect the value to be close to the entry node, this will find the solution faster than depth-first search and possibly list iteration. We can also use bfs if the paths of the graph run very deep and using a queue will use less memory than using the stack for the full paths. |
|
In addition to the points above, if a particular graph has a certain ordering based on the relationships between nodes and how they are connected, dfs or bfs may come into play based on the nature of that ordering.
Space and Time Complexity
The space and time complexity for depth-first search and breadth-first search are the same, as their implementations themselves are very similar. In the table below, we'll describe complexity in terms of the number of nodes v and the number of edges e.
Complexity | Notes | |
---|---|---|
Space | O(v) | The algorithm uses a set to record nodes which have been visited and a stack/queue to record which nodes will be visited next. The worst-case sizes of both are proportional to the number of vertices. |
Time | O(v + e) | The algorithm will visit each node once per number of edges which connect to it, or just once in the case it has no connections. |
Searching Over Large or Distributed Graphs
The algorithm implementations above work in cases where the entire graph is held in memory of a single computer / device. In cases where the graph is distributed over a network, and we use HTTP calls to traverse the graph, or in cases where the device has low memory and we do not want to keep a set of visited nodes which requires additional memory, we can use algorithms which implicitly represent depth-first and breadth-first searches but operate using a more "distributed" version of the algorithms. If you would like to read more on this, please visit the page below for more info: