QuickSelect is an efficient algorithm for finding the kth smallest element in an unsorted list. In this article, we'll introduce quicksort, discuss how it works, its space and time complexity, and give full code examples in multiple languages. Before proceeding, if you are not familiar with recursion, please visit our non-techincal introduction to recursion or our outline on writing recursive functions. If you are not familiar with the QuickSort algorithm, which this algorithm is based on, please visit our introduction to QuickSort.
QuickSelect
The quickselect algorithm is an algorithm which takes an unordered array of inputs and finds the kth smallest element. For example, if we have an input of 1, 3, 5, 7, 9 and want the 2nd smallest item, the result will be 3. This algorithm is a modified version of QuickSort. This algorithm uses the divide and conquer approach to recursively partition the array, continuously making smaller and smaller partitions, and partially sorting each partition based on a pivot value, until we've sorted the array enough that the kth smallest element is in the kth position. The order of the elements in the original array itself will be altered; the array will be partially sorted based on how many partition operations it took to find the kth element. We'll break down the process step by step and give a ful overview of each step in the following sections.
Pivoting and Partitioning
QuickSelect applies the divide and conquer strategy by dividing the input array into partitions. QuickSelect can use the exact same partition algorithm as QuickSort; we'll be describing that same algorithm here. At each recursive step, a partition will be created by selecting a pivot point within the array, then swapping items in the array so all items smaller than the pivot are to the left, and all items larger are to the right. This isn't a full sort, as items to the left and right might still be unsorted themselves. We can consider the array to be sorted with respect to "p", where p is the pivot value. Once the pivot process is complete, we will have two partitions, one with values smaller than or equal to p and one with values larger than or equal to p. The value p itself is now in the position it will be in once the entire array is sorted. As a consequence, if we call p the pth smallest element, then we now know what the pth smallest element is. If the index of p is equal to k - 1, we will know that p is the kth smallest element.
To summarize, the steps of this part of the algorithm are:
- Set the initial pivot point as the median of the first, middle, and last elements. We'll discuss this in more detail below.
- Iterate through the array and compare values against p.
- If the item is before p but greater than p, move it to the right of p.
- If the item is after p but less than p, move it to the left of p.
- Consider the left partition to be the subarray from the beginning to p and the right partition to be the subarray from p+1 to the end.
There are multiple ways we can approach creating the partition function. In our implementation, we will use two pointers, one for the left and one for the right. The left will start at the leftmost part of the array and the right will start at the rightmost. Both sides will increment/decrement until a mismatch pair is found, then swap them. The full steps are:
- Initialize left to the beginning and right to the end indices.
- Move the left pointer forwards until an item equal to or larger than p is found.
- Move the right pointer leftwards until an item equal to or lesser than p is found.
- If right is less than or equal to left, the partition process is complete, and right is the partition index.
- Otherwise, swap the array values at left with right, then move back to step 2.
Pivot Selection
There are multiple ways a pivot can be selected. The selected pivot can have consequences on the overall performance. Common types of pivot choices include:
- First index: we can make the pivot point the first possible index. However, this is generally avoided, as it can cause the algorithm to degrade to O(n^2) for lists which are sorted or mostly sorted. This point will be covered further in sections below.
- Mid-point index: we can make the pivot point the middle index in the array.
- Last index: we can make the pivot point the last possible index.
- Random index: select any random index as the pivot.
- Median of three: take the first, center, and last elements, determine the median value of the three, and use that value. This is the approach we will be taking in our algorithm.
The ideal pivot value is a value which is the median of all of the values in the array. The median of three pivot approach will increase the chances of using a median value as the pivot point by selecting the best option out of three items. In the worst case scenario, all values will be the same. In the best case, all values will be different and the median selected will be closer to the median of the entire array than the other two values.
Recursively Searching
Once the partitioning process is complete, the array will be sorted in respect to the pivot. Additionally, we will have two partitions, a left partition and right partition. Note that the pivot index itself is excluded from both partitions. to continue searching for the kth smallest element, we can continue the search process on one partition and entirely ignore the other partition:
- If the pivot position is greater than k+1, continue with the right partition and ignore the left partition.
- If the pivot position is less than k+1, continue with the left partition and ignore the right partition.
- If the pivot position is equal to k+1, we've found the kth smallest element; return the value at that position.
The reason we can ignore entire partitions is: the pivot value for each partition will end up in its correct position in the sorted array. Therefore, if its position is greater than or less than k, we know everything to the right or left respectively will also not be the kth smallest value. Therefore, we can ignore them entirely.
Complexity Analysis
In the table below, we'll outline the space complexity, time complexity, and other aspects of the quickselect algorithm:
Value | Notes | |
---|---|---|
Space Complexity | O(n) | This requres space in the function call stack for recursive calls. Average case is O(log n) |
Time Complexity | O(n^2) | The average case is O(n) , but in edge cases, such as an array which is already sorted, performance will degrade to O(n^2) |
Modifies the Source Input | Yes | The source array itself will be directly modified, so the original ordering will be lost. |
QuickSort has an average performance of O(n logn), whereas QuickSelect has an average performance of O(n). This is because at each partition, we only need to recursively partition the left or right side, instead of the left and right side. Therefore, the complexity can be represented as n + n/2 + n/4 .... = 2n for the average case.
Full Implementation
In this implementation, we will be searching through arrays of numbers. However, the approach can be adapted to other types, such as strings and class instances, or generalized to search through any data type which can implement a compare method.