QuickSelect is an efficient algorithm for finding the kth smallest element in an unsorted list. In this article, we'll introduce quicksort, discuss how it works, its space and time complexity, and give full code examples in multiple languages. Before proceeding, if you are not familiar with recursion, please visit our non-techincal introduction to recursion or our outline on writing recursive functions. If you are not familiar with the QuickSort algorithm, which this algorithm is based on, please visit our introduction to QuickSort.
The quickselect algorithm is an algorithm which takes an unordered array of inputs and finds the kth smallest element. For example, if we have an input of 1, 3, 5, 7, 9 and want the 2nd smallest item, the result will be 3. This algorithm is a modified version of QuickSort. This algorithm uses the divide and conquer approach to recursively partition the array, continuously making smaller and smaller partitions, and partially sorting each partition based on a pivot value, until we've sorted the array enough that the kth smallest element is in the kth position. The order of the elements in the original array itself will be altered; the array will be partially sorted based on how many partition operations it took to find the kth element. We'll break down the process step by step and give a ful overview of each step in the following sections.
Pivoting and Partitioning
QuickSelect applies the divide and conquer strategy by dividing the input array into partitions. QuickSelect can use the exact same partition algorithm as QuickSort; we'll be describing that same algorithm here. At each recursive step, a partition will be created by selecting a pivot point within the array, then swapping items in the array so all items smaller than the pivot are to the left, and all items larger are to the right. This isn't a full sort, as items to the left and right might still be unsorted themselves. We can consider the array to be sorted with respect to "p", where p is the pivot value. Once the pivot process is complete, we will have two partitions, one with values smaller than or equal to p and one with values larger than or equal to p. The value p itself is now in the position it will be in once the entire array is sorted. As a consequence, if we call p the pth smallest element, then we now know what the pth smallest element is. If the index of p is equal to k - 1, we will know that p is the kth smallest element.
- If the item is before p but greater than p, move it to the right of p.
- If the item is after p but less than p, move it to the left of p.
The ideal pivot value is a value which is the median of all of the values in the array. The median of three pivot approach will increase the chances of using a median value as the pivot point by selecting the best option out of three items. In the worst case scenario, all values will be the same. In the best case, all values will be different and the median selected will be closer to the median of the entire array than the other two values.
The reason we can ignore entire partitions is: the pivot value for each partition will end up in its correct position in the sorted array. Therefore, if its position is greater than or less than k, we know everything to the right or left respectively will also not be the kth smallest value. Therefore, we can ignore them entirely.
In the table below, we'll outline the space complexity, time complexity, and other aspects of the quickselect algorithm:
|Space Complexity||O(n)|| This requres space in the function call stack for recursive calls. Average case is |
|Time Complexity||O(n^2)|| The average case is |
|Modifies the Source Input||Yes||The source array itself will be directly modified, so the original ordering will be lost.|
QuickSort has an average performance of O(n logn), whereas QuickSelect has an average performance of O(n). This is because at each partition, we only need to recursively partition the left or right side, instead of the left and right side. Therefore, the complexity can be represented as n + n/2 + n/4 .... = 2n for the average case.
In this implementation, we will be searching through arrays of numbers. However, the approach can be adapted to other types, such as strings and class instances, or generalized to search through any data type which can implement a compare method.