Merge sort is one of the most efficient and most fundamental sorting algorithms. The algorithm is based on a recursive divide and conquer approach. It is also one of the classic algorithms taught and tested in universities. In this article, we'll introduce merge sort, discuss how it works, its space and time complexity, and give full code examples in multiple languages. Before proceeding, if you are not familiar with recursion, please visit our non-techincal introduction to recursion or our outline on writing recursive functions.
Merge Sort
The merge sort algorithm is a comparison based sorting algorithm which is used to sort a list of items. Generally, it is applied to sort arrays. This algorithm uses a divide and conquer approach to recursively sort left-right halves of sub-arrays, then combine the results until the entire array becomes sorted. The original input array itself will be modified, as opposed to a new array being created, once the operation is complete. In this article, we will be sorting arrays of numbers. However, the approach can be adapted to other types, such as strings and class instances, or generalized to sort anything which can implement a compare method. In the sections below, we'll describe the divide step, then the conquer step. Each recursive call will itself perform all steps specified in those sections.
Divide Step
In this first step, we will determine the indices for the left and right sides of the array, then recursively call the merge sort method twice: once for the left side and once for the right side, or handle the base case. The full steps involved in this part of the algorithm:
- Determine the index of the halfway point of this subarray.
- Consider the left side of this array to be the items before and including the halfway point. For this side:
- If this subarray only has two items, swap them if they are out of order.
- Otherwise, call merge sort recursively for this subarray.
- Consider the right side of this array to be the items from the halfway point + 1 to the end of this subarray. For this side:
- If this subarray only has two items, swap them if they are out of order.
- Otherwise, call merge sort recursively for this subarray.
Once the steps above are done, the left and right sides of this array themselves will be sorted; we will need to combine these sides so the full array itself is sorted.
Conquer Step
From the previous step, we know that the left and right sides of the array are sorted. To combine these into a single sorted array, we will need to move through both arrays simultaneously using two pointers, make comparisons between the current item on the left and current item on the right, and insert based on how they compare to each other. Since the left and right sides are themselves sorted, we will only need to touch each element once. In other words, we will not need to do any type of nested or quadratic iteration. The full steps involved in this part of the algorithm:
- Copy the left and right portions into their own arrays.
- Create two pointers, one for the left array and one for the right array, and set to 0.
- Create one more pointer inputIndex for the main input array and set to the start position.
- While the left and right iterators are less than their array lengths:
- Compare the current left item to the current right item.
- If left item is less, set inputIndex to the left item and increment the left pointer.
- If right item is less, set inputIndex to the right item and increment the right pointer.
- Increment the inputIndex pointer.
- Add any remaining items from the left or right side not covered above.
Once the steps above are completed, the array will now be sorted. If this was called on a subarray instead of the full input, this subarray will be sorted, and the caller will then perform its own conquer step to complete its own sorting.
Complexity Analysis
In the table below, we'll outline the space complexity, time complexity, and other aspects of the merge sort algorithm.
Value | Notes | |
---|---|---|
Space Complexity | O(n) | This requires additional space to clone the left and right subarrays, as well as the function call stack. |
Time Complexity | O(n logn) | The algorithm has a recurrence relation of T(n) = 2T(n/2) + n . |
In Place | No | Since this requires additional space for cloning subarrays, it is not in place. |
Stable | Yes | The sorted array will preserve the order of items which have an equal sort value, so it is stable. |
Modifies the Source Input | Yes | The source array itself will be directly modified, so the original ordering will be lost. |
Full Implementation
In this approach, we will represent subarrays implicitly by using start and end indices within the recursive call. Once the recursive calls have been made, we will then apply the splice operation to clone.