The subset sum problem is similar to the knapsack problem, coin counting, and other classic computer-science problems. In this article, we'll introduce the subset sum problem, outline multiple algorithms which solve the problem with visuals, discuss their space and time complexity, and show full implementations in multiple programming languages.
Subset Sum Problem
The subset sum problem asks: given a list of positive integers and a positive integer target sum, determine if the list contains elements which add up to that target. We cannot assume the input list will be sorted. The problem statement may sound similar at first, but finding an efficient way to implement the solution may be more difficult than it seems. In the sections below, we'll cover two methods of solving this problem, an inefficient way based on naive recursion, and an efficient way based on dynamic programming. We'll run the code with the input below and show how long each implementation takes to find the answer:
- Target Sum: 15
- Input Numbers:
[ 56, 72, 97, 85, 72, 26, 42, 15, 24, 62, 50, 88, 47, 32, 40, 62, 30, 96, 13, 64, 17, 93, 11, 86, 36, 37, 10, 18, 86, 63, 17 ]
Recursive Solution
We can find the solution to a given input using a recursive implementation. At each level of recursion, we'll make two recursive calls: one for the case where we try to include a number from the input list, and one for the case we do not. We can include the number by subtracting it from the target sum, then making the recursive call with that modified value. If that value ever becomes 0, we've determined that we can reach the target sum with elements from the list. If it never becomes 0, we've determined that we cannot. This implicitly represents a tree structure where each level represents an item in the input list, each left node represents the choice to use the item, and each right node represents the choice not to use it. The algorithm will perform a depth-first traversal of that tree.
The space and time complexity for this approach is shown below, where n represents the size of the input list.
Complexity | Notes | |
---|---|---|
Space | O(n) | The stack will store each recursive function call, and we can have up to n function calls on the stack with this implementation. |
Time | O(2^n) | This approach esentially tries every combination of the input list. |
When running this algorithm in Java with the input given above, the execution time is 5,255ms, which is very slow given the small size of the input.
Dynamic Programming Solution
In this approach, we will be solving the problem by first solving smaller sub-problem. A sub-problem is a problem which a portion of the input list instead of all of it, and/or a target sum which is smaller than the original target sum. We will be solving all sub-problems where we have 1 of the inputs available, then 2, 3, etc. until we have all inputs available. For each of those, we will all sub-problems where the subsum is 1, then 2, 3, etc. until it becomes the original target subsum. We will solve input.length * targetSum problems in total. The final sub-problem solved will be the original problem itself. Our implementation will use a matrix to keep track of these solutions, and we will calculate sub-problem solutions based on results from previous sub-problems. The matrix will have the following properties:
- The matrix will be of size input.length + 1 by targetSum + 1.
- The first row will correspond to a sub-problem where we have no inputs available. All but the first column will be set to false, as we cannot get a sum of anything other than 0 using no elements.
- The first column will correspond to a sub-problem where the target sum is zero. Every first column will be set to true, as we can get a sum of 0 by selecting 0 elements.
- For any column, if the column in the row above is true, that column in the current row will also be true. This represents a scenario where we do not use the input corresponding to the current row.
- For any column at colIndex for a row corresponding to an item from the input list, if the column at index colIndex - item is true, colIndex in the current row will also be true. This represents a scenario where we do use the input corresponding to the current row.
- The last cell on the bottom right contains the answer to the overall problem, and the last row contains the answers to all target sums from 0 to the original input target sum.
The space and time complexity for this approach is shown below, where n represents the size of the input list and s represents the target sum value.
Complexity | Notes | |
---|---|---|
Space | O(n * s) | We will be keeping a matrix of (n+1)*(s+1) dimensions to store sub-problem results. |
Time | O(n * s) | We will be iterating over the matrix a constant number of times. |
When running this algorithm in Java with the input given above, the execution time is 2ms, much faster than the recursive implementation.
Determine the List of Numbers
We can modify our implementation to return a list of numbers from the original list which add up to the target sum. The matrix itself will still contain boolean values; at the end of the method, we can traverse the matrix one more time to determine which input items were used and add them to a list. This implementation has the same time complexity as the dynamic programming implementation above, and the code is mostly identical until the end of the method where the list is assembled. To get the list from the matrix, we make the observation that for each row, if the sub-sum boolean value is different than the previous row, that means we've used the item and should add it to the list. If it's not, then we have not used the item.