Most programers become familiar with, and begin working extensively with, arrays early in their careers. It is one of the simpliest and most ubiquitous data structures, and it is the appropriate data structure to use in many situations. However, there are situations that arrays are not well suited for. In this article, we'll introduce the concept of a Set, something which is similar to an array but which will provide a significant boost in performance in some of those certain situations.
What is a Set?
A Set is a data type which holds one or more distinct elements. If you try to add something in the set which already exists in that set, nothing will change. If the inserted value is a primitive data type, such as a number or string, the set will "compare" by value. If it is an object, the set will "compare" via object reference, which means that two objects with the same "meaning" but different object references may both be present in the set.
Sets vs Arrays
Sets and arrays are similar in concept and in use. However, there are a few important differences between the two. The table below compares and contrasts them. There will be reference to "Big O Notation"; if you are not familiar with this notation, the introduction falls outside of the scope of this article, but in the meantime, understand that O(1) is significantly faster than O(n).
|Duplicate Values||Arrays can have the same element inserted more than once.||Sets cannot contain any duplicates of the same element.|
|Add an Element||Arrays can add elements quickly: O(1).||Sets can add elements quickly: O(1).|
|Remove the Last Element||Removing the last element of an array is fast: O(1).||Removing the last element of a set is fast: O(1).|
|Removing a Value||To remove a value from an array, we will need to find the index. Therefore, this is a bit slower, as we must iterate over some or all of the array: O(n).||We can remove any given value from a set very quickly: O(1).|
|Check if Value is Present||Similar to deletion, we must iterate over the array to find a value: O(n).||We can check if any value is in a set very quickly: O(1).|
Based on the information above, we can see that sets outperform arrays in some scenarios and arrays outperform sets in others. We can utilize this information to use the appropriate data structures in the appropriate situations to make our code much more performant.
Use Cases for Sets
- Keep track of words we've found in an input box: if we want to do some analysis on what the user is typing, perhaps for a search, we can use a set and iterate over the input to see which words appear. We don't have to worry about adding the same word twice.
- Quickly check if the current webpage has certain tags: we may want to display something, or take some action, if the webpage which was just navigated to has one or more specific tags. Upon navigation, we can check if the set of tags for the current webpage contains certain items, and if so, perform that action.
The two main points these examples showcased: fast verification of membership and fast duplicate filtering. When execution speed in one or both of these types of scenarios are important, a set will provide that speed, especially in cases where the collection is very large and/or the operation is repeated numerous times. In otherwords: in general, sets will provide a performance improvement in any situation where we need to check for membership or handle filtering duplicates.
Writing Code with Sets
The code below shows some common ways of working with sets:
Note that when working with objects, we must be careful with our object references:
Sets in TypeScript
TypeScript provides type declarations for sets in the same way it does for arrays:
Another Example: Project Euler
While working on Project Euler problems, especially when reaching problems in the upper two digits, properly utilizing data structures became crucial to calculating solutions in a reasonable amount of time. Knowing when to use sets vs arrays, maps vs sets, etc. meant the difference between milliseconds and full seconds of execution. In problem #64: Odd Period Square Root, once I realized that we should be using a set in the code instead of an array, I made the switch and the execution became roughly 12x faster. The code for that solution can be found here: Project Euler problem #64