About a year ago, I created a Random Name Generator that generates names across a variety of domains, including pet names, human names, Star Trek ship names, and World of Warcraft Names. I've recently enhanced it to include Key & Peele football names, ensure that it doesn't deliver real names (in some cases), and make it more performant. I've decided to take this opportunity to also write an article on how I implemented this tool. First, I'll introduce foundational concepts, then I'll outline the algorithms used.
Markov Chains
A markov chain is a model that describes how events transition into other events based on probabilities. It can be represented as a graph where each node is some event and the arrows describe the probability that from this event, that next event will occur. The example below is a markov chain that describes whether salad, pizza, or burgers will be the meal for the day. The event is the meal that occurs that day, and the transition to the next day's meal is shown through the arrows. If you have pizza one day, it's 70% likely that you will have salad the next day.
We can also represent markov chains as a matrix. This is incredibly useful for the name generator. The table below describes the same markov chain as the diagram above. Each cell describes the probability that tomorrows meal will be the meal at the top of the column given that today's meal is the meal at the left of the row.
Salad | Pizza | Burgers | |
---|---|---|---|
Salad | 0% | 40% | 60% |
Pizza | 70% | 0% | 30% |
Burgers | 50% | 50% | 0% |
Markov Matrices Syllables
At its core, my random name generator uses a markov matrix to string syllables together. The state, or node, of the corresponding markov chain is "the current syllable," and the transition describes the probability that any given syllable will occur immediately after this one. Once the system has a syllable markov matrix in place, it can randomly generate a name by selecting a random start syllable, based on the probability that those syllables start immediately at the start of the string, then proceed to the next syllable.
Each syllable is determined iteratively by creating a random number between 0 and 1, then moving through the current row of the syllable markov matrix (corresponding to the current syllable) and subtract its probability from our random number. Once it has a non-positive value, we will select that syllable as the next syllable and proceed. To end the string, we also account for the existence of an "end syllable" which marks that the string is terminated. This factors into the probabilities.
The code below shows how this process works:
Calculating Probabilities and Creating the Matrix
We've described how we can generate a name with a markov matrix, but how do we initially create the markov matrix and assign values? We do this by reading a list of input names, then for each name:
- Split the name into its individual syllables.
- Add each syllable not yet found to a map and assign it an index.
- For each syllable in the name, update the probability that the next syllable occurs after this one (or the "end syllable" if it's the last one). For now, we can use a simple counter and increment it.
- After all syllables are added and all names process, normalize the matrix values so all values are between 0 and 1.
With this, the probability matrix is in place and its probabilities are based solely on the input names. This means that we can tailor the name generator to a particular domain, such as Key & Peele Football Names, simply by feeding the algorithm inputs from those domains. The more inputs we provide, the better the outputs will become.
Enhancing with Multiple Matrices
The random name generator on this website uses multiple matrices. Each matrix corresponds to the nth syllable position of the word being constructed. If we only used one matrix, the transition would be described as the probability that syllable b will follow syllable a. With multiple matrices, the transition is described as the probability that syllable b will follow syllable a given that syllable a is the nth syllable in the word. This gives the name generator more awareness to positional dependencies (such as syllables that appear more commonly at the ends of words), it ensures that the length of the name has a clearer upper bound, and it enables the name generator to recognize more complex patterns as it reads names in.
The code below shows how this process works given a set of input words:
Preventing Outputs of Real Names
For some of the generators, such as the World of Warcraft name generator, we don't want the tool to output a real name, as this would defeat the purpose of the tool. The algorithm described above to generate names does have a probability of generating a name provided in the input because the syllable selection can coincidentally correspond to one of the input words since the probabilities would be non-zero in each case. To prevent this, I've used a trie to store input words for each name generator. To save space, I've used a flyweight pattern so we can re-use a single trie for all generators. Each time a name is generated, the system checks against that trie structure to see if the name is present as an input (specifically for that generator), and if it is, it generates a new one. This doesn't strictly guarantee that it will eventually find a new name, but practically speaking, it's safe to assume (in the worst case scenario we can throw an exception after a few unsuccessful attempts).
Structuring the Random Name Generator Interface
The tool's interface guides you to a generator by offering choices, then for the choice you click, it either offers its own choices to drill down further, or it offers you the "Generate" button to generate a name. For example, I might just click "Key & Peele Football" and then "Generate", or I might follow the path "World of Warcraft→ World of Warcraft Player Name→ Alliance→ Mage," then click "Generate."
To support this, I've implemented a tree structure where each node is a display in that UI. Each node will either have its own children, or it will be a node that can generate a name. If the former, it specifies which children belong to it, and if the latter, it defines the generation function (which is to use one of the matrix generators). In each case, it defines its own name, description, and icon to use.