Word Ladder Solver

The actual solver: Word Ladder Solver

Word ladders are a kind of puzzle.  You change one word into another by altering one letter at a time; all the intermediate words between the start word and target word must be valid dictionary words.  For example, you might change CAT into DOG using the steps: cat→cot→dot→dog.

The solver linked above and described here is a little different to some others:

  • You don’t have to download or install anything on your computer or device.
  • It finds not only the shortest solution but also a solution using common words.
  • It displays dictionary definitions for the words in the solutions it finds.

The solver is written in PHP – not the fastest language ever – and several people could be using it via the web at the same time, so it needs to have a pretty efficient algorithm.  The trick is to use a dictionary where all the links from each word have been worked out in advance.  For example, consider the word, SOLVER: it’s stored in the dictionary along with links to the eight other words it can be changed into in one step:

SOLVER: salver, silver, soever, solder, soller, solved, solves, wolver.

The dictionary

We can see that the dictionary contains some pretty obscure words, and we’d prefer to avoid using those words in our solutions unless they’re absolutely necessary; so each word has a ‘rareness’ score stored along with it.  ‘Silver’, ‘solved’ and ‘solves’ have a rareness of 1, ‘solder’ has a rareness of 4, ‘salver’ 200, ‘soever’ 20,000, ‘wolver’ 200,000 and ‘soller’ 2,000,001.  When searching for a common-word solution, the solver looks for a ladder of words with the lowest sum of rareness, so it would rather use four common words like ‘silver’ than the slightly less common ‘solder’ or ten words as rare as ‘soever’ instead of one as rare as ‘wolver’ and so on.

The other useful information to store in the dictionary for each word is a group number.  When the linked words in the dictionary were being found, the computer worked out how the words fell into different groups.  It’s obvious that a six-letter word can never be transformed into a seven-letter one using this simple kind of word ladder; less obvious is the fact that there are several groups of words for each word length.  All the words within one group can be changed (by a ladder of words) into any other word in that same group – but not to a word in a different group.  For example, ELBOWED can be transformed into UMPIRES or LOYALTY into MIRACLE, but none of those words can be changed into the main group of seven-letter words that includes ACCUSED and ZIPPING.

In case you’re interested, the CSW dictionary, which my program uses, has:

  • ‘Elbowed’ and ‘umpires’ in group 4786 along with sixty other words.
  • ‘Loyalty’ and ‘miracle’ in group 4787 along with sixty three others.
  • The big group of connected seven-letter words is group 4788 and contains 12,512 words.

There’s a summary table of how the words are distributed in groups in this text file.

The word in the dictionary with the most links is SAY, which can be transformed into forty two different words in a single step.  For longer words, the champions are:

length Word Definition Number of linked words
4 TATS Someone who tats is making lace by hand 40
5 CARE 38
6 BARKED 26
7 BARKING 25
8 SLATTERS Someone who slatters is being careless or negligent 15
9 BATTERING 13
10 SLATTERING See Slatters above 9
11 MUSTINESSES Varieties of stale odors 10

The dictionary contains words up to fifteen letters long, but the longer words become increasingly uninteresting from a word ladder point of view.

The default dictionary used is the Collins Scrabble Words (CSW or SOWPODS) which is the dictionary used in tournament Scrabble in most countries except the USA, Thailand and Canada.  The USA tournament Scrabble dictionary is the Official Tournament and Club Word List (OWL) but this turns out to be merely a subset of the CSW, so the program is able to cater for that dictionary too by knowing which words are present in OWL and only using those words when the OWL dictionary is selected.

The program

The program checks that both the start word and the target word exist in its dictionary, and that both the words are in the same group; if not it displays appropriate error messages.  If both words are in the same group the program then loads the necessary data for all the words in that group from its MySQL database:  it doesn’t have to load the actual words or their definitions at this stage – it only needs:

  • an index number (id) for each of the words in the group.
  • for each word id,  the ‘rareness’ score.
  • for each word id, the list of other word’s id’s that it can be transformed into.

The first pass finds one of the shortest solutions. There may be many equally short solutions, but the search tree is built in a random order so that different solutions may be found for the same start and target words.  All the nodes at a particular depth in the search tree are examined before moving on to  the next depth:  if the shortest solution possible is, say, six steps then the program is bound to find and display a six step solution,  If there are several different ways of solving the ladder in six steps, the program will use a random choice to find and display one of those solutions each time it is run.

The program builds the first depth of the search tree by visiting each node linked from the start node; it then visits all nodes linked from the first depth except those that have already been visited: these new nodes are ‘depth two’.  It then proceeds to visit all nodes (not already visited) that are linked from the depth two nodes… and so on until one of the nodes visited is the target word and a solution can be displayed.  As each node is visited a pointer is set in the node that points to its parent node.  The pointer is used both to allow the solution to be displayed and to keep track of which nodes have been visited before:  initially all the node’s pointers are initialized to -1, so if the program looks at a node and sees a pointer >= 0 it knows that node has been visited already.

The program uses a three element array to store each node (parent, rareness, links): the links are stored as a string with comma separated ids of linked nodes.  For example ABDUCT (id 19633) links to two other words ABDUCE (id 19632) and ADDUCT (id 19772) so the links element in the ABDUCT array would be “19632,19772”.  The nodes are stored in a node array, n[]  where the keys are the ids of the nodes and the values are instances of the three-element array described above (PHP uses associative arrays which can be confusing if you’re not used to them, but work well for this kind of task), so for ABDUCT (which has a rareness of 4), the information is stored in memory as:

n[19633] => (parent, 4, “19632,19772”)

Note that the n array is sparse: only the information belonging to the words in the same group as the start and target word are loaded from the database to memory.  With PHP’s associative arrays there might only be, say, a hundred elements in the n array, even though the keys are integers ranging up to 270132 (the highest id in the dictionary).

Another array, toVisit[], holds the keys of the n array to be visited.  This is initialised with the key (id) of the start word.  We’re not interested in the keys of the toVisit array, only the values (which store keys to the n array).  As the search proceeds, the ids of nodes to visit are added to the toVisit array, and as each node is visited, it is removed from the array.  Here is the code for the first pass solver:

$toVisit = array($startWordId); 
$soln = false;
while (!$soln && count($toVisit)) {
    foreach ($toVisit as $k => $i) {
        $links = explode(',', $n[$i][2]);
        shuffle($links);
        foreach ($links as $link) {
            $j = intval($link);
            if (isset($n[$j]) && $n[$j][0] < 0) { // node exists and hasn't been seen
                $n[$j][0] = $i; // assign current node as parent of linked node
                $toVisit[] = $j; // add the linked node to the list of nodes to visit
            }
            if ($j == $targetWordId) { // we have a solution
                $soln = true;
                break;
            }				
        }
        unset($toVisit[$k]);		
    }	
}

Having displayed the solution the program then does a second pass, this time taking into account the rareness of the words so that it finds a solution with the lowest possible sum of the rareness values of the words used.  Again there may be several alternative solutions with same total rareness, so the program builds the search tree in a random order.  This time, each depth of the tree holds all the words that can be reached from the start word with the same total rareness cost.

Before the second pass, the parent element of all the n nodes is reset to -1, effectively cleaning the previous search result from the n array.  For the second pass, toVisit is an array of arrays: the keys in toVisit are the cost of reaching a particular set of words; the value associated with each key is an array containing the ids of that set of words.  Initially the only node to visit is the start node, with a cost of zero so that is stored as:

toVisit[0] => array(19633)

There will never be any other nodes with a cost of 0, but we might eventually find, say, five nodes with a cost of, say, 87 which would be stored as, for example:

toVisit[87] => (21030, 20043, 29754, 19444, 19443)

Here’s the code for the second pass of the solver. As you can see, it’s just a slight variation on the code used for the first pass:

$toVisit = array(0 => array($startWordId));
$soln = false;
while (!$soln && count($toVisit)) {
    ksort($toVisit, SORT_NUMERIC); // sort array into key (costs) order
    $cost = key($toVisit); // cost of the lowest nodes to visit
    foreach ($toVisit[$cost] as $i) {
        $links = explode(',', $n[$i][2]);
        shuffle($links);
        foreach ($links as $link) {
            $j = intval($link);
            if (isset($n[$j]) && $n[$j][0] < 0) { // linked node hasn't been seen
                $newCost = $cost + $n[$j][1];
                $n[$j][0] = $i; // assign current node as parent of linked node
                if (isset($toVisit[$newCost]))  // already nodes with this cost
                    $toVisit[$newCost][] = $j; // add the new node to the array
                else
                    $toVisit[$newCost] = array($j); // new array for the new cost
            }
            if ($j == $w2targetWordId) { // we have a solution
                $soln = true;
                break;
            }				
        }
    }
    unset ($toVisit[$cost]); // all nodes of this cost have now been visited	
}

 

One thought on “Word Ladder Solver”

Leave a Reply

Your email address will not be published. Required fields are marked *