How do I resolve the “Crypt Kicker” exercise proposed in “Programming Challenges (The Programming Contest Training Manual)”?

"Programming Challenges (The Programming Contest Training Manual)" is probably one of the nicest exercises book on algorithms. I've resolved the first 11 exercises, but now I am stuck with "Crypt Kicker" problem:

Crypt Kicker
A common but insecure method of encrypting text is to permute the letters of the alphabet. In other words, each letter of the alphabet is consistently replaced in the text by some other letter. To ensure that the encryption is reversible, no two letters are replaced by the same letter.

Your task is to decrypt several encoded lines of text, assuming that each line uses a different set of replacements, and that all words in the decrypted text are from a dictionary of known words.

Input
The input consists of a line containing an integer n, followed by n lowercase words, one per line, in alphabetical order. These n words compose the dictionary of words which may appear in the decrypted text.
Following the dictionary are several lines of input. Each line is encrypted as described above.

There are no more than 1,000 words in the dictionary. No word exceeds 16 letters. The encrypted lines contain only lower case letters and spaces and do not exceed 80 characters in length.

Output
Decrypt each line and print it to standard output. If there are multiple solutions, any one will do.
If there is no solution, replace every letter of the alphabet by an asterisk.

Sample Input 6
and
dick
jane
puff
spot
yertle

bjvg xsb hxsn xsb qymm xsb rqat xsb pnetfn
xxxx yyy zzzz www yyyy aaa bbbb ccc dddddd

Sample Output
dick and jane and puff and spot and yertle ...

What strategy should I to take in order to resolve this exercise? I was thinking to a classic and brutish backtracking solution, but I am trying avoid that until I find something more intelligent.

PS: This is not homework related, I am just trying to improve my overall skills.

Carlos Gutiérrez

KeyArray will hold the replacement table.

Start with an empty KeyArray, this is version 0
Match longest encrypted word to longest dictionary word and add to KeyArray (if there are two longest, pick any), this is version 1.
Decrypt some letters of the next longest crypted word.
Check if the decrypted letters match the letter in the same position in any dictionary word of the same length.
If none matches, go back to version 0 and try another word.
If some letters match, add the rest of the letters to KeyArray, this is version 2.
Decrypt some letters of the next longest crypted word.
Check if the decrypted letters match the letter in the same position in any dictionary word.
If none matches, go back to version 1 and try another word
If some letters match, add the rest of the letters to KeyArray, this is version 3.

Repeat until all words are decrypted.

If at version 0 none of the longest words creates a partial decrypt in shorter words, very probably there is no solution.

A minor optimization could be done by enumerating possibilities before the backtracking run. In Python:

dictionary = ['and', 'dick', 'jane', 'puff', 'spot', 'yertle']
line = ['bjvg', 'xsb', 'hxsn', 'xsb', 'qymm', 'xsb', 'rqat', 'xsb', 'pnetfn']

# ------------------------------------

import collections

words_of_length = collections.defaultdict(list)

for word in dictionary:
  words_of_length[len(word)].append(word)

possibilities = collections.defaultdict(set)
certainities = {}

for word in line:
    length = len(word)
    for i, letter in enumerate(word):
        if len(words_of_length[length]) == 1:
            match = words_of_length[length][0]
            certainities[letter] = match[i]
        else:
            for match in words_of_length[length]:
              possibilities[letter].add(match[i])

for letter in certainities.itervalues():
    for k in possibilities:
        possibilities[k].discard(letter)

for i, j in certainities.iteritems():
    possibilities[i] = set([j])

# ------------------------------------

import pprint
pprint.pprint(dict(possibilities))

Output:

{'a': set(['c', 'f', 'o']),
 'b': set(['d']),
 'e': set(['r']),
 'f': set(['l']),
 'g': set(['f', 'k']),
 'h': set(['j', 'p', 's']),
 'j': set(['i', 'p', 'u']),
 'm': set(['c', 'f', 'k', 'o']),
 'n': set(['e']),
 'p': set(['y']),
 'q': set(['i', 'j', 'p', 's', 'u']),
 'r': set(['j', 'p', 's']),
 's': set(['n']),
 't': set(['t']),
 'v': set(['c', 'f', 'o']),
 'x': set(['a']),
 'y': set(['i', 'p', 'u'])}

If you have some single-element possibilities, you can eliminate them from the input and rerun the algorithm.

EDIT: Switched to set instead of list and added printing code.

I actually tried a rather different approach. I built a trie from the dictionary words. Then I walk through the trie and the sentence together recursively (traversing the trie in a DFS).

At each space I make sure I hit the end of a word in the trie and if so I loop back to the root. Along the way I keep track of the letter assignments I've made so far. If ever I have an assignment that contradicts a prior assignment I fail and unravel the recursion to the point I can make the next possible assigment.

It sounds tricky but it seems to work quite well. And it is really not that hard to code up!

Another possible optimization, if you have "enough" text to deal with and you know the text's language, you can use letter frequencies (see : http://en.wikipedia.org/wiki/Letter_frequency). This is of course a very approximative approach when dealing with 6 / 7 words but will be the fastest way if you have a few pages to decode.

EDIT : about Max's solution, you could try to extract some characteristics of the word, too, such as repeating letters. Obviously, remarking that puff in the dictionary and qymm in the encrypted text are the only four letter words ending with a double letter gives a straight answer for 3 of the letters. In more complex scenarios, you should be able to narrow the possibilities for each letter couple.

Here is a Java implementation with more refinements to the algorithm proposed by @Carlos Gutiérrez.

Crypt Kicker Algorithm and Solution, what went wrong?

The refinement is to add a word pattern to reduce the search space for words. For example, words "abc" and "her" have the same pattern while "aac" and "her" don't as a three distinct letters word would not match a tow letters distinct word.
Moreover, the algorithm can be implemented recursively which is more intuitive and sensible.

来源：https://stackoverflow.com/questions/2175356/how-do-i-resolve-the-crypt-kicker-exercise-proposed-in-programming-challenges

标签

algorithm

backtracking