Given a set of words, we need to find the anagram words and display each category alone using the best algorithm.
input:
man car kile arc none like
<
I have implemented this before with a simple array of letter counts, e.g.:
unsigned char letter_frequency[26];
Then store that in a database table together with each word. Words that have the same letter frequency 'signature' are anagrams, and a simple SQL query then returns all anagrams of a word directly.
With some experimentation with a very large dictionary, I found no word that exceeded a frequency count of 9 for any letter, so the 'signature' can be represented as a string of numbers 0..9 (The size could be easily halved by packing into bytes as hex, and further reduced by binary encoding the number, but I didn't bother with any of this so far).
Here is a ruby function to compute the signature of a given word and store it into a Hash, while discarding duplicates. From the Hash I later build a SQL table:
def processword(word, downcase)
word.chomp!
word.squeeze!(" ")
word.chomp!(" ")
if (downcase)
word.downcase!
end
if ($dict[word]==nil)
stdword=word.downcase
signature=$letters.collect {|letter| stdword.count(letter)}
signature.each do |cnt|
if (cnt>9)
puts "Signature overflow:#{word}|#{signature}|#{cnt}"
end
end
$dict[word]=[$wordid,signature]
$wordid=$wordid+1
end
end