Algorithm for grouping anagram words

前端 未结 14 1492
悲&欢浪女
悲&欢浪女 2020-12-07 23:30

Given a set of words, we need to find the anagram words and display each category alone using the best algorithm.

input:

man car kile arc none like
<         


        
14条回答
  •  太阳男子
    2020-12-07 23:50

    I have implemented this before with a simple array of letter counts, e.g.:

    unsigned char letter_frequency[26];
    

    Then store that in a database table together with each word. Words that have the same letter frequency 'signature' are anagrams, and a simple SQL query then returns all anagrams of a word directly.

    With some experimentation with a very large dictionary, I found no word that exceeded a frequency count of 9 for any letter, so the 'signature' can be represented as a string of numbers 0..9 (The size could be easily halved by packing into bytes as hex, and further reduced by binary encoding the number, but I didn't bother with any of this so far).

    Here is a ruby function to compute the signature of a given word and store it into a Hash, while discarding duplicates. From the Hash I later build a SQL table:

    def processword(word, downcase)
      word.chomp!
      word.squeeze!(" ") 
      word.chomp!(" ")
      if (downcase)
        word.downcase!
      end
      if ($dict[word]==nil) 
        stdword=word.downcase
        signature=$letters.collect {|letter| stdword.count(letter)}
        signature.each do |cnt|
          if (cnt>9)
            puts "Signature overflow:#{word}|#{signature}|#{cnt}"
          end
        end
        $dict[word]=[$wordid,signature]
        $wordid=$wordid+1
      end
    end
    

提交回复
热议问题