Algorithm for grouping anagram words

前端 未结 14 1550
悲&欢浪女
悲&欢浪女 2020-12-07 23:30

Given a set of words, we need to find the anagram words and display each category alone using the best algorithm.

input:

man car kile arc none like
<         


        
14条回答
  •  长情又很酷
    2020-12-07 23:50

    I used a Godel-inspired scheme:

    Assign the primes P_1 to P_26 to the letters (in any order, but to obtain smallish hash values best to give common letters small primes).

    Built a histogram of the letters in the word.

    Then the hash value is the product of each letter's associated prime raised to the power of its frequency. This gives a unique value to every anagram.

    Python code:

    primes = [2, 41, 37, 47, 3, 67, 71, 23, 5, 101, 61, 17, 19, 13, 31, 43, 97, 29, 11, 7, 73, 83, 79, 89, 59, 53]
    
    
    def get_frequency_map(word):
        map = {}
    
        for letter in word:
            map[letter] = map.get(letter, 0) + 1
    
        return map
    
    
    def hash(word):
        map = get_frequency_map(word)
        product = 1
        for letter in map.iterkeys():
            product = product * primes[ord(letter)-97] ** map.get(letter, 0)
        return product
    

    This cleverly transforms the tricky problem of finding subanagrams into the (also known to be tricky) problem of factoring large numbers...

提交回复
热议问题