Finding anagrams for a given word

后端未结

关注

 12  1034

温柔的废话

Two words are anagrams if one of them has exactly same characters as that of the another word.

Example : Anagram & Nagaram are anagrams

相关标签:

12条回答

轻奢々

2020-12-04 08:16

Example algorithm:

Open dictionary
Create empty hashmap H
For each word in dictionary:
  Create a key that is the word's letters sorted alphabetically (and forced to one case)
  Add the word to the list of words accessed by the hash key in H

To check for all anagrams of a given word:

Create a key that is the letters of the word, sorted (and forced to one case)
Look up that key in H
You now have a list of all anagrams

Relatively fast to build, blazingly fast on look-up.

0 讨论(0)

南旧

2020-12-04 08:17
Well Tries would make it easier to check if the word exists. So if you put the whole dictionary in a trie:

http://en.wikipedia.org/wiki/Trie

then you can afterward take your word and do simple backtracking by taking a char and recursively checking if we can "walk" down the Trie with any combiniation of the rest of the chars (adding one char at a time). When all chars are used in a recursion branch and there was a valid path in the Trie, then the word exists.

The Trie helps because its a nice stopping condition: We can check if the part of a string, e.g "Anag" is a valid path in the trie, if not we can break that perticular recursion branch. This means we don't have to check every single permutation of the characters.

In pseudo-code
```
checkAllChars(currentPositionInTrie, currentlyUsedChars, restOfWord)
    if (restOfWord == 0)
    {
         AddWord(currentlyUsedChar)
    }
    else 
    {
        foreach (char in restOfWord)
        {
            nextPositionInTrie = Trie.Walk(currentPositionInTrie, char)
            if (nextPositionInTrie != Positions.NOT_POSSIBLE)
            {
                checkAllChars(nextPositionInTrie, currentlyUsedChars.With(char), restOfWord.Without(char))
            }
        }   
    }
```
Obviously you need a nice Trie datastructure which allows you to progressively "walk" down the tree and check at each node if there is a path with the given char to any next node...
0 讨论(0)
发布评论:

提交评论
- 加载中...
南旧

2020-12-04 08:18

Generating all permutations is easy, I guess you are worried that checking their existence in the dictionary is the "highly inefficient" part. But that actually depends on what data structure you use for the dictionary: of course, a list of words would be inefficient for your use case. Speaking of Tries, they would probably be an ideal representation, and quite efficient, too.

Another possibility would be to do some pre-processing on your dictionary, e.g. build a hashtable where the keys are the word's letters sorted, and the values are lists of words. You can even serialize this hashtable so you can write it to a file and reload quickly later. Then to look up anagrams, you simply sort your given word and look up the corresponding entry in the hashtable.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2020-12-04 08:24
- Reduce the words to - say - lower case (clojure.string/lower-case).
- Classify them (group-by) by letter frequency-map (frequencies).
- Drop the frequency maps,
- ... leaving the collections of anagrams.
(These) are the corresponding functions in the Lisp dialect Clojure.

The whole function can be expressed so:
```
(defn anagrams [dict]
  (->> dict
       (map clojure.string/lower-case)
       (group-by frequencies)
       vals))
```
For example,
```
(anagrams ["Salt" "last" "one" "eon" "plod"])
;(["salt" "last"] ["one" "eon"] ["plod"])
```
An indexing function that maps each thing to its collection is
```
(defn index [xss]
  (into {} (for [xs xss, x xs] [x xs])))
```
So that, for example,
```
((comp index anagrams) ["Salt" "last" "one" "eon" "plod"])
;{"salt" ["salt" "last"], "last" ["salt" "last"], "one" ["one" "eon"], "eon" ["one" "eon"], "plod" ["plod"]}
```
... where comp is the functional composition operator.
0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2020-12-04 08:25
That depends on how you store your dictionary. If it is a simple array of words, no algorithm will be faster than linear.

If it is sorted, then here's an approach that may work. I've invented it just now, but I guess its faster than linear approach.
1. Denote your dictionary as D, current prefix as S. S = 0;
2. You create frequency map for your word. Lets denote it by F.
3. Using binary search find pointers to start of each letter in dictionary. Lets denote this array of pointers by P.
4. For each char c from A to Z, if F[c] == 0, skip it, else
  - S += c;
  - F[c] --;
  - P <- for every character i P[i] = pointer to first word beginning with S+i.
  - Recursively call step 4 till you find a match for your word or till you find that no such match exists.
This is how I would do it, anyway. There should be a more conventional approach, but this is faster then linear.
0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2020-12-04 08:27
- Compute the frequency count vector for each word in the dictionary, a vector of length of the alphabet list.
- generate a random Gaussian vector of the length of the alphabet list
- project each dictionary word's count vector in this random direction and store the value (insert such that the array of values is sorted).
- Given a new test word, project it in the same random direction used for the dictionary words.
- Do a binary search to find the list of words that map to the same value.
- Verify if each word obtained as above is indeed a true anagram. If not, remove it from the list.
- Return the remaining elements of the list.
PS: The above procedure is a generalization of the prime number procedure which may potentially lead to large numbers (and hence computational precision issues)
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2