Where can I learn more about the Google search “did you mean” algorithm? [duplicate]

僤鯓⒐⒋嵵緔 提交于 2019-11-27 09:03:09

问题


Possible Duplicate:
How do you implement a “Did you mean”?

I am writing an application where I require functionality similar to Google's "did you mean?" feature used by their search engine:

Is there source code available for such a thing or where can I find articles that would help me to build my own?


回答1:


You should check out Peter Norvigs article about implementing the spell checker in a few lines of python: How to Write a Spelling Corrector It also has links for implementations in other languages (i.e. C#)




回答2:


I attended a seminar by a Google engineer a year and a half ago, where they talked about their approach to this. The presenter was saying that (at least part of) their algorithm has little intelligence at all; but rather, utilises the huge amounts of data they have access to. They determined that if someone searches for "Brittany Speares", clicks on nothing, and then does another search for "Britney Spears", and clicks on something, we can have a fair guess about what they were searching for, and can suggest that in future.

Disclaimer: This may have just been part of their algorithm




回答3:


Python has a module called difflib. It provides a functionality called get_close_matches. From the Python Documentation:

get_close_matches(word, possibilities[, n][, cutoff])

Return a list of the best "good enough" matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).

Optional argument n (default 3) is the maximum number of close matches to return; n must be greater than 0.

Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don't score at least that similar to word are ignored.

The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.

  >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
  ['apple', 'ape']
  >>> import keyword
  >>> get_close_matches('wheel', keyword.kwlist)
  ['while']
  >>> get_close_matches('apple', keyword.kwlist)
  []
  >>> get_close_matches('accept', keyword.kwlist)
  ['except']

Could this library help you?




回答4:


You can use http://developer.yahoo.com/search/web/V1/spellingSuggestion.html which would give a similar functionality.




回答5:


You can check out the source code for Xapian which provides this functionality, as do a lot of other search libraries. http://xapian.org/




回答6:


I am not sure if it serves your purpose but a String Edit distance Algorithm with a dictionary might suffice for a small Application.




回答7:


I'd take a look at this article on google bombing. It shows that it just suggests answers based off previously entered results.




回答8:


AFAIK the "did you mean ?" feature doesn't check the spelling. It only gives you another query based on the content parsed by google.




回答9:


A great chapter to this topic can be found in the openly available Introduction to Information Retrieval.




回答10:


U could use ngram for the comparisment: http://en.wikipedia.org/wiki/N-gram

Using python ngram module: http://packages.python.org/ngram/index.html

import ngram

G2 = ngram.NGram([  "iis7 configure ftp 7.5",
                    "ubunto configre 8.5",
                    "mac configure ftp"])

print "String", "\t", "Similarity"
for i in G2.search("iis7 configurftp 7.5", threshold=0.1):
    print i[0], "\t", i[1]

U get:

>>> 
String  Similarity
"iis7 configure ftp 7.5"    0.76
"mac configure ftp  0.24"
"ubunto configre 8.5"   0.19



回答11:


take a look at Levenshtein-Automata



来源:https://stackoverflow.com/questions/3763640/where-can-i-learn-more-about-the-google-search-did-you-mean-algorithm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!