How to correct the user input (Kind of google “did you mean?”)

后端 未结 8 1282
粉色の甜心
粉色の甜心 2021-01-30 23:50

I have the following requirement: -

I have many (say 1 million) values (names). The user will type a search string.

I don\'t expect the user to spell the names c

8条回答
  •  不要未来只要你来
    2021-01-31 00:21

    You have two possible issues that you need to address (or not address if you so choose)

    1. Users mistyping a word (an edit distance algorithm)
    2. Users not knowing a word and guessing (a phonetic match algorithm)

    Are you interested in both of these, or just one or the other? They are really two separate things; e.g. Sean and Shawn sound the same but have an edit distance of 3 - too high to be considered a typo.

    You should pre-index the count of words to ensure you are only suggesting relevant answers (similar to ealdent's suggestion). For example, if I entered sith I might expect to be asked if I meant smith, however if I typed smith it would not make sense to suggest sith. Determine an algorithm which measures the relative likelihood a word and only suggest words that are more likely.

    My experience in loose matching reinforced a simple but important learning - perform as many indexing/sieve layers as you need and don't be scared of including more than 2 or 3. Cull out anything that doesn't start with the correct letter, for instance, then cull everything that doesn't end in the correct letter, and so on. You really only want to perform edit distance calculation on the smallest possible dataset as it is a very intensive operation.

    So if you have an O(n), an O(nlogn), and an O(n^2) algorithm - perform all three, in that order, to ensure you are only putting your 'good prospects' through to your heavy algorithm.

提交回复
热议问题