Getting the closest string match

后端 未结 13 917
难免孤独
难免孤独 2020-11-22 10:57

I need a way to compare multiple strings to a test string and return the string that closely resembles it:

TEST STRING: THE BROWN FOX JUMPED OVER THE RED COW         


        
13条回答
  •  我在风中等你
    2020-11-22 11:20

    There is one more similarity measure which I once implemented in our system and was giving satisfactory results :-

    Use Case

    There is a user query which needs to be matched against a set of documents.

    Algorithm

    1. Extract keywords from the user query (relevant POS TAGS - Noun, Proper noun).
    2. Now calculate score based on below formula for measuring similarity between user query and given document.

    For every keyword extracted from user query :-

    • Start searching the document for given word and for every subsequent occurrence of that word in the document decrease the rewarded points.

    In essence, if first keyword appears 4 times in the document, the score will be calculated as :-

    • first occurrence will fetch '1' point.
    • Second occurrence will add 1/2 to calculated score
    • Third occurrence would add 1/3 to total
    • Fourth occurrence gets 1/4

    Total similarity score = 1 + 1/2 + 1/3 + 1/4 = 2.083

    Similarly, we calculate it for other keywords in user query.

    Finally, the total score will represent the extent of similarity between user query and given document.

提交回复
热议问题