String similarity score/hash

前端 未结 12 1183
长发绾君心
长发绾君心 2020-12-07 09:52

Is there a method to calculate something like general \"similarity score\" of a string? In a way that I am not comparing two strings together but rather I get some number (h

12条回答
  •  时光取名叫无心
    2020-12-07 10:46

    Your idea sounds like ontology but applied to whole phrases. The more similar two phrases are, the closer in the graph they are (assuming you're using weighted edges). And vice-versa: non similar phrases are very far from each other.

    Another approach, is to use Fourier transform to get sort of the 'index' for a given string (it won't be a single number, but always). You may find little bit more in this paper.

    And another idea, that bases on the Levenshtein distance: you may compare n-grams that will give you some similarity index for two given phrases - the more they are similar the value is closer to 1. This may be used to calculate distance in the graph. wrote a paper on this a few years ago, if you'd like I can share it.

    Anyways: despite I don't know the exact solution, I'm also interested in what you'll came up with.

提交回复
热议问题