Similar String algorithm

后端 未结 9 1384
梦如初夏
梦如初夏 2020-11-29 20:10

I\'m looking for an algorithm, or at least theory of operation on how you would find similar text in two or more different strings...

Much like the question posed he

9条回答
  •  执笔经年
    2020-11-29 20:33

    I can't mark two answers here, so I'm going to answer and mark my own. The Levenshtein distance appears to be the correct method in most cases for this. But, it is worth mentioning j_random_hackers answer as well. I have used an implementation of LZMA to test his theory, and it proves to be a sound solution. In my original question I was looking for a method for short strings (2 to 200 chars), where the Levenshtein Distance algorithm will work. But, not mentioned in the question was the need to compare two (larger) strings (in this case, text files of moderate size) and to perform a quick check to see how similar the two are. I believe that this compression technique will work well but I have yet to study it to find at which point one becomes better than the other, in terms of the size of the sample data and the speed/cost of the operation in question. I think a lot of the answers given to this question are valuable, and worth mentioning, for anyone looking to solve a similar string ordeal like I'm doing here. Thank you all for your great answers, and I hope they can be used to serve others well too.

提交回复
热议问题