Text comparison algorithm

前端 未结 6 1293
悲哀的现实
悲哀的现实 2020-11-27 03:29

We have a requirement in the project that we have to compare two texts (update1, update2) and come up with an algorithm to define how many words and how many sentences have

6条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-11-27 04:14

    Here are two papers that describe other text comparison algorithms that should generally output 'better' (e.g. smaller, more meaningful) differences:

    • Tichy, Walter F., "The String-to-String Correction Problem with Block Moves" (1983). Computer Science Technical Reports. Paper 378.
    • Paul Heckel, "A Technique for Isolating Differences Between Files", Communications of the ACM, April 1978, Volume 21, Number 4

    The first paper cites the second and mentions this about its algorithm:

    Heckel[3] pointed out similar problems with LCS techniques and proposed a linear-lime algorithm to detect block moves. The algorithm performs adequately if there are few duplicate symbols in the strings. However, the algorithm gives poor results otherwise. For example, given the two strings aabb and bbaa, Heckel's algorithm fails to discover any common substring.

    The first paper was mentioned in this answer and the second in this answer, both to the similar SO question:

    • Is there a diff-like algorithm that handles moving block of lines? - Stack Overflow

提交回复
热议问题