Algorithm to find common substring across N strings

前端 未结 2 824
小鲜肉
小鲜肉 2020-12-06 07:24

I\'m familiar with LCS algorithms for 2 strings. Looking for suggestions for finding common substrings in 2..N strings. There may be multiple common substrings in each pair.

2条回答
  •  情歌与酒
    2020-12-06 07:49

    This sort of thing is done all the time in DNA sequence analysis. You can find a variety of algorithms for it. One reasonable collection is listed here.

    There's also the brute-force approach of making tables of every substring (if you're interested only in short ones): form an N-ary tree (N=26 for letters, 256 for ASCII) at each level, and store histograms of the count at every node. If you prune off little-used nodes (to keep the memory requirements reasonable), you end up with an algorithm that finds all subsequences of length up to M in something like N*M^2*log(M) time for input of length N. If you instead split this up into K separate strings, you can build the tree structure and just read off the answer(s) in a single pass through the tree.

提交回复
热议问题