longest common substring in R finding non-contiguous matches between the two strings

后端 未结 4 811
挽巷
挽巷 2020-11-29 07:21

I have a question regarding finding the longest common substring in R. While searching through a few posts on StackOverflow, I got to know about the qualV package. However,

4条回答
  •  青春惊慌失措
    2020-11-29 08:00

    I'm not sure what you did to get your output of "hello". Based on trial-and-error experiments below, it appears that the LCS function will (a) not regard a string as an LCS if a character follows what would otherwise be an LCS; (b) find multiple, equally-long LCS's (unlike sub() that finds just the first); (c) the order of the elements in the strings doesn't matter -- which has no illustration below; and (b) the order of the string in the LCS call doesn't matter -- also not shown.

    So, your "hello" of a had no LCS in b since the "hel" of b was followed by a character. Well, that's my current hypothesis.

    Point A above:

    a= c("hello", "hel", "abcd")
    b= c("hello123l5678o", "abcd") 
    print(LCS(a, b)[4]) # "abcd" - perhaps because it has nothing afterwards, unlike hello123...
    
    a= c("hello", "hel", "abcd1") # added 1 to abcd
    b= c("hello123l5678o", "abcd") 
    print(LCS(a, b)[4]) # no LCS!, as if anything beyond an otherwise LCS invalidates it
    
    a= c("hello", "hel", "abcd") 
    b= c("hello1", "abcd") # added 1 to hello
    print(LCS(a, b)[4]) # abcd only, since the b hello1 has a character
    

    Point B above:

    a= c("hello", "hel", "abcd") 
    b= c("hello", "abcd") 
    print(LCS(a, b)[4]) # found both, so not like sub vs gsub of finding first or all
    

提交回复
热议问题