I am working on C# to find all the common substrings between two strings. For instance, if the input is: S1= \"need asssitance with email\" S2= \"email assistance needed\"<
Use Set-Intersections
Start with a routine to find all possible substrings of a string. Here it is in Python, it's an 'exercise for the reader' to translate it to C#':
def allSubstr(instring):
retset = set()
retset.add(instring)
totlen = len(instring)
for thislen in range(0, totlen):
for startpos in range(0, totlen):
# print "startpos: %s, thislen: %s" % (startpos, thislen)
addStr = instring[startpos:startpos+thislen]
# print "addstr: %s" % (addStr)
retset.add(addStr)
print "retset total: %s" % (retset)
return retset
set1 = allSubstr('abcdefg')
set2 = allSubstr('cdef')
print set1.intersection(set2)
Here's the output:
>>> set1 = allSubstr('abcdefg')
retset total: set(['', 'cde', 'ab', 'ef', 'cd', 'abcdef', 'abc', 'efg', 'bcde', 'cdefg', 'bc', 'de', 'bcdef', 'abcd', 'defg', 'fg', 'cdef', 'a', 'c', 'b', 'e', 'd', 'g', 'f', 'bcd', 'abcde', 'abcdefg', 'bcdefg', 'def'])
>>> set2 = allSubstr('cdef')
retset total: set(['', 'cde', 'c', 'ef', 'e', 'd', 'f', 'de', 'cd', 'cdef', 'def'])
>>>
>>> set1.intersection(set2)
set(['', 'cde', 'c', 'de', 'e', 'd', 'f', 'ef', 'cd', 'cdef', 'def'])
No, you're not interested in subsets of length 1. But, you can always add a limit to length before you do the set.add() call.