I have come across a problem statement to find the all the common sub-strings between the given two sub-strings such a way that in every case you have to print the l
Typically this type of substring matching is done with the assistance of a separate data structure called a Trie (pronounced try). The specific variant that best suits this problem is a suffix tree. Your first step should be to take your inputs and build a suffix tree. Then you'll need to use the suffix tree to determine the longest common substring, which is a good exercise.
You would be better off with a proper algorithm for the task rather than a brute-force approach. Wikipedia describes two common solutions to the longest common substring problem: suffix-tree and dynamic-programming.
The dynamic programming solution takes O(n m) time and O(n m) space. This is pretty much a straightforward Java translation of the Wikipedia pseudocode for the longest common substring:
public static Set<String> longestCommonSubstrings(String s, String t) {
int[][] table = new int[s.length()][t.length()];
int longest = 0;
Set<String> result = new HashSet<>();
for (int i = 0; i < s.length(); i++) {
for (int j = 0; j < t.length(); j++) {
if (s.charAt(i) != t.charAt(j)) {
continue;
}
table[i][j] = (i == 0 || j == 0) ? 1
: 1 + table[i - 1][j - 1];
if (table[i][j] > longest) {
longest = table[i][j];
result.clear();
}
if (table[i][j] == longest) {
result.add(s.substring(i - longest + 1, i + 1));
}
}
}
return result;
}
Now, you want all of the common substrings, not just the longest. You can enhance this algorithm to include shorter results. Let's examine the table for the example inputs eatsleepnightxyz
and eatsleepabcxyz
:
e a t s l e e p a b c x y z
e 1 0 0 0 0 1 1 0 0 0 0 0 0 0
a 0 2 0 0 0 0 0 0 1 0 0 0 0 0
t 0 0 3 0 0 0 0 0 0 0 0 0 0 0
s 0 0 0 4 0 0 0 0 0 0 0 0 0 0
l 0 0 0 0 5 0 0 0 0 0 0 0 0 0
e 1 0 0 0 0 6 1 0 0 0 0 0 0 0
e 1 0 0 0 0 1 7 0 0 0 0 0 0 0
p 0 0 0 0 0 0 0 8 0 0 0 0 0 0
n 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t 0 0 1 0 0 0 0 0 0 0 0 0 0 0
x 0 0 0 0 0 0 0 0 0 0 0 1 0 0
y 0 0 0 0 0 0 0 0 0 0 0 0 2 0
z 0 0 0 0 0 0 0 0 0 0 0 0 0 3
eatsleep
result is obvious: that's the 12345678
diagonal streak at the top-left.xyz
result is the 123
diagonal at the bottom-right.a
result is indicated by the 1
near the top (second row, ninth column).t
result is indicated by the 1
near the bottom left.What about the other 1
s at the left, the top, and next to the 6
and 7
? Those don't count because they appear within the rectangle formed by the 12345678
diagonal — in other words, they are already covered by eatsleep
.
I recommend doing one pass doing nothing but building the table. Then, make a second pass, iterating backwards from the bottom-right, to gather the result set.