finding all distinct substring of a string

偶尔善良 提交于 2020-07-30 03:15:10

问题


hello guys i was given homework problem where it asks me to find all distinct substring of a string. I have implemented a method which will tell you all the substrings of string but i need a help figuring out how to not count one which is already counted once as substring because assignment is to find distinct one.

public int printSubstrings1(int length)
{ 
    for(int i=0; i<text.length()-length+1;i++)
    {
        String sub = text.substring(i,length+i);

        counter++;
    }
    return counter;

}

here i am passing the length of substrings that i want from te string given. i am doing that through another method.

so example string given is "fred" than the distinct substrings will be 10. my method will output right answer since the string does not contain any repeated letters. i am stuck on the part where i do get repeated substrings.

if i input fred. this is what my method will output

length 1
f
r
e
d
length 2
fr
re
ed
length 3
fre
red
length 4
fred


回答1:


Here the example with a Set

public int printSubstrings1(int length) {
    Set<String> set = new HashSet<String>();
    for(int i=0; i < text.length() - length + 1; i++) {
        String sub = text.substring(i,length+i);
        set.add(sub);
    }
    for (String str : set) {
        System.out.println(str);
    }
    return set.size();
}



回答2:


public ArrayList<String> getAllUniqueSubset(String str) {
        ArrayList<String> set = new ArrayList<String>();
        for (int i = 0; i < str.length(); i++) {
            for (int j = 0; j < str.length() - i; j++) {
                String elem = str.substring(j, j + (i+1));
                if (!set.contains(elem)) {
                    set.add(elem);
                }
            }
        }
        return set;
    }



回答3:


Insert any new sub string into an array and check if it is already available available there don't add it to the array else do. When done loop through the array and print out the distinct sub strings.

To check if an element exists in an array create a function that takes an array and a value as parameters. It would loop through the array looking for the value if found return true. Out of the loop return false.

e.g.

public static boolean(String target, String[] arr)
{
  for(int i = 0; i < arr.length; i++){
      if(arr[i].equals(target))
         return true;
  }
   return false;
}



回答4:


This algorithm uses just the Z-function / Z algorithm.

For each prefix i of the word, reverse it and do z_function over it. The number of new distinct substrings that end in i is (the length of the prefix) — (maximum value in the z_function array). The pseudo code looks like this:

string s; cin >> s;
int sol = 0
foreach i to s.size()-1
    string x = s.substr( 0 , i+1 );
    reverse( x.begin() , x.end() );
    vector<int> z = z_function( x );
    //this works too
    //vector<int> z = prefix_functionx(x); 
    int mx = 0;
    foreach j to x.size()-1
        mx = max( mx , z[j] );
    sol += (i+1) - mx; 

cout << sol;

The time complexity of this algorithm is O(n^2). The maximum can be returned from the z_function as well.

Source.

This is not my original answer. I am merely linking to it and pasting it in case the link goes down.




回答5:


I followed this link.Acknowledged the content from similar answer in quora

The solution consists of constructing the suffix array and then finding the number of distinct substrings based on the Longest Common Prefixes.

One key observation here is that:

If you look through the prefixes of each suffix of a string, you have covered all substrings of that string.

Let us take an example: BANANA

Suffixes are: 0) BANANA 1) ANANA 2) NANA 3) ANA 4) NA 5) A

It would be a lot easier to go through the prefixes if we sort the above set of suffixes, as we can skip the repeated prefixes easily.

Sorted set of suffixes: 5) A 3) ANA 1) ANANA 0) BANANA 4) NA 2) NANA

From now on,

LCP = Longest Common Prefix of 2 strings.

Initialize

ans = length(first suffix) = length("A") = 1.

Now consider the consecutive pairs of suffixes, i.e, [A, ANA], [ANA, ANANA], [ANANA, BANANA], etc. from the above set of sorted suffixes.

We can see that, LCP("A", "ANA") = "A".

All characters that are not part of the common prefix contribute to a distinct substring. In the above case, they are 'N' and 'A'. So they should be added to ans.

So we have, 1 2 ans += length("ANA") - LCP("A", "ANA") ans = ans + 3 - 1 = ans + 2 = 3

Do the same for the next pair of consecutive suffixes: ["ANA", "ANANA"]

1 2 3 4 LCP("ANA", "ANANA") = "ANA". ans += length("ANANA") - length(LCP) => ans = ans + 5 - 3 => ans = 3 + 2 = 5.

Similarly, we have:

1 2 LCP("ANANA", "BANANA") = 0 ans = ans + length("BANANA") - 0 = 11

1 2 LCP("BANANA", "NA") = 0 ans = ans + length("NA") - 0 = 13

1 2 LCP("NA", "NANA") = 2 ans = ans + length("NANA") - 2 = 15

Hence the number of distinct substrings for the string "BANANA" = 15.




回答6:


There are two ways you could do this, not sure if your teacher permits but I am going to use a HashSet for uniqueness.

Without using 'substring()':

void uniqueSubStrings(String test) {
HashSet < String > substrings = new LinkedHashSet();
char[] a = test.toCharArray();
for (int i = 0; i < test.length(); i++) {
    substrings.add(a[i] + "");
    for (int j = i + 1; j < test.length(); j++) {
        StringBuilder sb = new StringBuilder();
        for (int k = i; k <= j; k++) {
            sb.append(a[k]);
        }
        substrings.add(sb.toString());
    }
}
System.out.println(substrings);

}

Using 'substring':

    void uniqueSubStringsWithBuiltIn(String test) {
    HashSet<String> substrings = new LinkedHashSet();

    for(int i=0; i<test.length();i++) {
        for(int j=i+1;j<test.length()+1;j++) {
            substrings.add(test.substring(i, j));
        }
    }
        System.out.println(substrings);}


来源:https://stackoverflow.com/questions/13213783/finding-all-distinct-substring-of-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!