Finding shortest repeating cycle in word?

社会主义新天地 提交于 2019-11-30 06:32:41

问题


I'm about to write a function which, would return me a shortest period of group of letters which would eventually create the given word.

For example word abkebabkebabkeb is created by repeated abkeb word. I would like to know, how efficiently analyze input word, to get the shortest period of characters creating input word.


回答1:


O(n) solution. Assumes that the entire string must be covered. The key observation is that we generate the pattern and test it, but if we find something along the way that doesn't match, we must include the entire string that we already tested, so we don't have to reobserve those characters.

def pattern(inputv):
    pattern_end =0
    for j in range(pattern_end+1,len(inputv)):

        pattern_dex = j%(pattern_end+1)
        if(inputv[pattern_dex] != inputv[j]):

            pattern_end = j;
            continue

        if(j == len(inputv)-1):
            print pattern_end
            return inputv[0:pattern_end+1];
    return inputv;



回答2:


Here is a correct O(n) algorithm. The first for loop is the table building portion of KMP. There are various proofs that it always runs in linear time.

Since this question has 4 previous answers, none of which are O(n) and correct, I heavily tested this solution for both correctness and runtime.

def pattern(inputv):
    if not inputv:
        return inputv

    nxt = [0]*len(inputv)
    for i in range(1, len(nxt)):
        k = nxt[i - 1]
        while True:
            if inputv[i] == inputv[k]:
                nxt[i] = k + 1
                break
            elif k == 0:
                nxt[i] = 0
                break
            else:
                k = nxt[k - 1]

    smallPieceLen = len(inputv) - nxt[-1]
    if len(inputv) % smallPieceLen != 0:
        return inputv

    return inputv[0:smallPieceLen]



回答3:


This is an example for PHP:

<?php
function getrepeatedstring($string) {
    if (strlen($string)<2) return $string;
    for($i = 1; $i<strlen($string); $i++) {
        if (substr(str_repeat(substr($string, 0, $i),strlen($string)/$i+1), 0, strlen($string))==$string)
            return substr($string, 0, $i);
    }
    return $string;
}
?>



回答4:


I believe there is a very elegant recursive solution. Many of the proposed solutions solve the extra complexity where the string ends with part of the pattern, like abcabca. But I do not think that is asked for.

My solution for the simple version of the problem in clojure:

 (defn find-shortest-repeating [pattern string]
  (if (empty? (str/replace string pattern ""))
   pattern
   (find-shortest-repeating (str pattern (nth string (count pattern))) string)))

(find-shortest-repeating "" "abcabcabc") ;; "abc"

But be aware that this will not find patterns that are uncomplete at the end.




回答5:


I found a solution based on your post, that could take an incomplete pattern:

(defn find-shortest-repeating [pattern string]
   (if (or (empty? (clojure.string/split string (re-pattern pattern)))
          (empty? (second (clojure.string/split string (re-pattern pattern)))))
    pattern
    (find-shortest-repeating (str pattern (nth string (count pattern))) string)))



回答6:


My Solution: The idea is to find a substring from the position zero such that it becomes equal to the adjacent substring of same length, when such a substring is found return the substring. Please note if no repeating substring is found I am printing the entire input String.

public static void repeatingSubstring(String input){
    for(int i=0;i<input.length();i++){
        if(i==input.length()-1){
            System.out.println("There is no repetition "+input);
        }
        else if(input.length()%(i+1)==0){
            int size = i+1;
            if(input.substring(0, i+1).equals(input.substring(i+1, i+1+size))){
                System.out.println("The subString which repeats itself is "+input.substring(0, i+1));
                break;
            }
        }
    }
}



回答7:


Regex solution:

Step 1: Separate each character with a delimiter character that isn't part of the input-string, including a trailing one (i.e. ~):

(.)
$1~

Example input: "abkebabkebabkeb"
Example output: "a~b~k~e~b~a~b~k~e~b~a~b~k~e~b~"

Try it online in Retina. (NOTE: Retina is a Regex-based programming language designed for quick testing of regexes and being able to compete successfully in code-golf challenges.)

Step 2: Use the following regex to find the shortest repeating substring (where ~ is our chosen delimiter character):

^(([^~]+~)*?)\1*$
$1

Explanation:

^(([^~]+~)*?)\1*$
^               $    # Start and end, to match the entire input-string
  ([^~]+~)           # Capture group 1: One or more non-'~' followed by a '~'
 (        *?)        # Capture group 2: Repeated zero or more time optionally
             \1*     # Followed by the first capture group repeated zero or more times

$1                   # Replace the entire input-string with the first capture group match

Example input: "a~b~k~e~b~a~b~k~e~b~a~b~k~e~b~"
Example output: "a~b~k~e~b~"

Try it online in Retina.

Step 3: Remove our delimiter character again, to get our intended result.

~
<empty>

Example input: "a~b~k~e~b~"
Example output: "abkeb"

Try it online in Retina.

Here an example implementation in Java.




回答8:


Super delayed answer, but I got the question in an interview, here was my answer (probably not the most optimal but it works for strange test cases as well).

private void run(String[] args) throws IOException {
    File file = new File(args[0]);
    BufferedReader buffer = new BufferedReader(new FileReader(file));
    String line;
    while ((line = buffer.readLine()) != null) {
        ArrayList<String> subs = new ArrayList<>();
        String t = line.trim();
        String out = null;
        for (int i = 0; i < t.length(); i++) {
            if (t.substring(0, t.length() - (i + 1)).equals(t.substring(i + 1, t.length()))) {
                subs.add(t.substring(0, t.length() - (i + 1)));
            }
        }
        subs.add(0, t);
        for (int j = subs.size() - 2; j >= 0; j--) {
            String match = subs.get(j);
            int mLength = match.length();
            if (j != 0 && mLength <= t.length() / 2) {
                if (t.substring(mLength, mLength * 2).equals(match)) {
                    out = match;
                    break;
                }
            } else {
                out = match;
            }
        }
        System.out.println(out);
    }
}

Testcases:

abcabcabcabc
bcbcbcbcbcbcbcbcbcbcbcbcbcbc
dddddddddddddddddddd
adcdefg
bcbdbcbcbdbc
hellohell

Code returns:

abc
bc
d
adcdefg
bcbdbc
hellohell




回答9:


Works in cases such as bcbdbcbcbdbc.

function smallestRepeatingString(sequence){
  var currentRepeat = '';
  var currentRepeatPos = 0;

  for(var i=0, ii=sequence.length; i<ii; i++){
    if(currentRepeat[currentRepeatPos] !== sequence[i]){
      currentRepeatPos = 0;
      // Add next character available to the repeat and reset i so we don't miss any matches inbetween
      currentRepeat = currentRepeat + sequence.slice(currentRepeat.length, currentRepeat.length+1);
      i = currentRepeat.length-1;
    }else{
      currentRepeatPos++;
    }
    if(currentRepeatPos === currentRepeat.length){
      currentRepeatPos = 0;
    }
  }

  // If repeat wasn't reset then we didn't find a full repeat at the end.
  if(currentRepeatPos !== 0){ return sequence; }

  return currentRepeat;
}



回答10:


I came up with a simple solution that works flawlessly even with very large strings.
PHP Implementation:

function get_srs($s){
    $hash = md5( $s );
    $i = 0; $p = '';

    do {
        $p .= $s[$i++];
        preg_match_all( "/{$p}/", $s, $m );
    } while ( ! hash_equals( $hash, md5( implode( '', $m[0] ) ) ) );

    return $p;
}


来源:https://stackoverflow.com/questions/6021274/finding-shortest-repeating-cycle-in-word

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!