How to find smallest substring which contains all characters from a given string?

后端 未结 15 1135
灰色年华
灰色年华 2020-12-02 07:36

I have recently come across an interesting question on strings. Suppose you are given following:

Input string1: \"this is a test string\"
Input strin         


        
15条回答
  •  感情败类
    2020-12-02 08:07

    Edit: apparently there's an O(n) algorithm (cf. algorithmist's answer). Obviously this have this will beat the [naive] baseline described below!

    Too bad I gotta go... I'm a bit suspicious that we can get O(n). I'll check in tomorrow to see the winner ;-) Have fun!

    Tentative algorithm:
    The general idea is to sequentially try and use a character from str2 found in str1 as the start of a search (in either/both directions) of all the other letters of str2. By keeping a "length of best match so far" value, we can abort searches when they exceed this. Other heuristics can probably be used to further abort suboptimal (so far) solutions. The choice of the order of the starting letters in str1 matters much; it is suggested to start with the letter(s) of str1 which have the lowest count and to try with the other letters, of an increasing count, in subsequent attempts.

      [loose pseudo-code]
      - get count for each letter/character in str1  (number of As, Bs etc.)
      - get count for each letter in str2
      - minLen = length(str1) + 1  (the +1 indicates you're not sure all chars of 
                                    str2 are in str1)
      - Starting with the letter from string2 which is found the least in string1,
        look for other letters of Str2, in either direction of str1, until you've 
        found them all (or not, at which case response = impossible => done!). 
        set x = length(corresponding substring of str1).
     - if (x < minLen), 
             set minlen = x, 
             also memorize the start/len of the str1 substring.
     - continue trying with other letters of str1 (going the up the frequency
       list in str1), but abort search as soon as length(substring of strl) 
       reaches or exceed minLen.  
       We can find a few other heuristics that would allow aborting a 
       particular search, based on [pre-calculated ?] distance between a given
       letter in str1 and some (all?) of the letters in str2.
     - the overall search terminates when minLen = length(str2) or when 
       we've used all letters of str1 (which match one letter of str2)
       as a starting point for the search
    

提交回复
热议问题