Find shortest subarray containing all elements

后端 未结 6 658
执念已碎
执念已碎 2020-12-13 04:42

Suppose you have an array of numbers, and another set of numbers. You have to find the shortest subarray containing all numbers with minimal complexity.

The array ca

相关标签:
6条回答
  • 2020-12-13 05:21

    Say the array has n elements, and set has m elements

    Sort the array, noting the reverse index (position in the original array)
    // O (n log n) time
    
    for each element in given set
       find it in the array
    // O (m log n) time  - log n for binary serch, m times
    
    keep track of the minimum and maximum index for each found element
    

    min - max defines your range

    Total time complexity: O ((m+n) log n)

    0 讨论(0)
  • 2020-12-13 05:23

    This sounds like a problem that is well-suited for a sliding window approach: maintain a window (a subarray) that is gradually expanding and contracting, and use a hashmap to keep track of the number of times each "interesting" number occurs in the window. E.g. start with an empty window, then expand it to include only element 0, then elements 0-1, then 0-2, 0-3, and so on, by adding subsequent elements (and using the hashmap to keep track of which numbers exist in the window). When the hashmap tells you that all interesting numbers exist in the window, you can begin contracting it: e.g. 0-5, 1-5, 2-5, etc., until you find out that the window no longer contains all interesting numbers. Then, you can begin expanding it on the right hand side again, and so on. I'm quite (but not entirely) sure that this would work for your problem, and it can be implemented to run in linear time.

    0 讨论(0)
  • 2020-12-13 05:23

    This solution definitely does not run in O(n) time as suggested by some of the pseudocode above, however it is real (Python) code that solves the problem and by my estimates runs in O(n^2):

    def small_sub(A, B):
        len_A = len(A)
        len_B = len(B)
    
        sub_A = []
        sub_size = -1
        dict_b = {}
    
        for elem in B:
            if elem in dict_b:
                dict_b[elem] += 1
            else:
                dict_b.update({elem: 1})
    
        for i in range(0, len_A - len_B + 1):
            if A[i] in dict_b:
                temp_size, temp_sub = find_sub(A[i:], dict_b.copy())
    
                if (sub_size == -1 or (temp_size != -1 and temp_size < sub_size)):
                    sub_A = temp_sub
                    sub_size = temp_size
    
        return sub_size, sub_A
    
    def find_sub(A, dict_b):
        index = 0
        for i in A:
            if len(dict_b) == 0:
                break
    
            if i in dict_b:
                dict_b[i] -= 1
    
                if dict_b[i] <= 0:
                    del(dict_b[i])
    
            index += 1
    
        if len(dict_b) > 0:
            return -1, {}
        else:
            return index, A[0:index]
    
    0 讨论(0)
  • 2020-12-13 05:29

    Here's how I solved this problem in linear time using collections.Counter objects

    from collections import Counter
    
    def smallest_subsequence(stream, search):
        if not search:
            return []  # the shortest subsequence containing nothing is nothing
    
        stream_counts = Counter(stream)
        search_counts = Counter(search)
    
        minimal_subsequence = None
    
        start = 0
        end = 0
        subsequence_counts = Counter()
    
        while True:
            # while subsequence_counts doesn't have enough elements to cancel out every
            # element in search_counts, take the next element from search
            while search_counts - subsequence_counts:
                if end == len(stream):  # if we've reached the end of the list, we're done
                    return minimal_subsequence
                subsequence_counts[stream[end]] += 1
                end += 1
    
            # while subsequence_counts has enough elements to cover search_counts, keep
            # removing from the start of the sequence
            while not search_counts - subsequence_counts:
                if minimal_subsequence is None or (end - start) < len(minimal_subsequence):
                    minimal_subsequence = stream[start:end]
                subsequence_counts[stream[start]] -= 1
                start += 1
    
    print(smallest_subsequence([1, 2, 5, 8, 7, 6, 2, 6, 5, 3, 8, 5], [5, 7]))
    # [5, 8, 7]
    
    0 讨论(0)
  • 2020-12-13 05:34

    Java solution

        List<String> paragraph = Arrays.asList("a", "c", "d", "m", "b", "a");
        Set<String> keywords = Arrays.asList("a","b");
    
        Subarray result = new Subarray(-1,-1);
    
        Map<String, Integer> keyWordFreq = new HashMap<>();
    
        int numKeywords = keywords.size();
    
        // slide the window to contain the all the keywords**
        // starting with [0,0]
        for (int left = 0, right = 0 ; right < paragraph.size() ; right++){
    
          // expand right to contain all the keywords
          String currRight = paragraph.get(right);
    
          if (keywords.contains(currRight)){
            keyWordFreq.put(currRight, keyWordFreq.get(currRight) == null ? 1 : keyWordFreq.get(currRight) + 1);
          }
    
          // loop enters when all the keywords are present in the current window
          // contract left until the all the keywords are still present
          while (keyWordFreq.size() == numKeywords){
            String currLeft = paragraph.get(left);
    
            if (keywords.contains(currLeft)){
    
              // remove from the map if its the last available so that loop exists
              if (keyWordFreq.get(currLeft).equals(1)){
    
                // now check if current sub array is the smallest
                if((result.start == -1 && result.end == -1) || (right - left) < (result.end - result.start)){
                  result = new Subarray(left, right);
                }
                keyWordFreq.remove(currLeft);
              }else {
    
                // else reduce the frequcency
                keyWordFreq.put(currLeft, keyWordFreq.get(currLeft) - 1);
              }
            }
            left++;
          }
    
        }
    
        return result;
      }
    
    0 讨论(0)
  • 2020-12-13 05:39

    Proof of a linear-time solution

    I will write right-extension to mean increasing the right endpoint of a range by 1, and left-contraction to mean increasing the left endpoint of a range by 1. This answer is a slight variation of Aasmund Eldhuset's answer. The difference here is that once we find the smallest j such that [0, j] contains all interesting numbers, we thereafter consider only ranges that contain all interesting numbers. (It's possible to interpret Aasmund's answer this way, but it's also possible to interpret it as allowing a single interesting number to be lost due to a left-contraction -- an algorithm whose correctness has yet to be established.)

    The basic idea is that for each position j, we will find the shortest satisfying range ending at position j, given that we know the shortest satisfying range ending at position j-1.

    EDIT: Fixed a glitch in the base case.

    Base case: Find the smallest j' such that [0, j'] contains all interesting numbers. By construction, there can be no ranges [0, k < j'] that contain all interesting numbers so we don't need to worry about them further. Now find the smallestlargest i such that [i, j'] contains all interesting numbers (i.e. hold j' fixed). This is the smallest satisfying range ending at position j'.

    To find the smallest satisfying range ending at any arbitrary position j, we can right-extend the smallest satisfying range ending at position j-1 by 1 position. This range will necessarily also contain all interesting numbers, though it may not be minimal-length. The fact that we already know this is a satisfying range means that we don't have to worry about extending the range "backwards" to the left, since that can only increase the range over its minimal length (i.e. make the solution worse). The only operations we need to consider are left-contractions that preserve the property of containing all interesting numbers. So the left endpoint of the range should be advanced as far as possible while this property holds. When no more left-contractions can be performed, we have the minimal-length satisfying range ending at j (since further left-contractions clearly cannot make the range satisfying again) and we are done.

    Since we perform this for each rightmost position j, we can take the minimum-length range over all rightmost positions to find the overall minimum. This can be done using a nested loop in which j advances on each outer loop cycle. Clearly j advances by 1 n times. Since at any point in time we only ever need the leftmost position of the best range for the previous value of j, we can store this in i and just update it as we go. i starts at 0, is at all times <= j <= n, and only ever advances upwards by 1, meaning it can advance at most n times. Both i and j advance at most n times, meaning that the algorithm is linear-time.

    In the following pseudo-code, I've combined both phases into a single loop. We only try to contract the left side if we have reached the stage of having all interesting numbers:

    # x[0..m-1] is the array of interesting numbers.
    # Load them into a hash/dictionary:
    For i from 0 to m-1:
        isInteresting[x[i]] = 1
    
    i = 0
    nDistinctInteresting = 0
    minRange = infinity
    For j from 0 to n-1:
        If count[a[j]] == 0 and isInteresting[a[j]]:
            nDistinctInteresting++
        count[a[j]]++
    
        If nDistinctInteresting == m:
            # We are in phase 2: contract the left side as far as possible
            While count[a[i]] > 1 or not isInteresting[a[i]]:
                count[a[i]]--
                i++
    
            If j - i < minRange:
                (minI, minJ) = (i, j)
    

    count[] and isInteresting[] are hashes/dictionaries (or plain arrays if the numbers involved are small).

    0 讨论(0)
提交回复
热议问题