Determine if all elements in a list are present and in the same order in another list

后端 未结 10 1170
难免孤独
难免孤独 2021-01-02 15:35

How do I create a function sublist() that takes two lists, list1 and list2, and returns True if list1 is a s

相关标签:
10条回答
  • 2021-01-02 15:41

    Congrats on a deceptively hard question. I think this will work but I will not be shocked if I missed a corner case, especially with repeated elements. Revised version inspired by Hgu Nguyen's recursive solution:

    def sublist(a, b):
        index_a = 0
        index_b = 0
        len_a = len(a)
        len_b = len(b)
        while index_a < len_a and index_b < len_b:
            if a[index_a] == b[index_b]:
                index_a += 1
                index_b += 1
            else:
                index_b += 1
        return index_a == len_a
    

    Some rough profiling:

    Given lists that require traversing most or all of b, my algorithm suffers:

    a = [1, 3, 999999]
    b = list(range(1000000))
    

    On my PC, Huu Nguyen or Hetman's algorithm takes about 10 seconds to run 100 iterations of the check. My algorithm takes 20 seconds.

    Given an earlier success, Huu's algorithm falls vastly behind:

    a = [1, 3, 5]
    

    Hetman's algorithm or mine can complete 100k checks in under a second - Hetman's in 0.13 seconds on my PC, mine in 0.19 seconds. Huu's takes 16 seconds to complete 1k checks. I am frankly astounded at that degree of difference - recursion can be slow if not compiler optimized, I know, but 4 orders of magnitude is worse than I would have expected.

    Given a failure list a, the performance heads back towards what I saw when requiring traversal of the whole second list - understandable, since there's no way to know that there won't be a sequence at the end that matches the otherwise unmatchable list.

    a = [3, 1, 5]
    

    Again, about 10 seconds for Huu Nguyen or Hetman's algorithm for 100 tests, 20 for mine.

    Longer ordered lists maintain the pattern I saw for early success. EG:

    a = range(0, 1000, 20)
    

    With Hetman's algorithm that took 10.99 seconds to complete 100k tests, while mine took 24.08. Huu's took 28.88 to complete 100 tests.

    These are admittedly not the full range of tests you could run, but in all cases Hetman's algorithm performed the best.

    0 讨论(0)
  • 2021-01-02 15:44

    Here's another solution that may be easier for novices to understand than Hetman's. (Notice that it's very close to the OP's implementation in this duplicate question, but avoiding the problem of restarting the search from the start of b each time.)

    def sublist(a, b):
        i = -1
        try:
            for e in a:
                i = b.index(e, i+1)
        except ValueError:
            return False
        else:
            return True
    

    Of course this requires b to be a list, while Hetman's answer allows any iterable. And I think that (for people who understand Python well enough) it's less simple than Hetman's answer, too.

    Algorithmically, it's doing the same thing as Hetman's answer, so it's O(N) time and O(1) space. But practically, it may be faster, at least in CPython, since we're moving the inner loop from a Python while around an iterator to a C fast-getindex loop (inside list.index). Then again, it may be slower, because we're copying around that i value instead of having all state embedded inside a (C-implemented) iterator. If it matters, test them both with your real data. :)

    0 讨论(0)
  • 2021-01-02 15:54

    For a quick-and-dirty solution that runs slowly, but will be totally adequate for arrays of the size you showed:

    def sublist(a,b):
        last = 0
        for el_a in a:
            if el_a in b[last:]:
                 last = b[last:].index(el_a)
            else:
                 return False
        return True
    

    **Edited to work for non-contiguous elements

    0 讨论(0)
  • 2021-01-02 15:56

    A very rough solution:

    def sublist(a, b):
        if not a:
            return True
        for k in range(len(b)):
            if a[0] == b[k]:
                return sublist(a[1:], b[k+1:])
        return False
    
    print sublist([1, 12, 3], [25, 1, 30, 12, 3, 40]) # True
    print sublist([12, 1, 3], [25, 1, 30, 12, 3, 40]) # False
    

    Edit: Speed upgrade

    0 讨论(0)
  • 2021-01-02 15:56

    Here is a simplified version:

    def sublist(a,b):
        try:
            return a[0] in b and sublist(a[1:],b[1+b.index(a[0]):])
        except IndexError:
            return True
    
    >>> print sublist([1, 12, 3],[25, 1, 30, 12, 3, 40])
    True
    
    >>> print sublist([5, 90, 2],[90, 20, 5, 2, 17])
    False
    
    0 讨论(0)
  • 2021-01-02 16:02

    Here's an iterative solution which should have optimal asymptotics:

    def sublist(x, y):
        if x and not y:
            return False
        i, lim = 0, len(y)
        for e in x:
            while e != y[i]:
                i += 1
                if i == lim:
                    return False
            i += 1
        return True
    

    @sshashank124's solution has the same complexity, but the dynamics will be somewhat different: his version traverses the second argument multiple times, but because it pushes more work into the C layer it'll probably be much faster on smaller input.

    Edit: @hetman's solution has essentially the same logic, but is much more Pythonic, although, contrary to my expectation, it seems to be slightly slower. (I was also incorrect about the performance of @sshashan124's solution; the overhead of the recursive calls appears to outweigh the benefit of doing more work in C.)

    0 讨论(0)
提交回复
热议问题