Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)

后端 未结 6 1765
耶瑟儿~
耶瑟儿~ 2020-11-29 03:42

It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathemati

6条回答
  •  情话喂你
    2020-11-29 04:21

    I'm accepting the answer of Gareth Rees as the most appealing explanation (short of an answer from the Python library designers), namely, that Python's itertools.permutations doesn't compare the values of the elements. Come to think of it, this is what the question asks about, but I see now how it could be seen as an advantage, depending on what one typically uses itertools.permutations for.

    Just for completeness, I compared three methods of generating all distinct permutations. Method 1, which is very inefficient memory-wise and time-wise but requires the least new code, is to wrap Python's itertools.permutations, as in zeekay's answer. Method 2 is a generator-based version of C++'s next_permutation, from this blog post. Method 3 is something I wrote that is even closer to C++'s next_permutation algorithm; it modifies the list in-place (I haven't made it too general).

    def next_permutationS(l):
        n = len(l)
        #Step 1: Find tail
        last = n-1 #tail is from `last` to end
        while last>0:
            if l[last-1] < l[last]: break
            last -= 1
        #Step 2: Increase the number just before tail
        if last>0:
            small = l[last-1]
            big = n-1
            while l[big] <= small: big -= 1
            l[last-1], l[big] = l[big], small
        #Step 3: Reverse tail
        i = last
        j = n-1
        while i < j:
            l[i], l[j] = l[j], l[i]
            i += 1
            j -= 1
        return last>0
    

    Here are some results. I have even more respect for Python's built-in function now: it's about three to four times as fast as the other methods when the elements are all (or almost all) distinct. Of course, when there are many repeated elements, using it is a terrible idea.

    Some results ("us" means microseconds):
    
    l                                       m_itertoolsp  m_nextperm_b  m_nextperm_s
    [1, 1, 2]                               5.98 us       12.3 us       7.54 us
    [1, 2, 3, 4, 5, 6]                      0.63 ms       2.69 ms       1.77 ms
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]         6.93 s        13.68 s       8.75 s
    
    [1, 2, 3, 4, 6, 6, 6]                   3.12 ms       3.34 ms       2.19 ms
    [1, 2, 2, 2, 2, 3, 3, 3, 3, 3]          2400 ms       5.87 ms       3.63 ms
    [1, 1, 1, 1, 1, 1, 1, 1, 1, 2]          2320000 us    89.9 us       51.5 us
    [1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4]    429000 ms     361 ms        228 ms
    

    The code is here if anyone wants to explore.

提交回复
热议问题