What is the difference between `sorted(list)` vs `list.sort()`?

后端 未结 6 2182
温柔的废话
温柔的废话 2020-11-22 09:19

list.sort() sorts the list and replaces the original list, whereas sorted(list) returns a sorted copy of the list, without changing the original li

6条回答
  •  刺人心
    刺人心 (楼主)
    2020-11-22 09:42

    What is the difference between sorted(list) vs list.sort()?

    • list.sort mutates the list in-place & returns None
    • sorted takes any iterable & returns a new list, sorted.

    sorted is equivalent to this Python implementation, but the CPython builtin function should run measurably faster as it is written in C:

    def sorted(iterable, key=None):
        new_list = list(iterable)    # make a new list
        new_list.sort(key=key)       # sort it
        return new_list              # return it
    

    when to use which?

    • Use list.sort when you do not wish to retain the original sort order (Thus you will be able to reuse the list in-place in memory.) and when you are the sole owner of the list (if the list is shared by other code and you mutate it, you could introduce bugs where that list is used.)
    • Use sorted when you want to retain the original sort order or when you wish to create a new list that only your local code owns.

    Can a list's original positions be retrieved after list.sort()?

    No - unless you made a copy yourself, that information is lost because the sort is done in-place.

    "And which is faster? And how much faster?"

    To illustrate the penalty of creating a new list, use the timeit module, here's our setup:

    import timeit
    setup = """
    import random
    lists = [list(range(10000)) for _ in range(1000)]  # list of lists
    for l in lists:
        random.shuffle(l) # shuffle each list
    shuffled_iter = iter(lists) # wrap as iterator so next() yields one at a time
    """
    

    And here's our results for a list of randomly arranged 10000 integers, as we can see here, we've disproven an older list creation expense myth:

    Python 2.7

    >>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
    [3.75168503401801, 3.7473005310166627, 3.753129180986434]
    >>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
    [3.702025591977872, 3.709248117986135, 3.71071034099441]
    

    Python 3

    >>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
    [2.797430992126465, 2.796825885772705, 2.7744789123535156]
    >>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
    [2.675589084625244, 2.8019039630889893, 2.849375009536743]
    

    After some feedback, I decided another test would be desirable with different characteristics. Here I provide the same randomly ordered list of 100,000 in length for each iteration 1,000 times.

    import timeit
    setup = """
    import random
    random.seed(0)
    lst = list(range(100000))
    random.shuffle(lst)
    """
    

    I interpret this larger sort's difference coming from the copying mentioned by Martijn, but it does not dominate to the point stated in the older more popular answer here, here the increase in time is only about 10%

    >>> timeit.repeat("lst[:].sort()", setup=setup, number = 10000)
    [572.919036605, 573.1384446719999, 568.5923951]
    >>> timeit.repeat("sorted(lst[:])", setup=setup, number = 10000)
    [647.0584738299999, 653.4040515829997, 657.9457361929999]
    

    I also ran the above on a much smaller sort, and saw that the new sorted copy version still takes about 2% longer running time on a sort of 1000 length.

    Poke ran his own code as well, here's the code:

    setup = '''
    import random
    random.seed(12122353453462456)
    lst = list(range({length}))
    random.shuffle(lst)
    lists = [lst[:] for _ in range({repeats})]
    it = iter(lists)
    '''
    t1 = 'l = next(it); l.sort()'
    t2 = 'l = next(it); sorted(l)'
    length = 10 ** 7
    repeats = 10 ** 2
    print(length, repeats)
    for t in t1, t2:
        print(t)
        print(timeit(t, setup=setup.format(length=length, repeats=repeats), number=repeats))
    

    He found for 1000000 length sort, (ran 100 times) a similar result, but only about a 5% increase in time, here's the output:

    10000000 100
    l = next(it); l.sort()
    610.5015971539542
    l = next(it); sorted(l)
    646.7786222379655
    

    Conclusion:

    A large sized list being sorted with sorted making a copy will likely dominate differences, but the sorting itself dominates the operation, and organizing your code around these differences would be premature optimization. I would use sorted when I need a new sorted list of the data, and I would use list.sort when I need to sort a list in-place, and let that determine my usage.

提交回复
热议问题