Using list.count to sort a list in-place using .sort() does not work. Why?

前端 未结 2 441
北荒
北荒 2020-12-10 05:58

I am trying to sort a list by frequency of its elements.

>>> a = [5, 5, 4, 4, 4, 1, 2, 2]
>>> a.sort(key = a.count)
>>> a
[5, 5, 4         


        
2条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-10 06:34

    It doesn't work with the list.sort method because CPython decides to "empty the list" temporarily (the other answer already presents this). This is mentioned in the documentation as implementation detail:

    CPython implementation detail: While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python makes the list appear empty for the duration, and raises ValueError if it can detect that the list has been mutated during a sort.

    The source code contains a similar comment with a bit more explanation:

        /* The list is temporarily made empty, so that mutations performed
         * by comparison functions can't affect the slice of memory we're
         * sorting (allowing mutations during sorting is a core-dump
         * factory, since ob_item may change).
         */
    

    The explanation isn't straight-forward but the problem is that the key-function and the comparisons could change the list instance during sorting which is very likely to result in undefined behavior of the C-code (which may crash the interpreter). To prevent that the list is emptied during the sorting, so that even if someone changes the instance it won't result in an interpreter crash.

    This doesn't happen with sorted because sorted copies the list and simply sorts the copy. The copy is still emptied during the sorting but there's no way to access it, so it isn't visible.


    However you really shouldn't sort like this to get a frequency sort. That's because for each item you call the key function once. And list.count iterates over each item, so you effectively iterate the whole list for each element (what is called O(n**2) complexity). A better way would be to calculate the frequency once for each element (can be done in O(n)) and then just access that in the key.

    However since CPython has a Counter class that also supports most_common you could really just use that:

    >>> from collections import Counter
    >>> [item for item, count in reversed(Counter(a).most_common()) for _ in range(count)]
    [1, 2, 2, 5, 5, 4, 4, 4]
    

    This may change the order of the elements with equal counts but since you're doing a frequency count that shouldn't matter to much.

提交回复
热议问题