“lambda” vs. “operator.attrgetter('xxx')” as a sort key function

后端 未结 3 934
说谎
说谎 2020-12-15 16:44

I am looking at some code that has a lot of sort calls using comparison functions, and it seems like it should be using key functions.

If you were to change se

3条回答
  •  半阙折子戏
    2020-12-15 17:33

    When choosing purely between attrgetter('attributename') and lambda o: o.attributename as a sort key, then using attrgetter() is the faster option of the two.

    Remember that the key function is only applied once to each element in the list, before sorting, so to compare the two we can use them directly in a time trial:

    >>> from timeit import Timer
    >>> from random import randint
    >>> from dataclasses import dataclass, field
    >>> @dataclass
    ... class Foo:
    ...     bar: int = field(default_factory=lambda: randint(1, 10**6))
    ...
    >>> testdata = [Foo() for _ in range(1000)]
    >>> def test_function(objects, key):
    ...     [key(o) for o in objects]
    ...
    >>> stmt = 't(testdata, key)'
    >>> setup = 'from __main__ import test_function as t, testdata; '
    >>> tests = {
    ...     'lambda': setup + 'key=lambda o: o.bar',
    ...     'attrgetter': setup + 'from operator import attrgetter; key=attrgetter("bar")'
    ... }
    >>> for name, tsetup in tests.items():
    ...     count, total = Timer(stmt, tsetup).autorange()
    ...     print(f"{name:>10}: {total / count * 10 ** 6:7.3f} microseconds ({count} repetitions)")
    ...
        lambda: 130.495 microseconds (2000 repetitions)
    attrgetter:  92.850 microseconds (5000 repetitions)
    

    So applying attrgetter('bar') 1000 times is roughly 40 μs faster than a lambda. That's because calling a Python function has a certain amount of overhead, more than calling into a native function such as produced by attrgetter().

    This speed advantage translates into faster sorting too:

    >>> def test_function(objects, key):
    ...     sorted(objects, key=key)
    ...
    >>> for name, tsetup in tests.items():
    ...     count, total = Timer(stmt, tsetup).autorange()
    ...     print(f"{name:>10}: {total / count * 10 ** 6:7.3f} microseconds ({count} repetitions)")
    ...
        lambda: 218.715 microseconds (1000 repetitions)
    attrgetter: 169.064 microseconds (2000 repetitions)
    

提交回复
热议问题