There is a more general question here: In what situation should the built-in operator module be used in python?
The top answer claims that operator.itemgetter(
There are benefits in some situations, here is a good example.
>>> data = [('a',3),('b',2),('c',1)]
>>> from operator import itemgetter
>>> sorted(data, key=itemgetter(1))
[('c', 1), ('b', 2), ('a', 3)]
This use of itemgetter
is great because it makes everything clear while also being faster as all operations are kept on the C
side.
>>> sorted(data, key=lambda x:x[1])
[('c', 1), ('b', 2), ('a', 3)]
Using a lambda
is not as clear, it is also slower and it is preferred not to use lambda
unless you have to. Eg. list comprehensions are preferred over using map
with a lambda
.
Leaving aside performance and code style, itemgetter
is picklable, while lambda
is not. This is important if the function needs to be saved, or passed between processes (typically as part of a larger object). In the following example, replacing itemgetter
with lambda
will result in a PicklingError
.
from operator import itemgetter
def sort_by_key(sequence, key):
return sorted(sequence, key=key)
if __name__ == "__main__":
from multiprocessing import Pool
items = [([(1,2),(4,1)], itemgetter(1)),
([(5,3),(2,7)], itemgetter(0))]
with Pool(5) as p:
result = p.starmap(sort_by_key, items)
print(result)
As performance was mentioned, I've compared both methods operator.itemgetter
and lambda
and for a small list it turns out that operator.itemgetter
outperforms lambda by 10%
. I personally like the itemgetter
method as I mostly use it during sort and it became like a keyword for me.
import operator
import timeit
x = [[12, 'tall', 'blue', 1],
[2, 'short', 'red', 9],
[4, 'tall', 'blue', 13]]
def sortOperator():
x.sort(key=operator.itemgetter(1, 2))
def sortLambda():
x.sort(key=lambda x:(x[1], x[2]))
if __name__ == "__main__":
print(timeit.timeit(stmt="sortOperator()", setup="from __main__ import sortOperator", number=10**7))
print(timeit.timeit(stmt="sortLambda()", setup="from __main__ import sortLambda", number=10**7))
>>Tuple: 9.79s, Single: 8.835s
>>Tuple: 11.12s, Single: 9.26s
When using this in the key
parameter of sorted()
or min()
, given the choice between say operator.itemgetter(1)
and lambda x: x[1]
, the former is typically significantly faster in both cases:
Using sorted()
The compared functions are defined as follows:
import operator
def sort_key_itemgetter(items, key=1):
return sorted(items, key=operator.itemgetter(key))
def sort_key_lambda(items, key=1):
return sorted(items, key=lambda x: x[key])
Result: sort_key_itemgetter()
is faster by ~10% to ~15%.
(Full analysis here)
Using min()
The compared functions are defined as follows:
import operator
def min_key_itemgetter(items, key=1):
return min(items, key=operator.itemgetter(key))
def min_key_lambda(items, key=1):
return min(items, key=lambda x: x[key])
Result: min_key_itemgetter()
is faster by ~20% to ~60%.
(Full analysis here)
Some programmers understand and use lambdas, but there is a population of programmers who perhaps didn't take computer science and aren't clear on the concept. For those programmers itemgetter()
can make your intention clearer. (I don't write lambdas and any time I see one in code it takes me a little extra time to process what's going on and understand the code).
If you're coding for other computer science professionals go ahead and use lambdas if they are more comfortable. However, if you're coding for a wider audience. I suggest using itemgetter()
.
Performance. It can make a big difference. In the right circumstances, you can get a bunch of stuff done at the C level by using itemgetter.
I think the claim of what is clearer really depends on which you use most often and would be very subjective