问题
I have a list of counters:
from collections import Counter
counters = [
Counter({"coach": 1, "says": 1, "play": 1, "basketball": 1}),
Counter({"i": 2, "said": 1, "hate": 1, "basketball": 1}),
Counter({"he": 1, "said": 1, "play": 1, "basketball": 1}),
]
I can combine them using a loop as shown below, but I'd like to avoid the loop.
all_ct = Counter()
for ct in counters:
all_ct.update(ct)
Using reduce
gives an error:
all_ct = Counter()
reduce(all_ct.update, counters)
>>> TypeError: update() takes from 1 to 2 positional arguments but 3 were given
Is there a way to combine the counters into a single counter without using a loop?
回答1:
You need to replace update() with a form that reduce can use:
def static_update(x, y):
x.update(y)
return x
all_ct = Counter()
functools.reduce(static_update, counters)
回答2:
U can use the sum function.
all_ct = sum(counters, Counter())
回答3:
Note, counters implement __add__
to merge the counters... So you could use:
In [3]: from collections import Counter
...: counters = [
...: Counter({"coach": 1, "says": 1, "play": 1, "basketball": 1}),
...: Counter({"i": 2, "said": 1, "hate": 1, "basketball": 1}),
...: Counter({"he": 1, "said": 1, "play": 1, "basketball": 1}),
...: ]
In [4]: from operator import add
In [5]: from functools import reduce
In [6]: reduce(add, counters)
Out[6]:
Counter({'coach': 1,
'says': 1,
'play': 2,
'basketball': 3,
'i': 2,
'said': 2,
'hate': 1,
'he': 1})
Or more simply:
In [7]: final = Counter()
In [8]: for c in counters:
...: final += c
...:
In [9]: final
Out[9]:
Counter({'coach': 1,
'says': 1,
'play': 2,
'basketball': 3,
'i': 2,
'said': 2,
'hate': 1,
'he': 1})
Note, the above is more efficient, sine it only uses one dict. If you use reduce(add, counters)
it creates an new, intermediate counter object on each iteration
Just to illustrate what I mean, in the best case, where the keys are always repeated, you have to do double the work using the reduce
/sum
approach:
In [1]: from collections import Counter
...: counters = [
...: Counter({"coach": 1, "says": 1, "play": 1, "basketball": 1}),
...: Counter({"i": 2, "said": 1, "hate": 1, "basketball": 1}),
...: Counter({"he": 1, "said": 1, "play": 1, "basketball": 1}),
...: ]
In [2]: counters *= 5_000
In [3]: from functools import reduce
In [4]: from operator import add
In [5]: %%timeit
...: data = counters.copy()
...: result = Counter()
...: for c in data:
...: result += c
...:
21.2 ms ± 542 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: %%timeit
...: data = counters.copy()
...: reduce(add, counters)
...:
...:
50.9 ms ± 1.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
And I believe in the worst case (where each counter has keys disjoint from each of the rest) this will degrade to quadratic performance.
Finally, note, you can do an in-place add using reduce
(not sum
) which elimiates the performance issue:
In [6]: import operator
In [7]: operator.iadd?
Signature: operator.iadd(a, b, /)
Docstring: Same as a += b.
Type: builtin_function_or_method
In [8]: reduce(operator.iadd, counters, Counter())
Out[8]:
Counter({'coach': 5000,
'says': 5000,
'play': 10000,
'basketball': 15000,
'i': 10000,
'said': 10000,
'hate': 5000,
'he': 5000})
And note, now the performance is on par with the explicit loop:
In [9]: %%timeit
...: data = counters.copy()
...: reduce(operator.iadd, counters, Counter())
...:
...:
22 ms ± 224 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
However, mixing functional constructs like reduce
with functions that have side-effects is just... ugly. Better stick to imperative code for impure functions.
来源:https://stackoverflow.com/questions/64250703/update-a-counter-from-a-list-of-counters-without-a-loop