问题
I have an xlsx that I'm parsing with openpyxl
.
Column A is Product Name, Column B is revenue, and I want to extract each pair of prouct-revenue values into a dict
. Were there no duplicate products, it would simply be a matter of the creating a dict by mapping ws.columns
appropriately.
The problem is, there are multiple entries for some (but not all) products. For these, I need to sum the values in question, and just return a single key for those products (as for the rest). So if my revenue spreadsheet contains the following:

I want to sum the values of Revenue for Banana before returning the dict. The desired outcome then is:
{'Banana': 7.2, 'Apple': 1.7, 'Pear': 6.2, 'Kiwi': 1.2}
The following would work OK were there no duplicates:
revenue{}
i = 0;
for product in ws.columns[0]:
revenue[product.value] = ws.columns[1][i].value
i+=1
But obviously it breaks down when it encounters duplicates. I could try using a MultiDict(), which will give a structure from which I can perform the addition and create my final dict
:
d = MultiDict()
for i in range(len(ws.columns[1])):
d.add(ws.columns[0][i].value,ws.columns[1][i].value)
This leaves me with a MultiDict
, which itself is actually a list of tuples, and it all gets a tad convoluted. Is there a neater or standard-library way of achieving the same-key-multiple-times data structure? What about employing zip()
? Doesn't necessarily have to be dict-like. I just need to be able to create a dict
from it (and then perform the addition).
回答1:
This should be close to what you want, assuming you can transform your data to a list of key-value tuples:
list_key_value_tuples = [("A", 1), ("B", 2), ("A", 3)]
d = {}
for key, value in list_key_value_tuples:
d[key] = d.get(key, 0) + value
> print d
{'A': 4, 'B': 2}
回答2:
collections.defaultdict
was made for this type of use case.
>>>
>>> d = collections.defaultdict(float)
>>> p = [('Kiwi', 1.2), ('Banana', 3.2), ('Pear', 6.2), ('Banana', 2.3), ('Apple', 1.7), ('Banana', 1.7)]
>>> for k,v in p:
d[k] += v
>>> d
defaultdict(<type 'float'>, {'Kiwi': 1.2, 'Pear': 6.2, 'Banana': 7.2, 'Apple': 1.7})
>>>
回答3:
Assuming length
of second column is less than the first one; one can simply group rows by value in the first column and sum the rest like the following:
from itertools import izip_longest, groupby
from operator import itemgetter
rows = izip_longest(ws.columns[0], ws.columns[1], fillvalue=0)
result = dict((k, sum((g[1] for g in v))) for k, v in groupby(rows, itemgetter(0)))
来源:https://stackoverflow.com/questions/31197478/how-to-sum-coupled-values-in-a-dict-like-structure-in-python