Python group by

匿名 (未验证) 提交于 2019-12-03 01:48:02

问题:

Assume that I have a such set of pair datas where index 0 is the value and the index 1 is the type:

input = [           ('11013331', 'KAT'),            ('9085267',  'NOT'),            ('5238761',  'ETH'),            ('5349618',  'ETH'),            ('11788544', 'NOT'),            ('962142',   'ETH'),            ('7795297',  'ETH'),            ('7341464',  'ETH'),            ('9843236',  'KAT'),            ('5594916',  'ETH'),            ('1550003',  'ETH')         ] 

I want to group them by their type(by the 1st indexed string) as such:

result = [             {               type:'KAT',               items: ['11013331', '9843236']             },            {              type:'NOT',               items: ['9085267', '11788544']             },            {              type:'ETH',               items: ['5238761', '962142', '7795297', '7341464', '5594916', '1550003']             }          ]  

How can I achieve this in an efficient way?

Thanks

回答1:

Do it in 2 steps. First, create a dictionary.

>>> input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')] >>> from collections import defaultdict >>> res = defaultdict(list) >>> for v, k in input: res[k].append(v) ... 

Then, convert that dictionary into the expected format.

>>> [{'type':k, 'items':v} for k,v in res.items()] [{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}] 

It is also possible with itertools.groupby but it requires the input to be sorted first.

>>> sorted_input = sorted(input, key=itemgetter(1)) >>> groups = groupby(sorted_input, key=itemgetter(1)) >>> [{'type':k, 'items':[x[0] for x in v]} for k, v in groups] [{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}] 

Note both of these do not respect the original order of the keys. You need an OrderedDict if you need to keep the order.

>>> from collections import OrderedDict >>> res = OrderedDict() >>> for v, k in input: ...   if k in res: res[k].append(v) ...   else: res[k] = [v] ...  >>> [{'type':k, 'items':v} for k,v in res.items()] [{'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}] 


回答2:

Python's built-in itertools module actually has a groupby function that you could use, but the elements to be grouped must first be sorted such that the elements to be grouped are contiguous in the list:

sortkeyfn = key=lambda s:s[1] input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'),   ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'),   ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]  input.sort(key=sortkeyfn) 

Now input looks like:

[('5238761', 'ETH'), ('5349618', 'ETH'), ('962142', 'ETH'), ('7795297', 'ETH'),  ('7341464', 'ETH'), ('5594916', 'ETH'), ('1550003', 'ETH'), ('11013331', 'KAT'),  ('9843236', 'KAT'), ('9085267', 'NOT'), ('11788544', 'NOT')] 

groupby returns a sequence of 2-tuples, of the form (key, values_iterator). What we want is to turn this into a list of dicts where the 'type' is the key, and 'items' is a list of the 0'th elements of the tuples returned by the values_iterator. Like this:

from itertools import groupby result = [] for key,valuesiter in groupby(input, key=sortkeyfn):     result.append(dict(type=key, items=list(v[0] for v in valuesiter))) 

Now result contains your desired dict, as stated in your question.

You might consider, though, just making a single dict out of this, keyed by type, and each value containing the list of values. In your current form, to find the values for a particular type, you'll have to iterate over the list to find the dict containing the matching 'type' key, and then get the 'items' element from it. If you use a single dict instead of a list of 1-item dicts, you can find the items for a particular type with a single keyed lookup into the master dict. Using groupby, this would look like:

result = {} for key,valuesiter in groupby(input, key=sortkeyfn):     result[key] = list(v[0] for v in valuesiter) 

result now contains this dict (this is similar to the intermediate res defaultdict in @KennyTM's answer):

{'NOT': ['9085267', '11788544'],   'ETH': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'],   'KAT': ['11013331', '9843236']} 

(If you want to reduce this to a one-liner, you can:

result = dict((key,list(v[0] for v in valuesiter)               for key,valuesiter in groupby(input, key=sortkeyfn)) 

or using the newfangled dict-comprehension form:

result = {key:list(v[0] for v in valuesiter)               for key,valuesiter in groupby(input, key=sortkeyfn)} 


回答3:

The following function will quickly (no sorting required) group tuples of any length by a key having any index:

# given a sequence of tuples like [(3,'c',6),(7,'a',2),(88,'c',4),(45,'a',0)], # returns a dict grouping tuples by idx-th element - with idx=1 we have: # if merge is True {'c':(3,6,88,4),     'a':(7,2,45,0)} # if merge is False {'c':((3,6),(88,4)), 'a':((7,2),(45,0))} def group_by(seqs,idx=0,merge=True):     d = dict()     for seq in seqs:         k = seq[idx]         v = d.get(k,tuple()) + (seq[:idx]+seq[idx+1:] if merge else (seq[:idx]+seq[idx+1:],))         d.update({k:v})     return d 

In the case of your question, the index of key you want to group by is 1, therefore:

group_by(input,1) 

gives

{'ETH': ('5238761','5349618','962142','7795297','7341464','5594916','1550003'),  'KAT': ('11013331', '9843236'),  'NOT': ('9085267', '11788544')} 

which is not exactly the output you asked for, but might as well suit your needs.



回答4:

I also liked pandas simple grouping. it's powerful, simple and most adequate for large data set

result = pandas.DataFrame(input).groupby(1).groups



文章来源: Python group by
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!