Why itertools.groupby() doesn't work? [duplicate]

青春壹個敷衍的年華 提交于 2019-11-30 21:06:57

问题


I've checked some topics about groupby() but I don't get what's wrong with my example:

students = [{'name': 'Paul',    'mail': '@gmail.com'},
            {'name': 'Tom',     'mail': '@yahoo.com'},
            {'name': 'Jim',     'mail': 'gmail.com'},
            {'name': 'Jules',   'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'},
            {'name': 'Kathrin', 'mail': '@something.com'}]

key_func = lambda student: student['mail']

for key, group in itertools.groupby(students, key=key_func):
    print(key)
    print(list(group))

This prints each student separately. Why I don't get only 3 groups: @gmail.com, @yahoo.com and @something.com?


回答1:


For starters, some of the mails are gmail.com and some are @gmail.com which is why they are treated as separate groups.

groupby also expects the data to be pre-sorted by the same key function, which explains why you get @something.com twice.

From the docs:

... Generally, the iterable needs to already be sorted on the same key function. ...

students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
            {'name': 'Jim', 'mail': 'gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]

key_func = lambda student: student['mail']

students.sort(key=key_func)
# sorting by same key function we later use with groupby

for key, group in itertools.groupby(students, key=key_func):
    print(key)
    print(list(group))

#  @gmail.com
#  [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Gregory', 'mail': '@gmail.com'}]
#  @something.com
#  [{'name': 'Jules', 'mail': '@something.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
#  @yahoo.com
#  [{'name': 'Tom', 'mail': '@yahoo.com'}]
#  gmail.com
#  [{'name': 'Jim', 'mail': 'gmail.com'}]

After fixing both sorting and gmail.com/@gmail.com we get the expected output:

import itertools

students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
            {'name': 'Jim', 'mail': '@gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]

key_func = lambda student: student['mail']

students.sort(key=key_func)

for key, group in itertools.groupby(students, key=key_func):
    print(key)
    print(list(group))

#  @gmail.com
#  [{'mail': '@gmail.com', 'name': 'Paul'},
#   {'mail': '@gmail.com', 'name': 'Jim'},
#   {'mail': '@gmail.com', 'name': 'Gregory'}]
#  @something.com
#  [{'mail': '@something.com', 'name': 'Jules'},
#   {'mail': '@something.com', 'name': 'Kathrin'}]
#  @yahoo.com
#  [{'mail': '@yahoo.com', 'name': 'Tom'}]



回答2:


itertools uses the sort order of the data. Your list is not sorted.

So if you have ["gmail.com", "something.com", "gmail.com"] itertools will create three groups. This is different than the groupby in some functional languages (or Python pandas for that sake).

You need to sort the dict first.

import itertools

students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom',    'mail': '@yahoo.com'},
            {'name': 'Jim', 'mail': 'gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]


 for key, group in itertools.groupby(sorted(students, key=lambda x: x["mail"]), key=lambda student: student['mail']):
     print(key)
     print(list(group))

# @gmail.com
# [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Gregory', 'mail': '@gmail.com'}]
# @something.com
# [{'name': 'Jules', 'mail': '@something.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
# @yahoo.com
#[{'name': 'Tom', 'mail': '@yahoo.com'}]
#gmail.com
# [{'name': 'Jim', 'mail': 'gmail.com'}]


来源:https://stackoverflow.com/questions/50198597/why-itertools-groupby-doesnt-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!