问题
I've checked some topics about groupby()
but I don't get what's wrong with my example:
students = [{'name': 'Paul', 'mail': '@gmail.com'},
{'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': 'gmail.com'},
{'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'},
{'name': 'Kathrin', 'mail': '@something.com'}]
key_func = lambda student: student['mail']
for key, group in itertools.groupby(students, key=key_func):
print(key)
print(list(group))
This prints each student separately. Why I don't get only 3 groups: @gmail.com
, @yahoo.com
and @something.com
?
回答1:
For starters, some of the mails are gmail.com
and some are @gmail.com
which is why they are treated as separate groups.
groupby
also expects the data to be pre-sorted by the same key
function, which explains why you get @something.com
twice.
From the docs:
... Generally, the iterable needs to already be sorted on the same key function. ...
students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': 'gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
key_func = lambda student: student['mail']
students.sort(key=key_func)
# sorting by same key function we later use with groupby
for key, group in itertools.groupby(students, key=key_func):
print(key)
print(list(group))
# @gmail.com
# [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Gregory', 'mail': '@gmail.com'}]
# @something.com
# [{'name': 'Jules', 'mail': '@something.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
# @yahoo.com
# [{'name': 'Tom', 'mail': '@yahoo.com'}]
# gmail.com
# [{'name': 'Jim', 'mail': 'gmail.com'}]
After fixing both sorting and gmail.com
/@gmail.com
we get the expected output:
import itertools
students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': '@gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
key_func = lambda student: student['mail']
students.sort(key=key_func)
for key, group in itertools.groupby(students, key=key_func):
print(key)
print(list(group))
# @gmail.com
# [{'mail': '@gmail.com', 'name': 'Paul'},
# {'mail': '@gmail.com', 'name': 'Jim'},
# {'mail': '@gmail.com', 'name': 'Gregory'}]
# @something.com
# [{'mail': '@something.com', 'name': 'Jules'},
# {'mail': '@something.com', 'name': 'Kathrin'}]
# @yahoo.com
# [{'mail': '@yahoo.com', 'name': 'Tom'}]
回答2:
itertools uses the sort order of the data. Your list is not sorted.
So if you have ["gmail.com", "something.com", "gmail.com"] itertools will create three groups. This is different than the groupby in some functional languages (or Python pandas for that sake).
You need to sort the dict first.
import itertools
students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': 'gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
for key, group in itertools.groupby(sorted(students, key=lambda x: x["mail"]), key=lambda student: student['mail']):
print(key)
print(list(group))
# @gmail.com
# [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Gregory', 'mail': '@gmail.com'}]
# @something.com
# [{'name': 'Jules', 'mail': '@something.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
# @yahoo.com
#[{'name': 'Tom', 'mail': '@yahoo.com'}]
#gmail.com
# [{'name': 'Jim', 'mail': 'gmail.com'}]
来源:https://stackoverflow.com/questions/50198597/why-itertools-groupby-doesnt-work