I had a similar problem when I had to create a dictionary that contained the elements of an array and their count. The answer is relevant because, I flatten a list of lists, get the elements I need and then do a group and count. I used Python's map function to produce a tuple of element and it's count and groupby over the array. Note that the groupby takes the array element itself as the keyfunc. As a relatively new Python coder, I find it to me more easier to comprehend, while being Pythonic as well.
Before I discuss the code, here is a sample of data I had to flatten first:
{ "_id" : ObjectId("4fe3a90783157d765d000011"), "status" : [ "opencalais" ],
"content_length" : 688, "open_calais_extract" : { "entities" : [
{"type" :"Person","name" : "Iman Samdura","rel_score" : 0.223 },
{"type" : "Company", "name" : "Associated Press", "rel_score" : 0.321 },
{"type" : "Country", "name" : "Indonesia", "rel_score" : 0.321 }, ... ]},
"title" : "Indonesia Police Arrest Bali Bomb Planner", "time" : "06:42 ET",
"filename" : "021121bn.01", "month" : "November", "utctime" : 1037836800,
"date" : "November 21, 2002", "news_type" : "bn", "day" : "21" }
It is a query result from Mongo. The code below flattens a collection of such lists.
def flatten_list(items):
return sorted([entity['name'] for entity in [entities for sublist in
[item['open_calais_extract']['entities'] for item in items]
for entities in sublist])
First, I would extract all the "entities" collection, and then for each entities collection, iterate over the dictionary and extract the name attribute.