Extracting tuples from a list in Pandas Dataframe

大兔子大兔子 提交于 2020-01-06 04:59:44

问题


I have a dataframe with 12 column. I would like to extract the rows of a column depending on the values of another column.

Sample of my dataframe

order_id    order_type   order_items
45           Lunch       [('Burger', 5), ('Fries', 6)]
12           Dinner      [('Shrimp', 10), ('Fish&Chips', 7)]
44           Lunch       [('Salad', 9), ('Steak', 9)]
23           Breakfast   [('Coffee', 2), ('Eggs', 3)]

I would like to extract the breakfast, lunch and dinner menu by extracting the first item of each tuple. and extract the number of orders from the next item in the tuple.

Each item is type string according to this line of code

print(type(df['order_items'][0]))
>> <class 'str'>

I tried to apply a filter to extract the breakfast menu:

BreakfastLst=df.loc[df['order_type'] == 'Breakfast']['order_items']

but the output looks like this, and I can't use a for loop to iterate through sublists and access the tuples.

2                           [('Coffee', 4), ('Eggs', 7)]
7                           [('Coffee', 2), ('Eggs', 3)]
8      [('Cereal', 7), ('Pancake', 8), ('Coffee', 4),...
9      [('Cereal', 3), ('Eggs', 1), ('Coffee', 1), ('...

I also tried to convert to lists:

orderTypeLst = df(['order_type'])['order_items'].apply(list)

and then extract the lists by doing this:

breakFast=orderTypeLst['Breakfast']
lunch=orderTypeLst['Lunch']
dinner=orderTypeLst['Dinner']

but the output is a string. And I can't iterate through that either.

["[('Coffee', 4), ('Eggs', 7)]",
 "[('Coffee', 2), ('Eggs', 3)]",
 "[('Cereal', 7), ('Pancake', 8), ('Coffee', 4), ('Eggs', 8)]"]

As for dictionaries I tried the below, but the output is duplicated:

pd.Series(outlierFile.order_type.values,index=outlierFile.order_items).to_dict()

output sample

 "[('Fries', 1), ('Steak', 6), ('Salad', 8), ('Chicken', 10)]": 'Lunch',
 "[('Cereal', 6), ('Pancake', 8), ('Eggs', 3)]": 'Breakfast',
 "[('Shrimp', 9), ('Salmon', 9)]": 'Dinner',
 "[('Pancake', 3), ('Coffee', 5)]": 'Breakfast',
 "[('Eggs', 1), ('Pancake', 1), ('Coffee', 5), ('Cereal', 5)]": 'Breakfast'

my desired output is a clean version of each order_type (list or dictionary) so I can iterate through the tuples and extract the needed items.

Any input would be helpful Thanks,


回答1:


IIUC, try using pandas.DataFrame.groupby after evaluation:

my_dict = df.groupby('order_type')['order_items'].apply(lambda x: sum(x, [])).to_dict()
print(my_dict)

Output:

{'Breakfast': [('Coffee', 2), ('Eggs', 3)],
 'Dinner': [('Shrimp', 10), ('Fish&Chips', 7)],
 'Lunch': [('Burger', 5), ('Fries', 6), ('Salad', 9), ('Steak', 9)]}


来源:https://stackoverflow.com/questions/58151118/extracting-tuples-from-a-list-in-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!