问题
I have a dataframe with 12 column. I would like to extract the rows of a column depending on the values of another column.
Sample of my dataframe
order_id order_type order_items
45 Lunch [('Burger', 5), ('Fries', 6)]
12 Dinner [('Shrimp', 10), ('Fish&Chips', 7)]
44 Lunch [('Salad', 9), ('Steak', 9)]
23 Breakfast [('Coffee', 2), ('Eggs', 3)]
I would like to extract the breakfast, lunch and dinner menu by extracting the first item of each tuple.
and extract the number of orders from the next item in the tuple.
Each item is type string according to this line of code
print(type(df['order_items'][0]))
>> <class 'str'>
I tried to apply a filter to extract the breakfast menu:
BreakfastLst=df.loc[df['order_type'] == 'Breakfast']['order_items']
but the output looks like this, and I can't use a for loop to iterate through sublists and access the tuples.
2 [('Coffee', 4), ('Eggs', 7)]
7 [('Coffee', 2), ('Eggs', 3)]
8 [('Cereal', 7), ('Pancake', 8), ('Coffee', 4),...
9 [('Cereal', 3), ('Eggs', 1), ('Coffee', 1), ('...
I also tried to convert to lists:
orderTypeLst = df(['order_type'])['order_items'].apply(list)
and then extract the lists by doing this:
breakFast=orderTypeLst['Breakfast']
lunch=orderTypeLst['Lunch']
dinner=orderTypeLst['Dinner']
but the output is a string. And I can't iterate through that either.
["[('Coffee', 4), ('Eggs', 7)]",
"[('Coffee', 2), ('Eggs', 3)]",
"[('Cereal', 7), ('Pancake', 8), ('Coffee', 4), ('Eggs', 8)]"]
As for dictionaries I tried the below, but the output is duplicated:
pd.Series(outlierFile.order_type.values,index=outlierFile.order_items).to_dict()
output sample
"[('Fries', 1), ('Steak', 6), ('Salad', 8), ('Chicken', 10)]": 'Lunch',
"[('Cereal', 6), ('Pancake', 8), ('Eggs', 3)]": 'Breakfast',
"[('Shrimp', 9), ('Salmon', 9)]": 'Dinner',
"[('Pancake', 3), ('Coffee', 5)]": 'Breakfast',
"[('Eggs', 1), ('Pancake', 1), ('Coffee', 5), ('Cereal', 5)]": 'Breakfast'
my desired output is a clean version of each order_type (list or dictionary) so I can iterate through the tuples and extract the needed items.
Any input would be helpful Thanks,
回答1:
IIUC, try using pandas.DataFrame.groupby after evaluation:
my_dict = df.groupby('order_type')['order_items'].apply(lambda x: sum(x, [])).to_dict()
print(my_dict)
Output:
{'Breakfast': [('Coffee', 2), ('Eggs', 3)],
'Dinner': [('Shrimp', 10), ('Fish&Chips', 7)],
'Lunch': [('Burger', 5), ('Fries', 6), ('Salad', 9), ('Steak', 9)]}
来源:https://stackoverflow.com/questions/58151118/extracting-tuples-from-a-list-in-pandas-dataframe