Extracting tuples from a list in Pandas Dataframe

问题

I have a dataframe with 12 column. I would like to extract the rows of a column depending on the values of another column.

Sample of my dataframe

order_id    order_type   order_items
45           Lunch       [('Burger', 5), ('Fries', 6)]
12           Dinner      [('Shrimp', 10), ('Fish&Chips', 7)]
44           Lunch       [('Salad', 9), ('Steak', 9)]
23           Breakfast   [('Coffee', 2), ('Eggs', 3)]

I would like to extract the breakfast, lunch and dinner menu by extracting the first item of each tuple. and extract the number of orders from the next item in the tuple.

Each item is type string according to this line of code

print(type(df['order_items'][0]))
>> <class 'str'>

I tried to apply a filter to extract the breakfast menu:

BreakfastLst=df.loc[df['order_type'] == 'Breakfast']['order_items']

but the output looks like this, and I can't use a for loop to iterate through sublists and access the tuples.

2                           [('Coffee', 4), ('Eggs', 7)]
7                           [('Coffee', 2), ('Eggs', 3)]
8      [('Cereal', 7), ('Pancake', 8), ('Coffee', 4),...
9      [('Cereal', 3), ('Eggs', 1), ('Coffee', 1), ('...

I also tried to convert to lists:

orderTypeLst = df(['order_type'])['order_items'].apply(list)

and then extract the lists by doing this:

breakFast=orderTypeLst['Breakfast']
lunch=orderTypeLst['Lunch']
dinner=orderTypeLst['Dinner']

but the output is a string. And I can't iterate through that either.

["[('Coffee', 4), ('Eggs', 7)]",
 "[('Coffee', 2), ('Eggs', 3)]",
 "[('Cereal', 7), ('Pancake', 8), ('Coffee', 4), ('Eggs', 8)]"]

As for dictionaries I tried the below, but the output is duplicated:

pd.Series(outlierFile.order_type.values,index=outlierFile.order_items).to_dict()

output sample

 "[('Fries', 1), ('Steak', 6), ('Salad', 8), ('Chicken', 10)]": 'Lunch',
 "[('Cereal', 6), ('Pancake', 8), ('Eggs', 3)]": 'Breakfast',
 "[('Shrimp', 9), ('Salmon', 9)]": 'Dinner',
 "[('Pancake', 3), ('Coffee', 5)]": 'Breakfast',
 "[('Eggs', 1), ('Pancake', 1), ('Coffee', 5), ('Cereal', 5)]": 'Breakfast'

my desired output is a clean version of each order_type (list or dictionary) so I can iterate through the tuples and extract the needed items.

Any input would be helpful Thanks,

回答1:

IIUC, try using pandas.DataFrame.groupby after evaluation:

my_dict = df.groupby('order_type')['order_items'].apply(lambda x: sum(x, [])).to_dict()
print(my_dict)

Output:

{'Breakfast': [('Coffee', 2), ('Eggs', 3)],
 'Dinner': [('Shrimp', 10), ('Fish&Chips', 7)],
 'Lunch': [('Burger', 5), ('Fries', 6), ('Salad', 9), ('Steak', 9)]}

来源：https://stackoverflow.com/questions/58151118/extracting-tuples-from-a-list-in-pandas-dataframe

标签

python

list

tuples

iteration