问题
I am trying to figure out a way to determine all possible combinations of rows within a DataFrame that are below a budget, so let's say I have a dataframe like this:
data = [['Bread', 9, 'Food'], ['Shoes', 20, 'Clothes'], ['Shirt', 15, 'Clothes'], ['Milk', 5, 'Drink'], ['Cereal', 8, 'Food'], ['Chips', 10, 'Food'], ['Beer', 15, 'Drink'], ['Popcorn', 3, 'Food'], ['Ice Cream', 6, 'Food'], ['Soda', 4, 'Drink']]
df = pd.DataFrame(data, columns = ['Item', 'Price', 'Type'])
df
Data
Item Price Type
Bread 9 Food
Shoes 20 Clothes
Shirt 15 Clothes
Milk 5 Drink
Cereal 8 Food
Chips 10 Food
Beer 15 Drink
Popcorn 3 Food
Ice Cream 6 Food
Soda 4 Drink
I want to find every combination that I could purchase for under a specific budget, let's say $35 for this example, while only getting one of each type. I'd like to get a new dataframe made up of rows for each combination that works with each item in its own column.
I was trying to do it using itertools.product, but this can combine and add columns, but what I really need to do is combine and add a specific column based on values in another column. I'm a bit stumped now.
Thanks for your help!
回答1:
Here a way using powerset recipe from itertools with pd.concat
from itertools import chain, combinations
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in
enumerate(powerset(df.index))
if (df.loc[l, 'Price'].sum() <= 35)])
Outputs a single dataframe with groups of product that meet $35 condition:
Item Price Type grp
0 Bread 9 Food 1
1 Shoes 20 Clothes 2
2 Shirt 15 Clothes 3
3 Milk 5 Drink 4
4 Cereal 8 Food 5
.. ... ... ... ...
3 Milk 5 Drink 752
4 Cereal 8 Food 752
7 Popcorn 3 Food 752
8 Ice Cream 6 Food 752
9 Soda 4 Drink 752
How many ways this came combined to meet $35 budget?
df_groups['grp'].nunique()
Output:
258
Details:
There are a couple of tricks/methods that are used here. First, we are using the index of the dataframe to create groups of rows or items using powerset. Next, we are using enumerate to identify each group and with assign creating a new column in a dataframe with that group number from enumerate.
Modify to capture no more than one of each type:
df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in
enumerate(powerset(df.index))
if ((df.loc[l, 'Price'].sum() <= 35) &
(df.loc[l, 'Type'].value_counts()==1).all())])
How many groups?
df_groups['grp'].nunique()
62
Get exactly one for each Type:
df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in
enumerate(powerset(df.index))
if ((df.loc[l, 'Price'].sum() <= 35) &
(df.loc[l, 'Type'].value_counts()==1).all()&
(len(df.loc[l, 'Type']) == 3))])
How many groups?
df_groups['grp'].nunique()
21
来源:https://stackoverflow.com/questions/58119575/pandas-find-all-combinations-of-rows-under-a-budget