Pandas find all combinations of rows under a budget

假装没事ソ 提交于 2019-12-08 11:35:23

问题


I am trying to figure out a way to determine all possible combinations of rows within a DataFrame that are below a budget, so let's say I have a dataframe like this:

data = [['Bread', 9, 'Food'], ['Shoes', 20, 'Clothes'], ['Shirt', 15, 'Clothes'], ['Milk', 5, 'Drink'], ['Cereal', 8, 'Food'], ['Chips', 10, 'Food'], ['Beer', 15, 'Drink'], ['Popcorn', 3, 'Food'], ['Ice Cream', 6, 'Food'], ['Soda', 4, 'Drink']]
df = pd.DataFrame(data, columns = ['Item', 'Price', 'Type'])
df

Data

Item       Price  Type
Bread      9      Food
Shoes      20     Clothes
Shirt      15     Clothes
Milk       5      Drink
Cereal     8      Food
Chips      10     Food
Beer       15     Drink
Popcorn    3      Food
Ice Cream  6      Food
Soda       4      Drink

I want to find every combination that I could purchase for under a specific budget, let's say $35 for this example, while only getting one of each type. I'd like to get a new dataframe made up of rows for each combination that works with each item in its own column.

I was trying to do it using itertools.product, but this can combine and add columns, but what I really need to do is combine and add a specific column based on values in another column. I'm a bit stumped now.

Thanks for your help!


回答1:


Here a way using powerset recipe from itertools with pd.concat

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in 
                       enumerate(powerset(df.index)) 
                       if (df.loc[l, 'Price'].sum() <= 35)])

Outputs a single dataframe with groups of product that meet $35 condition:

          Item  Price     Type  grp
0       Bread      9     Food    1
1       Shoes     20  Clothes    2
2       Shirt     15  Clothes    3
3        Milk      5    Drink    4
4      Cereal      8     Food    5
..        ...    ...      ...  ...
3        Milk      5    Drink  752
4      Cereal      8     Food  752
7     Popcorn      3     Food  752
8   Ice Cream      6     Food  752
9        Soda      4    Drink  752

How many ways this came combined to meet $35 budget?

df_groups['grp'].nunique()

Output:

258

Details:

There are a couple of tricks/methods that are used here. First, we are using the index of the dataframe to create groups of rows or items using powerset. Next, we are using enumerate to identify each group and with assign creating a new column in a dataframe with that group number from enumerate.

Modify to capture no more than one of each type:

df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in 
                       enumerate(powerset(df.index)) 
                       if ((df.loc[l, 'Price'].sum() <= 35) & 
                           (df.loc[l, 'Type'].value_counts()==1).all())])

How many groups?

df_groups['grp'].nunique()
62

Get exactly one for each Type:

df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in 
                       enumerate(powerset(df.index)) 
                       if ((df.loc[l, 'Price'].sum() <= 35) & 
                           (df.loc[l, 'Type'].value_counts()==1).all()&
                           (len(df.loc[l, 'Type']) == 3))])

How many groups?

df_groups['grp'].nunique()
21


来源:https://stackoverflow.com/questions/58119575/pandas-find-all-combinations-of-rows-under-a-budget

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!