Pandas Finding cross sell in two columns in a data frame

巧了我就是萌 提交于 2021-02-05 06:49:26

问题


What I'm trying to do is a kind of a cross sell.

I have a Pandas dataframe with two columns, one with receipt numbers, and the other with product ids:

receipt  product
1        a
1        b
2        c
3        b
3        a

Most of the receipts have many products. What I need to find is the count of combinations of products that happen in the receipts. Let's say products 'a' and 'b' are the most common combination (they appear together in most of the receipts), how do I find this information?

I tried using df.groupby(['receipt','product']).count() but this only brings me the count of combinations for receipt + product, not the count of relation of products per receipt.

Any help is aprecciated, and thanks!


回答1:


I think you can do a cross merge:

new_df = df.merge(df, on='receipt')
(new_df[new_df['product_x'] < new_df['product_y']]
     .groupby(['product_x','product_y'])['receipt'].count()
)

Output:

product_x  product_y
a          b            2
Name: receipt, dtype: int64



回答2:


I think this is what you looking for

df.groupby(['receipt']).agg({'product': list}).assign(count=lambda x: x['product'].str.len())

        product  count
receipt
1        [a, b]      2
2           [c]      1
3        [b, a]      2


来源:https://stackoverflow.com/questions/60106078/pandas-finding-cross-sell-in-two-columns-in-a-data-frame

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!