Aggregate based on each item in a special character seperated column in Pandas

本秂侑毒 提交于 2019-12-21 06:19:25

问题


I have input data as given below

Date        Investment Type                                    Medium
1/1/2000    Mutual Fund, Stocks, Fixed Deposit, Real Estate    Own, Online,Through Agent
1/2/2000    Mutual Fund, Stocks, Real Estate                   Own
1/3/2000    Fixed Deposit                                      Online
1/3/2000    Mutual Fund, Fixed Deposit, Real Estate            Through Agent
1/2/2000    Stocks                                             Own, Online,                            Through Agent

The input to my function is Medium. It could be a single value of a list. I want to search the data based on Medium input and then aggregate the data as given below. For the values in Medium, checkout what the Investment types and then aggregate data for each Investment type

Medium                                Investment Type           Date
Own,Online                            Mutual Fund               1/1/2000,1/2/2000 
Own,Online                            Stocks                    1/1/2000,1/2/2000
Own,Online                            Fixed Deposit             1/1/2000,1/3/2000
Own,Online                            Real Estate               1/1/2000

回答1:


You can use:

L = ['Online','Own']
pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['New_Medium'] = df.pop('Medium').str.findall('('+ pat + ')').str.join(', ')
#remove rows with empty values
df = df[df['New_Medium'].astype(bool)]

from  itertools import product
df1 = pd.DataFrame([j for i in df.apply(lambda x: x.str.split(',\s*')).values 
                      for j in product(*i)], columns=df.columns)
print (df1)
        Date Investment Type New_Medium
0   1/1/2000     Mutual Fund        Own
1   1/1/2000     Mutual Fund     Online
2   1/1/2000          Stocks        Own
3   1/1/2000          Stocks     Online
4   1/1/2000   Fixed Deposit        Own
5   1/1/2000   Fixed Deposit     Online
6   1/1/2000     Real Estate        Own
7   1/1/2000     Real Estate     Online
8   1/2/2000     Mutual Fund        Own
9   1/2/2000          Stocks        Own
10  1/2/2000     Real Estate        Own
11  1/3/2000   Fixed Deposit     Online
12  1/2/2000          Stocks        Own
13  1/2/2000          Stocks     Online

#get all combinations and aggregate join by unique values
df = df1.groupby('Investment Type').agg(lambda x: ', '.join(x.unique())).reset_index()
print (df)
  Investment Type                Date   New_Medium
0   Fixed Deposit  1/1/2000, 1/3/2000  Own, Online
1     Mutual Fund  1/1/2000, 1/2/2000  Own, Online
2     Real Estate  1/1/2000, 1/2/2000  Own, Online
3          Stocks  1/1/2000, 1/2/2000  Own, Online


来源:https://stackoverflow.com/questions/52199952/aggregate-based-on-each-item-in-a-special-character-seperated-column-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!