问题
I have input data as given below
Date Investment Type Medium 1/1/2000 Mutual Fund, Stocks, Fixed Deposit, Real Estate Own, Online,Through Agent 1/2/2000 Mutual Fund, Stocks, Real Estate Own 1/3/2000 Fixed Deposit Online 1/3/2000 Mutual Fund, Fixed Deposit, Real Estate Through Agent 1/2/2000 Stocks Own, Online, Through Agent
The input to my function is Medium. It could be a single value of a list. I want to search the data based on Medium input and then aggregate the data as given below. For the values in Medium, checkout what the Investment types and then aggregate data for each Investment type
Medium Investment Type Date Own,Online Mutual Fund 1/1/2000,1/2/2000 Own,Online Stocks 1/1/2000,1/2/2000 Own,Online Fixed Deposit 1/1/2000,1/3/2000 Own,Online Real Estate 1/1/2000
回答1:
You can use:
L = ['Online','Own']
pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['New_Medium'] = df.pop('Medium').str.findall('('+ pat + ')').str.join(', ')
#remove rows with empty values
df = df[df['New_Medium'].astype(bool)]
from itertools import product
df1 = pd.DataFrame([j for i in df.apply(lambda x: x.str.split(',\s*')).values
for j in product(*i)], columns=df.columns)
print (df1)
Date Investment Type New_Medium
0 1/1/2000 Mutual Fund Own
1 1/1/2000 Mutual Fund Online
2 1/1/2000 Stocks Own
3 1/1/2000 Stocks Online
4 1/1/2000 Fixed Deposit Own
5 1/1/2000 Fixed Deposit Online
6 1/1/2000 Real Estate Own
7 1/1/2000 Real Estate Online
8 1/2/2000 Mutual Fund Own
9 1/2/2000 Stocks Own
10 1/2/2000 Real Estate Own
11 1/3/2000 Fixed Deposit Online
12 1/2/2000 Stocks Own
13 1/2/2000 Stocks Online
#get all combinations and aggregate join by unique values
df = df1.groupby('Investment Type').agg(lambda x: ', '.join(x.unique())).reset_index()
print (df)
Investment Type Date New_Medium
0 Fixed Deposit 1/1/2000, 1/3/2000 Own, Online
1 Mutual Fund 1/1/2000, 1/2/2000 Own, Online
2 Real Estate 1/1/2000, 1/2/2000 Own, Online
3 Stocks 1/1/2000, 1/2/2000 Own, Online
来源:https://stackoverflow.com/questions/52199952/aggregate-based-on-each-item-in-a-special-character-seperated-column-in-pandas