问题
Consider the below df
df_dict = {'name': {0: ' john',
1: ' john',
4: ' daphne '},
'address': {0: 'johns address',
1: 'johns address',
4: 'daphne address'},
'phonenum1': {0: 7870395,
1: 7870450,
4: 7373209},
'phonenum2': {0: None, 1: 123450 , 4: None},
'phonenum3': {0: None, 1: 123456, 4: None}
}
df = pd.DataFrame(df_dict)
name address phonenum1 phonenum2 phonenum3
0 john johns address 7870395 NaN NaN
1 john johns address 7870450 123450.0 123456.0
4 daphne daphne address 7373209 NaN NAN
How to unstack the phonenum
data so the output is presented as below for entries where the same full_name and address is found?
name address phonenum1 phonenum2 phonenum3 phonenum4
0 john johns address 7870395 7870450 123450.0 123456.0
4 daphne daphne address 7373209 NaN NaN NaN
回答1:
you can do it using set_index
and stack
, then groupby.cumcount
per name and address to get the later column names, then unstack
and do some reset_index
and rename_axis
for cosmetic.
df_ = (df.set_index(['name', 'address'])
.stack()
.reset_index(level=-1)
.assign(cc=lambda x: x.groupby(level=['name', 'address']).cumcount()+1)
.set_index('cc', append=True)
[0].unstack()
.add_prefix('phonenum')
.reset_index()
.rename_axis(columns=None)
)
print (df_)
name address phonenum1 phonenum2 phonenum3 phonenum4
0 john johns address 7870395.0 7870450.0 123450.0 123456.0
1 daphne daphne address 7373209.0 NaN NaN NaN
The way the code is, you can comment from second line to the last one before closing the parenthesis, then un-comment each line one after the other to see what is happening each time.
回答2:
The code below will do what you are trying to accomplish I believe. It should be able to handle more than 4 phone numbers just in case.
df = df.astype(str)
df['joined'] = df[['phonenum1','phonenum2','phonenum3']].agg(','.join,axis=1)
df['joined'] = df['joined'].str.replace(',nan','')
df['joined'] = df.groupby(['name','address'])['joined'].transform(lambda x: ','.join(x))
df = df.drop_duplicates(subset=['joined'])
columns = ['phonenum'+str(num+1) for num in range(df['joined'].str.count(',').max()+1)]
split = df['joined'].str.split(',',expand=True)
split.columns = columns
df = df[['name','address']]
pd.concat([df,split],axis=1)
来源:https://stackoverflow.com/questions/62664710/how-to-stack-this-specific-row-on-pandas