How to stack this specific row on pandas?

问题

Consider the below df

df_dict = {'name': {0: '  john',
  1: '  john',
  4: ' daphne '},
 'address': {0: 'johns address',
  1: 'johns address',
  4: 'daphne address'},
 'phonenum1': {0: 7870395,
  1: 7870450,
  4: 7373209},
 'phonenum2': {0: None, 1: 123450 , 4: None},
 'phonenum3': {0: None, 1: 123456, 4: None}
}

df = pd.DataFrame(df_dict)

    name    address       phonenum1     phonenum2   phonenum3
0   john    johns address   7870395     NaN         NaN
1   john    johns address   7870450     123450.0    123456.0
4   daphne  daphne address  7373209     NaN         NAN

How to unstack the phonenum data so the output is presented as below for entries where the same full_name and address is found?


    name     address       phonenum1     phonenum2   phonenum3    phonenum4
0   john    johns address   7870395      7870450     123450.0     123456.0
4   daphne  daphne address  7373209        NaN        NaN           NaN

回答1:

you can do it using set_index and stack, then groupby.cumcount per name and address to get the later column names, then unstack and do some reset_index and rename_axis for cosmetic.

df_ = (df.set_index(['name', 'address'])
         .stack()
         .reset_index(level=-1)
         .assign(cc=lambda x: x.groupby(level=['name', 'address']).cumcount()+1)
         .set_index('cc', append=True)
         [0].unstack()
         .add_prefix('phonenum')
         .reset_index()
         .rename_axis(columns=None)
      )
print (df_)
       name         address  phonenum1  phonenum2  phonenum3  phonenum4
0      john   johns address  7870395.0  7870450.0   123450.0   123456.0
1   daphne   daphne address  7373209.0        NaN        NaN        NaN

The way the code is, you can comment from second line to the last one before closing the parenthesis, then un-comment each line one after the other to see what is happening each time.

回答2:

The code below will do what you are trying to accomplish I believe. It should be able to handle more than 4 phone numbers just in case.

df = df.astype(str)
df['joined'] = df[['phonenum1','phonenum2','phonenum3']].agg(','.join,axis=1)
df['joined'] = df['joined'].str.replace(',nan','')
df['joined'] = df.groupby(['name','address'])['joined'].transform(lambda x: ','.join(x))
df = df.drop_duplicates(subset=['joined'])
columns = ['phonenum'+str(num+1) for num in range(df['joined'].str.count(',').max()+1)]
split = df['joined'].str.split(',',expand=True)
split.columns = columns
df = df[['name','address']]
pd.concat([df,split],axis=1)

来源：https://stackoverflow.com/questions/62664710/how-to-stack-this-specific-row-on-pandas

标签

python

pandas

pandas-groupby