How to Create New Columns to Store the Data of the Duplicate ID Column?

被刻印的时光 ゝ 提交于 2019-12-12 01:59:02

问题


I have this dataframe:

   ID  key
0   1    A
1   1    B
2   2    C
3   3    D
4   3    E
5   3    E

I want to create additional key columns -as necessary- to store the data in the key column when there are duplicate IDs

This is a snippet of the output:

   ID  key  key2  
0   1    A     B # Note: ID#1 appeared twice in the dataframe, so the key value "B"
                 # associated with the duplicate ID will be stored in the new column "key2"

The complete output should like the following:

    ID  key  key2   key3
0   1    A      B    NaN
1   2    C    NaN    NaN
2   3    D      E      E # The ID#3 has repeated three times.  The key of                    
                         # of the second repeat "E" will be stored under the "key2" column
                         # and the third repeat "E" will be stored in the new column "key3"  

Any suggestion or idea how should I approach this problem?

Thanks,


回答1:


Check out groupby and apply. Their respective docs are here and here. You can unstack (docs) the extra level of the MultiIndex that is created.

df.groupby('ID')['key'].apply(
    lambda s: pd.Series(s.values, index=['key_%s' % i for i in range(s.shape[0])])
).unstack(-1)

outputs

   key_0 key_1 key_2
ID                  
1      A     B  None
2      C  None  None
3      D     E     E

If you want ID as a column, you can call reset_index on this DataFrame.




回答2:


You can use cumcount with pivot_table:

df['cols'] = 'key' + df.groupby('ID').cumcount().astype(str)
print (df.pivot_table(index='ID', columns='cols', values='key', aggfunc=''.join))
cols key0  key1  key2
ID                   
1       A     B  None
2       C  None  None
3       D     E     E


来源:https://stackoverflow.com/questions/38733732/how-to-create-new-columns-to-store-the-data-of-the-duplicate-id-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!