Pandas: break categorical column to multiple columns

北城余情 提交于 2021-02-07 02:52:33

问题


Imagine a Pandas dataframe of the following format:

id  type  v1  v2
1   A     6   9
1   B     4   2
2   A     3   7
2   B     3   6

I would like to convert this dataframe into the following format:

id  A_v1  A_v2  B_v1  B_v2
1   6     9     4     2
2   3     7     3     6

Is there an elegant way of doing this?


回答1:


You could use set_index to move the type and id columns into the index, and then unstack to move the type index level into the column index. You don't have to worry about the v values -- where the indexes go dictate the arrangement of the values.

The result is a DataFrame with a MultiIndex for the column index:

In [181]: df.set_index(['type', 'id']).unstack(['type'])
Out[181]: 
     v1    v2   
type  A  B  A  B
id              
1     6  4  9  2
2     3  3  7  6

Generally, a MultiIndex is preferable to a flattened column index. It gives you better ways to select or manipulate your data based on type or v value.

If you wish to reorder the columns to exactly match the order shown in the desired output, you could use df.reindex:

df = df.reindex(columns=sorted(df.columns, key=lambda x: x[::-1]))

yields

     v1 v2 v1 v2
type  A  A  B  B
id              
1     6  9  4  2
2     3  7  3  6

And if you wish to flatten the column index to a single level, then

df.columns = ['{}_{}'.format(t, v) for v,t in df.columns]

yields

    A_v1  A_v2  B_v1  B_v2
id                        
1      6     9     4     2
2      3     7     3     6


来源:https://stackoverflow.com/questions/30990389/pandas-break-categorical-column-to-multiple-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!