How to keep original index of a DataFrame after groupby 2 columns?

只谈情不闲聊 提交于 2021-02-18 04:54:44

问题


Is there any way I can retain the original index of my large dataframe after I perform a groupby? The reason I need to this is because I need to do an inner merge back to my original df (after my groupby) to regain those lost columns. And the index value is the only 'unique' column to perform the merge back into. Does anyone know how I can achieve this?

My DataFrame is quite large. My groupby looks like this:

df.groupby(['col1', 'col2']).agg({'col3': 'count'}).reset_index()

This drops my original indexes from my original dataframe, which I want to keep.


回答1:


I think you are are looking for transform in this situation:

df['count'] = df.groupby(['col1', 'col2'])['col3'].transform('count')



回答2:


You can elevate your index to a column via reset_index. Then aggregate your index to a tuple via agg, together with your count aggregation.

Below is a minimal example.

import pandas as pd, numpy as np

df = pd.DataFrame(np.random.randint(0, 4, (50, 5)),
                  index=np.random.randint(0, 4, 50))

df = df.reset_index()

res = df.groupby([0, 1]).agg({2: 'count', 'index': lambda x: tuple(x)}).reset_index()

#     0  1  2            index
# 0   0  0  4     (2, 0, 0, 2)
# 1   0  1  4     (0, 3, 1, 1)
# 2   0  2  1             (1,)
# 3   0  3  1             (3,)
# 4   1  0  4     (1, 2, 1, 3)
# 5   1  1  2           (1, 3)
# 6   1  2  4     (2, 1, 2, 2)
# 7   1  3  1             (2,)
# 8   2  0  5  (0, 3, 0, 2, 2)
# 9   2  1  2           (0, 2)
# 10  2  2  5  (1, 1, 3, 3, 2)
# 11  2  3  2           (0, 1)
# 12  3  0  4     (0, 3, 3, 3)
# 13  3  1  4     (1, 3, 0, 1)
# 14  3  2  3        (3, 2, 1)
# 15  3  3  4     (3, 3, 2, 1)



回答3:


You should not use 'reset_index()' if you want to keep your original indexes



来源:https://stackoverflow.com/questions/49216357/how-to-keep-original-index-of-a-dataframe-after-groupby-2-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!