How to summarize on different groupby combinations?

后端未结

关注

 5  1556

后悔当初 2020-12-04 02:59

I am compiling a table of top-3 crops by county. Some counties have the same crop varieties in the same order. Other counties have the same crop varieties in a different ord

5条回答

温柔的废话 (楼主)

2020-12-04 03:47

Here is one way to do it.

First let's get the unique values across the columns and then reassign these values back to the DataFrame. We will perform this on a copy of the original data since you might need to preserve the original data.

df = df1.copy()

to_sum = ['Crop1', 'Crop2', 'Crop3']

df[to_sum] = pd.DataFrame(df.loc[:, to_sum] \
                            .apply(set, axis=1) \
                            .apply(sorted) \
                            .values \
                            .tolist(), columns=to_sum)

print(df)

       County  Crop1    Crop2    Crop3  Total_pop
0      Harney  grain   apples   melons       2000
1       Baker  grain   apples   melons       1500
2     Wheeler  grain   apples   melons       3000
3  Hood River  grain   apples   melons       1500
4       Wasco  pears  carrots  raddish       2000
5      Morrow  pears  carrots  raddish       2500
6       Union  pears  carrots  raddish       2700
7        Lake  pears  carrots  raddish       2000

Now we can perform our groupby to get the desired results.

df.groupby(to_sum).Total_pop.sum()

Crop1    Crop2  Crop3  
apples   grain  melons     8000
carrots  pears  raddish    9200
Name: Total_pop, dtype: int64

0 讨论(0)

查看其它5个回答