Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()

前端 未结 10 2201
有刺的猬
有刺的猬 2020-12-15 21:23
def stack_plot(data, xtick, col2=\'project_is_approved\', col3=\'total\'):
    ind = np.arange(data.shape[0])

    plt.figure(figsize=(20,5))
    p1 = plt.bar(ind, d         


        
10条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-15 22:08

    Sometimes it's convenient to keep an aggdict of how each column should be transformed under aggregation that will work with different column sets and different group by columns. You can do this with the new syntax fairly easily by unpacking the dict with **. Here's a minimal working example for simple data.

    dfx=pd.DataFrame(columns=["A","B","C"],data=np.random.randint(0,5,size=(10,3)))
    #dfx
    #
    #   A  B  C
    #0  4  4  1
    #1  2  4  4
    #2  1  3  3
    #3  2  4  3
    #4  1  2  1
    #5  0  4  2
    #6  2  3  4
    #7  1  0  2
    #8  2  1  4
    #9  3  0  3
    

    Maybe when you agg you want the first "A", the last "B", the mean "C" and sometimes your pipeline has a "D" (but not this time) that you also want the mean of.

    aggdict = {"A":lambda x: x.iloc[0], "B": lambda x: x.iloc[-1], "C" : "mean" , "D":lambda x: "mean"}
    

    You can build a simple dict like the old days and then unpack it with ** filtering on the relevant keys:

    gb_col="C"
    gbc = dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
    #       A  B
    #C      
    #1  4  2
    #2  0  0
    #3  1  4
    #4  2  3
    

    And then you can slice and dice how you want with the same syntax:

    mygb = lambda gb_col: dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
    allgb = [mygb(c) for c in dfx.columns]
    

提交回复
热议问题