Pandas dataframe: how to apply describe() to each group and add to new columns?

前端 未结 6 681
执念已碎
执念已碎 2020-12-15 06:20

df:

name score
A      1
A      2
A      3
A      4
A      5
B      2
B      4
B      6 
B      8

Want to get the following new dataframe in

6条回答
  •  無奈伤痛
    2020-12-15 06:41

    Define some data

    In[1]:
    import pandas as pd
    import io
    
    data = """
    name score
    A      1
    A      2
    A      3
    A      4
    A      5
    B      2
    B      4
    B      6
    B      8
        """
    
    df = pd.read_csv(io.StringIO(data), delimiter='\s+')
    print(df)
    

    .

    Out[1]:
      name  score
    0    A      1
    1    A      2
    2    A      3
    3    A      4
    4    A      5
    5    B      2
    6    B      4
    7    B      6
    8    B      8
    

    Solution

    A nice approach to this problem uses a generator expression (see footnote) to allow pd.DataFrame() to iterate over the results of groupby, and construct the summary stats dataframe on the fly:

    In[2]:
    df2 = pd.DataFrame(group.describe().rename(columns={'score':name}).squeeze()
                             for name, group in df.groupby('name'))
    
    print(df2)
    

    .

    Out[2]:
       count  mean       std  min  25%  50%  75%  max
    A      5     3  1.581139    1  2.0    3  4.0    5
    B      4     5  2.581989    2  3.5    5  6.5    8
    

    Here the squeeze function is squeezing out a dimension, to convert the one-column group summary stats Dataframe into a Series.

    Footnote: A generator expression has the form my_function(a) for a in iterator, or if iterator gives us back two-element tuples, as in the case of groupby: my_function(a,b) for a,b in iterator

提交回复
热议问题