unique combinations of values in selected columns in pandas data frame and count

后端 未结 4 634
无人及你
无人及你 2020-11-27 10:19

I have my data in pandas data frame as follows:

df1 = pd.DataFrame({\'A\':[\'yes\',\'yes\',\'yes\',\'yes\',\'no\',\'no\',\'yes\',\'yes\',\'yes\',\'no\'],
            


        
4条回答
  •  一向
    一向 (楼主)
    2020-11-27 11:07

    You can groupby on cols 'A' and 'B' and call size and then reset_index and rename the generated column:

    In [26]:
    
    df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
    Out[26]:
         A    B  count
    0   no   no      1
    1   no  yes      2
    2  yes   no      4
    3  yes  yes      3
    

    update

    A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size which returns the number of unique groups:

    In[202]:
    df1.groupby(['A','B']).size()
    
    Out[202]: 
    A    B  
    no   no     1
         yes    2
    yes  no     4
         yes    3
    dtype: int64
    

    So now to restore the grouped columns, we call reset_index:

    In[203]:
    df1.groupby(['A','B']).size().reset_index()
    
    Out[203]: 
         A    B  0
    0   no   no  1
    1   no  yes  2
    2  yes   no  4
    3  yes  yes  3
    

    This restores the indices but the size aggregation is turned into a generated column 0, so we have to rename this:

    In[204]:
    df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
    
    Out[204]: 
         A    B  count
    0   no   no      1
    1   no  yes      2
    2  yes   no      4
    3  yes  yes      3
    

    groupby does accept the arg as_index which we could have set to False so it doesn't make the grouped columns the index, but this generates a series and you'd still have to restore the indices and so on....:

    In[205]:
    df1.groupby(['A','B'], as_index=False).size()
    
    Out[205]: 
    A    B  
    no   no     1
         yes    2
    yes  no     4
         yes    3
    dtype: int64
    

提交回复
热议问题