Pandas: Counting unique values in a dataframe

前端 未结 2 690
情歌与酒
情歌与酒 2020-12-16 03:37

We have a DataFrame that looks like this:

> df.ix[:2,:10]
    0   1   2   3   4   5   6   7   8   9   10
0   NaN NaN NaN NaN  6   5  NaN NaN  4  NaN  5
1          


        
相关标签:
2条回答
  • 2020-12-16 04:31

    Not enough rep to comment, but Andy's answer:

    pd.value_counts(d.values.ravel()) 
    

    is what I have used personally, and seems to me to be by far the most versatile and easily-readable solution. Another advantage is that it is easy to use a subset of the columns:

    pd.value_counts(d[[1,3,4,6,7]].values.ravel()) 
    

    or

    pd.value_counts(d[["col_title1","col_title2"]].values.ravel()) 
    

    Is there any disadvantage to this approach, or any particular reason you want to use stack and groupby?

    0 讨论(0)
  • 2020-12-16 04:36

    I think you are doing a row/column-wise operation so can use apply:

    In [11]: d.apply(pd.Series.value_counts, axis=1).fillna(0)
    Out[11]: 
       1  2  3
    0  1  1  1
    1  4  0  1
    2  1  1  1
    3  0  4  1
    

    Note: There is a value_counts DataFrame method in the works for 0.14... which will make this more efficient and more concise.

    It's worth noting that the pandas value_counts function also works on a numpy array, so you can pass it the values of the DataFrame (as a 1-d array view using np.ravel):

    In [21]: pd.value_counts(d.values.ravel())
    Out[21]: 
    2    6
    1    6
    3    4
    dtype: int64
    

    Also, you were pretty close to getting this correct, but you'd need to stack and unstack:

    In [22]: d.stack().groupby(level=0).apply(pd.Series.value_counts).unstack().fillna(0)
    Out[22]: 
       1  2  3
    0  1  1  1
    1  4  0  1
    2  1  1  1
    3  0  4  1
    

    This error seems somewhat self explanatory (4 != 16):

    len(d.stack()) #16
    d.stack().groupby(arange(4))
    AssertionError: Grouper and axis must be same length
    

    perhaps you wanted to pass:

    In [23]: np.repeat(np.arange(4), 4)
    Out[23]: array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
    
    0 讨论(0)
提交回复
热议问题