We have a DataFrame that looks like this:
> df.ix[:2,:10]
0 1 2 3 4 5 6 7 8 9 10
0 NaN NaN NaN NaN 6 5 NaN NaN 4 NaN 5
1
Not enough rep to comment, but Andy's answer:
pd.value_counts(d.values.ravel())
is what I have used personally, and seems to me to be by far the most versatile and easily-readable solution. Another advantage is that it is easy to use a subset of the columns:
pd.value_counts(d[[1,3,4,6,7]].values.ravel())
or
pd.value_counts(d[["col_title1","col_title2"]].values.ravel())
Is there any disadvantage to this approach, or any particular reason you want to use stack and groupby?
I think you are doing a row/column-wise operation so can use apply
:
In [11]: d.apply(pd.Series.value_counts, axis=1).fillna(0)
Out[11]:
1 2 3
0 1 1 1
1 4 0 1
2 1 1 1
3 0 4 1
Note: There is a value_counts
DataFrame method in the works for 0.14... which will make this more efficient and more concise.
It's worth noting that the pandas value_counts
function also works on a numpy array, so you can pass it the values of the DataFrame (as a 1-d array view using np.ravel):
In [21]: pd.value_counts(d.values.ravel())
Out[21]:
2 6
1 6
3 4
dtype: int64
Also, you were pretty close to getting this correct, but you'd need to stack and unstack:
In [22]: d.stack().groupby(level=0).apply(pd.Series.value_counts).unstack().fillna(0)
Out[22]:
1 2 3
0 1 1 1
1 4 0 1
2 1 1 1
3 0 4 1
This error seems somewhat self explanatory (4 != 16):
len(d.stack()) #16
d.stack().groupby(arange(4))
AssertionError: Grouper and axis must be same length
perhaps you wanted to pass:
In [23]: np.repeat(np.arange(4), 4)
Out[23]: array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])