Find value counts within a pandas dataframe of strings

前端未结

关注

 4  1424

不要未来只要你来 2020-12-21 13:20

I want to get the frequency count of strings within a column. One one hand, this is similar to collapsing a dataframe to a set of rows that only reflects the strings in the

4条回答

粉色の甜心 (楼主)

2020-12-21 14:03

You can use value counts and pd.Series (Thanks for improvement Jon)i.e

ndf = df.apply(pd.Series.value_counts).fillna(0)

           2017-08-09  2017-08-10
active_1             2         3.0
active_1-3           1         0.0
active_3-7           1         1.0
pre                  1         1.0

Timings:

k = pd.concat([df]*1000)
# @cᴏʟᴅsᴘᴇᴇᴅ's method 
%%timeit
pd.get_dummies(k.T).groupby(by=lambda x: x.split('_', 1)[1], axis=1).sum().T
1 loop, best of 3: 5.68 s per loop


%%timeit
# @cᴏʟᴅsᴘᴇᴇᴅ's method 
k.stack().str.get_dummies().sum(level=1).T
10 loops, best of 3: 84.1 ms per loop

# My method 
%%timeit
k.apply(pd.Series.value_counts).fillna(0)
100 loops, best of 3: 7.57 ms per loop

# FabienP's method 
%%timeit
k.unstack().groupby(level=0).value_counts().unstack().T.fillna(0)
100 loops, best of 3: 7.35 ms per loop

#@Wen's method (fastest for now) 
pd.concat([pd.Series(collections.Counter(k[x])) for x in df.columns],axis=1)
100 loops, best of 3: 4 ms per loop

0 讨论(0)

查看其它4个回答