Count occurrences of items in Series in each row of a DataFrame

本秂侑毒 提交于 2020-01-09 07:44:23

问题


I have a pandas.DataFrame that looks like this.

COL1    COL2    COL3
C1      None    None
C1      C2      None
C1      C1      None
C1      C2      C3

For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this

COL1    COL2    COL3    C1  C2  C3
C1      None    None    1   0   0
C1      C2      None    1   1   0
C1      C1      None    2   0   0
C1      C2      C3      1   1   1

So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an apply approach that can achieve this in a compact fashion?


回答1:


You could apply value_counts:

In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]: 
   C1  C2  C3  None
0   1 NaN NaN     2
1   1   1 NaN     1
2   2 NaN NaN     1
3   1   1   1   NaN

So you can fill the NaN and applend just the base values you want:

In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]: 
   C1  C2  C3
0   1   0   0
1   1   1   0
2   2   0   0
3   1   1   1

Note: there's an open issue to have a value_counts method directly for a DataFrame (which I think should be introduced by pandas 0.15).




回答2:


Andy's answer is spot on.

I'm adding this answer, if C1,C2...Cn list is huge and we want to view only subset of them.

dff = df.copy()
dff['C1']=(df == 'C1').T.sum()
dff['C2']=(df == 'C2').T.sum()
dff['C3']=(df == 'C3').T.sum()
dff
  COL1  COL2  COL3  C1  C2  C3
0   C1  None  None   1   0   0
1   C1    C2  None   1   1   0
2   C1    C1  None   2   0   0
3   C1    C2    C3   1   1   1



回答3:


Usually apply + serise function to whole dataframe will slowing down the whole process , Additional Reading : Link

df.mask(df.eq('None')).stack().str.get_dummies().sum(level=0)
Out[165]: 
   C1  C2  C3
0   1   0   0
1   1   1   0
2   2   0   0
3   1   1   1

Or you can do with Counter

from  collections import Counter

pd.DataFrame([ Counter(x) for x in df.values]).drop('None',1)
Out[170]: 
   C1   C2   C3
0   1  NaN  NaN
1   1  1.0  NaN
2   2  NaN  NaN
3   1  1.0  1.0


来源:https://stackoverflow.com/questions/24516361/count-occurrences-of-items-in-series-in-each-row-of-a-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!