pandas count values in each column of a dataframe

牧云@^-^@ 提交于 2021-02-04 07:32:30

问题


i'm lookng to find a way to count the number of values in a column and its proving trickier than i originally thought.

       Percentile   Percentile1 Percentile2 Percentile3
0       mediocre    contender   contender   mediocre
69      mediocre    bad         mediocre    mediocre
117     mediocre    mediocre    mediocre    mediocre
144     mediocre    none        mediocre    contender
171     mediocre    mediocre    contender   mediocre

i'm trying to create something looking like the following output. It takes the four options and counts them per column. It is essentially a pd.value.counts for each column. Any help would definitely be appreciated.

         Percentile     Percentile1     Percentile2     Percentile3
mediocre:    5               2               3               4
contender:   0               1               2               1
bad:         0               1               0               0
none:        0               1               0               0

回答1:


It helps to make your data "tidy" (PDF) first. That means the columns should represent variables and the rows should represent observations.

In [98]: df
Out[98]: 
    Percentile Percentile1 Percentile2 Percentile3
0     mediocre   contender   contender    mediocre
69    mediocre         bad    mediocre    mediocre
117   mediocre    mediocre    mediocre    mediocre
144   mediocre        none    mediocre   contender
171   mediocre    mediocre   contender    mediocre

[5 rows x 4 columns]

In this case, melting the DataFrame makes it tidy:

In [125]: melted = pd.melt(df); melted
Out[125]: 
       variable      value
0    Percentile   mediocre
1    Percentile   mediocre
2    Percentile   mediocre
3    Percentile   mediocre
4    Percentile   mediocre
5   Percentile1  contender
6   Percentile1        bad
7   Percentile1   mediocre
8   Percentile1       none
9   Percentile1   mediocre
10  Percentile2  contender
11  Percentile2   mediocre
12  Percentile2   mediocre
13  Percentile2   mediocre
14  Percentile2  contender
15  Percentile3   mediocre
16  Percentile3   mediocre
17  Percentile3   mediocre
18  Percentile3  contender
19  Percentile3   mediocre

[20 rows x 2 columns]

and then make a frequency table using crosstab:

In [127]: pd.crosstab(index=[melted['value']], columns=[melted['variable']])
Out[127]: 
variable   Percentile  Percentile1  Percentile2  Percentile3
value                                                       
bad                 0            1            0            0
contender           0            1            2            1
mediocre            5            2            3            4
none                0            1            0            0

[4 rows x 4 columns]


来源:https://stackoverflow.com/questions/22888434/pandas-count-values-in-each-column-of-a-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!