问题
i'm lookng to find a way to count the number of values in a column and its proving trickier than i originally thought.
Percentile Percentile1 Percentile2 Percentile3
0 mediocre contender contender mediocre
69 mediocre bad mediocre mediocre
117 mediocre mediocre mediocre mediocre
144 mediocre none mediocre contender
171 mediocre mediocre contender mediocre
i'm trying to create something looking like the following output. It takes the four options and counts them per column. It is essentially a pd.value.counts for each column. Any help would definitely be appreciated.
Percentile Percentile1 Percentile2 Percentile3
mediocre: 5 2 3 4
contender: 0 1 2 1
bad: 0 1 0 0
none: 0 1 0 0
回答1:
It helps to make your data "tidy" (PDF) first. That means the columns should represent variables and the rows should represent observations.
In [98]: df
Out[98]:
Percentile Percentile1 Percentile2 Percentile3
0 mediocre contender contender mediocre
69 mediocre bad mediocre mediocre
117 mediocre mediocre mediocre mediocre
144 mediocre none mediocre contender
171 mediocre mediocre contender mediocre
[5 rows x 4 columns]
In this case, melting the DataFrame makes it tidy:
In [125]: melted = pd.melt(df); melted
Out[125]:
variable value
0 Percentile mediocre
1 Percentile mediocre
2 Percentile mediocre
3 Percentile mediocre
4 Percentile mediocre
5 Percentile1 contender
6 Percentile1 bad
7 Percentile1 mediocre
8 Percentile1 none
9 Percentile1 mediocre
10 Percentile2 contender
11 Percentile2 mediocre
12 Percentile2 mediocre
13 Percentile2 mediocre
14 Percentile2 contender
15 Percentile3 mediocre
16 Percentile3 mediocre
17 Percentile3 mediocre
18 Percentile3 contender
19 Percentile3 mediocre
[20 rows x 2 columns]
and then make a frequency table using crosstab:
In [127]: pd.crosstab(index=[melted['value']], columns=[melted['variable']])
Out[127]:
variable Percentile Percentile1 Percentile2 Percentile3
value
bad 0 1 0 0
contender 0 1 2 1
mediocre 5 2 3 4
none 0 1 0 0
[4 rows x 4 columns]
来源:https://stackoverflow.com/questions/22888434/pandas-count-values-in-each-column-of-a-dataframe