Pandas crosstab on CategoricalDType columns throws TypeError

孤者浪人 提交于 2019-12-13 19:24:19

问题


Consider this simple data set whose columns are cut by quantiles.

kyle = pd.DataFrame({'foo':np.random.randint(0,100,100),'boo':np.random.randint(0,100,100)})
kyle.loc[:,'fooCut'] = pd.qcut(kyle.loc[:,'foo'], np.arange(0,1.1,.1))
kyle.loc[:,'booCut'] = pd.qcut(kyle.loc[:,'boo'], np.arange(0,1.1,.1))

Previous versions of Pandas handled the below as expected...

pd.crosstab(kyle.fooCut,kyle.booCut)

After updating to version '0.24.2', the above is throwing me a TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

Does anyone know why and how to solve this? Note that here, kyle.boocut.dtype returns CategoricalDtype, a type that is the same as in the pd.crosstab documentation and example for categorical variables.

[Update]

This is a known bug in pandas and is being fixed


回答1:


As uncovered by OP, this is an issue relating to pivoting (crosstab is an optimised version of pivot_table under the hood) Interval columns and is currently being fixed for v0.25.

Here's a workaround involving crosstabulating the integer codes:

cstab = pd.crosstab(kyle.fooCut.cat.codes, kyle.booCut.cat.codes)
cstab


col_0  0  1  2  3  4  5  6  7  8  9
row_0                              
0      0  2  0  1  3  1  2  1  1  1
1      1  1  0  1  1  2  1  0  1  2
2      2  1  1  0  1  1  2  0  0  0
3      2  1  3  1  2  0  0  0  0  1
4      1  2  1  0  0  2  0  1  1  2
5      0  2  0  1  0  1  0  3  3  0
6      2  0  1  2  0  2  1  1  1  1
7      1  0  0  2  2  0  1  1  2  0
8      0  1  1  0  1  1  3  1  1  1
9      1  1  2  2  0  0  2  1  0  1

If you want to, you can always assign the index and columns of the result to the actual categories:

cstab.index = kyle.fooCut.cat.categories
cstab.columns = kyle.booCut.cat.categories


来源:https://stackoverflow.com/questions/56571306/pandas-crosstab-on-categoricaldtype-columns-throws-typeerror

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!