How to generate pandas DataFrame column of Categorical from string column?

笑着哭i 提交于 2019-11-30 19:22:09

The only workaround for pandas pre-0.15 I found is as follows:

  • column must be converted to a Categorical for classifier, but numpy will immediately coerce the levels back to int, losing the factor information
  • so store the factor in a global variable outside the dataframe

.

train_LocationNFactor = pd.Categorical.from_array(train['LocationNormalized']) # default order: alphabetical

train['LocationNFactor'] = train_LocationNFactor.labels # insert in dataframe

[UPDATE: pandas 0.15+ added decent support for Categorical]

The labels<->levels is stored in the index object.

  • To convert an integer array to string array: index[integer_array]
  • To convert a string array to integer array: index.get_indexer(string_array)

Here is some exampe:

In [56]:

c = pd.Categorical.from_array(['a', 'b', 'c', 'd', 'e'])

idx = c.levels

In [57]:

idx[[1,2,1,2,3]]

Out[57]:

Index([b, c, b, c, d], dtype=object)

In [58]:

idx.get_indexer(["a","c","d","e","a"])

Out[58]:

array([0, 2, 3, 4, 0])
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!