Python: how to convert a string array to a factor list

纵饮孤独 提交于 2021-02-19 04:54:10

问题


Python 2.7, numpy, create levels in the form of a list of factors.

I have a data file which list independent variables, the last column indicates the class. For example:

2.34,4.23,0.001, ... ,56.44,2.0,"cloudy with a chance of rain"

Using numpy, I read all the numeric columns into a matrix, and the last column into an array which I call "classes". In fact, I don't know the class names in advance, so I do not want to use a dictionary. I also do not want to use Pandas. Here is an example of the problem:

classes = ['a', 'b', 'c', 'c', 'b', 'a', 'a', 'd']
type (classes)
<type 'list'>
classes = numpy.array(classes)
type(classes)
<type 'numpy.ndarray'>
classes
array(['a', 'b', 'c', 'c', 'b', 'a', 'a', 'd'],
      dtype='|S1')
# requirements call for a list like this:
# [0, 1, 2, 2, 1, 0, 3]

Note that the target class may be very sparse, for example, a 'z', in perhaps 1 out of 100,000 cases. Also note that the classes may be arbitrary strings of text, for example, scientific names.

I'm using Python 2.7 with numpy, and I'm stuck with my environment. Also, the data has been preprocessed, so it's scaled and all values are valid - I do not want to preprocess the data a second time to extract the unique classes and build a dictionary before I process the data. What I'm really looking for was the Python equivalent to the stringAsFactors parameter in R that automatically converts a string vector to a factor vector when the script reads the data.

Don't ask me why I'm using Python instead of R - I do what I'm told.

Thanks, CC.


回答1:


You could use np.unique with return_inverse=True to return both the unique class names and a set of corresponding integer indices:

import numpy as np

classes = np.array(['a', 'b', 'c', 'c', 'b', 'a', 'a', 'd'])

classnames, indices = np.unique(classes, return_inverse=True)

print(classnames)
# ['a' 'b' 'c' 'd']

print(indices)
# [0 1 2 2 1 0 0 3]

print(classnames[indices])
# ['a' 'b' 'c' 'c' 'b' 'a' 'a' 'd']

The class names will be sorted in lexical order.



来源:https://stackoverflow.com/questions/34682420/python-how-to-convert-a-string-array-to-a-factor-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!