numpy genfromtxt issues in Python3

倖福魔咒の 提交于 2019-11-29 07:58:32

The answer to my problem is using the dtype for unicode strings (U2, for example).

Thanks to the answer of E.Kehler, I found the solution. If I use str in place of S8 in the dtype definition, then the output for the 2nd column is empty:

numpy.genfromtxt("test.csv", delimiter=",", dtype='f8,str')

the output is:

array([(1.0, ''), (2.0, ''), (3.0, '')], dtype=[('f0', '<f16'), ('f1', '<U0')])

This suggested me that correct dtype to solve my problem is an unicode string:

numpy.genfromtxt("test.csv", delimiter=",", dtype='f8,U2')

that gives the expected output:

array([(1.0, 'a'), (2.0, 'b'), (3.0, 'c')], dtype=[('f0', '<f16'), ('f1', '<U2')])

Useful information can be also found at the numpy datatype doc page .

In python 3, writing

dtype="S8"

(or any variation of "S#") in NumPy's genfromtxt yields a byte string. To avoid this and get just an old fashioned string, write

dtype=str

instead.

training = np.genfromtxt('twitter_train.csv', delimiter=',', usecols=(0,1), dtype='U')

In my case, the first column contains a sentiment value of either 0 or 1 and the second column is a string of many characters representing a tweet in this ex. dtype='U' removed the b' from being included.

So in your case it would be: data=numpy.genfromtxt("test.csv", delimiter=",", dtype='U')

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!