I try to understand how works numpy.getfromtxt method and io.StringIO. On the officical website(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/num
In [200]: np.__version__
Out[200]: '1.14.0'
The example works for me:
In [201]: s = io.StringIO("1,1.3,abcde")
In [202]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
...: ... ('mystring','S5')], delimiter=",")
Out[202]:
array((1, 1.3, b'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
It also works for a byte string:
In [204]: s = io.BytesIO(b"1,1.3,abcde")
In [205]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
...: ... ('mystring','S5')], delimiter=",")
Out[205]:
array((1, 1.3, b'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
genfromtxt works with anything that feeds it lines, so I usually use a list of bytestrings directly (when testing questions):
In [206]: s = [b"1,1.3,abcde"]
In [207]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
...: ... ('mystring','S5')], delimiter=",")
Out[207]:
array((1, 1.3, b'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
Or with several lines
In [208]: s = b"""1,1.3,abcde
...: 4,1.3,two""".splitlines()
In [209]: s
Out[209]: [b'1,1.3,abcde', b'4,1.3,two']
In [210]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
...: ... ('mystring','S5')], delimiter=",")
Out[210]:
array([(1, 1.3, b'abcde'), (4, 1.3, b'two')],
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
It used to be that with dtype=None, genfromtxt created S strings.
NumPy dtype issues in genfromtxt(), reads string in as bytestring
With 1.14, we can control the default string dtype:
In [219]: s = io.StringIO("1,1.3,abcde")
In [220]: np.genfromtxt(s, dtype=None, delimiter=",")
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
#!/usr/bin/python3
Out[220]:
array((1, 1.3, b'abcde'),
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', 'S5')])
In [221]: s = io.StringIO("1,1.3,abcde")
In [222]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
Out[222]:
array((1, 1.3, 'abcde'),
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])
https://docs.scipy.org/doc/numpy/release.html#encoding-argument-for-text-io-functions
Now I can generate examples with Py3 strings without producing all those ugly b'string' results (but got to remember that not everyone has upgraded to 1.14):
In [223]: s = """1,1.3,abcde
...: 4,1.3,two""".splitlines()
In [224]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
Out[224]:
array([(1, 1.3, 'abcde'), (4, 1.3, 'two')],
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])
Consider upgrading numpy because for the current version of numpy, your code just works as written. See the mention in 1.14.0 release note highlights and the section Encoding argument for text IO functions for the relevant changes in np.genfromtxt.
For older numpy, you use a string object for the input but the docs you linked say:
Note that generators must return byte strings in Python 3k.
So do what the docs say and give it a byte string:
import io
s = io.BytesIO(b"1,1.3,abcde")