numpy.genfromtxt produces array of what looks like tuples, not a 2D array—why?

家住魔仙堡 提交于 2019-11-26 14:19:05

问题


I'm running genfromtxt like below:

date_conv = lambda x: str(x).replace(":", "/")
time_conv = lambda x: str(x)

a = np.genfromtxt(input.txt, delimiter=',', skip_header=4,
      usecols=[0, 1] + radii_indices, converters={0: date_conv, 1: time_conv})

Where input.txt is from this gist.

When I look at the results, it is a 1D array not a 2D array:

>>> np.shape(a)
(918,)

It seems to be an array of tuples instead:

>>> a[0]
('06/03/2006', '08:27:23', 6.4e-05, 0.000336, 0.001168, 0.002716, 0.004274, 0.004658, 0.003756, 0.002697, 0.002257, 0.002566, 0.003522, 0.004471, 0.00492, 0.005602, 0.006956, 0.008442, 0.008784, 0.006976, 0.003917, 0.001494, 0.000379, 6.4e-05)

If I remove the converters specification from the genfromtxt call it works fine and produces a 2D array:

>>> np.shape(a)
(918, 24)

回答1:


What is returned is called a structured ndarray, see eg here: http://docs.scipy.org/doc/numpy/user/basics.rec.html. This is because your data are not homogeneous, i.e. not all elements have the same type: the data contain both strings (the first two columns) and floats. Numpy arrays have to be homogeneous (see here for an explanation).

The structured arrays 'solve' this constraint of homogeneity by using tuples for each record or row, that's the reason the returned array is 1D: one series of tuples, but each tuple (row) consists of several data, so you can regard it as rows and columns. The different columns are accessible as a['nameofcolumn'], in your case eg a['Julian_Day'].

The reason that it returns a 2D array when removing the converters for the first two columns is that it that case, genfromtxt regards all data of the same type, and a normal ndarray is returned (the default type is float, but you can specify this with the dtype argument).

EDIT: If you want to make use of the column names, you can use the names argument (and set the skip_header at only three):

a2 = np.genfromtxt("input.txt", delimiter=',', skip_header=3, names = True, dtype = None,
                  usecols=[0, 1] + radii_indices, converters={0: date_conv, 1: time_conv})

the you can do eg:

>>> a2['Dateddmmyyyy']
array(['06/03/2006', '06/03/2006', '18/03/2006', '19/03/2006',
       '19/03/2006', '19/03/2006', '19/03/2006', '19/03/2006',
       '19/03/2006', '19/03/2006'], 
      dtype='|S10')


来源:https://stackoverflow.com/questions/9534408/numpy-genfromtxt-produces-array-of-what-looks-like-tuples-not-a-2d-array-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!