to slice columns in a tuple present in a numpy array

左心房为你撑大大i 提交于 2019-12-07 12:35:41

问题


I have imported a text file into a numpy array as shown below.

data=np.genfromtxt(f,dtype=None,delimiter=',',names=None)

where f contains the path of my csv file

now data contains the following.

array([(534, 116.48482, 39.89821, '2008-02-03 00:00:49'),
   (650, 116.4978, 39.98097, '2008-02-03 00:00:02'),
   (675, 116.31873, 39.9374, '2008-02-03 00:00:04'),
   (715, 116.70027, 40.16545, '2008-02-03 00:00:45'),
   (2884, 116.67727, 39.88201, '2008-02-03 00:00:48'),
   (3799, 116.29838, 40.04533, '2008-02-03 00:00:37'),
   (4549, 116.48405, 39.91403, '2008-02-03 00:00:42'),
   (4819, 116.42967, 39.93963, '2008-02-03 00:00:43')],
    dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')])

If i now try to column slice, ie extract the first or the second column using

data[:,0]

It says "too many indices". I figured out that it is due the the way it is being stored. all the rows are being stored as tuples and not as list/array. I thought of using the "ugliest" way possible to perform slicing without having to use iteration. That would be to convert the tuples in each row to list and put it back to the numpy array. something like this

data=np.asarray([list(i) for i in data])

But for the above problem, i am losing the datatypes of each column. Each element will be stored as a string rather than integer or float which was automatically detected in the former case.

Now if i want to slice the columns without having to use iteration is there any way?


回答1:


What np.genfromtext has created for you is not an array of tuples, which would have had object dtype, but a record array. You can tell from the weird dtype:

dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')]

Each of the tuples in that list holds the name of the corresponding field, and its dtype, <i4 is a little endian 4 byte integer, <f8 a little endian 8 byte float and S19 a 19 character long string. You can access the fields by name as:

In [2]: x['f0']
Out[2]: array([ 534,  650,  675,  715, 2884, 3799, 4549, 4819])

In [3]: x['f1']
Out[3]: 
array([ 116.48482,  116.4978 ,  116.31873,  116.70027,  116.67727,
        116.29838,  116.48405,  116.42967])



回答2:


Perhaps for your case you could just use zip.

import numpy as np

x = np.array([(534, 116.48482, 39.89821, '2008-02-03 00:00:49'),
              (650, 116.4978, 39.98097, '2008-02-03 00:00:02'),
              (675, 116.31873, 39.9374, '2008-02-03 00:00:04'),
              (715, 116.70027, 40.16545, '2008-02-03 00:00:45'),
              (2884, 116.67727, 39.88201, '2008-02-03 00:00:48'),
              (3799, 116.29838, 40.04533, '2008-02-03 00:00:37'),
              (4549, 116.48405, 39.91403, '2008-02-03 00:00:42'),
              (4819, 116.42967, 39.93963, '2008-02-03 00:00:43')],
              dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')])

b = zip(*x)

Result:

>>> b[0]
(534, 650, 675, 715, 2884, 3799, 4549, 4819)
>>> b[1]
(116.48482, 116.4978, 116.31873, 116.70027, 116.67726999999999, 116.29837999999999, 116.48405, 116.42967)


来源:https://stackoverflow.com/questions/16134724/to-slice-columns-in-a-tuple-present-in-a-numpy-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!