to slice columns in a tuple present in a numpy array

问题

I have imported a text file into a numpy array as shown below.

data=np.genfromtxt(f,dtype=None,delimiter=',',names=None)

where f contains the path of my csv file

now data contains the following.

array([(534, 116.48482, 39.89821, '2008-02-03 00:00:49'),
   (650, 116.4978, 39.98097, '2008-02-03 00:00:02'),
   (675, 116.31873, 39.9374, '2008-02-03 00:00:04'),
   (715, 116.70027, 40.16545, '2008-02-03 00:00:45'),
   (2884, 116.67727, 39.88201, '2008-02-03 00:00:48'),
   (3799, 116.29838, 40.04533, '2008-02-03 00:00:37'),
   (4549, 116.48405, 39.91403, '2008-02-03 00:00:42'),
   (4819, 116.42967, 39.93963, '2008-02-03 00:00:43')],
    dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')])

If i now try to column slice, ie extract the first or the second column using

data[:,0]

It says "too many indices". I figured out that it is due the the way it is being stored. all the rows are being stored as tuples and not as list/array. I thought of using the "ugliest" way possible to perform slicing without having to use iteration. That would be to convert the tuples in each row to list and put it back to the numpy array. something like this

data=np.asarray([list(i) for i in data])

But for the above problem, i am losing the datatypes of each column. Each element will be stored as a string rather than integer or float which was automatically detected in the former case.

Now if i want to slice the columns without having to use iteration is there any way?

回答1:

What np.genfromtext has created for you is not an array of tuples, which would have had object dtype, but a record array. You can tell from the weird dtype:

dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')]

Each of the tuples in that list holds the name of the corresponding field, and its dtype, <i4 is a little endian 4 byte integer, <f8 a little endian 8 byte float and S19 a 19 character long string. You can access the fields by name as:

In [2]: x['f0']
Out[2]: array([ 534,  650,  675,  715, 2884, 3799, 4549, 4819])

In [3]: x['f1']
Out[3]: 
array([ 116.48482,  116.4978 ,  116.31873,  116.70027,  116.67727,
        116.29838,  116.48405,  116.42967])

回答2:

Perhaps for your case you could just use zip.

import numpy as np

x = np.array([(534, 116.48482, 39.89821, '2008-02-03 00:00:49'),
              (650, 116.4978, 39.98097, '2008-02-03 00:00:02'),
              (675, 116.31873, 39.9374, '2008-02-03 00:00:04'),
              (715, 116.70027, 40.16545, '2008-02-03 00:00:45'),
              (2884, 116.67727, 39.88201, '2008-02-03 00:00:48'),
              (3799, 116.29838, 40.04533, '2008-02-03 00:00:37'),
              (4549, 116.48405, 39.91403, '2008-02-03 00:00:42'),
              (4819, 116.42967, 39.93963, '2008-02-03 00:00:43')],
              dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')])

b = zip(*x)

Result:

>>> b[0]
(534, 650, 675, 715, 2884, 3799, 4549, 4819)
>>> b[1]
(116.48482, 116.4978, 116.31873, 116.70027, 116.67726999999999, 116.29837999999999, 116.48405, 116.42967)

来源：https://stackoverflow.com/questions/16134724/to-slice-columns-in-a-tuple-present-in-a-numpy-array

标签

python

list

numpy

tuples