I have imported a text file into a numpy array as shown below.
data=np.genfromtxt(f,dtype=None,delimiter=',',names=None)
where f contains the path of my csv file
now data contains the following.
array([(534, 116.48482, 39.89821, '2008-02-03 00:00:49'),
(650, 116.4978, 39.98097, '2008-02-03 00:00:02'),
(675, 116.31873, 39.9374, '2008-02-03 00:00:04'),
(715, 116.70027, 40.16545, '2008-02-03 00:00:45'),
(2884, 116.67727, 39.88201, '2008-02-03 00:00:48'),
(3799, 116.29838, 40.04533, '2008-02-03 00:00:37'),
(4549, 116.48405, 39.91403, '2008-02-03 00:00:42'),
(4819, 116.42967, 39.93963, '2008-02-03 00:00:43')],
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')])
If i now try to column slice, ie extract the first or the second column using
data[:,0]
It says "too many indices". I figured out that it is due the the way it is being stored. all the rows are being stored as tuples and not as list/array. I thought of using the "ugliest" way possible to perform slicing without having to use iteration. That would be to convert the tuples in each row to list and put it back to the numpy array. something like this
data=np.asarray([list(i) for i in data])
But for the above problem, i am losing the datatypes of each column. Each element will be stored as a string rather than integer or float which was automatically detected in the former case.
Now if i want to slice the columns without having to use iteration is there any way?
What np.genfromtext
has created for you is not an array of tuples, which would have had object
dtype, but a record array. You can tell from the weird dtype:
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')]
Each of the tuples in that list holds the name of the corresponding field, and its dtype, <i4
is a little endian 4 byte integer, <f8
a little endian 8 byte float and S19
a 19 character long string. You can access the fields by name as:
In [2]: x['f0']
Out[2]: array([ 534, 650, 675, 715, 2884, 3799, 4549, 4819])
In [3]: x['f1']
Out[3]:
array([ 116.48482, 116.4978 , 116.31873, 116.70027, 116.67727,
116.29838, 116.48405, 116.42967])
Perhaps for your case you could just use zip
.
import numpy as np
x = np.array([(534, 116.48482, 39.89821, '2008-02-03 00:00:49'),
(650, 116.4978, 39.98097, '2008-02-03 00:00:02'),
(675, 116.31873, 39.9374, '2008-02-03 00:00:04'),
(715, 116.70027, 40.16545, '2008-02-03 00:00:45'),
(2884, 116.67727, 39.88201, '2008-02-03 00:00:48'),
(3799, 116.29838, 40.04533, '2008-02-03 00:00:37'),
(4549, 116.48405, 39.91403, '2008-02-03 00:00:42'),
(4819, 116.42967, 39.93963, '2008-02-03 00:00:43')],
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')])
b = zip(*x)
Result:
>>> b[0]
(534, 650, 675, 715, 2884, 3799, 4549, 4819)
>>> b[1]
(116.48482, 116.4978, 116.31873, 116.70027, 116.67726999999999, 116.29837999999999, 116.48405, 116.42967)
来源:https://stackoverflow.com/questions/16134724/to-slice-columns-in-a-tuple-present-in-a-numpy-array