Fastest way to read every n-th row with numpy's genfromtxt

泪湿孤枕 提交于 2019-12-07 17:11:01

问题


I read my data with numpy's genfromtxt:

import numpy as np
measurement = np.genfromtxt('measurementProfile2.txt', delimiter=None, dtype=None, skip_header=4, skip_footer=2, usecols=(3,0,2))
rows, columns = np.shape(measurement)
x=np.zeros((rows, 1), dtype=measurement.dtype)
x[:]=394
measurement = np.hstack((measurement, x))
np.savetxt('measurementProfileFormatted.txt',measurement)

this works fine. But i want only ever 5-th, 6-th (so n-th) row in the final Output file. According to numpy.genfromtxt.html there is no Parameter which would do that. I dont want to iterate the array. Is there a recommended way to deal with this problem?


回答1:


To avoid reading the whole array you can combine np.genfromtxt with itertools.islice to skip the rows. This is marginally faster than reading the whole array and then slicing (at least for the smaller arrays I tried).

For instance, here's the contents of file.txt:

12
34
22
17
41
28
62
71

Then for example:

>>> import itertools
>>> with open('file.txt') as f_in:
        x = np.genfromtxt(itertools.islice(f_in, 0, None, 3), dtype=int)

returns an array x with the 0, 3 and 6 indexed elements of the above file:

array([12, 17, 62])



回答2:


You must read the whole file anyways, to select the n-th element do something like:

>>> a = np.arange(50)
>>> a[::5]
array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45])



回答3:


If you just want specific rows in the final output file then why not save only those rows instead of saving the whole 'measurement' matrix:

output_rows = [5,7,11]
np.savetxt('measurementProfileFormatted.txt',measurement[output_rows,:])


来源:https://stackoverflow.com/questions/27961782/fastest-way-to-read-every-n-th-row-with-numpys-genfromtxt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!