NumPy reading file with filtering lines on the fly

后端未结

关注

 3  1385

南方客 2021-01-02 08:19

I have a large array of numbers written in a CSV file and need to load only a slice of that array. Conceptually I want to call np.genfromtxt() and then row-slic

3条回答

猫巷女王i (楼主)

2021-01-02 09:16
If you pass a list of types (the format condition), use a try block and use yield to use genfromtxt as a generator, we should be able to replicate textscan().
```
def genfromtext(fname, formatTypes):
    with open(fname, 'r') as file:
        for line in file:
            try:
                line = line.split(',')  # Do you care about line anymore?
                r = []
                for type, cell in zip(formatTypes, line):
                    r.append(type(cell))
            except:
                pass  # Fail silently on this line since we hit an error
            yield r
```
Edit: I forgot the except block. It runs okay now and you can use genfromtext as a generator like so (using a random CSV log I have sitting around):
```
>>> a = genfromtext('log.txt', [str, str, str, int])
>>> a.next()
['10.10.9.45', ' 2013/01/17 16:29:26', '00:00:36', 0]
>>> a.next()
['10.10.9.45', ' 2013/01/17 16:22:20', '00:08:14', 0]
>>> a.next()
['10.10.9.45', ' 2013/01/17 16:31:05', '00:00:11', 3]
```
I should probably note that I'm using zip to zip together the comma split line and the formatSpec which will tuplify the two lists (stopping when one of the lists runs out of items) so we can iterate over them together, avoiding a loop dependent on len(line) or something like that.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...