发表新帖

发表新帖

Is it possible to speed-up python IO?

后端未结

关注

 8  970

不要未来只要你来 2020-12-16 16:12

Consider this python program:

import sys

lc = 0
for line in open(sys.argv[1]):
    lc = lc + 1

print lc, sys.argv[1]

Running it on my 6GB

8条回答

悲&欢浪女 (楼主)

2020-12-16 16:27
The trick is not to make electrons move faster (that's hard to do) but to get more work done per unit of time.

First, be sure your 6GB file read is I/O bound, not CPU bound.

If It's I/O bound, consider the "Fan-Out" design pattern.
- A parent process spawns a bunch of children.
- The parent reads the 6Gb file, and deals rows out to the children by writing to their STDIN pipes. The 6GB read time will remain constant. The row dealing should involve as little parent processing as possible. Very simple filters or counts should be used.
  
  A pipe is an in-memory channel for communication. It's a shared buffer with a reader and a writer.
- Each child reads a row from STDIN, and does appropriate work. Each child should probably write a simple disk file with the final (summarized, reduce) results. Later, the results in those files can be consolidated.
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题