I have pretty simple problem. I have a large file that goes through three steps, a decoding step using an external program, some processing in python, and then recoding usi
However, all the data are buffered to memory ...
Are you using subprocess.Popen.communicate()? By design, this function will wait for the process to finish, all the while accumulating the data in a buffer, and then return it to you. As you've pointed out, this is problematic if dealing with very large files.
If you want to process the data while it is generated, you will need to write a loop using the poll()
and .stdout.read()
methods, then write that output to another socket/file/etc.
Do be sure to notice the warnings in the documentation against doing this as it is easy to result in a deadlock (the parent process waits for the child process to generate data, who is in turn waiting for the parent process to empty the pipe buffer).