I\'m writing a multiprocessing program to handle a large .CSV file in parallel, using Windows.
I found this excellent example for a similar problem. When running it
The problem you're running into is caused by using methods of the CSVWorker class as the process targets; and that class has members that cannot be pickled; those open files are just never going to work;
What you want to do is break that class into two classes; one which coordinates all of the worker subprocesses, and another which actually does the computational work. the worker processes take filenames as arguments and open the individual files as needed, or at least wait until they have their worker methods invoked and open files only then. they can also take multiprocessing.Queue
s as arguments or as instance members; that's safe to pass around.
To a certain extent, you already kinda do this; your write_output_csv
method is opening the file its file in the subprocess, but your parse_input_csv
method is expecting to find an already open and prepared file as a attribute of self
. Do it the other way consistently and you should be in good shape.