I need to read some very huge text files (100+ Mb), process every lines with regex and store the data into a structure. My structure inherits from defaultdict, it has a read
You're creating a pool with as many workers as files. That may be too many. Usually, I aim to have the number of workers around the same as the number of cores.
The simple fact is that your final step is going to be a single process merging all the results together. There is no avoiding this, given your problem description. This is known as a barrier synchronization: all tasks have to reach the same point before any can proceed.
You should probably run this program multiple times, or in a loop, passing a different value to multiprocessing.Pool() each time, starting at 1 and going to the number of cores. Time each run, and see which worker count does best.
The result will depend on how CPU-intensive (as opposed to disk-intensive) your task is. I would not be surprised if 2 were best if your task is about half CPU and half disk, even on an 8-core machine.