share python object between multiprocess in python3

守給你的承諾、 提交于 2019-12-10 15:28:13

问题


Here I create a producer-customer program,the parent process(producer) create many child process(consumer),then parent process read file and pass data to child process.

but , here comes a performance problem,pass message between process cost too much time (I think).

for an example ,a 200MB original data ,parent process read and pretreat will cost less then 8 seconds , than just pass data to child process by multiprocess.pipe will cost another 8 seconds , and child processes do the remain work just cost another 3 ~ 4 seconds.

so ,a complete work flow cost less than 18 seconds ,and more than 40% time cost on communication between process , it is much bigger than I used think about ,and I tried multiprocess.Queue and Manager ,they are worse.

I works with windows7 / Python3.4. I had google for several days , and POSH maybe a good solution , but it can't build with python3.4

there I have 3 ways:

1.is there any way can share python object direct between process in Python3.4 ? as POSH

or

2.is it possable pass the "pointer" of an object to child process and child process can recovery the "pointer" to python object?

or

3.multiprocess.Array may be a valid solution , but if I want share complex data structure, such as list, how it works? should I make a new class base on it and provide interfaces as list?

Edit1: I tried the 3rd way,but it works worse.
I defined those value:

p_pos = multiprocessing.Value('i') #producer write position
c_pos = multiprocessing.Value('i') #customer read position
databuff = multiprocess.Array('c',buff_len) # shared buffer

and two function:

send_data(msg)
get_data()

in send_data function(parent process),it copies msg to databuff , and send the start and end position (two integer)to child process via pipe.
than in get_data function (child process) ,it received the two position and copy the msg from databuff.

in final,it cost twice than just use pipe @_@

Edit 2:
Yes , I tried Cython ,and the result looks good.
I just changed my python script's suffix to .pyx and compile it ,and the program speed up for 15%.
No doubt , I met the " Unable to find vcvarsall.bat" and " The system cannot find the file specified" error , and I cost whole day for solved the first one , and blocked by the second one.
Finally , I found Cyther , and all troubles gone ^_^.


回答1:


I was at your place five month ago. I looked around few times but my conclusion is multiprocessing with Python has exactly the problem you describe :

  • Pipes and Queue are good but not for big objects from my experience
  • Manager() proxies objects are slow except arrays and those one are limited. if you want to share a complex data structure use a Namespace like it is done here : multiprocessing in python - sharing large object (e.g. pandas dataframe) between multiple processes
  • Manager() has a shared list you are looking for : https://docs.python.org/3.6/library/multiprocessing.html
  • There are no pointers or real memory management in Python, so you can't share selected memory cells

I solved this kind of problem by learning C++, but it's probably not what you want to read...




回答2:


To pass data (especially big numpy arrays) to a child process, I think mpi4py can be very efficient since I can work directly on buffer-like objects.

An example of using mpi4py to spawn processes and communicate (using also trio, but it is another story) can be found here.



来源:https://stackoverflow.com/questions/39687235/share-python-object-between-multiprocess-in-python3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!