multiprocessing global variable updates not returned to parent

后端 未结 5 509
天命终不由人
天命终不由人 2020-11-22 10:44

I am trying to return values from subprocesses but these values are unfortunately unpicklable. So I used global variables in threads module with success but have not been ab

相关标签:
5条回答
  • 2020-11-22 11:06

    When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.

    Additionally, most of the abstractions that multiprocessing provides use pickle to transfer data. All data transferred using proxies must be pickleable; that includes all the objects that a Manager provides. Relevant quotations (my emphasis):

    Ensure that the arguments to the methods of proxies are picklable.

    And (in the Manager section):

    Other processes can access the shared objects by using proxies.

    Queues also require pickleable data; the docs don't say so, but a quick test confirms it:

    import multiprocessing
    import pickle
    
    class Thing(object):
        def __getstate__(self):
            print 'got pickled'
            return self.__dict__
        def __setstate__(self, state):
            print 'got unpickled'
            self.__dict__.update(state)
    
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=q.put, args=(Thing(),))
    p.start()
    print q.get()
    p.join()
    

    Output:

    $ python mp.py 
    got pickled
    got unpickled
    <__main__.Thing object at 0x10056b350>
    

    The one approach that might work for you, if you really can't pickle the data, is to find a way to store it as a ctype object; a reference to the memory can then be passed to a child process. This seems pretty dodgy to me; I've never done it. But it might be a possible solution for you.

    Given your update, it seems like you need to know a lot more about the internals of a LORR. Is LORR a class? Can you subclass from it? Is it a subclass of something else? What's its MRO? (Try LORR.__mro__ and post the output if it works.) If it's a pure python object, it might be possible to subclass it, creating a __setstate__ and a __getstate__ to enable pickling.

    Another approach might be to figure out how to get the relevant data out of a LORR instance and pass it via a simple string. Since you say that you really just want to call the methods of the object, why not just do so using Queues to send messages back and forth? In other words, something like this (schematically):

    Main Process              Child 1                       Child 2
                              LORR 1                        LORR 2 
    child1_in_queue     ->    get message 'foo'
                              call 'foo' method
    child1_out_queue    <-    return foo data string
    child2_in_queue                   ->                    get message 'bar'
                                                            call 'bar' method
    child2_out_queue                  <-                    return bar data string
    
    0 讨论(0)
  • 2020-11-22 11:06

    @DBlas gives you a quick url and reference to the Manager class in an answer, but I think its still a bit vague so I thought it might be helpful for you to just see it applied...

    import multiprocessing
    from multiprocessing import Manager
    
    ants = ['DV03', 'DV04']
    
    def getDV03CclDrivers(lib, data_dict):  
        data_dict[1] = 1
        data_dict[0] = 0
    
    def getDV04CclDrivers(lib, data_list):   
        data_list['driver'] = 0  
    
    
    if __name__ == "__main__":
    
        manager = Manager()
        dataDV03 = manager.list(['', ''])
        dataDV04 = manager.dict({'driver': '', 'status': ''})
    
        jobs = []
        if 'DV03' in ants:
            j = multiprocessing.Process(
                    target=getDV03CclDrivers, 
                    args=('LORR', dataDV03))
            jobs.append(j)
    
        if 'DV04' in ants:
            j = multiprocessing.Process(
                    target=getDV04CclDrivers, 
                    args=('LORR', dataDV04))
            jobs.append(j)
    
        for j in jobs:
            j.start()
    
        for j in jobs:
            j.join()
    
        print 'Results:\n'
        print 'DV03', dataDV03
        print 'DV04', dataDV04
    

    Because multiprocessing actually uses separate processes, you cannot simply share global variables because they will be in completely different "spaces" in memory. What you do to a global under one process will not reflect in another. Though I admit that it seems confusing since the way you see it, its all living right there in the same piece of code, so "why shouldn't those methods have access to the global"? Its harder to wrap your head around the idea that they will be running in different processes.

    The Manager class is given to act as a proxy for data structures that can shuttle info back and forth for you between processes. What you will do is create a special dict and list from a manager, pass them into your methods, and operate on them locally.

    Un-pickle-able data

    For your specialize LORR object, you might need to create something like a proxy that can represent the pickable state of the instance.

    Not super robust or tested much, but gives you the idea.

    class LORRProxy(object):
    
        def __init__(self, lorrObject=None):
            self.instance = lorrObject
    
        def __getstate__(self):
            # how to get the state data out of a lorr instance
            inst = self.instance
            state = dict(
                foo = inst.a,
                bar = inst.b,
            )
            return state
    
        def __setstate__(self, state):
            # rebuilt a lorr instance from state
            lorr = LORR.LORR()
            lorr.a = state['foo']
            lorr.b = state['bar']
            self.instance = lorr
    
    0 讨论(0)
  • 2020-11-22 11:10

    You could also use a multiprocessing Array. This allows you to have a shared state between processes and is probably the closest thing to a global variable.

    At the top of main, declare an Array. The first argument 'i' says it will be integers. The second argument gives the initial values:

    shared_dataDV03 = multiprocessing.Array ('i', (0, 0)) #a shared array
    

    Then pass this array to the process as an argument:

    j = multiprocessing.Process(target=getDV03CclDrivers, args=('LORR',shared_dataDV03))
    

    You have to receive the array argument in the function being called, and then you can modify it within the function:

    def getDV03CclDrivers(lib,arr):  # call global variable
        arr[1]=1
        arr[0]=0
    

    The array is shared with the parent, so you can print out the values at the end in the parent:

    print 'DV03', shared_dataDV03[:]
    

    And it will show the changes:

    DV03 [0, 1]
    
    0 讨论(0)
  • 2020-11-22 11:24

    When using multiprocess, the only way to pass objects between processes is to use Queue or Pipe; globals are not shared. Objects must be pickleable, so multiprocess won't help you here.

    0 讨论(0)
  • 2020-11-22 11:30

    I use p.map() to spin off a number of processes to remote servers and print the results when they come back at unpredictable times:

    Servers=[...]
    from multiprocessing import Pool
    p=Pool(len(Servers))
    p.map(DoIndividualSummary, Servers)
    

    This worked fine if DoIndividualSummary used print for the results, but the overall result was in unpredictable order, which made interpretation difficult. I tried a number of approaches to use global variables but ran into problems. Finally, I succeeded with sqlite3.

    Before p.map(), open a sqlite connection and create a table:

    import sqlite3
    conn=sqlite3.connect('servers.db') # need conn for commit and close
    db=conn.cursor()
    try: db.execute('''drop table servers''')
    except: pass
    db.execute('''CREATE TABLE servers (server text, serverdetail text, readings     text)''')
    conn.commit()
    

    Then, when returning from DoIndividualSummary(), save the results into the table:

    db.execute('''INSERT INTO servers VALUES (?,?,?)''',         (server,serverdetail,readings))
    conn.commit()
    return
    

    After the map() statement, print the results:

    db.execute('''select * from servers order by server''')
    rows=db.fetchall()
    for server,serverdetail,readings in rows: print serverdetail,readings
    

    May seem like overkill but it was simpler for me than the recommended solutions.

    0 讨论(0)
提交回复
热议问题