multiprocessing queue issue with pickle dumps

落花浮王杯 提交于 2021-02-18 18:02:22

问题


I have read and read again the Python documentation about multiprocessing module and Queues management but I cannot find anything related to this issue that turns me crazy and is blocking my project:

I wrote a 'JsonLike' class which allows me to create an object such as :

a = JsonLike()
a.john.doe.is.here = True

...without considering intermediate initialization (very useful)

The following code just creates such an object, set and insert it in a array and tries to send that to a process (this is what I need but the sending of the object itself leads to the same error)

Considering this piece of code :

from multiprocessing import Process, Queue, Event

class JsonLike(dict):
    """
    This class allows json-crossing-through creation and setting such as :
    a = JsonLike()
    a.john.doe.is.here = True
    it automatically creates all the hierarchy
    """

    def __init__(self, *args, **kwargs):
        # super(JsonLike, self).__init__(*args, **kwargs)
        dict.__init__(self, *args, **kwargs)
        for arg in args:
            if isinstance(arg, dict):
                for k, v in arg.items():
                    self[k] = v
        if kwargs:
            for k, v in kwargs.items():
                self[k] = v

    def __getattr__(self, attr):
        if self.get(attr) != None:
            return attr
        else:
            newj = JsonLike()
            self.__setattr__(attr, newj)
            return newj

    def __setattr__(self, key, value):
        self.__setitem__(key, value)

    def __setitem__(self, key, value):
        dict.__setitem__(self, key, value)
        self.__dict__.update({key: value})

    def __delattr__(self, item):
        self.__delitem__(item)

    def __delitem__(self, key):
        dict.__delitem__(self, key)
        del self.__dict__[key]


def readq(q, e):
    while True:
        obj = q.get()
        print('got')
        if e.is_set():
            break


if __name__ == '__main__':
    q = Queue()
    e = Event()

    obj = JsonLike()
    obj.toto = 1

    arr=[obj]

    proc = Process(target=readq, args=(q,e))
    proc.start()
    print(f"Before sending value :{arr}")
    q.put(arr)
    print('sending done')
    e.set()
    proc.join()
    proc.close()

I get the following output (on the q.put):

Before sending value :[{'toto': 1}]
Traceback (most recent call last):
sending done
  File "/usr/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: 'JsonLike' object is not callable

Any suggestions?


回答1:


The problem is that you are messing with __getattr__. If you add a print statement inside this method, you will see that running the following code leads to a crash too:

obj = JsonLike()
obj.toto.test = 1

q = Queue()
q.put(obj)
q.get()

This last statement will result in calling (repeatedly) obj.__getattr__, searching for an attribute named __getstate__ (it will later try to find its friend __setstate__). Here's what the pickle documentations says about this dunder method:

If the __getstate__() method is absent, the instance’s __dict__ is pickled as usual.

In your case the problem is that this method doesn't exist, but your code make it look like it does (by creating an attribute with the right name on the fly). Therefore the default behavior is not triggered, instead an empty attribute named __getstate__ is called. The problem is that __getstate__ is not a callable as it's an empty JsonLike object. This is why you may see errors like "JsonLike is not callable" pop-up here.

One quick fix is to avoid touching attributes that look like __xx__ and even _xx. To that matter you can add/modify these lines:

import re

dunder_pattern = re.compile("__.*__")
protected_pattern = re.compile("_.*")

class JsonLike(dict):

    def __getattr__(self, attr):
        if dunder_pattern.match(attr) or protected_pattern.match(attr):
            return super().__getattr__(attr)
        if self.get(attr) != None:
            return attr
        else:
            newj = JsonLike()
            self.__setattr__(attr, newj)
            return newj

Which will allow to make the previous code work (same goes for your code). But on the other hand, you won't be able to write things like obj.__toto__ = 1 anymore, but that's probably a good thing anyway.

I feel like you may end-up with similar bugs in other contexts and sadly, in some cases you will find libraries that won't use such predictable attributes names. That's one of the reasons why I wouldn't suggest to use such a mechanism IRL (even though I really like the idea and I would love to see how far this can go).



来源:https://stackoverflow.com/questions/55249537/multiprocessing-queue-issue-with-pickle-dumps

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!