I'm trying to write a class for a read-only object which will not be really copied with the copy
module, and when it will be pickled to be transferred between processes each process will maintain no more than one copy of it, no matter how many times it will be passed around as a "new" object. Is there already something like that?
问题:
回答1:
I made an attempt to implement this. @Alex Martelli and anyone else, please give me comments/improvements. I think this will eventually end up on GitHub.
""" todo: need to lock library to avoid thread trouble? todo: need to raise an exception if we're getting pickled with an old protocol? todo: make it polite to other classes that use __new__. Therefore, should probably work not only when there is only one item in the *args passed to new. """ import uuid import weakref library = weakref.WeakValueDictionary() class UuidToken(object): def __init__(self, uuid): self.uuid = uuid class PersistentReadOnlyObject(object): def __new__(cls, *args, **kwargs): if len(args)==1 and len(kwargs)==0 and isinstance(args[0], UuidToken): received_uuid = args[0].uuid else: received_uuid = None if received_uuid: # This section is for when we are called at unpickling time thing = library.pop(received_uuid, None) if thing: thing._PersistentReadOnlyObject__skip_setstate = True return thing else: # This object does not exist in our library yet; Let's add it new_args = args[1:] thing = super(PersistentReadOnlyObject, cls).__new__(cls, *new_args, **kwargs) thing._PersistentReadOnlyObject__uuid = received_uuid library[received_uuid] = thing return thing else: # This section is for when we are called at normal creation time thing = super(PersistentReadOnlyObject, cls).__new__(cls, *args, **kwargs) new_uuid = uuid.uuid4() thing._PersistentReadOnlyObject__uuid = new_uuid library[new_uuid] = thing return thing def __getstate__(self): my_dict = dict(self.__dict__) del my_dict["_PersistentReadOnlyObject__uuid"] return my_dict def __getnewargs__(self): return (UuidToken(self._PersistentReadOnlyObject__uuid),) def __setstate__(self, state): if self.__dict__.pop("_PersistentReadOnlyObject__skip_setstate", None): return else: self.__dict__.update(state) def __deepcopy__(self, memo): return self def __copy__(self): return self # -------------------------------------------------------------- """ From here on it's just testing stuff; will be moved to another file. """ def play_around(queue, thing): import copy queue.put((thing, copy.deepcopy(thing),)) class Booboo(PersistentReadOnlyObject): def __init__(self): self.number = random.random() if __name__ == "__main__": import multiprocessing import random import copy def same(a, b): return (a is b) and (a == b) and (id(a) == id(b)) and \ (a.number == b.number) a = Booboo() b = copy.copy(a) c = copy.deepcopy(a) assert same(a, b) and same(b, c) my_queue = multiprocessing.Queue() process = multiprocessing.Process(target = play_around, args=(my_queue, a,)) process.start() process.join() things = my_queue.get() for thing in things: assert same(thing, a) and same(thing, b) and same(thing, c) print("all cool!")
回答2:
I don't know of any such functionality already implemented. The interesting problem is as follows, and needs precise specs as to what's to happen in this case...:
- process A makes the obj and sends it to B which unpickles it, so far so good
- A makes change X to the obj, meanwhile B makes change Y to ITS copy of the obj
- now either process sends its obj to the other, which unpickles it: what changes to the object need to be visible at this time in each process? does it matter whether A's sending to B or vice versa, i.e. does A "own" the object? or what?
If you don't care, say because only A OWNS the obj -- only A is ever allowed to make changes and send the obj to others, others can't and won't change -- then the problems boil down to identifying obj uniquely -- a GUID will do. The class can maintain a class attribute dict mapping GUIDs to existing instances (probably as a weak-value dict to avoid keeping instances needlessly alive, but that's a side issue) and ensure the existing instance is returned when appropriate.
But if changes need to be synchronized to any finer granularity, then suddenly it's a REALLY difficult problem of distributed computing and the specs of what happens in what cases really need to be nailed down with the utmost care (and more paranoia than is present in most of us -- distributed programming is VERY tricky unless a few simple and provably correct patterns and idioms are followed fanatically!-).
If you can nail down the specs for us, I can offer a sketch of how I would go about trying to meet them. But I won't presume to guess the specs on your behalf;-).
Edit: the OP has clarified, and it seems all he needs is a better understanding of how to control __new__
. That's easy: see __getnewargs__
-- you'll need a new-style class and pickling with protocol 2 or better (but those are advisable anyway for other reasons!-), then __getnewargs__
in an existing object can simply return the object's GUID (which __new__
must receive as an optional parameter). So __new__
can check if the GUID is present in the class's memo
[[weakvalue;-)]]dict (and if so return the corresponding object value) -- if not (or if the GUID is not passed, implying it's not an unpickling, so a fresh GUID must be generated), then make a truly-new object (setting its GUID;-) and also record it in the class-level memo
.
BTW, to make GUIDs, consider using the uuid module in the standard library.
回答3:
you could use simply a dictionnary with the key and the values the same in the receiver. And to avoid a memory leak use a WeakKeyDictionary.