I\'ve created a subclass of numpy ndarray following the numpy documentation. In particular, I have added a custom attribute by modifying the code provided.
I\'m man
I'm the dill (and pathos) author. dill was pickling a numpy.array before numpy could do it itself. @dano's explanation is pretty accurate. Me personally, I'd just use dill and let it do the job for you. With dill, you don't need __reduce__, as dill has several ways that it grabs subclassed attributes… one of which is storing the __dict__ for any class object. pickle doesn't do this, b/c it usually works with classes by name reference and not storing the class object itself… so you have to work with __reduce__ to make pickle work for you. No need, in most cases, with dill.
>>> import numpy as np
>>>
>>> class RealisticInfoArray(np.ndarray):
... def __new__(cls, input_array, info=None):
... # Input array is an already formed ndarray instance
... # We first cast to be our class type
... obj = np.asarray(input_array).view(cls)
... # add the new attribute to the created instance
... obj.info = info
... # Finally, we must return the newly created object:
... return obj
... def __array_finalize__(self, obj):
... # see InfoArray.__array_finalize__ for comments
... if obj is None: return
... self.info = getattr(obj, 'info', None)
...
>>> import dill as pickle
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> print obj.info # 'foo'
foo
>>>
>>> pickle_str = pickle.dumps(obj)
>>> new_obj = pickle.loads(pickle_str)
>>> print new_obj.info
foo
dill can extend itself into pickle (essentially by copy_reg everything it knows), so you can then use all dill types in anything that uses pickle. Now, if you are going to use multiprocessing, you are a bit screwed, since it uses cPickle. There is, however, the pathos fork of multiprocessing (called pathos.multiprocessing), which basically the only change is it uses dill instead of cPickle… and thus can serialize a heck of a lot more in a Pool.map. I think (currently) if you want to work with your subclass of a numpy.array in multiprocessing (or pathos.multiprocessing), you might have to do something like @dano suggests -- but not sure, as I didn't think of a good case off the top of my head to test your subclass.
If you are interested, get pathos here: https://github.com/uqfoundation