Is there a way to pickle a scipy.interpolate.Rbf() object?

一个人想着一个人 提交于 2020-01-24 06:02:02

问题


I'm creating a radial basis function interpolation model for a rather large dataset. The main call `scipy.interpolate.Rbf(,) takes about one minute and 14 GB of RAM. Since not every machine this is supposed to run on is capable of doing this, and since the program will run on the same dataset very often, I'd like to pickle the results to a file. This is a simplified example:

import scipy.interpolate as inter
import numpy as np
import cPickle

x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
y = np.array([1,2,3,4])

rbfi = inter.Rbf(x[:,0], x[:,1], x[:,2], y)

RBFfile = open('picklefile','wb')
RBFpickler = cPickle.Pickler(RBFfile,protocol=2)
RBFpickler.dump(rbfi)
RBFfile.close()

The RBFpickler.dump() call results in a can't pickle <type 'instancemethod'> error. As I understand, that means there's a method somewhere in there (well, rbfi() is callable), and that can't be pickled for some reason I do not quite understand.

Does anyone know a way of either pickling this in some other way or saving the results of the inter.Rbf() call in some other way?

There are some arrays of shape (nd,n) and (n,n) in there (rbfi.A, rbfi.xi, rbfi.di...), which I assume store all the interesting information. I guess I could pickle just those arrays, but then I'm not sure how I could put the object together again...

Edit: Additional constraint: I'm not allowed to install additional libraries on the system. The only way I can include them is if they are pure Python and I can just include them with the script without having to compile anything.


回答1:


I'd use dill to serialize the results… or if you want to have a cached function you could use klepto to cache the function call so you'd minimize reevaluation of the function.

Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy.interpolate as inter
>>> import numpy as np
>>> import dill
>>> import klepto
>>> 
>>> x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
>>> y = np.array([1,2,3,4])
>>> 
>>> # build an on-disk archive for numpy arrays,
>>> # with a dictionary-style interface  
>>> p = klepto.archives.dir_archive(serialized=True, fast=True)
>>> # add a caching algorithm, so when threshold is hit,
>>> # memory is dumped to disk
>>> c = klepto.safe.lru_cache(cache=p)
>>> # decorate the target function with the cache
>>> c(inter.Rbf)
<function Rbf at 0x104248668>
>>> rbf = _
>>> 
>>> # 'rbf' is now cached, so all repeat calls are looked up
>>> # from disk or memory
>>> d = rbf(x[:,0], x[:,1], x[:,2], y)
>>> d
<scipy.interpolate.rbf.Rbf object at 0x1042454d0>
>>> d.A
array([[ 1.        ,  1.22905719,  2.36542472,  1.70724365],
       [ 1.22905719,  1.        ,  1.74422655,  1.37605151],
       [ 2.36542472,  1.74422655,  1.        ,  1.70724365],
       [ 1.70724365,  1.37605151,  1.70724365,  1.        ]])
>>> 

continuing…

>>> # the cache is serializing the result object behind the scenes
>>> # it also works if we directly pickle and unpickle it
>>> _d = dill.loads(dill.dumps(d))
>>> _d
<scipy.interpolate.rbf.Rbf object at 0x104245510>
>>> _d.A
array([[ 1.        ,  1.22905719,  2.36542472,  1.70724365],
       [ 1.22905719,  1.        ,  1.74422655,  1.37605151],
       [ 2.36542472,  1.74422655,  1.        ,  1.70724365],
       [ 1.70724365,  1.37605151,  1.70724365,  1.        ]])
>>>

Get klepto and dill here: https://github.com/uqfoundation




回答2:


Alright, Mike's solution seems to be a good one, but I found another in the meantime:

There are only two parts of an Rbf object that can't be pickled directly, and they are easy to recreate from scratch. Therefore my code now saves only the data parts:

import scipy.interpolate as inter
import numpy as np
import cPickle

x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
y = np.array([1,2,3,4])

rbfi = inter.Rbf(x[:,0], x[:,1], x[:,2], y)

RBFfile = open('picklefile','wb')
RBFpickler = cPickle.Pickler(RBFfile,protocol=2)

# RBF can't be pickled directly, so save everything required for reconstruction
RBFdict = {}            
for key in rbfi.__dict__.keys():
    if key != '_function' and key!= 'norm':
        RBFdict[key] = rbfi.__getattribute__(key)   

RBFpickler.dump(RBFdict)
RBFfile.close()

This gives me a file containing all the information stored in the object. rbfi._function() and rbfi.norm are not saved. Luckily, they can be recreated from scratch by just initializing any (arbitrarily simple) Rbf object:

## create a bare-bones RBF object ##
rbfi = inter.Rbf(np.array([1,2,3]), np.array([10,20,30]), \
                      np.array([1,2,3]), function = RBFdict['function'] )

This object's data parts are then replaced with the saved data:

RBFfile = open('picklefile','rb')
RBFunpickler = cPickle.Unpickler(RBFfile)
RBFdict = RBFunpickler.load()
RBFfile.close()

## replace rbfi's contents with what was saved ##
for key,value in RBFdict.iteritems():
    rbfi.__setattr__(key, value)

>>> rbfi(2,3,4)
array(1.4600661386382146)

It's apparently not even necessary to give the new Rbf object the same number of dimensions as the original one, as all of that will be overwritten.

That said, Mike's solution is probably the more universally applicable one, while this one is more platform-independent. I've had issues with moving pickled Kriging models between platforms, but this method for RBF models seems to be more robust -- I haven't tested it much yet, though, so no guarantees given.



来源:https://stackoverflow.com/questions/23997431/is-there-a-way-to-pickle-a-scipy-interpolate-rbf-object

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!