How to pickle a python function with its dependencies?

前端 未结 5 1115
我寻月下人不归
我寻月下人不归 2020-12-02 20:55

As a follow up to this question: Is there an easy way to pickle a python function (or otherwise serialize its code)?

I would like to see an example of this bullet fr

5条回答
  •  醉梦人生
    2020-12-02 21:57

    Updated Sep 2020: See the comment by @ogrisel below. The developers of PiCloud moved to Dropbox shortly after I wrote the original version of this answer in 2013, though a lot of folks are still using the cloudpickle module seven years later. The module made its way to Apache Spark, where it has continued to be maintained and improved. I'm updating the example and background text below accordingly.

    Cloudpickle

    The cloudpickle package is able to pickle a function, method, class, or even a lambda, as well as any dependencies. To try it out, just pip install cloudpickle and then:

    import cloudpickle
    
    def foo(x):
        return x*3
    
    def bar(z):
        return foo(z)+1
    
    x = cloudpickle.dumps(bar)
    del foo
    del bar
    
    import pickle
    
    f = pickle.loads(x)
    print(f(3))  # displays "10"
    
    

    In other words, just call cloudpickle.dump() or cloudpickle.dumps() the same way you'd use pickle.*, then later use the native pickle.load() or pickle.loads() to thaw.

    Background

    PiCcloud.com released the cloud python package under the LGPL, and other open-source projects quickly started using it (google for cloudpickle.py to see a few). The folks at picloud.com had an incentive to put the effort into making general-purpose code pickling work -- their whole business was built around it. The idea was that if you had cpu_intensive_function() and wanted to run it on Amazon's EC2 grid, you just replaced:

    cpu_intensive_function(some, args) 
    

    with:

    cloud.call(cpu_intensive_function, some, args)
    

    The latter used cloudpickle to pickle up any dependent code and data, shipped it to EC2, ran it, and returned the results to you when you called cloud.result().

    Picloud billed in millisecond increments, it was cheap as heck, and I used it all the time for Monte Carlo simulations and financial time series analysis, when I needed hundreds of CPU cores for just a few seconds each. Years later, I still can't say enough good things about it and I didn't even work there.

提交回复
热议问题