How to cache in IPython Notebook?

允我心安 提交于 2019-12-04 16:23:57

问题


Environment:

  • Python 3
  • IPython 3.2

Every time I shut down a IPython notebook and re-open it, I have to re-run all the cells. But some cells involve intensive computation.

By contrast, knitr in R save the results in a cache directory by default so only new code and new settings would invoke computation.

I looked at ipycache but it seems to cache a cell instead of the notebook. Is there a counterpart of cache of knitr in IPython?


回答1:


Can you give an example of what you are trying to do? When I run something in an IPython Notebook that is expensive I almost always write it to disk afterword. For example, if my data is a list of JSON object, I write it to disk as line separated JSON formatted strings:

with open('path_to_file.json', 'a') as file:
    for item in data: 
        line = json.dumps(item)
        file.write(line + '\n')

You can then read back in the data the same way:

data = []
with open('path_to_file.json', 'a') as file:
    for line in file: 
        data_item = json.loads(line)
        data.append(data_item)

I think this is a good practice generally speaking because it provides you a backup. You can also use pickle for the same thing. If your data is really big you can actually gzip.open to directly write to a zip file.

EDIT

To save a scikit learn model to disk use joblib.pickle.

from sklearn.cluster import KMeans

km = KMeans(n_clusters=num_clusters)
km.fit(some_data)


from sklearn.externals import joblib
# dump to pickle
joblib.dump(km, 'model.pkl')

# and reload from pickle
km = joblib.load('model.pkl')



回答2:


Unfortunately, it doesn't seem like there is something as convenient as an automatic cache. The %store magic option is close, but requires you to do the caching and reloading manually and explicitly.

In your Jupyter notebook:

a = 1
%store a

Now, let's say you close the notebook and the kernel gets restarted. You no longer have access to the local variables. However, you can reload the variables you've stored using the -r option.

%store -r a
print a # Should print 1



回答3:


In fact the functionality you ask is already there, no need to re-implement it manually by doing your dumps .

You can use the use the %store or maybe better the %%cache magic (extension) to store the results of these intermittently cells, so they don't have to be recomputed (see https://github.com/rossant/ipycache)

It is as simple as:

%load_ext ipycache

Then, in a cell e.g.:

%%cache mycache.pkl var1 var2
var1 = 1
var2 = 2

When you execute this cell the first time, the code is executed, and the variables var1 and var2 are saved in mycache.pkl in the current directory along with the outputs. Rich display outputs are only saved if you use the development version of IPython. When you execute this cell again, the code is skipped, the variables are loaded from the file and injected into the namespace, and the outputs are restored in the notebook.

It saves all graphics, output produced, and all the variables specified automatically for you :)




回答4:


Use the cache magic.

%cache myVar = someSlowCalculation(some, "parameters")

This will calculate someSlowCalculation(some, "parameters") once. And in subsequent calls it restores myVar from storage.

https://pypi.org/project/ipython-cache/

Under the hood it does pretty much the same as the accepted answer.



来源:https://stackoverflow.com/questions/31255894/how-to-cache-in-ipython-notebook

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!