No module named numpy_pickle when executing script under a different user

问题

I have a python script that uses sklearn joblib to load a persistent model and perform prediction. The script runs fine when I run it under my username and when some other user tries to run the same script they get the error "ImportError: No module named numpy_pickle"

I also copied the script to the other user home directory and run it from there and still same error and I also ran it from python shell and nothing changed. Here is what I run in the Python shell:

from sklearn.externals import joblib
joblib.load("model_filename.pkl")

The second line above works under my username and gives the following error under all other users:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/joblib/numpy_pickle.py", line 424, in load
    obj = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/usr/lib/python2.7/pickle.py", line 1124, in find_class
    __import__(module)
ImportError: No module named numpy_pickle

This is all running one a server with Ubuntu 14.04.1 LTS.

Any ideas why this is happening?

Thank you

回答1:

As suggested by Croad Langshan make sure you don't have a joblib version conflict/mismatch - I had exactly the same problem. The binary file was created with sklearn.externals.joblib however I was using a stand-alone joblib that I installed from the offical debian repository, this in combination with stock debian sklearn resulted in an un-unpickable binary store.

So check if you have python-joblib installed as a stand-alone package, if you do - remove it, remove sklearn and re-install sklearn from source

$ sudo apt-get remove python-joblib
$ sudo apt-get remove python-sklearn

install sklearn from source

$ git clone https://github.com/scikit-learn/scikit-learn.git
$ sudo python setup.py install

*Note - a situation where the conflict is reversed is possible (original binary created with stand-alone joblib)

*A more granular solution to resolving module version conflicts/mismatch is use virtualenv, but in my case I had no incentive to keep the stand-alone joblib

回答2:

The load function is using the Python standard library module pickle "under the hood". That module provides a way to dump arbitrary python objects to a file. "Unpickling" that file again to load the python objects from the file back into memory requires the Python files that define the modules in which the objects' classes are defined (the same goes for functions). The directories containing those modules need to be on sys.path (say by means of being listed in environment variable PYTHONPATH).

Perhaps the pickle in question has a reference to code in module numpy_pickle (as opposed to joblib.numpy_pickle), and perhaps that is not on sys.path (even if joblib itself is). Try (before the import) running import cgitb; cgitb.enable() to see the value of module in the last stack frame.

回答3:

I had this same problem. I pickled a model with one user and couldn't unpickle it with a second user. The above answers didn't really help me. I believe it has something to do with the local variables saved in the pickled file and the path of the second user.

The module was trying to be loaded as:

__import__('joblib.numpy_pickle')

which results in

ImportError: No module named joblib.numpy_pickle

but if you run

__import__('sklearn.externals.joblib.numpy_pickle')

it can find it and returns

<module 'sklearn' from '/python2.6/site-packages/sklearn/__init__.pyc'>

So I'm assuming the second user is trying to load the file and there are some settings in the file telling it to look in joblib.numpy_pickle and while ignoring the previously imported sklearn.externals. I didn't know how to fix this, so instead i just trained the model again with the second user and saved it. Now the second user can read the file it created.

来源：https://stackoverflow.com/questions/28797769/no-module-named-numpy-pickle-when-executing-script-under-a-different-user

标签

python

scikit-learn

joblib