ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory

时光毁灭记忆、已成空白 提交于 2020-06-22 22:54:09

问题


I installed Tensorflow 1.6.0 - GPU version with anaconda in a Python 3.6.4 environment.

When I do import tensorflow as tf, I get the following error:

ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory

The different versions:

  • cudnn : 7.1.1
  • cuda : 9.0.176
  • tensorflow : 1.6.0
  • Ubuntu : 16.04

I am aware of this but it did not solve my problem.


回答1:


I installed the nvidia-cuda-toolkit package:

$ sudo apt install nvidia-cuda-toolkit

and it worked.

I did not find the solution nor on the tensorflow website nor on the nvidia installation page. I found it by luck while looking for a way to get the cuda version with a command line: How to get the cuda version?




回答2:


The accepted answer is wrong (installing nvidia-cuda-toolkit). By installing the toolkit you are basically installing a second CUDA on top of already installed cuda from the nvidia guide.

The problem turned out to be an issue with symbolic links. Inspiration is from this topic http://queirozf.com/entries/installing-cuda-tk-and-tensorflow-on-a-clean-ubuntu-16-04-install but the actual resolution is different

So at one point during CuDNN installation nvidia tutorial will ask you to do this:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

The problem with this approach is that copying files with filter libcudnn* will break the symbolic links of the copied files. Instead, I suggest runnign following command, but it will still break the links:

sudo cp --preserve=links cuda/lib64/libcudnn* /usr/local/cuda/lib64

You can verify the links by running ls -lha libcudnn* in /usr/local/cuda/lib64 folder. If you happen to not see a picture like this:

lrwxrwxrwx 1 root root 13 May 2 20:02 libcudnn.so -> libcudnn.so.7

lrwxrwxrwx 1 root root 17 May 2 20:02 libcudnn.so.7 -> libcudnn.so.7.6.5

-rwxr-xr-x 1 root root 409M May 2 20:02 libcudnn.so.7.6.5

-rw-r--r-- 1 root root 386M May 2 20:02 libcudnn_static.a

Then you just found the problem. The actual solution is involving doing the following:

sudo rm /usr/local/cuda/lib64/libcudnn.so
sudo rm /usr/local/cuda/lib64/libcudnn.so.7
cd /usr/local/cuda/lib64/
sudo ln -s libcudnn.so.7.6.5 libcudnn.so.7
sudo ln -s libcudnn.so.7 libcudnn.so

Remove the old "links" and create new ones. Verify the links again with ls -lha libcudnn*. After that run following command in verbose mode:

sudo ldconfig -v

CHECK the logs. I don't know exactly what it does, but it turned out that it is something very important. Also, if the log says that symbolic link is broken or something along these lines then the tensorflow will continue to show the error mentioned in the subject.

BONUS! make sure you have following paths appended as the last lines nano ~/.bashrc

export PATH=/usr/local/cuda/bin:/opt/nvidia/nsight-compute/2019.4.0${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDADIR=/usr/local/cuda${CUDADIR:+:${CUDADIR}}
export CUDA_HOME=/usr/local/cuda

and then run the command source ~/.bashrc

All of the above steps assume that you did NOT use the nvidia-cuda-toolkit, but instead used nvidia cuda repo.

Also when installing CUDA make sure you are not targeting the 10.2. On the momenent of writing TF supports versions up to Cuda 10.1, so following is the right way of installing the necessary version:

sudo apt-cache policy cuda
sudo apt-get install cuda=10.1.243-1

Verifications by:

nvcc --version
nvidia-smi

EDIT: I found the error that you should AVOID seeing after running the ldconfig command:

/usr/local/cuda-10.1/targets/x86_64-linux/lib:

...

libnppist.so.10 -> libnppist.so.10.2.0.243

libcuinj64.so.10.1 -> libcuinj64.so.10.1.243

> /sbin/ldconfig.real: /usr/local/cuda-10.1/targets/x86_64-linux/lib /libcudnn.so.7 is not a symbolic link

libcudnn.so.7 -> libcudnn.so.7.6.5

libnppc.so.10 -> libnppc.so.10.2.0.243

libnppicom.so.10 -> libnppicom.so.10.2.0.243

libnvgraph.so.10 -> libnvgraph.so.10.1.243

/usr/lib/x86_64-linux-gnu/libfakeroot:

...

If you see it, then something is still misconfigured.




回答3:


This didn't work for me, In my case it was because I had multiple versions of Cuda installed and that the cudnn version I had was for an older version than the one I was trying to use so I installed the cudnn for the new verision following nvidia's instructions and that did it for me.



来源:https://stackoverflow.com/questions/49656725/importerror-libcudnn-so-7-cannot-open-shared-object-file-no-such-file-or-dire

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!