ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

前端 未结 12 1362
灰色年华
灰色年华 2020-12-08 10:27

I have installed Cuda 10.1 and cudnn on Ubuntu 18.04 and it seems to be installed properly as type nvcc and nvidia-smi, I get proper response:

user:~$ nvcc -V         


        
相关标签:
12条回答
  • 2020-12-08 10:35

    This error occurs when the version of cuda and tensorflow installed are not compatible. I encountered a similar ImportError while running tensorflow version 1.13.0 with cuda 9. Since I had installed tensorflow on a virtual environment with pip, I just uninstalled tensorflow 1.13.0 and installed tensorflow 1.12.0 as follow;

        pip uninstall tensorflow-gpu tensorflow-estimator tensorboard
        pip install tensorflow-gpu==1.12.0
    

    Everything now works.

    0 讨论(0)
  • 2020-12-08 10:36

    I had the correct version of CUDA and tensorflow-gpu==1.14.0 installed on my conda environment, but somehow I was still getting this error message. This post helped me to finally solve it.

    I had previously installed tensorflow-gpu via pip - after creating a new environment and installing tensorflow-gpu via conda solved my problem.

    conda install -c anaconda tensorflow-gpu=1.14.0
    
    0 讨论(0)
  • 2020-12-08 10:37

    I downloaded cuda 10.0 from the following link CUDA 10.0

    Then I installed it using the following commands:

    sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
    sudo apt-get update
    sudo apt-get install cuda-10-0
    

    I then installed cudnn v7.5.0 for CUDA 10.0 by going to link CUDNN download and you need to logon using an account.

    and after choosing the correct version I downloaded via link CUDNN power link after that I added the include and lib files for cudnn as follows:

    sudo cp -P cuda/targets/ppc64le-linux/include/cudnn.h /usr/local/cuda-10.0/include/
    sudo cp -P cuda/targets/ppc64le-linux/lib/libcudnn* /usr/local/cuda-10.0/lib64/
    sudo chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*
    

    After modified the .bashrc for lib and path of cuda 10.0, if you do not have it you need to add them into .bashrc

    export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    

    And after all these steps, I managed to import tensorflow in python3 successfully.

    0 讨论(0)
  • 2020-12-08 10:37

    CUDA 10.1 (installed as per tensorflow docs) throws can't find libcublas.so.10.0 errors. The libs exist in /usr/local/cuda-10.1/targets/x86_64-linux/lib/ but are misnamed.

    There was another (lost) stackoverflow post saying this was a pinned dependency issue with the package and could be fixed with an extra cli flag to apt. This didn't seem to fix the issue for me.

    Tested workaround is to modify instructions to downgrade to CUDA 10.0

    # Uninstall packages from tensorflow installation instructions 
    sudo apt-get remove cuda-10-1 \
        libcudnn7 \
        libcudnn7-dev \
        libnvinfer6 \
        libnvinfer-dev \
        libnvinfer-plugin6
    
    # WORKS: Downgrade to CUDA-10.0
    sudo apt-get install -y --no-install-recommends \
        cuda-10-0 \
        libcudnn7=7.6.4.38-1+cuda10.0  \
        libcudnn7-dev=7.6.4.38-1+cuda10.0;
    sudo apt-get install -y --no-install-recommends \
        libnvinfer6=6.0.1-1+cuda10.0 \
        libnvinfer-dev=6.0.1-1+cuda10.0 \
        libnvinfer-plugin6=6.0.1-1+cuda10.0;
    

    Upgrading to CUDA-10.2 also seems to suffer from the same problem

    # BROKEN: Upgrade to CUDA-10.2 
    # use `apt show -a libcudnn7 libnvinfer7` to find 10.2 compatable version numbers
    sudo apt-get install -y --no-install-recommends \
        cuda-10-2 \
        libcudnn7=7.6.5.32-1+cuda10.2  \
        libcudnn7-dev=7.6.5.32-1+cuda10.2;
    sudo apt-get install -y --no-install-recommends \
        libnvinfer7=7.0.0-1+cuda10.2 \
        libnvinfer-dev=7.0.0-1+cuda10.2 \
        libnvinfer-plugin7=7.0.0-1+cuda10.2;
    

    Test GPU Visibility in Python

    python3
    >>> import tensorflow as tf
    >>> tf.test.is_gpu_available()
    

    FutureWarnings on tensorflow import

    https://github.com/tensorflow/tensorflow/issues/30427

    two solutions:

    • pip3 install tf-nightly-gpu
    • pip3 install "numpy<1.17"

    Update:

    You also need the correct tensorflow version to match with your CUDA version

    Tensorflow / CUDA version combinations:

    • Tensorflow v2.x does not support CUDA 9 (Ubuntu 18.4 default)
    • Tensorflow v2.1.0 works with CUDA 10.1
    • Tensorflow v2.0.0 works with CUDA 10.0

    See for the full list: https://www.tensorflow.org/install/source#tested_build_configurations

    You may potentually need to reinstall tensorflow with a named version matching your CUDA

    pip uninstall tensorflow tensorflow-gpu
    pip install tensorflow==2.1.0 tensorflow-gpu==2.1.0
    

    Then add CUDA to $PATH and $LD_LIBRARY_PATH in ~/.bashrc

    ~/.bashrc

    # CUDA Environment Setup: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup
    for CUDA_BIN_DIR in `find /usr/local/cuda-*/bin   -maxdepth 0`; do export PATH="$PATH:$CUDA_BIN_DIR"; done;
    for CUDA_LIB_DIR in `find /usr/local/cuda-*/lib64 -maxdepth 0`; do export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}$CUDA_LIB_DIR"; done;
    
    export            PATH=`echo $PATH            | tr ':' '\n' | awk '!x[$0]++' | tr '\n' ':' | sed 's/:$//g'` # Deduplicate $PATH
    export LD_LIBRARY_PATH=`echo $LD_LIBRARY_PATH | tr ':' '\n' | awk '!x[$0]++' | tr '\n' ':' | sed 's/:$//g'` # Deduplicate $LD_LIBRARY_PATH
    
    0 讨论(0)
  • 2020-12-08 10:38

    As CalderBot mentioned you can do this as well

    sudo cp -r /usr/local/cuda-10.2/lib64/libcu* /usr/local/cuda-10.1/lib64/

    0 讨论(0)
  • 2020-12-08 10:44

    Amin,

    I'm getting the same error when I try to run imagenet tutorial from tensorflow models package -- https://github.com/tensorflow/models/tree/master/tutorials/image/imagenet

     python3 classify_image.py
     ...
     2019-07-21 22:29:58.367858: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
     2019-07-21 22:29:58.367982: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
     2019-07-21 22:29:58.368112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
     2019-07-21 22:29:58.368234: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
     2019-07-21 22:29:58.368369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
     2019-07-21 22:29:58.368498: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
     2019-07-21 22:29:58.374333: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
    

    I think there's a version incompatibility somewhere and likely tensorflow, still relies on the old version of binaries provided by cuda libraries. Going to the place where binaries are stored and creating a link that's named 10.0 but either targets 10.1 or the default version of the library, seems to solve the problem for me.

     # cd /usr/lib/x86_64-linux-gnu
     # ln -s libcudart.so.10.1 libcudart.so.10.0
     # ln -s libcublas.so libcublas.so.10.0
     # ln -s libcufft.so libcufft.so.10.0
     # ln -s libcurand.so libcurand.so.10.0
     # ln -s libcusolver.so libcusolver.so.10.0
     # ln -s libcusparse.so libcusparse.so.10.0
    

    Now I'm able to run tutorial successfully

     2019-07-24 21:43:21.172908: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
     2019-07-24 21:43:21.174653: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
     2019-07-24 21:43:21.175826: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
     2019-07-24 21:43:21.182305: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
     2019-07-24 21:43:21.183970: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
     2019-07-24 21:43:21.206796: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
     2019-07-24 21:43:21.210685: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
     2019-07-24 21:43:21.212694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
     2019-07-24 21:43:21.213060: I tensorflow/core/platform/cpu_feature_guard.cc:142]      
     Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
     2019-07-24 21:43:21.238541: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3214745000 Hz
     2019-07-24 21:43:21.240096: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557e2b682ce0 executing computations on platform Host. Devices:
     2019-07-24 21:43:21.240162: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
     2019-07-24 21:43:21.355158: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557e2b652000 executing computations on platform CUDA. Devices:
     2019-07-24 21:43:21.355234: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1
     2019-07-24 21:43:21.357074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
     name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7715
     pciBusID: 0000:01:00.0
     2019-07-24 21:43:21.357151: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
     2019-07-24 21:43:21.357207: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
     2019-07-24 21:43:21.357245: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
     2019-07-24 21:43:21.357283: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
     2019-07-24 21:43:21.357321: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
     2019-07-24 21:43:21.357358: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
     2019-07-24 21:43:21.357395: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
     2019-07-24 21:43:21.360449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
     2019-07-24 21:43:21.380616: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
     2019-07-24 21:43:21.385223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
     2019-07-24 21:43:21.385272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
     2019-07-24 21:43:21.385299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
     2019-07-24 21:43:21.388647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5250 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
     2019-07-24 21:43:32.001598: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
     2019-07-24 21:43:32.532105: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
     W0724 21:43:34.981204 140284114071872 deprecation_wrapper.py:119] From classify_image.py:85: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
    
    0 讨论(0)
提交回复
热议问题