服务器重启后,输入nvidia-smi,报错如下:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
输入nvcc -V输入如下:
k8s@master:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
解决方法:
- 
sudo apt-get install dkms
- 
ll /usr/src/查看nvidia版本(最后一行的nvidia-410.48)k8s@master:~$ ll /usr/src/ 总用量 36 drwxr-xr-x 9 root root 4096 Dec 14 06:40 ./ drwxr-xr-x 12 root root 4096 Dec 27 15:46 ../ drwxr-xr-x 27 root root 4096 Feb 26 2019 linux-headers-4.15.0-45/ drwxr-xr-x 8 root root 4096 Feb 26 2019 linux-headers-4.15.0-45-generic/ drwxr-xr-x 27 root root 4096 Apr 3 2019 linux-headers-4.15.0-47/ drwxr-xr-x 8 root root 4096 Apr 3 2019 linux-headers-4.15.0-47-generic/ drwxr-xr-x 25 root root 4096 Dec 13 06:15 linux-headers-4.15.0-72/ drwxr-xr-x 8 root root 4096 Dec 13 06:15 linux-headers-4.15.0-72-generic/ drwxr-xr-x 7 root root 4096 Feb 26 2019 nvidia-410.48/
- 
sudo dkms install -m nvidia -v 410.48(-v后面的参数根据自己的nvidia的版本决定)
- 
到此,该问题已解决输入 nvidia-smi即可得到如下输出:
k8s@master:~$ nvidia-smi
Sun Jan  5 21:10:18 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:0B:00.0 Off |                    0 |
| N/A   33C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:0C:00.0 Off |                    0 |
| N/A   25C    P8    30W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:8A:00.0 Off |                    0 |
| N/A   30C    P8    25W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:8B:00.0 Off |                    0 |
| N/A   25C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
来源:CSDN
作者:Urmsone
链接:https://blog.csdn.net/Urms_handsomeyu/article/details/103847493