问题
We are using gitlab continuous integration to buildand test our projects. Recently, one of the projects added the requirement for CUDA to enable GPU acceleration. I do not want to change our pipeline (docker and gitlab-ci are working well for us), so I'd like to somehow give docker the ability to talk to an nvidia GPU.
Additional details:
- Installing an nvidia GPU on our build servers is fine - we have some spare GPU's lying around to use for that purpose
- We are not using ubuntu or centOS, so we cannot use nvidia's cuda containers directly
- You can't supply the
--runtime
parameter to gitlab CI, so you can't use nvidia's suggested docker invocation. [ edit: actually, you can now. See https://gitlab.com/gitlab-org/gitlab-runner/merge_requests/764 ]
回答1:
There are multiple steps:
- Install the nvidia driver on the host PC
- Install nvidia-docker2
- Build a docker image with CUDA
- Get it working in gitlab CI
Note that if you only want to compile CUDA code and don't need to run it, you don't need to use nvidia-docker2, have the nvidia driver on the host PC, and there are no special steps for getting it working in gitlab CI. (ie you only have to do step 3)
I'm afraid I'm not too familiar with docker, so if I've mixed container and image I apologize. If someone with more knowledge wants to fix any typos about docker, it would be greatly appreciated.
Step 1: Install the nvidia driver on the host PC
YOu have two options here. Either you can use your host's OS's recommended procedure. This is easy, but will mean that the environment may differ across build servers. The other option is to download the installer directly from nVidia (ie https://www.nvidia.com/object/unix.html ) so that you can distribute that with your docker container.
Step 2: Install nvidia-docker2
My current test PC is archlinux, so this was a case of using it from the AUR. nVidia provides repositories for several OS's, so see the quickstart guide on the nvidia-docker github page.
You should test your nvidia-docker installation as per the quickstart guide. Running from your host PC the command:
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
should run and output something like:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.18 Driver Version: 415.18 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:02:00.0 On | N/A |
| 28% 39C P0 24W / 120W | 350MiB / 6071MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Notice that although I've specified the 9.0-base image, nvidia-smi reports Cuda 10. I think this is because Cuda 10 is installed on the host PC. The nvidia-docker documentation says that it will use cuda from the docker image, so this shouldn't be a problem.
Step 3: Build a docker image with CUDA
You should use the Nvidia dockerhub docker images directly unless you have a good reason not to. In my case, I wanted to use a docker image based on Debian, but Nvidia only provides images for Ubuntu and CentOS. Fortunately, Nvidia posts the dockerfile for their images, so you can copy the relevant part of their dockerfiles from them. I based mine on https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.2/base/Dockerfile
The magic part of the dockerfile included:
# Install cuda manually
RUN wget https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux
COPY install_cuda.exp install_cuda.exp
RUN mv cuda_* cuda_install_bin && \
chmod +x cuda_install_bin && \
expect install_cuda.exp && \
rm cuda_*
# Magic copied from nvidia's cuda9.2 dockerfile at
# https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.2/base/Dockerfile
ENV CUDA_VERSION 9.2.148
LABEL com.nvidia.volumes.needed="nvidia_driver"
LABEL com.nvidia.cuda.version="${CUDA_VERSION}"
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=9.2"
The "expect" command is will allow you to write a script to automatically accept the license agreement etc. automatically. It's probably not a good idea for me to post the install_cuda.exp
file (because I can't accept the agreement for you), but in my case I accepted the eula, agreed to install it on an unsupported OS, did not install the graphics driver, did install cuda, used the default path, installed a symlink to usr/local/cuda and did not install the samples.
For more information on expect, see the man page [online man page here].
The inspect file is mostly made up of lines like expect -- "(y)es/(n)o/(q)uit:" { send "y\r" }
You should check you can run nvidia-smi test command for nvidia-smi using your own container. (ie docker run --runtime=nvidia -it your_image_here /bin/sh
)
Step 4: Get it running inside gitlab-ci.
When researching around the web, most sources tell you that you can't supply the --runtime
flag from gitlab runner configuration. Actually, according to this merge request, you can. To do so, you have to edit /etc/gitlab-runner/config.toml
and add in runtime = "nvidia"
to the right place.
For example, my runner configuration looks like:
[[runners]]
name = "docker-runner-test"
url = "<<REDACTED>>"
token = "<<REDACTED>>"
executor = "docker"
[runners.docker]
tls_verify = false
image = "build_machine"
privileged = false
disable_cache = false
runtime = "nvidia"
volumes = ["/cache"]
pull_policy = "never"
shm_size = 0
[runners.cache]
回答2:
For the record, if anyone stumbles on this issue, since docker 19.03
contains native support for GPUs in the docker client, this method is now deprecated.
However, at the time of writing, gitlab-runner does not support yet this new API.
I've checked, and the old method still works for now, even if it's deprecated.
来源:https://stackoverflow.com/questions/53647770/how-can-i-get-use-cuda-inside-a-gitlab-ci-docker-executor