How can I get use cuda inside a gitlab-ci docker executor

问题

We are using gitlab continuous integration to buildand test our projects. Recently, one of the projects added the requirement for CUDA to enable GPU acceleration. I do not want to change our pipeline (docker and gitlab-ci are working well for us), so I'd like to somehow give docker the ability to talk to an nvidia GPU.

Additional details:

Installing an nvidia GPU on our build servers is fine - we have some spare GPU's lying around to use for that purpose
We are not using ubuntu or centOS, so we cannot use nvidia's cuda containers directly
You can't supply the --runtime parameter to gitlab CI, so you can't use nvidia's suggested docker invocation. [ edit: actually, you can now. See https://gitlab.com/gitlab-org/gitlab-runner/merge_requests/764 ]

回答1:

There are multiple steps:

Install the nvidia driver on the host PC
Install nvidia-docker2
Build a docker image with CUDA
Get it working in gitlab CI

Note that if you only want to compile CUDA code and don't need to run it, you don't need to use nvidia-docker2, have the nvidia driver on the host PC, and there are no special steps for getting it working in gitlab CI. (ie you only have to do step 3)

I'm afraid I'm not too familiar with docker, so if I've mixed container and image I apologize. If someone with more knowledge wants to fix any typos about docker, it would be greatly appreciated.

Step 1: Install the nvidia driver on the host PC

YOu have two options here. Either you can use your host's OS's recommended procedure. This is easy, but will mean that the environment may differ across build servers. The other option is to download the installer directly from nVidia (ie https://www.nvidia.com/object/unix.html ) so that you can distribute that with your docker container.

Step 2: Install nvidia-docker2

My current test PC is archlinux, so this was a case of using it from the AUR. nVidia provides repositories for several OS's, so see the quickstart guide on the nvidia-docker github page.

You should test your nvidia-docker installation as per the quickstart guide. Running from your host PC the command: docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi should run and output something like:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.18       Driver Version: 415.18       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:02:00.0  On |                  N/A |
| 28%   39C    P0    24W / 120W |    350MiB /  6071MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Notice that although I've specified the 9.0-base image, nvidia-smi reports Cuda 10. I think this is because Cuda 10 is installed on the host PC. The nvidia-docker documentation says that it will use cuda from the docker image, so this shouldn't be a problem.

Step 3: Build a docker image with CUDA

You should use the Nvidia dockerhub docker images directly unless you have a good reason not to. In my case, I wanted to use a docker image based on Debian, but Nvidia only provides images for Ubuntu and CentOS. Fortunately, Nvidia posts the dockerfile for their images, so you can copy the relevant part of their dockerfiles from them. I based mine on https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.2/base/Dockerfile

The magic part of the dockerfile included:

# Install cuda manually
RUN wget https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux

COPY install_cuda.exp install_cuda.exp
RUN mv cuda_* cuda_install_bin && \
    chmod +x cuda_install_bin && \
    expect install_cuda.exp && \
    rm cuda_*

# Magic copied from nvidia's cuda9.2 dockerfile at
# https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.2/base/Dockerfile
ENV CUDA_VERSION 9.2.148


LABEL com.nvidia.volumes.needed="nvidia_driver"
LABEL com.nvidia.cuda.version="${CUDA_VERSION}"

RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
    echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=9.2"

The "expect" command is will allow you to write a script to automatically accept the license agreement etc. automatically. It's probably not a good idea for me to post the install_cuda.exp file (because I can't accept the agreement for you), but in my case I accepted the eula, agreed to install it on an unsupported OS, did not install the graphics driver, did install cuda, used the default path, installed a symlink to usr/local/cuda and did not install the samples. For more information on expect, see the man page [online man page here]. The inspect file is mostly made up of lines like expect -- "(y)es/(n)o/(q)uit:" { send "y\r" }

You should check you can run nvidia-smi test command for nvidia-smi using your own container. (ie docker run --runtime=nvidia -it your_image_here /bin/sh)

Step 4: Get it running inside gitlab-ci.

When researching around the web, most sources tell you that you can't supply the --runtime flag from gitlab runner configuration. Actually, according to this merge request, you can. To do so, you have to edit /etc/gitlab-runner/config.toml and add in runtime = "nvidia" to the right place. For example, my runner configuration looks like:

[[runners]]
  name = "docker-runner-test"
  url = "<<REDACTED>>"
  token = "<<REDACTED>>"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "build_machine"
    privileged = false
    disable_cache = false
    runtime = "nvidia"
    volumes = ["/cache"]
    pull_policy = "never"
    shm_size = 0
  [runners.cache]

回答2:

For the record, if anyone stumbles on this issue, since docker 19.03 contains native support for GPUs in the docker client, this method is now deprecated.

However, at the time of writing, gitlab-runner does not support yet this new API.

I've checked, and the old method still works for now, even if it's deprecated.

来源：https://stackoverflow.com/questions/53647770/how-can-i-get-use-cuda-inside-a-gitlab-ci-docker-executor

标签

Docker

cuda

gitlab