Docker inside LXD container. NVIDIA pass throught

In my quest to move all my services from docker in the host to docker inside LXD containers, I think I have got to the last battle :wink:

I’m trying to get docker Jellyfin inside a LXD container to decode using my NVIDIA card.

First, launched my container and installed Docker. Then:

lxc config device add u3 gpu gpu gputype=physical
lxc stop u3
lxc config set u3 nvidia.runtime=true
lxc start u3

Now I can run nvidia-smi and see the card inside de LXD container.

Then installed nvidia-docker inside the LXD container, following NVIDIA instructions in https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian

But when I try
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
I get

docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to
start container process: error during container init: error running hook #0: error running hook: exit status 1, stdo
ut: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached t
o the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown.

I’ve found this comment Docker with GPU-support in unprivileged LXD container - #4 by mayliszt

The solution is:

After the docker starts, map GPU buses /proc/driver/nvidia/gpus/xxxxxx from the host to the LXD container. The xxxxx pci buses should be those of the GPUs passed through from LXD to docker. The needed pci buses can be identified with nvidia-smi.

And if you ls /proc/driver/nvidia inside the LXD container there is no gpus directory (in host there is one containing the pci buses for gpus).

Maybe this is the solution but, How to do it?

EDIT: found a comment, and tried changing no-cgroups to true in /etc/nvidia-container-runtime/config.toml inside container, and then I get

docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: ,
stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: stat failed: /proc/driver/nvidia/gpus/0000:04:00.0: no such file or directory: unknown.

So, is there a way to have the gpus directory inside the LXD container?

no need to mount device under /proc/driver/nvidia folder, I wonder if they are generated by host nvidia driver.
I found a solution:cuda - Using GPU from a docker container? - Stack Overflow
and the following command works for me:

docker pull pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel

docker run --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm -it <container id>

apt update
apt install kmod -y
sh NVIDIA-Linux-x86_64-*.run --no-kernel-module
nvidia-smi
1 Like