In my quest to move all my services from docker in the host to docker inside LXD containers, I think I have got to the last battle
I’m trying to get docker Jellyfin inside a LXD container to decode using my NVIDIA card.
First, launched my container and installed Docker. Then:
lxc config device add u3 gpu gpu gputype=physical lxc stop u3 lxc config set u3 nvidia.runtime=true lxc start u3
Now I can run
nvidia-smi and see the card inside de LXD container.
Then installed nvidia-docker inside the LXD container, following NVIDIA instructions in Installation Guide — NVIDIA Cloud Native Technologies documentation
But when I try
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdo ut: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached t o the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown.
I’ve found this comment Docker with GPU-support in unprivileged LXD container - #4 by mayliszt
The solution is:
After the docker starts, map GPU buses /proc/driver/nvidia/gpus/xxxxxx from the host to the LXD container. The xxxxx pci buses should be those of the GPUs passed through from LXD to docker. The needed pci buses can be identified with nvidia-smi.
And if you
ls /proc/driver/nvidia inside the LXD container there is no gpus directory (in host there is one containing the pci buses for gpus).
Maybe this is the solution but, How to do it?
EDIT: found a comment, and tried changing no-cgroups to true in
/etc/nvidia-container-runtime/config.toml inside container, and then I get
docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: mount error: stat failed: /proc/driver/nvidia/gpus/0000:04:00.0: no such file or directory: unknown.
So, is there a way to have the gpus directory inside the LXD container?