Incus, docker and NVIDIA GPU

Hello. I’m on my way changing from LXD to Incus. I’ve posted this in LXD forum, but as I’m changing, I think I can ask here too.

This is an old question. I’ve been looking around in there and in Linux Containers forum and the problem exists as per last July.

The thing is, I cannot run a docker container using the gpu inside an LXC container. The container has gpu accesible, you can run nvidia-smi and get the correct response and if you install the application directly it works and uses the gpu. But if you try to launch its docker container using the gpu resource it just can’t start. See
https://discuss.linuxcontainers.org/t/how-to-build-nvidia-docker-inside-lxd-lxc-container/17582/10
https://discuss.linuxcontainers.org/t/gpu-in-a-docker-instance/15085
https://discuss.linuxcontainers.org/t/docker-with-gpu-support-in-unprivileged-lxd-container/5783
and https://discuss.linuxcontainers.org/search?q=docker%20gpu

I understand that it would be fixed if I can use a privileged LXC with nvidia capabilities enabled, but it does not seem possible.

Righ now I’m using snap LXD 5.15, ubuntu 22.04 container and last docker package installed in the container.

Any chance to get a docker container using gpu inside a LXC?

I do not know whether the LXD snap package adds some extra level of indirection.

I suggest to try with Incus and document the steps you take (I mean, post them here).
Show the steps like this person here, GPU in a docker instance

What you are facing is not that the GPU is not accessible from Docker in a container, but rather that the specific NVidia Docker application container has certain, perhaps unrelated, requirements that are not met. Hence, this specific NVidia Docker application container does not start.

I had a similar issue when I was trying to run Telegram in a GUI container. The application wanted access to the console, otherwise it would crash.

1 Like

So the weird thing here is that NVIDIA contributed both the LXC and Docker integrations.

But they made it so that the LXC integration only works with unprivileged containers whereas the Docker integration only works with privileged containers, so that’s how we end up with this weird mess.

The only workaround I’m aware of is to not use nvidia.runtime on the Incus side but instead go through the annoying process of installing all the NVIDIA packages directly in the Incus container, at which point, that container can be privileged and the Docker support should work as expected.

1 Like

Thank you @simos and @stgraber . Waiting for the Fedora package compatible with EL9.

So, to make it work, I have to

incus config device add u31 gpu gpu id=0

And not set (unset in my case)

  nvidia.driver.capabilities: all
  nvidia.runtime: "true"

And then install NVIDIA driver and container runtime into the container.

Is this correct?

Yep, that’s right