Nvidia & gpu seems not complete

I noticed that setting gpu delegation to container via gpu and nvidia.runtime: "true" is not complete as it lacks the directory /proc/driver/nvidia/gpu mount from the host.

moreover when i added this mount via:

 nvdock1:
    path: /proc/driver/nvidia/gpus
    source: /proc/driver/nvidia/gpus
    type: disk

cuda started to work correctly but after restart of the guest it seems that this disk is attempted to be mounted before the essential part of the gpu/nvidia delegation is run, so in result it is not mounted as it should.

questions are two:

  1. why /proc/driver/nvidia/gpus is not included in nvidia.runtime?
    or
  2. how to defer mounting ‘/proc/driver/nvidia/gpus’ disk after the nvidia.runtime is started?

I don’t have nVidia card to help you, but files under /usr/share might also be missing. That’s particularly important for Vulkan. Check this topic:

Hi @RandomUser,
What do you want to achieve? Have you ever add the nvidia gpu to container? Have you checked the directory /dev on the container after adding the gpu device?
Have a look at that video, please. Nvidia video
Regards.

I think my post was clear, the directory /proc/driver/nvidia/gpu is missing from the container. I want to permanently add it to allow CUDA programs to run.

I’m not pretty sure but, may be @qkiel post suits your need.
Regards.

Most likely it’s an upstream issue. There is an issue at GPU information not mounted on /proc/driver/nvidia/gpus · Issue #105 · NVIDIA/libnvidia-container · GitHub You may also need to check whether GitHub - NVIDIA/nvidia-container-toolkit: Build and run containers leveraging NVIDIA GPUs is the proper place to report this.

The mount that you perform in Incus is a workaround.

Yeah, the /proc stuff is managed by nvidia-container (when nvidia.runtime=true).
Incus does tell what GPUs we care about though so I’m not sure why it doesn’t provide the correct entries.

You may want to play around with maybe specifying what GPU you want by PCI address as that may help the nvidia scripts figure it out better?

I have added

  gpu:
    pci: "01:00.0"
    type: gpu

and it has not changed anything, the /proc/driver/nvidia/gpu is not being mounted.

any other ideas?

or a suggestion how to prioritise mounts in incus, to defer mounting of that device after all other devices are mounted?

  nvfix:
    path: /proc/driver/nvidia/gpus
    source: /proc/driver/nvidia/gpus
    type: disk

Hi @RandomUser,
Can you post the output of the incus config show <container_name> and inside the container cat /etc/os-release
Regards.