Nvidia & gpu seems not complete

RandomUser · February 12, 2024, 9:34am

I noticed that setting gpu delegation to container via gpu and nvidia.runtime: "true" is not complete as it lacks the directory /proc/driver/nvidia/gpu mount from the host.

moreover when i added this mount via:

 nvdock1:
    path: /proc/driver/nvidia/gpus
    source: /proc/driver/nvidia/gpus
    type: disk

cuda started to work correctly but after restart of the guest it seems that this disk is attempted to be mounted before the essential part of the gpu/nvidia delegation is run, so in result it is not mounted as it should.

questions are two:

why /proc/driver/nvidia/gpus is not included in nvidia.runtime?
or
how to defer mounting ‘/proc/driver/nvidia/gpus’ disk after the nvidia.runtime is started?

qkiel · February 12, 2024, 5:27pm

I don’t have nVidia card to help you, but files under /usr/share might also be missing. That’s particularly important for Vulkan. Check this topic:

cemzafer · February 13, 2024, 7:28am

Hi @RandomUser,
What do you want to achieve? Have you ever add the nvidia gpu to container? Have you checked the directory /dev on the container after adding the gpu device?
Have a look at that video, please. Nvidia video
Regards.

RandomUser · February 13, 2024, 8:10am

I think my post was clear, the directory /proc/driver/nvidia/gpu is missing from the container. I want to permanently add it to allow CUDA programs to run.

cemzafer · February 13, 2024, 10:41am

I’m not pretty sure but, may be @qkiel post suits your need.
Regards.

simos · February 13, 2024, 9:36pm

Most likely it’s an upstream issue. There is an issue at GPU information not mounted on /proc/driver/nvidia/gpus · Issue #105 · NVIDIA/libnvidia-container · GitHub You may also need to check whether GitHub - NVIDIA/nvidia-container-toolkit: Build and run containers leveraging NVIDIA GPUs is the proper place to report this.

The mount that you perform in Incus is a workaround.

stgraber · February 13, 2024, 10:07pm

Yeah, the /proc stuff is managed by nvidia-container (when nvidia.runtime=true).
Incus does tell what GPUs we care about though so I’m not sure why it doesn’t provide the correct entries.

You may want to play around with maybe specifying what GPU you want by PCI address as that may help the nvidia scripts figure it out better?

RandomUser · February 17, 2024, 3:52am

I have added

  gpu:
    pci: "01:00.0"
    type: gpu

and it has not changed anything, the /proc/driver/nvidia/gpu is not being mounted.

any other ideas?

or a suggestion how to prioritise mounts in incus, to defer mounting of that device after all other devices are mounted?

  nvfix:
    path: /proc/driver/nvidia/gpus
    source: /proc/driver/nvidia/gpus
    type: disk

cemzafer · February 17, 2024, 10:30am

Hi @RandomUser,
Can you post the output of the incus config show <container_name> and inside the container cat /etc/os-release
Regards.