Ver6.1 device add /proc/driver/nvidia/gpus lost after reboot

chenyg0911 · May 15, 2024, 8:54am

I run nvidia dockedr inside container, container config:

devices:
  gpu:
    type: gpu
  n-gpu:
    path: /proc/driver/nvidia/gpus/0000:01:00.0
    source: /proc/driver/nvidia/gpus/0000:01:00.0
    type: disk
  mydata:
    path: /data
    source: /data
    type: disk

it work with incus v0.6 as expect.
when upgrade 6.1, after add the device to container, it works like before. BUT, when the container restart, tnside container the path lost. Other disk volume as regular data still work(“mydata”),Why?

qkiel · May 15, 2024, 1:20pm

Maybe this will help. I don’t have an Nvidia card to test that, though.

When you set nvidia.driver.capabilities: all and nvidia.runtime: "true" in your config, the /proc/driver/nvidia/gpus/0000:01:00.0 from the host will be mounted to /dev/nvidia0 folder in the container. Simple symlink in the container should work (there’s a systemd service in the link above that automates this):

sudo mkdir -p /proc/driver/nvidia/gpus && ln -s /dev/nvidia0 /proc/driver/nvidia/gpus/0000:01:00.0

The other solution is to skip nvidia.driver.capabilities: all and nvidia.runtime: "true" and install all the Nvidia packages directly in the Incus container.

chenyg0911 · May 17, 2024, 3:04am

it wotk @qkiel ,But It’s the same problem when reboot the container. the /proc/driver/nvidia/gpu directory lost, have to recreate it Or maybe through a systemd service to auto create it.

But i’m puzzled why I can’t auto mount the path from host to container when upgrade incus to version6.1.
I guess it maybe when attach the dir to container is occur before the /proc/driver/nvidia have created in container. the same time regular data path is mounted normally.