Container not starting when nvidia.runtime=true

,

Hello to the Incus community!

I try to run a debian container (systemd) on a gentoo (openRC) host and the container fails to start when ‘nvidia.runtime=true’ is set. If I set it to false, then the container starts but without GPU, of course. I have an other container which has the driver in the container as well (not using nvidia-container-toolkit) and it works well so I suspect that the error is around the nvidia-container-toolkit.

Since the host is openRC and the guest is systemd, I followed the instructions in Incus - Gentoo wiki and set the ‘rc_cgroup_mode=unified’. I did not touch ‘/etc/init.d/incus’ since I found this there:

# Create necessary systemd paths in order for systemd containers to work on openrc host.
# /etc/rc.conf should have following values:
#   rc_cgroup_mode="hybrid"
if [ -d /sys/fs/cgroup/unified ] &&
[ ! -d /sys/fs/cgroup/systemd ]; then
	install -d /sys/fs/cgroup/systemd --group incus-admin --owner root
	mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd
fi

The installed versions:

nvidia-container-toolkit-1.17.8

app-containers/lxc 6.04-r1

app-containers/incus 6.04-r1

nvidia driver: 565.57.01

I checked that the lxc 6.04-r1 contains this patch on Gentoo: start: Re-introduce first SET_DUMPABLE call by stgraber · Pull Request #4536 · lxc/lxc · GitHub

The error message:

aszakal@fj-aszakal ~ $ incus start fj-aszakal-i31
Error: Failed to run: /usr/bin/incusd forkstart fj-aszakal-i31 /var/lib/incus/containers /run/incus/fj-aszakal-i31/lxc.conf: exit status 1
Try incus info --show-log fj-aszakal-i31 for more info

aszakal@fj-aszakal ~ $ incus info --show-log fj-aszakal-i31
Name: fj-aszakal-i31
Description:
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2025/11/26 15:07 CET
Last Used: 2025/11/26 22:25 CET

Log:

lxc fj-aszakal-i31 20251126212555.337 WARN cgfsng - ../lxc-6.0.4/src/lxc/cgroups/cgfsng.c:fchowmodat:1907 - No such file or directory - Failed to fchownat(46, memory.oom.group, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc fj-aszakal-i31 20251126212555.337 WARN cgfsng - ../lxc-6.0.4/src/lxc/cgroups/cgfsng.c:fchowmodat:1907 - No such file or directory - Failed to fchownat(46, memory.reclaim, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc fj-aszakal-i31 20251126212555.426 ERROR utils - ../lxc-6.0.4/src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc fj-aszakal-i31 20251126212555.427 ERROR conf - ../lxc-6.0.4/src/lxc/conf.c:lxc_setup:3948 - Failed to run mount hooks
lxc fj-aszakal-i31 20251126212555.427 ERROR start - ../lxc-6.0.4/src/lxc/start.c:do_start:1273 - Failed to setup container “fj-aszakal-i31”
lxc fj-aszakal-i31 20251126212555.427 ERROR sync - ../lxc-6.0.4/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc fj-aszakal-i31 20251126212555.430 WARN network - ../lxc-6.0.4/src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from “eth0” to its initial name “veth7b9fb08d”
lxc fj-aszakal-i31 20251126212555.430 ERROR start - ../lxc-6.0.4/src/lxc/start.c:__lxc_start:2119 - Failed to spawn container “fj-aszakal-i31”
lxc fj-aszakal-i31 20251126212555.430 WARN start - ../lxc-6.0.4/src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 47 for process 971
lxc fj-aszakal-i31 20251126212555.430 ERROR lxccontainer - ../lxc-6.0.4/src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state “ABORTING” instead of “RUNNING”
lxc fj-aszakal-i31 20251126212555.514 WARN cgfsng - ../lxc-6.0.4/src/lxc/cgroups/cgfsng.c:cgroup_tree_remove:489 - No such file or directory - Failed to destroy 24(lxc.payload.fj-aszakal-i31)

My incus configuration:

aszakal@fj-aszakal ~ $ incus config show fj-aszakal-i31
architecture: x86_64
config:
image.description: Build/dev image ‘master-linux-x64-3.1’ [Debian/trixie/20250222T205704Z]
image.os: debian
image.release: trixie
nvidia.runtime: “true”
volatile.base_image: c3906f302e1928cbac986ebd9113ef344df169ba566c7c21716c41881a8592ee
volatile.cloud-init.instance-id: 31d18c59-14cb-4e34-a369-cb2ee82a9ab5
volatile.eth0.hwaddr: 10:66:6a:05:4d:97
volatile.idmap.base: “0”
volatile.idmap.current: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:65536},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:65536}]’
volatile.idmap.next: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:65536},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:65536}]’
volatile.last_state.idmap: ‘
volatile.last_state.power: STOPPED
volatile.last_state.ready: “false”
volatile.uuid: fd2fab81-2bd0-4a17-8bd0-9db7cb06b36b
volatile.uuid.generation: fd2fab81-2bd0-4a17-8bd0-9db7cb06b36b
devices:
gpu0:
type: gpu
ephemeral: false
profiles:

  • devMachine31Profile
    stateful: false
    description: “”

The used profile:

aszakal@fj-aszakal ~ $ incus profile show devMachine31Profile
config: {}
description: Default Incus profile
devices:
eth0:
name: eth0
nictype: bridged
parent: br0
type: nic
root:
path: /
pool: incusStorage-31
type: disk
name: devMachine31Profile
used_by:

  • /1.0/instances/fj-aszakal-i31
  • /1.0/instances/fj-aszakal-i31-nonvidia
    project: default

I saw a lot of reasons why this error could happen, please help me to localize the problem.

Best regards,

Alex

I updated to the 6.0.5 versions of incus and LXC by unmasking the packages in the gentoo ebuild repository but the error still remains.

I found a post here:

This mentions that I should be able to run ‘nvidia-container-cli info’ with root privileges. It runs as a normal user, but returns ‘nvidia-container-cli: initialization error: nvml error: insufficient permissions’ when run as root. Could this be the source of my problem?

Thank you for your help,

Alex