Great. The GPU works fine after following the steps mentioned previously.
libnvidia-container hardcodes an expectation that nvidia-smi is in /usr/bin, which is not a valid NixOS assumption. There have been some tweaks in our libnvidia-container package recently, but Iām not sure yet if theyāll fix the problem of libnvidia-container failing to find the binaries. Iāll check back in when I can confirm either way, but the changes made will unlikely be backported to stable 24.11.
@stgraber one thing I noticed when debugging this is that the nvidia hook was failing to create /var/lib/incus/storage-pools/default/containers/noble-molly/hook, when I created it and made the permissions wide open, I notice that the hook is running as the containerās root UID and not the hostās. This prevents libnvidia-container from writing its log file.
Have you seen this before?
I suspect thatās normal, the hook was written for LXC and so expects a path like /var/lib/lxc/NAME where it can have write access.
Under Incus weāve tightened permissions a fair bit more so thatās causing this issue.
Is that fatal though or just prevents logging?
It only prevents logging from what Iāve seen. The logging was helpful for some of the troubleshooting Iām doing, but I can just mkdir/chown during that.
@stgraber @adamcstephens I am trying to setup a container on another host. If i specify nvidia.runtime: "true" container doesnt start.
$incus start dockerblr
Error: Failed to run: /nix/store/2ypj6mwrs14wzwf18avqx0nm5n8r41vg-incus-6.11.0/bin/incusd forkstart dockerblr /var/lib/incus/containers /run/incus/dockerblr/lxc.conf: exit status 1
Try `incus info --show-log dockerblr` for more info
$incus info --show-log dockerblr
Error: Invalid PID 'ļæ½'
My incus is setup as following,
#incus
virtualisation.incus.package = pkgs.incus;
virtualisation.incus.enable = true;
systemd.services.incus.environment.INCUS_LXC_HOOK =
"${config.virtualisation.incus.lxcPackage}/share/lxc/hooks";
Once i remove nvidia.runtime the container starts up fine.
Sorry, I isnāt have the bandwidth to look into this further right now. I donāt use this feature and itās difficult or impossible for us to write NixOS tests for given the hardware requirement.
Iād invite you to file an issue on the nixpkgs repo to track the problem, preferably with any more detail you can provide. Unfortunately, unless youāre willing/able to do the deep investigation yourself, I suspect little progress will be made.
@adamcstephens This issue is back in unstable.
I defined the following container,
architecture: x86_64
config:
image.architecture: amd64
image.description: Ubuntu noble amd64 (20250829_07:42)
image.os: Ubuntu
image.release: noble
image.requirements.cgroup: v2
image.serial: "20250829_07:42"
image.type: squashfs
image.variant: default
nvidia.driver.capabilities: all
nvidia.runtime: "true"
volatile.base_image: 9e6510296ae2a03601e0eeffaaab0bf990ff13cb15fb328d216fb87b4910936c
volatile.cloud-init.instance-id: b25968d3-af72-4242-8fa9-f0df01fec03e
volatile.eth0.hwaddr: 10:66:6a:81:c6:73
volatile.idmap.base: "0"
volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
volatile.last_state.idmap: '[]'
volatile.last_state.power: STOPPED
volatile.last_state.ready: "false"
volatile.uuid: 062a171d-a647-4d6e-b81c-28c099a8f506
volatile.uuid.generation: 062a171d-a647-4d6e-b81c-28c099a8f506
devices:
gpu:
id: "0"
type: gpu
ephemeral: false
profiles:
- default
stateful: false
$ incus start c2
Error: Failed to run: /nix/store/vjn8j2smqpib6g6bfdyvk0dvcqqsc2al-incus-6.15.0/bin/incusd forkstart c2 /var/lib/incus/containers /run/incus/c2/lxc.conf: exit status 1
Try `incus info --show-log c2` for more info
$ incus info --show-log c2
Name: c2
Description:
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2025/08/30 05:38 IST
Last Used: 2025/08/30 05:38 IST
Log:
lxc c2 20250830000831.469 ERROR utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc c2 20250830000831.469 ERROR conf - ../src/lxc/conf.c:lxc_setup:3933 - Failed to run mount hooks
lxc c2 20250830000831.469 ERROR start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "c2"
lxc c2 20250830000831.469 ERROR sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc c2 20250830000831.478 WARN network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth350a2ff1"
lxc c2 20250830000831.478 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:832 - Received container state "ABORTING" instead of "RUNNING"
lxc c2 20250830000831.478 ERROR start - ../src/lxc/start.c:__lxc_start:2119 - Failed to spawn container "c2"
lxc c2 20250830000831.478 WARN start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 553206
lxc 20250830000831.577 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250830000831.577 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
incus is defined as following,
virtualisation.incus.package = pkgs.incus;
virtualisation.incus.enable = true;
systemd.services.incus.environment.INCUS_LXC_HOOK =
"${config.virtualisation.incus.lxcPackage}/share/lxc/hooks";
I noticed that generation 649 works with nvidia but 650 does not. Both releases have incus 6.15,
Generation Build-date NixOS version Kernel Configuration Revision Specialisation Current
650 2025-08-30 05:11:29 25.11.20250828.dfb2f12 6.12.43 Unknown [] True
649 2025-08-29 04:05:10 25.11.20250819.2007595 6.12.42 Unknown [] False
lxc may have been upgraded to 6.0.5 in that timeframe, but youāll need to look at the commits and see if this is in that range. lxc: 6.0.4 -> 6.0.5 Ā· NixOS/nixpkgs@db7c9dc Ā· GitHub
I removed the patch because it no longer applied, assuming itās in 6.0.5. If not in 6.0.5, it will need to be rebased, and if it is then perhaps there is another regression. Pull requests are accepted, but I donāt have the bandwidth to continually fix these nvidia runtime issues.
I looked at the package version when nvidia IS working and it is 6.0.5,
$ nix-store --query --requisites /run/current-system | cut -d- -f2- | sort -u|grep lxc
lxc-6.0.5
lxcfs-6.0.5
unit-lxcfs.service
So maybe it is another regression.
manually linking the binaries to /usr/bin makes it work⦠Im not really sure where where the issue should be opened or addressedā¦
Seems related, I have issues starting up container in void linux, the error too seems to be the nvidia hook. It seems after normal restart itās fine but booting up from cold it canāt start the container.
Here are the startup logs:
lxc llama 20260112134425.985 INFO lxccontainer - ../src/lxc/lxccontainer.c:do_lxcapi_start:959 - Set process title to [lxc monitor] /var/lib/incus/containers llama
lxc llama 20260112134425.986 INFO start - ../src/lxc/start.c:lxc_check_inherited:326 - Closed inherited fd 4
lxc llama 20260112134425.986 INFO start - ../src/lxc/start.c:lxc_check_inherited:326 - Closed inherited fd 5
lxc llama 20260112134425.986 INFO start - ../src/lxc/start.c:lxc_check_inherited:326 - Closed inherited fd 9
lxc llama 20260112134425.986 INFO lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver nop
lxc llama 20260112134425.986 INFO utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/proc/1563/exe callhook /var/lib/incus "default" "llama" start" for container "llama"
lxc llama 20260112134426.125 INFO cgfsng - ../src/lxc/cgroups/cgfsng.c:unpriv_systemd_create_scope:1498 - Running privileged, not using a systemd unit
lxc llama 20260112134426.127 INFO seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "[all]"
lxc llama 20260112134426.127 INFO seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "reject_force_umount # comment this to allow umount -f; not recommended"
lxc llama 20260112134426.127 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
lxc llama 20260112134426.127 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
lxc llama 20260112134426.127 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
lxc llama 20260112134426.127 INFO seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "[all]"
lxc llama 20260112134426.127 INFO seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "kexec_load errno 38"
lxc llama 20260112134426.127 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[246:kexec_load] action[327718:errno] arch[0]
lxc llama 20260112134426.127 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741827]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327718:errno] arch[1073741886]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "open_by_handle_at errno 38"
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[304:open_by_handle_at] action[327718:errno] arch[0]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741827]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304:open_by_handle_at] action[327718:errno] arch[1073741886]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "init_module errno 38"
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[175:init_module] action[327718:errno] arch[0]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741827]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327718:errno] arch[1073741886]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "finit_module errno 38"
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[313:finit_module] action[327718:errno] arch[0]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741827]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327718:errno] arch[1073741886]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "delete_module errno 38"
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[176:delete_module] action[327718:errno] arch[0]
lxc llama 20260112134426.128 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741827]
lxc llama 20260112134426.129 INFO seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327718:errno] arch[1073741886]
lxc llama 20260112134426.129 INFO seccomp - ../src/lxc/seccomp.c:parse_config_v2:1036 - Merging compat seccomp contexts into main context
lxc llama 20260112134426.129 INFO start - ../src/lxc/start.c:lxc_init:882 - Container "llama" is initialized
lxc llama 20260112134426.130 INFO cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_create:1669 - The monitor process uses "lxc.monitor.llama" as cgroup
lxc llama 20260112134426.325 INFO cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_create:1777 - The container process uses "lxc.payload.llama" as inner and "lxc.payload.llama" as limit cgroup
lxc llama 20260112134426.443 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUSER
lxc llama 20260112134426.443 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWNS
lxc llama 20260112134426.443 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWPID
lxc llama 20260112134426.443 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUTS
lxc llama 20260112134426.443 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWIPC
lxc llama 20260112134426.443 INFO start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWCGROUP
lxc llama 20260112134426.499 INFO idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:176 - Caller maps host root. Writing mapping directly
lxc llama 20260112134426.500 NOTICE utils - ../src/lxc/utils.c:lxc_drop_groups:1477 - Dropped supplimentary groups
lxc llama 20260112134426.512 INFO start - ../src/lxc/start.c:do_start:1105 - Unshared CLONE_NEWNET
lxc llama 20260112134426.512 NOTICE utils - ../src/lxc/utils.c:lxc_drop_groups:1477 - Dropped supplimentary groups
lxc llama 20260112134426.512 NOTICE utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1453 - Switched to gid 0
lxc llama 20260112134426.512 NOTICE utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1462 - Switched to uid 0
lxc llama 20260112134426.737 INFO conf - ../src/lxc/conf.c:setup_utsname:683 - Set hostname to "llama"
lxc llama 20260112134426.740 INFO network - ../src/lxc/network.c:lxc_setup_network_in_child_namespaces:4064 - Finished setting up network devices with caller assigned names
lxc llama 20260112134426.741 INFO conf - ../src/lxc/conf.c:mount_autodev:1027 - Preparing "/dev"
lxc llama 20260112134426.742 INFO conf - ../src/lxc/conf.c:mount_autodev:1088 - Prepared "/dev"
lxc llama 20260112134426.173 INFO utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/usr/share/lxc/hooks/nvidia" for container "llama"
lxc llama 20260112134426.196 ERROR utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc llama 20260112134426.196 ERROR conf - ../src/lxc/conf.c:lxc_setup:3944 - Failed to run mount hooks
lxc llama 20260112134426.196 ERROR start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "llama"
lxc llama 20260112134426.196 ERROR sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc llama 20260112134426.199 WARN network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth0223f602"
lxc llama 20260112134426.199 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc llama 20260112134426.199 ERROR start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "llama"
lxc llama 20260112134426.199 WARN start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 2203
lxc llama 20260112134426.199 INFO utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/usr/libexec/incus/incusd callhook /var/lib/incus "default" "llama" stopns" for container "llama"
lxc 20260112134426.269 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20260112134426.269 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20260112134426.269 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc 20260112134426.269 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc llama 20260112134426.269 INFO utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/usr/libexec/incus/incusd callhook /var/lib/incus "default" "llama" stop" for container "llama"
Weāve seen report that on some distros, a run of nvidia-smi is needed on the host to get some stuff initialized.
Thank you. Seems to have indeed solved my issues.