Hello, I have seen a few posts in which people complain that their existing containers which have config nvidia.runtime=true stopped working due to an upgrade.
I believe my containers are suffering from a similar problem now. But the error I am getting is very strange. And made me wonder if something else is going on in my case…
Once I add the config nvidia.runtime=true, I get Failed getting os.UserHomeDir(): $HOME is not defined
Currently, I am on
Name : nvidia-container-toolkit
Version : 1.17.8-1
For simplicity I am including my test using a new container (but my existing containers suffer from this problem as well)
❯ incus launch images:archlinux/current/amd64 test-gpu
Launching test-gpu
❯ incus exec test-gpu -- sh
sh-5.3# echo $HOME
/root
sh-5.3# exit
exit
:~ took 22s
❯ incus config set test-gpu nvidia.runtime=true
:~
❯ incus restart test-gpu
Error: Failed to run: /usr/bin/incusd forkstart test-gpu /var/lib/incus/containers /run/incus/test-gpu/lxc.conf: exit status 1 (Failed getting os.UserHomeDir(): $HOME is not defined)
Try `incus info --show-log test-gpu` for more info
❯ incus info --show-log test-gpu 16:15:08
Name: test-gpu
Description:
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2025/09/06 16:04 PDT
Last Used: 2025/09/06 16:11 PDT
Log:
lxc test-gpu 20250906231122.511 ERROR utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc test-gpu 20250906231122.511 ERROR conf - ../src/lxc/conf.c:lxc_setup:3933 - Failed to run mount hooks
lxc test-gpu 20250906231122.511 ERROR start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "test-gpu"
lxc test-gpu 20250906231122.511 ERROR sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc test-gpu 20250906231122.513 WARN network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth980bf50a"
lxc test-gpu 20250906231122.513 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:832 - Received container state "ABORTING" instead of "RUNNING"
lxc test-gpu 20250906231122.513 ERROR start - ../src/lxc/start.c:__lxc_start:2119 - Failed to spawn container "test-gpu"
lxc test-gpu 20250906231122.513 WARN start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 244061
lxc 20250906231122.633 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250906231122.633 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250906231122.633 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc 20250906231122.633 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
Also please note that the existing containers that run xorg work once I set nvidia.runtime=false