After adding `nvidia.runtime=true` container fails to start

Hello, I have seen a few posts in which people complain that their existing containers which have config nvidia.runtime=true stopped working due to an upgrade.

I believe my containers are suffering from a similar problem now. But the error I am getting is very strange. And made me wonder if something else is going on in my case…

Once I add the config nvidia.runtime=true, I get Failed getting os.UserHomeDir(): $HOME is not defined

Currently, I am on

Name            : nvidia-container-toolkit
Version         : 1.17.8-1

For simplicity I am including my test using a new container (but my existing containers suffer from this problem as well)

❯ incus launch images:archlinux/current/amd64 test-gpu
Launching test-gpu

❯ incus exec test-gpu -- sh                                                                
sh-5.3# echo $HOME
/root
sh-5.3# exit
exit

:~ took 22s
❯ incus config set test-gpu nvidia.runtime=true                                            

:~
❯ incus restart test-gpu                                                                   
Error: Failed to run: /usr/bin/incusd forkstart test-gpu /var/lib/incus/containers /run/incus/test-gpu/lxc.conf: exit status 1 (Failed getting os.UserHomeDir(): $HOME is not defined)
Try `incus info --show-log test-gpu` for more info

❯ incus info --show-log test-gpu                                                                                                                                          16:15:08
Name: test-gpu
Description:
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2025/09/06 16:04 PDT
Last Used: 2025/09/06 16:11 PDT

Log:

lxc test-gpu 20250906231122.511 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc test-gpu 20250906231122.511 ERROR    conf - ../src/lxc/conf.c:lxc_setup:3933 - Failed to run mount hooks
lxc test-gpu 20250906231122.511 ERROR    start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "test-gpu"
lxc test-gpu 20250906231122.511 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc test-gpu 20250906231122.513 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth980bf50a"
lxc test-gpu 20250906231122.513 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:832 - Received container state "ABORTING" instead of "RUNNING"
lxc test-gpu 20250906231122.513 ERROR    start - ../src/lxc/start.c:__lxc_start:2119 - Failed to spawn container "test-gpu"
lxc test-gpu 20250906231122.513 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 244061
lxc 20250906231122.633 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250906231122.633 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250906231122.633 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc 20250906231122.633 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

Also please note that the existing containers that run xorg work once I set nvidia.runtime=false

Can you also share the which OS and Incus version this is running on?

Take a look at this issue 'Failed getting os.UserHomeDir(): $HOME is not defined' when starting proxy device · Issue #2439 · lxc/incus · GitHub although different root cause it might be related to your issue. Try the provided workaround and report if it solves the issue.

Apologies for the late response…thank you for the reference. I have it bookmarked for follow up (if needed).

Since this problem came up, I have been running containers with nvidia.runtime: false & installing the drivers manually.

Today, I finally got some time to look into this problem. I removed the manually installed nvidia drivers & reset nvidia.runtime: true…and everything works.

Not sure why it had broken, maybe the latest updates fixed the issue…

If the issue happens again, I’ll follow your link and post an update here.

Thanks again.