Don't update nvidia-container-toolkit to 1.17.7!. was Can't start containers using nvidia.runtime: "true"

Hello, after an os update, I’m suffering this:

lxc ollama 20250522124246.594 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc ollama 20250522124246.594 ERROR    conf - ../src/lxc/conf.c:lxc_setup:3944 - Failed to run mount hooks
lxc ollama 20250522124246.594 ERROR    start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "ollama"
lxc ollama 20250522124246.594 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc ollama 20250522124246.600 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth3d95dde7"
lxc ollama 20250522124246.600 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc ollama 20250522124246.600 ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "ollama"
lxc ollama 20250522124246.600 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 80014
lxc 20250522124246.679 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Conexión reinicializada por la máquina remota - Failed to receive response
lxc 20250522124246.679 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

In one of my computers. This is Rocky 9.5, incus 6.12.

nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2060        Off |   00000000:02:00.0 Off |                  N/A |
| 42%   32C    P8              9W /  184W |      14MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 2060        Off |   00000000:03:00.0 Off |                  N/A |
| 29%   38C    P8              9W /  184W |      10MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1793      G   /usr/libexec/Xorg                        11MiB |
+-----------------------------------------------------------------------------------------+

and nvidia drivers and container toolkit installed. It was working ok before update.

nvidia-container-runtime is 3.14.0…
lxc package is 6.0.3.

I’ve seen some posts about similar issues but no solution seems to work.

What could be happening?

Can you post your container config? incus config show <container>

You might need to add a few environment variables:

  environment.NVIDIA_DRIVER_CAPABILITIES: compute,utility
  environment.NVIDIA_VISIBLE_DEVICES: all
  environment.OLLAMA_HOST: 0.0.0.0:11434

These are the one I have in my ollama container to get it working, next to nvidia.runtime: true

Hello! Thank you.

It is not a container config issue, it is NVIDIA bug! nvidia-container-toolkit 1.17.7 is broken.

see:

So you have to revert back to 1.17.6, or wait for the next release.

Best regards

You are my hero! I’v spent a whole day trying to downgrade this and that but nothing work until I saw your post. Thanks!

I’m glad to hear it. Good luck!