Incus 6.11 Update: Containers with (Nvidia) GPU passthrough will not start

We use Incus (LXC) containers for all our server release testing (some 20+ containers)

Under Incus 6.10, with Nvidia and host processor (QuickSyncVideo) GPUs passed into the container, with Nvidia runtime libraries, Everything worked flawlessly.

In preparation for updating our server hosts to Incus 6.11, I updated my workstation (Ubuntu 22,04)

Now, any container with the Nvidia runtime library passed into it, will refuse to launch
If I remove the nvidia.driver.xxxx statements, the container will launch but the Nvidia will not function.

Reading documentation implies Nvidia changes in 6.11 are responsible but I can’t figure out what’s needed.

I would appreciate an assist in which changes are required.

  1. Create generic images:ubuntu/22.04 container
  2. Add the host processor GPU (QSV) and the Nvidia GPU (Discrete)
  3. Attempt to launch
[chuck@lizum ~.2012]$ incus-gpu plex
Device GPUs added to plex
GPU configuration added to 'plex'
Restarting plex
Error: Failed to run: /opt/incus/bin/incusd forkstart plex /var/lib/incus/containers /run/incus/plex/lxc.conf: exit status 1
Try `incus info --show-log plex` for more info
[chuck@lizum ~.2013]$ incus info --show-log plex
Name: plex
Description: 
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2025/04/04 15:56 EDT
Last Used: 2025/04/04 15:57 EDT

Log:

lxc plex 20250404195744.224 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc plex 20250404195744.224 ERROR    conf - ../src/lxc/conf.c:lxc_setup:3948 - Failed to run mount hooks
lxc plex 20250404195744.224 ERROR    start - ../src/lxc/start.c:do_start:1268 - Failed to setup container "plex"
lxc plex 20250404195744.224 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc plex 20250404195744.228 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth53c800f8"
lxc plex 20250404195744.228 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc plex 20250404195744.228 ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "plex"
lxc plex 20250404195744.228 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 2815984
lxc 20250404195744.290 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250404195744.290 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

[chuck@lizum ~.2014]$ 

This is the entire container config:

[chuck@lizum ~.2014]$ incus config show plex
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu jammy amd64 (20250402_07:42)
  image.os: Ubuntu
  image.release: jammy
  image.serial: "20250402_07:42"
  image.type: squashfs
  image.variant: default
  nvidia.driver.capabilities: all
  nvidia.require.cuda: "true"
  nvidia.runtime: "true"
  volatile.base_image: 6367014c9f0e92a6508be10660abb609e23fd0e9e03497d23f4f57679ede93ac
  volatile.cloud-init.instance-id: a3da4afe-f7fa-4cb4-ac9b-fe29f2846e92
  volatile.eth0.hwaddr: 10:66:6a:52:b3:05
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: c8c51e09-9c4c-48fe-a8ac-c42d33a08e1c
  volatile.uuid.generation: c8c51e09-9c4c-48fe-a8ac-c42d33a08e1c
devices:
  GPUs:
    gid: "110"
    type: gpu
ephemeral: false
profiles:
- default
stateful: false
description: ""
[chuck@lizum ~.2015]$ 

The setup script for our containers is very basic (uniformity)

This is the part of the script which adds both Nvidia and Host GPUs to the container.


Gid="$(stat -c %g /dev/dri/renderD128)"

# Add it (Both Intel and Nvidia)
incus config device add "$1" GPUs gpu gid=$Gid

# Add Nvidia runtime 
incus config set "$1" nvidia.driver.capabilities all
incus config set "$1" nvidia.require.cuda true
incus config set "$1" nvidia.runtime true

Additional info:

Existing containers which use the Nvidia GPU through a profile work as expected.

[chuck@lizum ~.2038]$ incus config show plexdev
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu jammy amd64 (20250222_07:42)
  image.os: Ubuntu
  image.release: jammy
  image.serial: "20250222_07:42"
  image.type: squashfs
  image.variant: default
  volatile.base_image: 9aab5e0a0a792348c8e8dc60e4d61aedd4360e0f89de9969c081721100f6fdcc
  volatile.cloud-init.instance-id: 6e8f49f9-c81e-4786-b9f6-3421ee1e4ad5
  volatile.eth0.host_name: veth5571e4e7
  volatile.eth0.hwaddr: 00:16:3e:c0:72:21
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 93ec99db-15bb-42e8-91cc-c7f7dad6e95b
  volatile.uuid.generation: 93ec99db-15bb-42e8-91cc-c7f7dad6e95b
devices:
  git:
    path: /git
    source: /glock/git/plex-media-server/postproc/
    type: disk
  postproc:
    path: /postproc
    source: /glock/git/plex-media-server/postproc
    type: disk
ephemeral: false
profiles:
- default
- nvidia
stateful: false
description: ""
[chuck@lizum ~.2039]$ incus shell plexdev
root@plexdev:~# nvidia-smi
Fri Apr  4 21:13:38 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.77                 Driver Version: 565.77         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 2000 Ada Gene...    Off |   00000000:01:00.0  On |                  Off |
| 30%   43C    P8              8W /   70W |    1815MiB /  16380MiB |     16%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
root@plexdev:~# 
[chuck@lizum ~.2016]$ incus profile show nvidia
config:
  nvidia.driver.capabilities: all
  nvidia.require.cuda: "true"
  nvidia.runtime: "true"
description: Nvidia GPU profile
devices:
  gpus:
    gid: "110"
    type: gpu
name: nvidia
used_by:
- /1.0/instances/plexdev
project: default
[chuck@lizum ~.2017]$ 

Greetings, ChuckY. I’m in a similar situation due to the 6.11 update.
My containers with the nvidia.runtime=“true” setting refuse to start, but if I change the value to “false,” they start, but obviously without using the graphics card. I’ve also tried your recent contribution via profiling, but it doesn’t work for me. I’d appreciate any guidance.

1 Like

Hey, I ran into this issue yesterday and I’m still experiencing it after updating to incus/noble,now 1:6.11-ubuntu24.04-202504052021 amd64 and rebooting. Appears to be the same issue and still occurring on this setup.

Configuration:

gage@r730:~$ incus config show openwebui1 --expanded
architecture: x86_64
config:
  environment.ANONYMIZED_TELEMETRY: "false"
  environment.DO_NOT_TRACK: "true"
  environment.DOCKER: "true"
  environment.ENV: prod
  environment.GPG_KEY: A035C8C19219BA821ECEA86B64E628F8D684696D
  environment.HF_HOME: /app/backend/data/cache/embedding/models
  environment.HOME: /root
  environment.LANG: C.UTF-8
  environment.OLLAMA_BASE_URL: /ollama
  environment.PATH: /usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  environment.PORT: "8080"
  environment.PYTHON_SHA256: 2a9920c7a0cd236de33644ed980a13cbbc21058bfdc528febb6081575ed73be3
  environment.PYTHON_VERSION: 3.11.11
  environment.RAG_EMBEDDING_MODEL: sentence-transformers/all-MiniLM-L6-v2
  environment.SCARF_NO_ANALYTICS: "true"
  environment.SENTENCE_TRANSFORMERS_HOME: /app/backend/data/cache/embedding/models
  environment.TERM: xterm
  environment.TIKTOKEN_CACHE_DIR: /app/backend/data/cache/tiktoken
  environment.TIKTOKEN_ENCODING_NAME: cl100k_base
  environment.USE_CUDA_DOCKER: "false"
  environment.USE_CUDA_DOCKER_VER: cu121
  environment.USE_EMBEDDING_MODEL_DOCKER: sentence-transformers/all-MiniLM-L6-v2
  environment.USE_OLLAMA_DOCKER: "true"
  environment.WEBUI_BUILD_VERSION: 04799f1f95f958674d35ba4854ef62754a4d332e
  environment.WHISPER_MODEL: base
  environment.WHISPER_MODEL_DIR: /app/backend/data/cache/whisper/models
  image.architecture: x86_64
  image.description: ghcr.io/open-webui/open-webui (OCI)
  image.id: open-webui/open-webui:ollama
  image.type: oci
  limits.cpu: "30"
  limits.memory: 762GiB
  nvidia.driver.capabilities: all
  nvidia.require.cuda: "true"
  nvidia.runtime: "true"
  oci.cwd: /app/backend
  oci.entrypoint: bash start.sh
  oci.gid: "0"
  oci.uid: "0"
  volatile.base_image: db1cbb159ee2074b3beb00bd5f958660c4e1cc27dce66b355c7b1dfaf076956b
  volatile.cloud-init.instance-id: 8a40c341-5d07-4da1-af1e-b074ef20dae3
  volatile.container.oci: "true"
  volatile.eth30.hwaddr: 10:66:6a:8d:d5:38
  volatile.eth30.name: eth0
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: c06f3e08-b9c0-490e-8145-3f753e14c8c8
  volatile.uuid.generation: c06f3e08-b9c0-490e-8145-3f753e14c8c8
devices:
  data:
    path: /app/backend/data
    pool: default
    source: openwebui1-data
    type: disk
  eth30:
    mtu: "1500"
    nictype: bridged
    parent: br10
    type: nic
    vlan: "30"
  gpu:
    type: gpu
  ollama:
    path: /root/.ollama
    pool: default
    source: openwebui1-ollama
    type: disk
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- defaultContainer
- vlan30
stateful: false
description: ""

Output of incus info --show-log

gage@r730:~$ incus info --show-log openwebui1
Name: openwebui1
Description: 
Status: STOPPED
Type: container (application)
Architecture: x86_64
Created: 2025/04/05 05:39 UTC
Last Used: 2025/04/06 01:23 UTC

Log:

lxc openwebui1 20250406012341.173 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc openwebui1 20250406012341.173 ERROR    conf - ../src/lxc/conf.c:lxc_setup:3948 - Failed to run mount hooks
lxc openwebui1 20250406012341.173 ERROR    start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "openwebui1"
lxc openwebui1 20250406012341.173 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc openwebui1 20250406012341.184 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth28643b19"
lxc openwebui1 20250406012341.184 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc openwebui1 20250406012341.184 ERROR    start - ../src/lxc/start.c:__lxc_start:2119 - Failed to spawn container "openwebui1"
lxc openwebui1 20250406012341.184 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 5455
lxc 20250406012341.286 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250406012341.286 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

^ That’s the problem.

In the past we were basically overriding the PATH variable for you, now we respect it.
Because that variable doesn’t include /opt/incus/bin/ where the nvidia tools are located, your container doesn’t see them and startup fails.

Either add :/opt/incus/bin to the variable or unset it altogether, either way, that should fix it.

1 Like

That did it, thank you!

@stgraber Is there going to a new release with this patch? Or is it possible to downgrade to a previous working version (on NixOS)?

We’re not currently planning to do a release just for that single commit as we normally coordinate point releases across all our projects.

We’ve been encouraging distribution maintainers to pull in that one extra commit in their package. Maybe @adamcstephens can help get that into the Nix one?

It will be a couple days before I’ll get to this to add a patch in the NixOS package, and then another few days after that for it to make it into a release branch. I’d be happy to review a PR if someone else has the time. Otherwise it’s relatively simple to use overrideAttrs to add a patch to the package locally in your config.

1 Like

@adamcstephens any luck with this?