We use Incus (LXC) containers for all our server release testing (some 20+ containers)
Under Incus 6.10, with Nvidia and host processor (QuickSyncVideo) GPUs passed into the container, with Nvidia runtime libraries, Everything worked flawlessly.
In preparation for updating our server hosts to Incus 6.11, I updated my workstation (Ubuntu 22,04)
Now, any container with the Nvidia runtime library passed into it, will refuse to launch
If I remove the nvidia.driver.xxxx
statements, the container will launch but the Nvidia will not function.
Reading documentation implies Nvidia changes in 6.11 are responsible but I can’t figure out what’s needed.
I would appreciate an assist in which changes are required.
- Create generic
images:ubuntu/22.04
container - Add the host processor GPU (QSV) and the Nvidia GPU (Discrete)
- Attempt to launch
[chuck@lizum ~.2012]$ incus-gpu plex
Device GPUs added to plex
GPU configuration added to 'plex'
Restarting plex
Error: Failed to run: /opt/incus/bin/incusd forkstart plex /var/lib/incus/containers /run/incus/plex/lxc.conf: exit status 1
Try `incus info --show-log plex` for more info
[chuck@lizum ~.2013]$ incus info --show-log plex
Name: plex
Description:
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2025/04/04 15:56 EDT
Last Used: 2025/04/04 15:57 EDT
Log:
lxc plex 20250404195744.224 ERROR utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc plex 20250404195744.224 ERROR conf - ../src/lxc/conf.c:lxc_setup:3948 - Failed to run mount hooks
lxc plex 20250404195744.224 ERROR start - ../src/lxc/start.c:do_start:1268 - Failed to setup container "plex"
lxc plex 20250404195744.224 ERROR sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc plex 20250404195744.228 WARN network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth53c800f8"
lxc plex 20250404195744.228 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc plex 20250404195744.228 ERROR start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "plex"
lxc plex 20250404195744.228 WARN start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 2815984
lxc 20250404195744.290 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250404195744.290 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
[chuck@lizum ~.2014]$
This is the entire container config:
[chuck@lizum ~.2014]$ incus config show plex
architecture: x86_64
config:
image.architecture: amd64
image.description: Ubuntu jammy amd64 (20250402_07:42)
image.os: Ubuntu
image.release: jammy
image.serial: "20250402_07:42"
image.type: squashfs
image.variant: default
nvidia.driver.capabilities: all
nvidia.require.cuda: "true"
nvidia.runtime: "true"
volatile.base_image: 6367014c9f0e92a6508be10660abb609e23fd0e9e03497d23f4f57679ede93ac
volatile.cloud-init.instance-id: a3da4afe-f7fa-4cb4-ac9b-fe29f2846e92
volatile.eth0.hwaddr: 10:66:6a:52:b3:05
volatile.idmap.base: "0"
volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
volatile.last_state.idmap: '[]'
volatile.last_state.power: STOPPED
volatile.last_state.ready: "false"
volatile.uuid: c8c51e09-9c4c-48fe-a8ac-c42d33a08e1c
volatile.uuid.generation: c8c51e09-9c4c-48fe-a8ac-c42d33a08e1c
devices:
GPUs:
gid: "110"
type: gpu
ephemeral: false
profiles:
- default
stateful: false
description: ""
[chuck@lizum ~.2015]$
The setup script for our containers is very basic (uniformity)
This is the part of the script which adds both Nvidia and Host GPUs to the container.
Gid="$(stat -c %g /dev/dri/renderD128)"
# Add it (Both Intel and Nvidia)
incus config device add "$1" GPUs gpu gid=$Gid
# Add Nvidia runtime
incus config set "$1" nvidia.driver.capabilities all
incus config set "$1" nvidia.require.cuda true
incus config set "$1" nvidia.runtime true