After upgrading from 6.0 to 6.5 something is broken

Hi,
I’m recently upgrade one of my server to the latest incus version and also upgrade from ubuntu 22.04 to 24.04 and then, i cant start those containers/vms anymore. Can anyone assist me what is wrong?
Here are some informations.

indiana@incusrv01:~$ incus config show ollama
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu jammy amd64 (20240209_07:42)
  image.os: Ubuntu
  image.release: jammy
  image.serial: "20240209_07:42"
  image.type: squashfs
  image.variant: cloud
  nvidia.runtime: "true"
  volatile.base_image: 1ee628cabf20f5284b3a105bf629691071120e3bb340312ca6675a1a546c4b7d
  volatile.cloud-init.instance-id: 2763f81a-f24e-4faa-a34e-bb9eeb696cd6
  volatile.eth0.hwaddr: 00:16:3e:bb:b4:78
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 63b43525-74c2-4fa8-a5f7-c5d26e2e9996
  volatile.uuid.generation: 63b43525-74c2-4fa8-a5f7-c5d26e2e9996
devices:
  gpu:
    gputype: physical
    type: gpu
ephemeral: false
profiles:
- default
stateful: false
description: ""
indiana@incusrv01:~$ incus start ollama
Error: Failed to run: /opt/incus/bin/incusd forkstart ollama /var/lib/incus/containers /run/incus/ollama/lxc.conf: exit status 1
Try `incus info --show-log ollama` for more info
indiana@incusrv01:~$ 
indiana@incusrv01:~$ incus info --show-log ollama
Name: ollama
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/02/10 09:12 UTC
Last Used: 2024/09/15 06:54 UTC

Snapshots:
+-------+----------------------+------------+----------+
| NAME  |       TAKEN AT       | EXPIRES AT | STATEFUL |
+-------+----------------------+------------+----------+
| snap0 | 2024/03/20 18:12 UTC |            | NO       |
+-------+----------------------+------------+----------+

Log:

lxc ollama 20240915065425.305 WARN     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc ollama 20240915065425.305 WARN     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc ollama 20240915065425.306 WARN     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc ollama 20240915065425.306 WARN     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc ollama 20240915065425.378 ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc ollama 20240915065425.379 ERROR    conf - ../src/lxc/conf.c:lxc_setup:3940 - Failed to run mount hooks
lxc ollama 20240915065425.379 ERROR    start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "ollama"
lxc ollama 20240915065425.379 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc ollama 20240915065425.384 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth6fd8a646"
lxc ollama 20240915065425.384 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc ollama 20240915065425.384 ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "ollama"
lxc ollama 20240915065425.384 WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 5171
lxc 20240915065425.455 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240915065425.455 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

Hi all,
I have found the answer, nvidia drivers are missing and executing ubuntu-drivers autoinstall solved the issue.
Regards.

nvidia.runtime=true should have provided the necessary NVidia container runtime.

Does this mean that there is a corner case when you upgrade both Ubuntu and Incus, the NVidia container runtime may not get updated as necessary?

The workaround when you do not get automatically the correct NVidia container runtime, is to install the whole lot of the NVidia drivers in the container, and the container instance will pick the necessary libraries that are required. Of course, by doing so, you are installing kernel modules and what not, which are not used.

Hi Simos,
I don’t think that the problem related with incus, I realized later on that the Ubuntu 22.04 → 24.04 upgrade did not actually go well. After checking the modules I found out nvidia modules did not loaded on the host. So I reinstalled the nvidia modules again.
Regards.