Migrating from LXD 6.1?

friki67 · August 28, 2024, 11:04am

EDIT: using --ignore-version-check. I will try tonight and see

Hello. Rocky 9.4 here. Snap installed LXD, I was thinking it was version 5.21.2, but it appears to be 6.1 (lxd version returns this)

Installed Incus using:

dnf -y install epel-release
dnf config-manager --enable crb
dnf copr enable neil/incus
dnf install incus
dnf install incus-tools

lxd-to-incus says it is not possible to migrate because the version.

How can I do it?

simos · August 28, 2024, 11:41am

lxd-to-incus has a flag, --ignore-version-check, which ignores the version check.

When you ignore the version check, you enter uncharted territory.
Let’s chart the territory.

We launch a test VM with LXD 6.1, create a few containers, then lxd-to-incus with --ignore-version-check.

Well, it worked for me. The migrated system to Incus could list the containers.
Of course, I do not know what side-effects could appear later.
Stéphane could shed some light here.

$ incus launch images:ubuntu/24.04/cloud lxd-to-incus --vm
Launching lxd-to-incus
$ incus ubuntu lxd-to-incus
Error: Instance is not running
$ incus ubuntu lxd-to-incus
Error: VM agent isn't currently running
$ incus ubuntu lxd-to-incus
Error: VM agent isn't currently running
$ incus ubuntu lxd-to-incus
sudo: unknown user ubuntu
sudo: error initializing audit plugin sudoers_audit
$ incus ubuntu lxd-to-incus
sudo: unknown user ubuntu
sudo: error initializing audit plugin sudoers_audit
$ incus ubuntu lxd-to-incus
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@lxd-to-incus:~$ sudo apt install snapd
...
ubuntu@lxd-to-incus:~$ snap info lxd
...
channels:
  5.21/stable:      5.21.2-22f93f4 2024-08-22 (29948) 109MB -   #### DEFAULT
...
ubuntu@lxd-to-incus:~$ sudo snap install lxd
lxd (5.21/stable) 5.21.2-22f93f4 from Canonical✓ installed
ubuntu@lxd-to-incus:~$ sudo lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (ceph, dir, lvm, powerflex, zfs, btrfs) [default=zfs]: 
Create a new ZFS pool? (yes/no) [default=yes]: 
Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]: 
Size in GiB of the new loop device (1GiB minimum) [default=5GiB]: 3
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like the LXD server to be available over the network? (yes/no) [default=no]: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 
ubuntu@lxd-to-incus:~$ lxc launch ubuntu:24.04 mycontainer
Creating mycontainer
Starting mycontainer                        
ubuntu@lxd-to-incus:~$ lxc launch images:alpine/edge myalpine
Creating myalpine
Starting myalpine                           
ubuntu@lxd-to-incus:~$ lxc list
+-------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
|    NAME     |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |
+-------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| myalpine    | RUNNING | 10.103.93.250 (eth0) | fd42:e979:169b:69d1:216:3eff:fe25:2086 (eth0) | CONTAINER | 0         |
+-------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| mycontainer | RUNNING | 10.103.93.216 (eth0) | fd42:e979:169b:69d1:216:3eff:fe21:3519 (eth0) | CONTAINER | 0         |
+-------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
ubuntu@lxd-to-incus:~$ sudo apt install incus
...
ubuntu@lxd-to-incus:~$ incus --version
6.0.0
ubuntu@lxd-to-incus:~$ sudo apt install incus-tools
...
ubuntu@lxd-to-incus:~$ sudo lxd-to-incus 
=> Looking for source server
==> Detected: snap package
=> Looking for target server
==> Detected: systemd
=> Connecting to source server
=> Connecting to the target server
=> Checking server versions
==> Source version: 5.21.2
==> Target version: 6.0.0
=> Validating version compatibility
=> Checking that the source server isn't empty
=> Checking that the target server is empty
=> Validating source server configuration

Source server uses obsolete features:
 - Required command "zfs" is missing for storage pool "default"
Error: Source server is using incompatible configuration
ubuntu@lxd-to-incus:~$ sudo apt install zfsutils-linux
...
ubuntu@lxd-to-incus:~$ sudo lxd-to-incus 
=> Looking for source server
==> Detected: snap package
=> Looking for target server
==> Detected: systemd
=> Connecting to source server
=> Connecting to the target server
=> Checking server versions
==> Source version: 5.21.2
==> Target version: 6.0.0
=> Validating version compatibility
=> Checking that the source server isn't empty
=> Checking that the target server is empty
=> Validating source server configuration

The migration is now ready to proceed.
At this point, the source server and all its instances will be stopped.
Instances will come back online once the migration is complete.
Proceed with the migration? [default=no]: no
ubuntu@lxd-to-incus:~$ lxd --version
5.21.2 LTS
ubuntu@lxd-to-incus:~$ sudo snap refresh --channel latest/stable lxd
lxd 6.1-efad198 from Canonical✓ refreshed
ubuntu@lxd-to-incus:~$ sudo lxd-to-incus 
=> Looking for source server
==> Detected: snap package
=> Looking for target server
==> Detected: systemd
=> Connecting to source server
=> Connecting to the target server
=> Checking server versions
==> Source version: 6.1
==> Target version: 6.0.0
=> Validating version compatibility
Error: LXD version is newer than maximum version "5.21.99"
ubuntu@lxd-to-incus:~$ sudo lxd-to-incus --ignore-version-check
=> Looking for source server
==> Detected: snap package
=> Looking for target server
==> Detected: systemd
=> Connecting to source server
=> Connecting to the target server
=> Checking server versions
==> Source version: 6.1
==> Target version: 6.0.0
=> Validating version compatibility
==> WARNING: User asked to bypass version check
=> Checking that the source server isn't empty
=> Checking that the target server is empty
=> Validating source server configuration

The migration is now ready to proceed.
At this point, the source server and all its instances will be stopped.
Instances will come back online once the migration is complete.
Proceed with the migration? [default=no]: yes
=> Stopping the source server
=> Stopping the target server
=> Wiping the target server
=> Migrating the data
=> Migrating database
=> Writing database patch
=> Cleaning up target paths
=> Starting the target server
=> Checking the target server
Uninstall the LXD package? [default=no]: yes
=> Uninstalling the source server
ubuntu@lxd-to-incus:~$ sudo incus list
+-------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
|    NAME     |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |
+-------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| myalpine    | RUNNING | 10.103.93.250 (eth0) | fd42:e979:169b:69d1:216:3eff:fe25:2086 (eth0) | CONTAINER | 0         |
+-------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| mycontainer | RUNNING | 10.103.93.216 (eth0) | fd42:e979:169b:69d1:216:3eff:fe21:3519 (eth0) | CONTAINER | 0         |
+-------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
ubuntu@lxd-to-incus:~$

friki67 · August 28, 2024, 12:15pm

Thank you Simos!

I will try tonigh and I’ll tell here the result.

friki67 · August 28, 2024, 9:26pm

Hello! I migrated, it seems ok. Something weird is happening with my uuid. Only my priviliged containers are running, the others cannot start

incus start u1

Error: Failed to run: /usr/libexec/incus/incusd forkstart u1 /var/lib/incus/containers /run/incus/u1/lxc.conf: exit status 1
Try `incus info --show-log u1` for more info

incus info --show-log u1

Name: u1
Status: STOPPED
Type: container
Arquitectura: x86_64
Creado: 2024/02/07 12:47 CET
Last Used: 2024/08/28 23:21 CEST

Registro:

lxc u1 20240828212102.787 ERROR    conf - conf.c:lxc_map_ids:3668 - newuidmap failed to write mapping "newuidmap: uid range [1000-1001) -> [1000-1001) not allowed": newuidmap 37066 0 1000000 1000 1000 1000 1 1001 1001001 999998999
lxc u1 20240828212102.787 ERROR    start - start.c:lxc_spawn:1791 - Failed to set up id mapping.
lxc u1 20240828212102.787 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING"
lxc u1 20240828212102.788 ERROR    start - start.c:__lxc_start:2074 - Failed to spawn container "u1"
lxc u1 20240828212102.788 WARN     start - start.c:lxc_abort:1039 - No existe el proceso - Failed to send SIGKILL via pidfd 17 for process 37066
lxc 20240828212102.836 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Conexión reinicializada por la máquina remota - Failed to receive response

cat /etc/subuid

miguel:100000:65536
root:1000000:1000000000

cat /etc/subgid

miguel:100000:65536
root:1000000:1000000000

I’ve tried with subuid and subgid containing only the root line, but same thing happen. I restarted incus daemon after every subuid/subgid change.

It was working in LXD without the root line in subuid/subgid, I added it because of this: Incus: No uid/gid allocation configured - #11 by stgraber

But it does not fix it for me.

EDIT: I 've found this

Containers are now starting. I’m too tired to test if they work ok now
Let’s see tomorrow.

friki67 · August 29, 2024, 4:03pm

Hello again. I have some docker containers in my Incus containers. Some of them are running ok, but other aren’t. I’m receiving this message when docker compose up

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: unable to join session keyring: unable to create session key: disk quota exceeded: unknown

It is fixed using

echo 200000 | sudo tee /proc/sys/kernel/keys/maxkeys

I’ve faced this problem before. It is strange that is was working with LXD with a maxkey limit of 200!

The other problem I’m facing is happening in only one container. It is Rocky 9.4 now. The docker daemon doesn’t run, and dockerd --debug returns:

dockerd --debug

INFO[2024-08-29T15:45:55.524465674Z] Starting up                                  
DEBU[2024-08-29T15:45:55.524958975Z] Listener created for HTTP on unix (/var/run/docker.sock) 
DEBU[2024-08-29T15:45:55.540630308Z] Golang's threads limit set to 922050         
DEBU[2024-08-29T15:45:55.540992017Z] metrics API listening on /var/run/docker/metrics.sock 
DEBU[2024-08-29T15:45:55.546536378Z] Using default logging driver json-file       
DEBU[2024-08-29T15:45:55.546632353Z] No quota support for local volumes in /var/lib/docker/volumes: Filesystem does not support, or has not enabled quotas 
DEBU[2024-08-29T15:45:55.546719068Z] processing event stream                       module=libcontainerd namespace=plugins.moby
INFO[2024-08-29T15:45:55.557498528Z] [graphdriver] trying configured driver: fuse-overlayfs 
DEBU[2024-08-29T15:45:55.557555432Z] Initialized graph driver fuse-overlayfs      
DEBU[2024-08-29T15:45:55.568921836Z] Max Concurrent Downloads: 3                  
DEBU[2024-08-29T15:45:55.568937583Z] Max Concurrent Uploads: 5                    
DEBU[2024-08-29T15:45:55.568943050Z] Max Download Attempts: 5                     
INFO[2024-08-29T15:45:55.568955397Z] Loading containers: start.                   
DEBU[2024-08-29T15:45:55.569145589Z] processing event stream                       module=libcontainerd namespace=moby
DEBU[2024-08-29T15:45:55.570405947Z] loaded container                              container=ab07a9bd5e87c32b1081d15bdae564366185075850ac872644a7e98b1a304629 paused=false running=false
DEBU[2024-08-29T15:45:55.570409096Z] loaded container                              container=36c080f1cbabc749893086dc65a9c9b4179abd8cf2cb127ca41eef49c3b79b33 paused=false running=false
DEBU[2024-08-29T15:45:55.570488080Z] loaded container                              container=da2937c0041a0e3ccfbc10bed871d33134bf561e7c95069dc4634a5b175dbd54 paused=false running=false
DEBU[2024-08-29T15:45:55.593659476Z] restoring container                           container=da2937c0041a0e3ccfbc10bed871d33134bf561e7c95069dc4634a5b175dbd54 paused=false restarting=false running=false
DEBU[2024-08-29T15:45:55.594235335Z] done restoring container                      container=da2937c0041a0e3ccfbc10bed871d33134bf561e7c95069dc4634a5b175dbd54 paused=false restarting=false running=false
DEBU[2024-08-29T15:45:55.594771909Z] restoring container                           container=ab07a9bd5e87c32b1081d15bdae564366185075850ac872644a7e98b1a304629 paused=false restarting=false running=false
DEBU[2024-08-29T15:45:55.595084026Z] done restoring container                      container=ab07a9bd5e87c32b1081d15bdae564366185075850ac872644a7e98b1a304629 paused=false restarting=false running=false
DEBU[2024-08-29T15:45:55.596864311Z] restoring container                           container=36c080f1cbabc749893086dc65a9c9b4179abd8cf2cb127ca41eef49c3b79b33 paused=false restarting=false running=false
DEBU[2024-08-29T15:45:55.597158919Z] done restoring container                      container=36c080f1cbabc749893086dc65a9c9b4179abd8cf2cb127ca41eef49c3b79b33 paused=false restarting=false running=false
DEBU[2024-08-29T15:45:55.597195554Z] Option DefaultDriver: bridge                 
DEBU[2024-08-29T15:45:55.597208991Z] Option DefaultNetwork: bridge                
DEBU[2024-08-29T15:45:55.597215125Z] Network Control Plane MTU: 1500              
WARN[2024-08-29T15:45:55.598666794Z] Running modprobe bridge br_netfilter failed with message: modprobe: WARNING: Module bridge not found in directory /lib/modules/5.14.0-427.31.1.el9_4.x86_64
modprobe: WARNING: Module br_netfilter not found in directory /lib/modules/5.14.0-427.31.1.el9_4.x86_64
, error: exit status 1 
INFO[2024-08-29T15:45:55.601894278Z] unable to detect if iptables supports xlock: 'iptables --wait -L -n': `modprobe: FATAL: Module ip_tables not found in directory /lib/modules/5.14.0-427.31.1.el9_4.x86_64
iptables v1.8.10 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.`  error="exit status 3"
DEBU[2024-08-29T15:45:55.602002634Z] /usr/sbin/iptables, [-t filter -C FORWARD -j DOCKER-ISOLATION] 
DEBU[2024-08-29T15:45:55.603910782Z] /usr/sbin/iptables, [-t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER] 
DEBU[2024-08-29T15:45:55.607944359Z] /usr/sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER] 
DEBU[2024-08-29T15:45:55.611988888Z] /usr/sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER] 
DEBU[2024-08-29T15:45:55.615973541Z] /usr/sbin/iptables, [-t nat -D PREROUTING]   
DEBU[2024-08-29T15:45:55.617706497Z] /usr/sbin/iptables, [-t nat -D OUTPUT]       
DEBU[2024-08-29T15:45:55.619462535Z] /usr/sbin/iptables, [-t nat -F DOCKER]       
DEBU[2024-08-29T15:45:55.621272519Z] /usr/sbin/iptables, [-t nat -X DOCKER]       
DEBU[2024-08-29T15:45:55.623076557Z] /usr/sbin/iptables, [-t filter -F DOCKER]    
DEBU[2024-08-29T15:45:55.624820471Z] /usr/sbin/iptables, [-t filter -X DOCKER]    
DEBU[2024-08-29T15:45:55.626648527Z] /usr/sbin/iptables, [-t filter -F DOCKER-ISOLATION-STAGE-1] 
DEBU[2024-08-29T15:45:55.628377140Z] /usr/sbin/iptables, [-t filter -X DOCKER-ISOLATION-STAGE-1] 
DEBU[2024-08-29T15:45:55.630208887Z] /usr/sbin/iptables, [-t filter -F DOCKER-ISOLATION-STAGE-2] 
DEBU[2024-08-29T15:45:55.632041867Z] /usr/sbin/iptables, [-t filter -X DOCKER-ISOLATION-STAGE-2] 
DEBU[2024-08-29T15:45:55.633830406Z] /usr/sbin/iptables, [-t filter -F DOCKER-ISOLATION] 
DEBU[2024-08-29T15:45:55.635569401Z] /usr/sbin/iptables, [-t filter -X DOCKER-ISOLATION] 
DEBU[2024-08-29T15:45:55.637321391Z] /usr/sbin/iptables, [-t nat -n -L DOCKER]    
DEBU[2024-08-29T15:45:55.639065114Z] /usr/sbin/iptables, [-t nat -N DOCKER]       
DEBU[2024-08-29T15:45:55.641000152Z] daemon configured with a 15 seconds minimum shutdown timeout 
DEBU[2024-08-29T15:45:55.641027274Z] start clean shutdown of all containers with a 15 seconds timeout... 
DEBU[2024-08-29T15:45:55.641839178Z] Cleaning up old mountid : start.             
INFO[2024-08-29T15:45:55.641886122Z] stopping event stream following graceful shutdown  error="<nil>" module=libcontainerd namespace=moby
DEBU[2024-08-29T15:45:55.642044012Z] Cleaning up old mountid : done.              
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to register "bridge" driver: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: modprobe: FATAL: Module ip_tables not found in directory /lib/modules/5.14.0-427.31.1.el9_4.x86_64
iptables v1.8.10 (legacy): can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
 (exit status 3)

Once again, it was working before migration … do you know what can be happening?

oh, the container config:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Rockylinux 9 amd64 (20240205_02:06)
  image.os: Rockylinux
  image.release: "9"
  image.serial: "20240205_02:06"
  image.type: squashfs
  image.variant: default
  raw.idmap: both 1000 1000
  security.nesting: "true"
  security.syscalls.intercept.mknod: "true"
  security.syscalls.intercept.setxattr: "true"
  volatile.base_image: d30a41f3c51c5bc88d3ee1497a82d8e5cd8844f836d7eeb6f51e2beecc1d78e1
  volatile.cloud-init.instance-id: 419ec603-ace1-4ae4-b80f-38d3b9e8a117
  volatile.eth0.host_name: vethab51f2a6
  volatile.eth0.hwaddr: 00:16:3e:7a:68:e5
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1001001,"Nsid":1001,"Maprange":64535},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":false,"Isgid":true,"Hostid":1001001,"Nsid":1001,"Maprange":64535}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1001001,"Nsid":1001,"Maprange":64535},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":false,"Isgid":true,"Hostid":1001001,"Nsid":1001,"Maprange":64535}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: 82599408-a2cd-47d1-85ac-be975c08d47f
  volatile.uuid.generation: 82599408-a2cd-47d1-85ac-be975c08d47f
devices:
  eth0:
    ipv4.address: 192.168.5.108
    name: eth0
    network: lxdbr0
    type: nic
  gpu:
    id: "0"
    type: gpu
  mainconf:
    path: /mnt/main/conf/u8
    source: /mnt/main/conf/u8
    type: disk
  maindata:
    path: /mnt/main/data/u8/
    source: /mnt/main/data/u8/
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

simos · August 29, 2024, 4:36pm

friki67:

failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to register "bridge" driver: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: modprobe: FATAL: Module ip_tables not found in directory /lib/modules/5.14.0-427.31.1.el9_4.x86_64
iptables v1.8.10 (legacy): can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.

It says here that the kernel module is not available.

friki67 · August 29, 2024, 5:45pm

I saw it. All the rest of containers (docker daemons inside them) are working, and this container (well, the container works, the docker daemon inside the container doesn’t) was working before migrating.

In the host the modules are available in /lib/modules/5.14.0-427.31.1.el9_4.x86_64/kernet/net/bridge and lib/modules/5.14.0-427.31.1.el9_4.x86_64/kernet/net/netfilter but inside the container, in lib/modules I only see

[root@u8r modules]# ls -la
total 28
drwxr-xr-x.  7 root root 4096 Aug 29 07:17 .
dr-xr-xr-x. 31 root root 4096 Jun 21 08:40 ..
lrwxrwxrwx.  1 root root   11 Aug  9 16:33 kabi-current -> kabi-rhel94
drwxr-xr-x.  2 root root 4096 Aug 29 07:17 kabi-rhel90
drwxr-xr-x.  2 root root 4096 Aug 29 07:17 kabi-rhel91
drwxr-xr-x.  2 root root 4096 Aug 29 07:17 kabi-rhel92
drwxr-xr-x.  2 root root 4096 Aug 29 07:17 kabi-rhel93
drwxr-xr-x.  2 root root 4096 Aug 29 07:17 kabi-rhel94

in the host:

ls -la
total 72
drwxr-xr-x. 18 root root 4096 ago 29 10:07 .
dr-xr-xr-x. 46 root root 4096 jun 20 20:57 ..
drwxr-xr-x.  3 root root 4096 nov 24  2023 5.14.0-284.11.1.el9_2.x86_64
drwxr-xr-x.  2 root root 4096 feb  4  2024 5.14.0-284.25.1.el9_2.x86_64
drwxr-xr-x.  2 root root 4096 abr  5 09:42 5.14.0-284.30.1.el9_2.x86_64
drwxr-xr-x.  3 root root 4096 jun 20 20:54 5.14.0-362.18.1.el9_3.x86_64
drwxr-xr-x.  2 root root 4096 ago 29 10:08 5.14.0-362.24.1.el9_3.0.1.x86_64
drwxr-xr-x.  3 root root 4096 jul 24 17:29 5.14.0-362.24.1.el9_3.x86_64
drwxr-xr-x.  2 root root 4096 may  8 18:26 5.14.0-362.8.1.el9_3.x86_64
drwxr-xr-x.  8 root root 4096 ago 29 10:10 5.14.0-427.22.1.el9_4.x86_64
drwxr-xr-x.  8 root root 4096 ago 29 10:10 5.14.0-427.24.1.el9_4.x86_64
drwxr-xr-x.  8 root root 4096 ago 29 16:03 5.14.0-427.31.1.el9_4.x86_64
drwxr-xr-x.  3 root root 4096 jun 20 20:50 5.14.0-427.el9.x86_64
lrwxrwxrwx.  1 root root   11 ago  9 18:33 kabi-current -> kabi-rhel94
drwxr-xr-x.  2 root root 4096 ago 29 10:07 kabi-rhel90
drwxr-xr-x.  2 root root 4096 ago 29 10:07 kabi-rhel91
drwxr-xr-x.  2 root root 4096 ago 29 10:07 kabi-rhel92
drwxr-xr-x.  2 root root 4096 ago 29 10:07 kabi-rhel93
drwxr-xr-x.  2 root root 4096 ago 29 10:07 kabi-rhel94

Why was this working before? Shouldn’t docker daemon use the kernel modules in the host?

simos · August 29, 2024, 7:06pm

Now that I think about it, something is keeping the container from being able to run certain commands. The message tries to be helpful for those that install on the host, but you should read it as saying you are running something in a container, which provides isolation, and due to that isolation I do not have access anymore to a resource. I would try to rebuild the container from scratch in Incus.

friki67 · August 29, 2024, 9:16pm

This is at least curious. Depending on the order of installation, it works or not.
If I install the NVIDIA driver first, Docker fails to boot. If I install Docker first and then NVIDIA, then it does work.
What I have done:

incus launch images:rockylinux/9 rt
incus config set rt security.syscalls.intercept.mknod true
incus config set rt security.syscalls.intercept.setxattr true
incus config set rt security.nesting true
incus restart rt
incus exec rt bash

into the container

dnf install 'dnf-command(config-manager)'
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo systemctl --now enable docker

And now docker is running ok.

The twist is when I try to add the Nvidia driver into the container.

incus config device add rt gpu gpu id=0

And in the container I followed the instructions to install nvidia drivers from RPM Fusion (using these in host and in here without problem before migration). I do this because I want to run a docker container that needs CUDA support.

Install RPM Fusion repository:

sudo dnf install --nogpgcheck https://dl.fedoraproject.org/pub/epel/epel-release-latest-$(rpm -E %rhel).noarch.rpm
sudo dnf install --nogpgcheck https://mirrors.rpmfusion.org/free/el/rpmfusion-free-release-$(rpm -E %rhel).noarch.rpm https://mirrors.rpmfusion.org/nonfree/el/rpmfusion-nonfree-release-$(rpm -E %rhel).noarch.rpm

Install Nvidia drivers

sudo dnf update -y # and reboot if you are not on the latest kernel
sudo dnf install akmod-nvidia # rhel/centos users can use kmod-nvidia instead
sudo dnf install xorg-x11-drv-nvidia-cuda #optional for cuda/nvdec/nvenc support

Then install the Nvidia container toolkit

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo yum install -y nvidia-container-toolkit

Then in rocky I should do

set “no-cgroups = true” in the Nvidia docker config file. /etc/nvidia-container-runtime/config.toml

/etc/docker/daemon.json

{
  "storage-driver": "fuse-overlayfs",
  "runtimes": {
    "nvidia": {
    "path": "nvidia-container-runtime",
    "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia"
}

Restart docker.

Now it seems to work. Tomorrow I will complete configuration and test my docker container.

Thank you for your help, time and knowledge

delmar · November 20, 2024, 11:13pm

Thank you for this. I will note however that this worked with the zabbly (6.7) but not with the ubuntu 24.04 (6.0.x).

delmar · November 24, 2024, 4:54pm

I am also curious since I am about to convert my production host to incus if this is reversible? If I snapshot my storage pools can I revert to lxd in the event of catastrophe?