Incus Container Wont stop or allow itself to be deleted

I have rebuilt from LXD Incus 4 server cluster, and I have one instance that is stuck. Can’t stop or remove it via Incus command line. Wondering if it is safe to manually delete it off disk or is there a better way. Everything else seems to run fine. Any ideas?

Also getting this in some new containers on their startup.
These are been brought in from LXD
root@Q3:/home/ic2000# incus info --show-log WP-SPACEWATCH2024
Name: WP-SPACEWATCH2024
Status: STOPPED
Type: container
Architecture: x86_64
Location: Q1
Created: 2024/01/01 03:00 EST
Last Used: 2024/01/01 03:01 EST

Log:

lxc WP-SPACEWATCH2024 20240101080101.541 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc WP-SPACEWATCH2024 20240101080101.565 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc WP-SPACEWATCH2024 20240101080101.566 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc WP-SPACEWATCH2024 20240101080101.566 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc WP-SPACEWATCH2024 20240101080101.566 WARN cgfsng - …/src/lxc/cgroups/cgfsng.c:fchowmodat:1619 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )

root@Q3:/home/ic2000# incus start WP-SPACEWATCH2024
Error: Failed to run: /opt/incus/bin/incusd forkstart WP-SPACEWATCH2024 /var/lib/incus/containers /var/log/incus/WP-SPACEWATCH2024/lxc.conf: exit status 1
Try incus info --show-log WP-SPACEWATCH2024 for more info
root@Q3:/home/ic2000# incus info --show-log WP-SPACEWATCH2024
Name: WP-SPACEWATCH2024
Status: STOPPED
Type: container
Architecture: x86_64
Location: Q1
Created: 2024/01/01 03:00 EST
Last Used: 2024/01/01 04:04 EST

Log:

lxc WP-SPACEWATCH2024 20240101090455.639 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc WP-SPACEWATCH2024 20240101090455.639 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc WP-SPACEWATCH2024 20240101090455.640 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc WP-SPACEWATCH2024 20240101090455.640 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc WP-SPACEWATCH2024 20240101090455.641 WARN cgfsng - …/src/lxc/cgroups/cgfsng.c:fchowmodat:1619 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc WP-SPACEWATCH2024 20240101090455.770 ERROR conf - …/src/lxc/conf.c:run_buffer:322 - Script exited with status 32
lxc WP-SPACEWATCH2024 20240101090455.770 ERROR conf - …/src/lxc/conf.c:lxc_setup:4437 - Failed to run mount hooks
lxc WP-SPACEWATCH2024 20240101090455.771 ERROR start - …/src/lxc/start.c:do_start:1272 - Failed to setup container “WP-SPACEWATCH2024”
lxc WP-SPACEWATCH2024 20240101090455.771 ERROR sync - …/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc WP-SPACEWATCH2024 20240101090455.778 WARN network - …/src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from “eth0” to its initial name “vetha73f8f3e”
lxc WP-SPACEWATCH2024 20240101090455.778 ERROR lxccontainer - …/src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state “ABORTING” instead of “RUNNING”
lxc WP-SPACEWATCH2024 20240101090455.778 ERROR start - …/src/lxc/start.c:__lxc_start:2107 - Failed to spawn container “WP-SPACEWATCH2024”
lxc WP-SPACEWATCH2024 20240101090455.778 WARN start - …/src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 41 for process 27343
lxc 20240101090455.930 ERROR af_unix - …/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240101090455.930 ERROR commands - …/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command “get_init_pid”

What storage pool driver is in use?

Anything useful in /var/log/incus/incusd.log?

cat /var/log/incus/incusd.log
time=“2024-01-01T02:36:31-05:00” level=warning msg=“Dqlite: attempt 1: server 84.17.40.18:8443: no known leader”
time=“2024-01-01T02:36:31-05:00” level=warning msg=“Dqlite: attempt 1: server 84.17.40.18:8443: no known leader”
time=“2024-01-01T03:47:39-05:00” level=error msg=“Failed starting instance” action=start created=“2024-01-01 07:45:48.24563635 +0000 UTC” ephemeral=false instance=WPZ-DATABASE2024 instanceType=container project=default stateful=false used=“2024-01-01 07:45:51.043636476 +0000 UTC”
time=“2024-01-01T03:53:24-05:00” level=error msg=“Failed starting instance” action=start created=“2023-12-31 06:46:34.087557759 +0000 UTC” ephemeral=false instance=WP-WARHAPPENS2024 instanceType=container project=default stateful=false used=“2023-12-31 06:46:41.133819429 +0000 UTC”
time=“2024-01-01T03:53:42-05:00” level=error msg=“Failed starting instance” action=start created=“2023-12-31 06:46:34.087557759 +0000 UTC” ephemeral=false instance=WP-WARHAPPENS2024 instanceType=container project=default stateful=false used=“2024-01-01 08:53:24.443278498 +0000 UTC”
time=“2024-01-01T03:54:12-05:00” level=error msg=“Failed starting instance” action=start created=“2023-12-31 06:46:34.087557759 +0000 UTC” ephemeral=false instance=WP-WARHAPPENS2024 instanceType=container project=default stateful=false used=“2024-01-01 08:53:42.039945382 +0000 UTC”
time=“2024-01-01T04:03:05-05:00” level=error msg=“Failed starting instance” action=start created=“2023-12-31 06:46:34.087557759 +0000 UTC” ephemeral=false instance=WP-WARHAPPENS2024 instanceType=container project=default stateful=false used=“2024-01-01 08:54:12.063983709 +0000 UTC”
time=“2024-01-01T04:04:55-05:00” level=error msg=“Failed starting instance” action=start created=“2024-01-01 08:00:58.641996981 +0000 UTC” ephemeral=false instance=WP-SPACEWATCH2024 instanceType=container project=default stateful=false used=“2024-01-01 08:01:01.482134493 +0000 UTC”

cluster list
±-----±-------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| NAME | URL | ROLES | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE | MESSAGE |
±-----±-------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| Q1 | https://84.17.40.18:8443 | database-standby | x86_64 | default | | ONLINE | Fully operational |
±-----±-------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| Q2 | https://84.17.40.19:8443 | database | x86_64 | default | | ONLINE | Fully operational |
±-----±-------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| Q3 | https://84.17.40.20:8443 | database | x86_64 | default | | ONLINE | Fully operational |
±-----±-------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| Q4 | https://84.17.40.21:8443 | database-leader | x86_64 | default | | ONLINE | Fully operational |
| | | database | | | | | |

Some more info. There are copied Containers from LXD, so may there is something with them.

cat console.log
systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Ubuntu 18.04.6 LTS!

Set hostname to .
Initializing machine ID from random generator.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
File /lib/systemd/system/systemd-journald.service:36 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup based firewalling.
Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.)
[ OK ] Started Dispatch Password Requests to Console Directory Watch.
[ OK ] Reached target Swap.
[ OK ] Started Forward Password Requests to Wall Directory Watch.
[ OK ] Reached target Local Encrypted Volumes.
[UNSUPP] Starting of Arbitrary Executable Fi…tem Automount Point not supported.
[ OK ] Reached target Remote File Systems.
system.slice: Failed to reset devices.list: Operation not permitted
[ OK ] Created slice System Slice.
[ OK ] Listening on udev Kernel Socket.
[ OK ] Listening on Journal Socket.
systemd-sysctl.service: Failed to reset devices.list: Operation not permitted
Starting Apply Kernel Variables…
keyboard-setup.service: Failed to reset devices.list: Operation not permitted
Starting Set the console keyboard layout…
[ OK ] Listening on Network Service Netlink Socket.
[ OK ] Listening on Syslog Socket.
user.slice: Failed to reset devices.list: Operation not permitted
[ OK ] Created slice User and Session Slice.
[ OK ] Reached target Slices.
[ OK ] Reached target Paths.
[ OK ] Listening on udev Control Socket.
systemd-udev-trigger.service: Failed to reset devices.list: Operation not permitted
Starting udev Coldplug all Devices…
[ OK ] Listening on Journal Socket (/dev/log).
systemd-journald.service: Failed to reset devices.list: Operation not permitted
Starting Journal Service…
systemd-tmpfiles-setup-dev.service: Failed to reset devices.list: Operation not permitted
Starting Create Static Device Nodes in /dev…
[ OK ] Listening on /dev/initctl Compatibility Named Pipe.
[ OK ] Started Apply Kernel Variables.
[ OK ] Started Create Static Device Nodes in /dev.
systemd-udevd.service: Failed to reset devices.list: Operation not permitted
Starting udev Kernel Device Manager…
[ OK ] Started Journal Service.
Starting Flush Journal to Persistent Storage…
[ OK ] Started udev Kernel Device Manager.
[ OK ] Started Set the console keyboard layout.
[ OK ] Reached target Local File Systems (Pre).
[ OK ] Reached target Local File Systems.
Starting Set console font and keymap…
Starting Network Service…
[ OK ] Started Set console font and keymap.
[ OK ] Started Flush Journal to Persistent Storage.
Starting Create Volatile Files and Directories…
[ OK ] Started Network Service.
[ OK ] Started Create Volatile Files and Directories.
[ OK ] Reached target System Time Synchronized.
Starting Update UTMP about System Boot/Shutdown…
Starting Network Name Resolution…
[ OK ] Started Update UTMP about System Boot/Shutdown.
[ OK ] Started Network Name Resolution.
[ OK ] Reached target Network.
[ OK ] Reached target Host and Network Name Lookups.
[ OK ] Started udev Coldplug all Devices.
[ OK ] Reached target System Initialization.
[ OK ] Started Daily apt download activities.
[ OK ] Started Message of the Day.
[ OK ] Listening on D-Bus System Message Bus Socket.
[ OK ] Reached target Sockets.
[ OK ] Reached target Basic System.
Starting Dispatcher daemon for systemd-networkd…
[ OK ] Started Regular background program processing daemon.
[ OK ] Started Daily apt upgrade and clean activities.
[ OK ] Started D-Bus System Message Bus.
Starting System Logging Service…
Starting Permit User Sessions…
Starting Login Service…
[ OK ] Started Daily Cleanup of Temporary Directories.
[ OK ] Reached target Timers.
[ OK ] Started System Logging Service.
[ OK ] Started Permit User Sessions.
[ OK ] Created slice system-getty.slice.
[ OK ] Started Console Getty.
[ OK ] Reached target Login Prompts.
Starting Hostname Service…
[ OK ] Started Login Service.
[ OK ] Started Hostname Service.
[ OK ] Started Dispatcher daemon for systemd-networkd.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Starting Update UTMP about System Runlevel Changes…
[ OK ] Started Update UTMP about System Runlevel Changes.

Ubuntu 18.04.6 LTS WP-SPACEWATCH2024 console

WP-SPACEWATCH2024 login: [ OK ] Stopped target Host and Network Name Lookups.
[ OK ] Stopped target Timers.
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Stopped Message of the Day.
[ OK ] Removed slice system-getty.slice.
[ OK ] Stopped Daily Cleanup of Temporary Directories.
[ OK ] Stopped Daily apt upgrade and clean activities.
[ OK ] Stopped Daily apt download activities.
[ OK ] Stopped target System Time Synchronized.
[ OK ] Stopped target Graphical Interface.
[ OK ] Stopped target Multi-User System.
[ OK ] Stopped target Login Prompts.
Stopping Console Getty…
Stopping System Logging Service…
Stopping D-Bus System Message Bus…
Stopping Dispatcher daemon for systemd-networkd…
Stopping Regular background program processing daemon…
Stopping Login Service…
[ OK ] Stopped Dispatcher daemon for systemd-networkd.
[ OK ] Stopped Regular background program processing daemon.
[ OK ] Stopped System Logging Service.
[ OK ] Stopped Login Service.
[ OK ] Stopped D-Bus System Message Bus.
[ OK ] Stopped Console Getty.
Stopping Permit User Sessions…
[ OK ] Stopped Permit User Sessions.
[ OK ] Stopped target Basic System.
[ OK ] Stopped target Sockets.
[ OK ] Closed D-Bus System Message Bus Socket.
[ OK ] Closed Syslog Socket.
[ OK ] Stopped target Paths.
[ OK ] Stopped target Slices.
[ OK ] Removed slice User and Session Slice.
[ OK ] Stopped target System Initialization.
[ OK ] Stopped target Local Encrypted Volumes.
[ OK ] Stopped Forward Password Requests to Wall Directory Watch.
[ OK ] Stopped Dispatch Password Requests to Console Directory Watch.
Stopping Update UTMP about System Boot/Shutdown…
[ OK ] Stopped target Swap.
[ OK ] Stopped target Network.
Stopping Network Name Resolution…
[ OK ] Stopped target Remote File Systems.
[ OK ] Stopped Network Name Resolution.
Stopping Network Service…
[ OK ] Stopped Update UTMP about System Boot/Shutdown.
[ OK ] Stopped Create Volatile Files and Directories.
[ OK ] Stopped target Local File Systems.
[ OK ] Stopped target Local File Systems (Pre).
[ OK ] Stopped Create Static Device Nodes in /dev.
[ OK ] Stopped Network Service.
[ OK ] Stopped Apply Kernel Variables.
[ OK ] Reached target Shutdown.
[ OK ] Reached target Final Step.
Starting Halt…

root@Q1:/var/log/incus/WP-SPACEWATCH2024# cat lxc.conf
lxc.log.file = /var/log/incus/WP-SPACEWATCH2024/lxc.log
lxc.log.level = warn
lxc.console.buffer.size = auto
lxc.console.size = auto
lxc.console.logfile = /var/log/incus/WP-SPACEWATCH2024/console.log
lxc.mount.auto = proc:rw sys:rw cgroup:mixed
lxc.autodev = 1
lxc.pty.max = 1024
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional 0 0
lxc.mount.entry = /dev/net/tun dev/net/tun none bind,create=file,optional 0 0
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/config sys/kernel/config none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/security sys/kernel/security none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/tracing sys/kernel/tracing none rbind,create=dir,optional 0 0
lxc.mount.entry = /dev/mqueue dev/mqueue none rbind,create=dir,optional 0 0
lxc.include = /opt/incus/share/lxc/config//common.conf.d/
lxc.arch = linux64
lxc.hook.version = 1
lxc.hook.pre-start = /proc/2738/exe callhook /var/lib/incus “default” “WP-SPACEWATCH2024” start
lxc.hook.stop = /opt/incus/bin/incusd callhook /var/lib/incus “default” “WP-SPACEWATCH2024” stopns
lxc.hook.post-stop = /opt/incus/bin/incusd callhook /var/lib/incus “default” “WP-SPACEWATCH2024” stop
lxc.tty.max = 0
lxc.uts.name = WP-SPACEWATCH2024
lxc.mount.entry = /var/lib/incus/guestapi dev/incus none bind,create=dir 0 0
lxc.apparmor.profile = incus-WP-SPACEWATCH2024_</var/lib/incus>//&:incus-WP-SPACEWATCH2024_:
lxc.seccomp.profile = /var/lib/incus/security/seccomp/WP-SPACEWATCH2024
lxc.idmap = u 0 1000000 1000000000
lxc.idmap = g 0 1000000 1000000000
lxc.mount.auto = shmounts:/var/lib/incus/shmounts/WP-SPACEWATCH2024:/dev/.incus-mounts
lxc.net.0.type = phys
lxc.net.0.name = eth0
lxc.net.0.flags = up
lxc.net.0.link = vetha73f8f3e
lxc.net.0.hwaddr = 00:16:3e:61:27:49
lxc.rootfs.path = dir:/var/lib/incus/storage-pools/default/containers/WP-SPACEWATCH2024/rootfs
root@Q1:/var/log/incus/WP-SPACEWATCH2024# cat lxc.log
lxc WP-SPACEWATCH2024 20240101090455.639 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc WP-SPACEWATCH2024 20240101090455.639 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc WP-SPACEWATCH2024 20240101090455.640 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc WP-SPACEWATCH2024 20240101090455.640 WARN conf - …/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc WP-SPACEWATCH2024 20240101090455.641 WARN cgfsng - …/src/lxc/cgroups/cgfsng.c:fchowmodat:1619 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc WP-SPACEWATCH2024 20240101090455.770 ERROR conf - …/src/lxc/conf.c:run_buffer:322 - Script exited with status 32
lxc WP-SPACEWATCH2024 20240101090455.770 ERROR conf - …/src/lxc/conf.c:lxc_setup:4437 - Failed to run mount hooks
lxc WP-SPACEWATCH2024 20240101090455.771 ERROR start - …/src/lxc/start.c:do_start:1272 - Failed to setup container “WP-SPACEWATCH2024”
lxc WP-SPACEWATCH2024 20240101090455.771 ERROR sync - …/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc WP-SPACEWATCH2024 20240101090455.778 WARN network - …/src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from “eth0” to its initial name “vetha73f8f3e”
lxc WP-SPACEWATCH2024 20240101090455.778 ERROR lxccontainer - …/src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state “ABORTING” instead of “RUNNING”
lxc WP-SPACEWATCH2024 20240101090455.778 ERROR start - …/src/lxc/start.c:__lxc_start:2107 - Failed to spawn container “WP-SPACEWATCH2024”
lxc WP-SPACEWATCH2024 20240101090455.778 WARN start - …/src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 41 for process 27343
lxc 20240101090455.930 ERROR af_unix - …/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240101090455.930 ERROR commands - …/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command “get_init_pid”

Ok, found the reason containers moved from LXD did not want to start there was LXD and LXCFS in /var/lib. I delete these and they can up perfect.

Ah, that’s interesting, it’s not something I ran into before and I’m a bit unsure how those would cause conflicts, but I’m glad you got those back online.

I was getting the exact same set of errors on a machine migrated from lxd (snap 5.20) to incus (0.6), and a container moved using “incus copy” from another machine. There were no lxd-related files in /var/lib.

The first error/warning was easy:

That one was fixable by “apt-get install uidmap”. I wonder if the incus-base package should declare uidmap as a dependency - or “recommends” - to alert users that they ought to install this.

However, the container was then still failing to start, with the same error @Tony_Anytime saw:

root@nuc3:~# ls -l /var/log/incus/nfsen/
total 4
-rw-r--r-- 1 root root    0 Mar 21 07:31 forkstart.log
-rw-r----- 1 root root 1407 Mar 21 07:31 lxc.log
-rw-r----- 1 root root    0 Mar 21 07:31 lxc.log.old
root@nuc3:~# cat /var/log/incus/nfsen/lxc.log
lxc nfsen 20240321073142.807 ERROR    conf - ../src/lxc/conf.c:run_buffer:322 - Script exited with status 1
lxc nfsen 20240321073142.807 ERROR    conf - ../src/lxc/conf.c:lxc_setup:4437 - Failed to run mount hooks
lxc nfsen 20240321073142.807 ERROR    start - ../src/lxc/start.c:do_start:1272 - Failed to setup container "nfsen"
lxc nfsen 20240321073142.807 ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc nfsen 20240321073142.825 WARN     network - ../src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from "eth0" to its initial name "vethf37a0873"
lxc nfsen 20240321073142.825 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc nfsen 20240321073142.825 ERROR    start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "nfsen"
lxc nfsen 20240321073142.825 WARN     start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 17 for process 54991
lxc 20240321073142.963 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240321073142.963 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

and /var/log/incus/incusd.log only showed

time="2024-03-21T07:31:42Z" level=error msg="Failed starting instance" action=start created="2024-03-20 21:49:26.189561727 +0000 UTC" ephemeral=false instance=nfsen instanceType=container project=default stateful=false used="2024-03-21 07:28:28.177445948 +0000 UTC"

The host “nuc3” is Ubuntu 22.04, with hwe kernel:

root@nuc3:~# uname -a
Linux nuc3 6.5.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 12 10:22:43 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

The container was from a zfs pool, both on its original host (nuc1) and the new host I copied it to (nuc3).

root@nuc3:~# ls -l /var/lib/incus/containers
total 0
lrwxrwxrwx 1 root root 53 Mar 20 22:50 nfsen -> /var/lib/incus/storage-pools/default/containers/nfsen
root@nuc3:~# ls /var/lib/incus/storage-pools/default/
buckets  containers  containers-snapshots  custom  custom-snapshots  images  virtual-machines  virtual-machines-snapshots
root@nuc3:~# ls /var/lib/incus/storage-pools/default/containers
nfsen
root@nuc3:~# ls /var/lib/incus/storage-pools/default/containers/nfsen/
root@nuc3:~# zfs list | grep nfsen
zfs/lxd/containers/nfsen          32.6G   198G     32.6G  legacy
root@nuc3:~# zfs get all zfs/lxd/containers/nfsen | grep mount
zfs/lxd/containers/nfsen  mounted               no                     -
zfs/lxd/containers/nfsen  mountpoint            legacy                 local
zfs/lxd/containers/nfsen  canmount              noauto                 local
root@nuc3:~#

Incidentally, there was a problem running the container on the original host (nuc1) too, and I know there was a zfs data error somewhere inside the container filesystem. I copied it using incus copy nfsen nuc3:nfsen --instance-only --allow-inconsistent in an attempt to fix this problem. On the target, zfs seems happy enough with the filesystem:

root@nuc3:~# zfs mount zfs/lxd/containers/nfsen
cannot mount 'zfs/lxd/containers/nfsen': legacy mountpoint
use mount(8) to mount this filesystem
root@nuc3:~# mount -t zfs zfs/lxd/containers/nfsen /mnt
root@nuc3:~# ls /mnt
backup.yaml  metadata.yaml  rootfs  templates
root@nuc3:~# umount /mnt
root@nuc3:~#

Now trying to start the container again, this time with strace -f -p <incusd-pid> -s 256 2>/var/tmp/strace.out

root@nuc3:~# egrep 'execve\(|exited with [^0]' /var/tmp/strace.out
[pid 57807] execve("/usr/sbin/zfs", ["zfs", "set", "mountpoint=legacy", "canmount=noauto", "zfs/lxd/containers/nfsen"], 0xc0005ad440 /* 17 vars */ <unfinished ...>
[pid 57813] execve("/usr/sbin/zfs", ["zfs", "get", "-H", "-p", "-o", "property,value", "atime,relatime", "zfs/lxd/containers/nfsen"], 0xc0005ad4d0 /* 17 vars */ <unfinished ...>
[pid 57814] execve("/usr/sbin/ip", ["ip", "link", "add", "name", "veth4f3a7752", "mtu", "1500", "txqueuelen", "1000", "up", "type", "veth", "peer", "name", "veth61e41102", "mtu", "1500", "address", "00:16:3e:ce:dc:18", "txqueuelen", "1000"], 0xc000791d40 /* 17 vars */ <unfinished ...>
[pid 57823] execve("/usr/sbin/tc", ["tc", "qdisc", "del", "dev", "veth4f3a7752", "root"], 0xc000791dd0 /* 17 vars */ <unfinished ...>
[pid 57823] +++ exited with 2 +++
[pid 57838] execve("/usr/sbin/tc", ["tc", "qdisc", "del", "dev", "veth4f3a7752", "ingress"], 0xc0005470e0 /* 17 vars */ <unfinished ...>
[pid 57838] +++ exited with 2 +++
[pid 57839] execve("/usr/sbin/ip", ["ip", "link", "set", "dev", "veth4f3a7752", "master", "br255"], 0xc00088e7e0 /* 17 vars */ <unfinished ...>
[pid 57841] execve("/opt/incus/bin/incusd", ["/opt/incus/bin/incusd", "forkstart", "nfsen", "/var/lib/incus/containers", "/run/incus/nfsen/lxc.conf"], 0xc00088f170 /* 17 vars */) = 0
[pid 57850] execve("/bin/sh", ["sh", "-c", "exec /proc/1082/exe callhook /var/lib/incus \"default\" \"nfsen\" start"], 0x42c37f0 /* 27 vars */) = 0
[pid 57850] execve("/proc/1082/exe", ["/proc/1082/exe", "callhook", "/var/lib/incus", "default", "nfsen", "start"], 0x5f4b846aa970 /* 27 vars */) = 0
[pid 57857] execve("/usr/sbin/apparmor_parser", ["apparmor_parser", "--version"], 0xc000718b40 /* 17 vars */ <unfinished ...>
[pid 57858] execve("/usr/sbin/apparmor_parser", ["apparmor_parser", "-rWL", "/var/lib/incus/security/apparmor/cache", "/var/lib/incus/security/apparmor/profiles/incus-nfsen"], 0xc0002eaf30 /* 17 vars */ <unfinished ...>
[pid 57861] execve("/bin/sh", ["sh", "-c", "newuidmap 57860 0 1000000 1000000000"], 0x42c37f0 /* 28 vars */) = 0
[pid 57862] execve("/usr/bin/newuidmap", ["newuidmap", "57860", "0", "1000000", "1000000000"], 0x5f1957b348d8 /* 28 vars */ <unfinished ...>
[pid 57863] execve("/bin/sh", ["sh", "-c", "newgidmap 57860 0 1000000 1000000000"], 0x42c37f0 /* 28 vars */) = 0
[pid 57864] execve("/usr/bin/newgidmap", ["newgidmap", "57860", "0", "1000000", "1000000000"], 0x5c4ba15ed8d8 /* 28 vars */ <unfinished ...>
[pid 57869] execve("/bin/sh", ["sh", "-c", "exec /opt/incus/share/lxcfs/lxc.mount.hook"], 0x42c37f0 /* 27 vars */) = 0
[pid 57869] execve("/opt/incus/share/lxcfs/lxc.mount.hook", ["/opt/incus/share/lxcfs/lxc.mount.hook"], 0x60a3ceb4e8b8 /* 27 vars */) = 0
[pid 57870] execve("/usr/bin/readlink", ["readlink", "-f", "/opt/incus/lib/lxc/rootfs"], 0x59d1240af7f8 /* 27 vars */) = 0
[pid 57871] execve("/usr/bin/basename", ["basename", "/var/lib/incus-lxcfs/proc/cpuinfo"], 0x59d1240afe78 /* 27 vars */) = 0
[pid 57872] execve("/usr/bin/mount", ["mount", "-n", "--bind", "/var/lib/incus-lxcfs/proc/cpuinfo", "/opt/incus/lib/lxc/rootfs/proc/cpuinfo"], 0x59d1240afed0 /* 27 vars */ <unfinished ...>
[pid 57873] execve("/usr/bin/basename", ["basename", "/var/lib/incus-lxcfs/proc/diskstats"], 0x59d1240afe98 /* 27 vars */) = 0
[pid 57874] execve("/usr/bin/mount", ["mount", "-n", "--bind", "/var/lib/incus-lxcfs/proc/diskstats", "/opt/incus/lib/lxc/rootfs/proc/diskstats"], 0x59d1240afed0 /* 27 vars */ <unfinished ...>
[pid 57875] execve("/usr/bin/basename", ["basename", "/var/lib/incus-lxcfs/proc/loadavg"], 0x59d1240afe98 /* 27 vars */) = 0
[pid 57876] execve("/usr/bin/mount", ["mount", "-n", "--bind", "/var/lib/incus-lxcfs/proc/loadavg", "/opt/incus/lib/lxc/rootfs/proc/loadavg"], 0x59d1240afed0 /* 27 vars */ <unfinished ...>
[pid 57877] execve("/usr/bin/basename", ["basename", "/var/lib/incus-lxcfs/proc/meminfo"], 0x59d1240afe98 /* 27 vars */ <unfinished ...>
[pid 57878] execve("/usr/bin/mount", ["mount", "-n", "--bind", "/var/lib/incus-lxcfs/proc/meminfo", "/opt/incus/lib/lxc/rootfs/proc/meminfo"], 0x59d1240afed0 /* 27 vars */ <unfinished ...>
[pid 57879] execve("/usr/bin/basename", ["basename", "/var/lib/incus-lxcfs/proc/slabinfo"], 0x59d1240afe98 /* 27 vars */) = 0
[pid 57880] execve("/usr/bin/mount", ["mount", "-n", "--bind", "/var/lib/incus-lxcfs/proc/slabinfo", "/opt/incus/lib/lxc/rootfs/proc/slabinfo"], 0x59d1240afed0 /* 27 vars */ <unfinished ...>
[pid 57882] execve("/usr/bin/basename", ["basename", "/var/lib/incus-lxcfs/proc/stat"], 0x59d1240afe98 /* 27 vars */) = 0
[pid 57883] execve("/usr/bin/mount", ["mount", "-n", "--bind", "/var/lib/incus-lxcfs/proc/stat", "/opt/incus/lib/lxc/rootfs/proc/stat"], 0x59d1240afe98 /* 27 vars */ <unfinished ...>
[pid 57884] execve("/usr/bin/basename", ["basename", "/var/lib/incus-lxcfs/proc/swaps"], 0x59d1240afe98 /* 27 vars */) = 0
[pid 57885] execve("/usr/bin/mount", ["mount", "-n", "--bind", "/var/lib/incus-lxcfs/proc/swaps", "/opt/incus/lib/lxc/rootfs/proc/swaps"], 0x59d1240afe98 /* 27 vars */ <unfinished ...>
[pid 57886] execve("/usr/bin/basename", ["basename", "/var/lib/incus-lxcfs/proc/uptime"], 0x59d1240afe98 /* 27 vars */) = 0
[pid 57887] execve("/usr/bin/mount", ["mount", "-n", "--bind", "/var/lib/incus-lxcfs/proc/uptime", "/opt/incus/lib/lxc/rootfs/proc/uptime"], 0x59d1240afed0 /* 27 vars */ <unfinished ...>
[pid 57888] execve("/usr/bin/mount", ["mount", "-n", "--bind", "/var/lib/incus-lxcfs/sys/devices/system/cpu", "/opt/incus/lib/lxc/rootfs/sys/devices/system/cpu"], 0x59d1240b0190 /* 27 vars */ <unfinished ...>
[pid 57889] execve("/usr/bin/rm", ["rm", "-Rf", "/opt/incus/lib/lxc/rootfs/var/lib/lxcfs"], 0x59d1240afc70 /* 27 vars */ <unfinished ...>
[pid 57889] +++ exited with 1 +++
[pid 57869] +++ exited with 1 +++
[pid 57860] write(8, "lxc nfsen 20240321075726.437 ERROR    conf - ../src/lxc/conf.c:run_buffer:322 - Script exited with status 1\n", 108) = 108
[pid 57860] +++ exited with 1 +++
[pid 57847] +++ exited with 1 +++
[pid 57846] +++ exited with 1 +++
[pid 57845] +++ exited with 1 +++
[pid 57844] +++ exited with 1 +++
[pid 57843] +++ exited with 1 +++
[pid 57842] +++ exited with 1 +++
[pid 57841] +++ exited with 1 +++
[pid 57894] execve("/bin/sh", ["sh", "-c", "exec /opt/incus/bin/incusd callhook /var/lib/incus \"default\" \"nfsen\" stopns"], 0x42c7f00 /* 36 vars */ <unfinished ...>
[pid  1138] <... read resumed>"lxc nfsen 20240321075726.437 ERROR    conf - ../src/lxc/conf.c:run_buffer:322 - Script exited with status 1\nlxc nfsen 20240321075726.438 ERROR    conf - ../src/lxc/conf.c:lxc_setup:4437 - Failed to run mount hooks\nlxc nfsen 20240321075726.438 ERROR    star"..., 1100) = 1099
[pid 57894] execve("/opt/incus/bin/incusd", ["/opt/incus/bin/incusd", "callhook", "/var/lib/incus", "default", "nfsen", "stopns"], 0x619320508c78 /* 36 vars */) = 0
[pid 57902] execve("/usr/sbin/nft", ["nft", "--json", "-nn", "list", "ruleset"], 0xc0002ebc20 /* 17 vars */ <unfinished ...>
[pid 57903] execve("/usr/sbin/ip", ["ip", "link", "set", "dev", "veth4f3a7752", "nomaster"], 0xc000abe480 /* 17 vars */ <unfinished ...>
[pid 57905] execve("/usr/sbin/ip", ["ip", "link", "delete", "dev", "veth4f3a7752"], 0xc000abe510 /* 17 vars */ <unfinished ...>
[pid 57907] execve("/bin/sh", ["sh", "-c", "exec /opt/incus/share/lxcfs/lxc.reboot.hook"], 0x42c7f00 /* 36 vars */) = 0
[pid 57907] execve("/opt/incus/share/lxcfs/lxc.reboot.hook", ["/opt/incus/share/lxcfs/lxc.reboot.hook"], 0x5740ac06ba68 /* 36 vars */) = 0
[pid 57908] execve("/usr/bin/sleep", ["sleep", "0.5s"], 0x592f7b839728 /* 36 vars */ <unfinished ...>
[pid 57909] execve("/bin/sh", ["sh", "-c", "exec /opt/incus/bin/incusd callhook /var/lib/incus \"default\" \"nfsen\" stop"], 0x42c7f00 /* 36 vars */) = 0
[pid 57909] execve("/opt/incus/bin/incusd", ["/opt/incus/bin/incusd", "callhook", "/var/lib/incus", "default", "nfsen", "stop"], 0x603f50de1c78 /* 36 vars */) = 0
[pid 57849] +++ exited with 1 +++
[pid 57918] execve("/usr/sbin/apparmor_parser", ["apparmor_parser", "-RWL", "/var/lib/incus/security/apparmor/cache", "/var/lib/incus/security/apparmor/profiles/incus-nfsen"], 0xc000665170 /* 17 vars */ <unfinished ...>

Ignoring the initial tidying up errors from tc, the first real error comes from pid 57889. Grepping for this pid:

[pid 57889] execve("/usr/bin/rm", ["rm", "-Rf", "/opt/incus/lib/lxc/rootfs/var/lib/lxcfs"], 0x59d1240afc70 /* 27 vars */ <unfinished
...>
...
[pid 57889] unlinkat(AT_FDCWD, "/opt/incus/lib/lxc/rootfs/var/lib/lxcfs", AT_REMOVEDIR) = -1 EACCES (Permission denied)
...
[pid 57889] write(2, "rm: ", 4)         = 4
[pid 57889] write(2, "cannot remove '/opt/incus/lib/lxc/rootfs/var/lib/lxcfs'", 55 <unfinished ...>
[pid 57889] <... write resumed>)        = 55
...
[pid 57889] write(2, ": Permission denied", 19) = 19
[pid 57889] write(2, "\n", 1 <unfinished ...>

I believe this is from script /opt/incus/share/lxcfs/lxc.mount.hook

rm -Rf "${LXC_ROOTFS_MOUNT}/var/lib/lxcfs"

Oddly, when I check it now, that doesn’t exist.

root@nuc3:~# ls /opt/incus/lib/lxc/rootfs
README
root@nuc3:~# ls /opt/incus/lib/lxc/rootfs/var/lib/lxcfs
ls: cannot access '/opt/incus/lib/lxc/rootfs/var/lib/lxcfs': No such file or directory

The script starts #!/bin/sh -e so it should terminate on first error. The second error I exit I see is from pid 57869, which indeed is "/bin/sh", ["sh", "-c", "exec /opt/incus/share/lxcfs/lxc.mount.hook"] (aside: two levels of shell seems a bit superfluous)

Next, I added this to the top of /opt/incus/share/lxcfs/lxc.mount.hook

exec 2>/tmp/incus.err
set -x
printenv 1>&2

Result:

+ printenv
LXC_HOOK_TYPE=mount
LXC_CGNS_AWARE=1
LXC_CONSOLE_LOGPATH=/var/log/incus/nfsen/console.log
LD_LIBRARY_PATH=/opt/incus/lib/
SYSTEMD_EXEC_PID=1085
INCUS_UI=/opt/incus/ui/
JOURNAL_STREAM=8:28965
LXC_LOG_LEVEL=WARN
INCUS_OVMF_PATH=/opt/incus/share/qemu/
INCUS_LXC_HOOK=/opt/incus/share/lxc/hooks/
LXC_HOOK_SECTION=lxc
LVM_SUPPRESS_FD_WARNINGS=1
PATH=/opt/incus/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
INVOCATION_ID=3d42018a8e6740d5846104fd51d8c650
LXC_ROOTFS_MOUNT=/opt/incus/lib/lxc/rootfs
LISTEN_FDNAMES=incus.socket
INCUS_DOCUMENTATION=/opt/incus/doc/
LXC_CONFIG_FILE=/run/incus/nfsen/lxc.conf
LANG=en_GB.UTF-8
INCUS_OPTS=
INCUS_LXC_TEMPLATE_CONFIG=/opt/incus/share/lxc/config/
LXCFS_OPTS=
PWD=/
INCUS_AGENT_PATH=/opt/incus/agent/
LXC_HOOK_VERSION=1
LXC_ROOTFS_PATH=/var/lib/incus/storage-pools/default/containers/nfsen/rootfs
LXC_NAME=nfsen
+ [ ! 0 -eq 0 ]
+ readlink -f /opt/incus/lib/lxc/rootfs
+ LXC_ROOTFS_MOUNT=/opt/incus/lib/lxc/rootfs
+ [ -d /var/lib/incus-lxcfs/proc/ ]
+ basename /var/lib/incus-lxcfs/proc/cpuinfo
+ DEST=cpuinfo
+ [ -e /opt/incus/lib/lxc/rootfs/proc/cpuinfo ]
+ mount -n --bind /var/lib/incus-lxcfs/proc/cpuinfo /opt/incus/lib/lxc/rootfs/proc/cpuinfo
+ basename /var/lib/incus-lxcfs/proc/diskstats
+ DEST=diskstats
+ [ -e /opt/incus/lib/lxc/rootfs/proc/diskstats ]
+ mount -n --bind /var/lib/incus-lxcfs/proc/diskstats /opt/incus/lib/lxc/rootfs/proc/diskstats
+ basename /var/lib/incus-lxcfs/proc/loadavg
+ DEST=loadavg
+ [ -e /opt/incus/lib/lxc/rootfs/proc/loadavg ]
+ mount -n --bind /var/lib/incus-lxcfs/proc/loadavg /opt/incus/lib/lxc/rootfs/proc/loadavg
+ basename /var/lib/incus-lxcfs/proc/meminfo
+ DEST=meminfo
+ [ -e /opt/incus/lib/lxc/rootfs/proc/meminfo ]
+ mount -n --bind /var/lib/incus-lxcfs/proc/meminfo /opt/incus/lib/lxc/rootfs/proc/meminfo
+ basename /var/lib/incus-lxcfs/proc/slabinfo
+ DEST=slabinfo
+ [ -e /opt/incus/lib/lxc/rootfs/proc/slabinfo ]
+ mount -n --bind /var/lib/incus-lxcfs/proc/slabinfo /opt/incus/lib/lxc/rootfs/proc/slabinfo
+ basename /var/lib/incus-lxcfs/proc/stat
+ DEST=stat
+ [ -e /opt/incus/lib/lxc/rootfs/proc/stat ]
+ mount -n --bind /var/lib/incus-lxcfs/proc/stat /opt/incus/lib/lxc/rootfs/proc/stat
+ basename /var/lib/incus-lxcfs/proc/swaps
+ DEST=swaps
+ [ -e /opt/incus/lib/lxc/rootfs/proc/swaps ]
+ mount -n --bind /var/lib/incus-lxcfs/proc/swaps /opt/incus/lib/lxc/rootfs/proc/swaps
+ basename /var/lib/incus-lxcfs/proc/uptime
+ DEST=uptime
+ [ -e /opt/incus/lib/lxc/rootfs/proc/uptime ]
+ mount -n --bind /var/lib/incus-lxcfs/proc/uptime /opt/incus/lib/lxc/rootfs/proc/uptime
+ [ -d /var/lib/incus-lxcfs/sys/devices/system/cpu ]
+ [ -d /opt/incus/lib/lxc/rootfs/sys/devices/system/cpu ]
+ [ -f /var/lib/incus-lxcfs/sys/devices/system/cpu/uevent ]
+ mount -n --bind /var/lib/incus-lxcfs/sys/devices/system/cpu /opt/incus/lib/lxc/rootfs/sys/devices/system/cpu
+ [ -d /opt/incus/lib/lxc/rootfs/var/lib/incus-lxcfs/ ]
+ [ -d /opt/incus/lib/lxc/rootfs/var/lib/lxcfs/ ]
+ rm -Rf /opt/incus/lib/lxc/rootfs/var/lib/lxcfs
rm: cannot remove '/opt/incus/lib/lxc/rootfs/var/lib/lxcfs': Permission denied

I added ls -lAR "${LXC_ROOTFS_MOUNT}/var/lib/lxcfs/" 1>&2 just before rm -Rf, and I get:

+ [ -d /opt/incus/lib/lxc/rootfs/var/lib/lxcfs/ ]
+ ls -lAR /opt/incus/lib/lxc/rootfs/var/lib/lxcfs/
/opt/incus/lib/lxc/rootfs/var/lib/lxcfs/:
total 0
+ rm -Rf /opt/incus/lib/lxc/rootfs/var/lib/lxcfs
rm: cannot remove '/opt/incus/lib/lxc/rootfs/var/lib/lxcfs': Permission denied

Then I added mount | grep "${LXC_ROOTFS_MOUNT}" 1>&2, which gave:

+ mount
+ grep /opt/incus/lib/lxc/rootfs
zfs/lxd/containers/nfsen on /opt/incus/lib/lxc/rootfs type zfs (rw,relatime,xattr,posixacl,casesensitive)
none on /opt/incus/lib/lxc/rootfs/dev type tmpfs (rw,relatime,size=492k,mode=755,uid=1000000,gid=1000000,inode64)
proc on /opt/incus/lib/lxc/rootfs/proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /opt/incus/lib/lxc/rootfs/sys type sysfs (rw,relatime)
udev on /opt/incus/lib/lxc/rootfs/dev/fuse type devtmpfs (rw,nosuid,relatime,size=8003052k,nr_inodes=2000763,mode=755,inode64)
udev on /opt/incus/lib/lxc/rootfs/dev/net/tun type devtmpfs (rw,nosuid,relatime,size=8003052k,nr_inodes=2000763,mode=755,inode64)
efivarfs on /opt/incus/lib/lxc/rootfs/sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
fusectl on /opt/incus/lib/lxc/rootfs/sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
pstore on /opt/incus/lib/lxc/rootfs/sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
configfs on /opt/incus/lib/lxc/rootfs/sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
debugfs on /opt/incus/lib/lxc/rootfs/sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
securityfs on /opt/incus/lib/lxc/rootfs/sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /opt/incus/lib/lxc/rootfs/sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
binfmt_misc on /opt/incus/lib/lxc/rootfs/proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)
mqueue on /opt/incus/lib/lxc/rootfs/dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
tmpfs on /opt/incus/lib/lxc/rootfs/dev/incus type tmpfs (rw,relatime,size=100k,mode=755,inode64)
/dev/mapper/vg0-root on /opt/incus/lib/lxc/rootfs/var/lib/extrausers type ext4 (rw,relatime)
tmpfs on /opt/incus/lib/lxc/rootfs/dev/.incus-mounts type tmpfs (rw,relatime,size=100k,mode=711,inode64)
none on /opt/incus/lib/lxc/rootfs/sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
lxcfs on /opt/incus/lib/lxc/rootfs/proc/cpuinfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /opt/incus/lib/lxc/rootfs/proc/diskstats type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /opt/incus/lib/lxc/rootfs/proc/loadavg type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /opt/incus/lib/lxc/rootfs/proc/meminfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /opt/incus/lib/lxc/rootfs/proc/slabinfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /opt/incus/lib/lxc/rootfs/proc/stat type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /opt/incus/lib/lxc/rootfs/proc/swaps type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /opt/incus/lib/lxc/rootfs/proc/uptime type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /opt/incus/lib/lxc/rootfs/sys/devices/system/cpu type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)

So my conclusion is:

  • The directory /opt/incus/lib/lxc/rootfs/var/lib/lxcfs/ exists at the time the script is run (even though it doesn’t exist normally). I guess it comes from zfs/lxd/containers/nfsen mounted on /opt/incus/lib/lxc/rootfs
  • The directory is empty
  • An attempt to rm -Rf it fails with “Permission denied”, which kills the whole script.

Additional experiments:

  • If I add || true at the end of the rm -Rf line, the container starts successfully, yay!
  • If I stop the container, remove || true and try to restart, it fails to start.
  • If I reboot the entire host and try to start the container, it fails

Does the container actually work? Kind of.

root@nuc3:~# incus exec nfsen bash
bash: /root/.bashrc: Permission denied
root@nfsen:~#

Hmm:

root@nfsen:~# ls -l /root/.bashrc
ls: cannot access '/root/.bashrc': Permission denied
root@nfsen:~# pwd
/root
root@nfsen:~# ls -l
ls: cannot open directory '.': Permission denied
root@nfsen:~# pwd
/root
root@nfsen:~# cd /
root@nfsen:/# ls -l
total 88
drwxr-xr-x   2 nobody nogroup 172 Apr 20  2023 bin
drwxr-xr-x   2 nobody nogroup   2 Jun 10  2020 boot
drwxr-xr-x   8 root   root    500 Mar 21 08:50 dev
drwxr-xr-x  92 nobody nogroup 181 Oct 12 08:18 etc
drwxr-xr-x   3 nobody nogroup   3 Jun 11  2020 home
drwxr-xr-x  19 nobody nogroup  23 Apr 20  2023 lib
drwxr-xr-x   2 nobody nogroup   3 Apr 20  2023 lib64
drwxr-xr-x   2 nobody nogroup   2 Jun 10  2020 media
drwxr-xr-x   2 nobody nogroup   2 Jun 10  2020 mnt
drwxr-xr-x   2 nobody nogroup   2 Jun 10  2020 opt
dr-xr-xr-x 315 nobody nogroup   0 Mar 21 08:50 proc
drwx------   5 nobody nogroup  22 May  4  2023 root
drwxr-xr-x  15 root   root    580 Mar 21 08:50 run
drwxr-xr-x   2 nobody nogroup 216 Oct 12 08:18 sbin
drwxr-xr-x   2 nobody nogroup   3 Jun 11  2020 snap
drwxr-xr-x   2 nobody nogroup   2 Jun 10  2020 srv
dr-xr-xr-x  13 nobody nogroup   0 Mar 21 08:50 sys
drwxrwxrwt   9 nobody nogroup   9 Mar 21 08:50 tmp
drwxr-xr-x  11 nobody nogroup  11 Feb 21  2022 usr
drwxr-xr-x  15 nobody nogroup  17 Feb 16  2023 var
root@nfsen:/# ls -l /root
ls: cannot open directory '/root': Permission denied

Hmm… that seems rather broken. And yet the application is running happily, and writing its own files:

root@nfsen:/# ls -l /var/nfsen/profiles-data/live/gw2/2024/03/21/
total 118
-rw-r--r-- 1 netflow www-data     90 Mar 21 08:45 nfcapd.202403210840
-rw-r--r-- 1 netflow www-data   7918 Mar 21 08:45 nfcapd.202403210845
-rw-r--r-- 1 netflow www-data 144117 Mar 21 08:55 nfcapd.202403210850

So whilst I have a workaround patch, I’m not sure this is correct - and perhaps there is something fundamentally wrong with this zfs filesystem after all. Any suggestions for how I can investigate this further?

And more generally speaking: stderr output from the hook script doesn’t seem to find its way into the incus logs. If the shell error rm: cannot remove '/opt/incus/lib/lxc/rootfs/var/lib/lxcfs': Permission denied was in the logs, debugging this could have been a lot quicker.

Sorry for the long ramble!

Regards,

Brian.

1 Like

There was a simple fix when I thought about it. Stop the container, then get rid of the lxcfs directory:

root@nuc3:~# mount -t zfs zfs/lxd/containers/nfsen /mnt
root@nuc3:~# ls /mnt/
backup.yaml  metadata.yaml  rootfs  templates
root@nuc3:~# ls /mnt/rootfs/var/lib
AccountsService  command-not-found  git              man-db   php       sudo                     unattended-upgrades  vim
apache2          dbus               initramfs-tools  misc     plymouth  systemd                  update-manager
apport           dhcp               landscape        mlocate  polkit-1  ubuntu-advantage         update-notifier
apt              dpkg               logrotate        mrtg     python    ubuntu-release-upgrader  ureadahead
cloud            extrausers         lxcfs            pam      snapd     ucf                      usbutils
root@nuc3:~# ls /mnt/rootfs/var/lib/lxcfs
root@nuc3:~# rmdir /mnt/rootfs/var/lib/lxcfs
root@nuc3:~# umount /mnt
root@nuc3:~#

And this goes back to what @Tony_Anytime said: it was a question of removing /var/lib/lxcfs in the container itself (not the host as I’d originally thought).

Of course, what would have happened if I did actually want to run nested lxc or incus inside this container is a different matter. But in this case I don’t.

There’s still definitely something wrong with filesystem permissions though: attempting to run as a privileged container fails.

root@nuc3:~# incus stop nfsen
root@nuc3:~# incus config set nfsen security.privileged=on
root@nuc3:~# incus start nfsen
Error: Failed to handle idmapped storage: invalid argument - Failed to change ACLs on /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal
Try `incus info --show-log nfsen` for more info
root@nuc3:~# incus config set nfsen security.privileged=off
root@nuc3:~# incus start nfsen
root@nuc3:~#

Nothing in incus info --show-log nfsen apart from the same error message shown; nothing in /var/log/incus/incusd.log apart from a line saying failed to start the container; and all files in /var/log/incus/nfsen/ are zero bytes, apart from console.log from previous boot.

1 Like

When the container is running, on the host I see an odd mix of uids:

root@nuc3:~# ls -l /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var
total 96
drwxr-xr-x  2 root    root    38 Mar 20 06:25 backups
drwxr-xr-x 11 root    root    12 Apr 20  2023 cache
drwxrwxrwt  2 root    root     3 Mar 21 08:44 crash
drwxr-xr-x 37 root    root    37 Mar 21 09:11 lib
drwxrwsr-x  2 root    staff    2 Apr 24  2018 local
lrwxrwxrwx  1 root    root     9 Jun 10  2020 lock -> /run/lock
drwxrwxr-x  9 root    input   84 Mar 20 06:25 log
drwxrwsr-x  2 root    mail     2 Jun 10  2020 mail
drwxr-xr-x  9 root    root     9 Sep 10  2020 nfsen
drwxr-xr-x  2 1000000 1000000  2 Jun 10  2020 opt
lrwxrwxrwx  1 1000000 1000000  4 Jun 10  2020 run -> /run
drwxr-xr-x  2 1000000 1000000  2 Oct 30  2019 snap
drwxr-xr-x  4 1000000 1000000  5 Jun 10  2020 spool
drwxrwxrwt  3 1000000 1000000  3 Mar 21 10:02 tmp
drwxr-xr-x  3 1000000 1000000  3 Jun 11  2020 www

which inside the container shows as:

root@nfsen:/# ls -l /var
total 96
drwxr-xr-x  2 nobody nogroup 38 Mar 20 06:25 backups
drwxr-xr-x 11 nobody nogroup 12 Apr 20  2023 cache
drwxrwxrwt  2 nobody nogroup  3 Mar 21 08:44 crash
drwxr-xr-x 37 nobody nogroup 37 Mar 21 09:11 lib
drwxrwsr-x  2 nobody nogroup  2 Apr 24  2018 local
lrwxrwxrwx  1 nobody nogroup  9 Jun 10  2020 lock -> /run/lock
drwxrwxr-x  9 nobody nogroup 84 Mar 20 06:25 log
drwxrwsr-x  2 nobody nogroup  2 Jun 10  2020 mail
drwxr-xr-x  9 nobody nogroup  9 Sep 10  2020 nfsen
drwxr-xr-x  2 root   root     2 Jun 10  2020 opt
lrwxrwxrwx  1 root   root     4 Jun 10  2020 run -> /run
drwxr-xr-x  2 root   root     2 Oct 30  2019 snap
drwxr-xr-x  4 root   root     5 Jun 10  2020 spool
drwxrwxrwt  3 root   root     3 Mar 21 10:02 tmp
drwxr-xr-x  3 root   root     3 Jun 11  2020 www

On host:

root@nuc3:~# ls -l /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/nfsen/profiles-data/live/
total 17
drwxrwxr-x 9 lxd     www-data 10 Mar 20 20:14 gw1
drwxrwxr-x 7 1000999  1000033  9 Mar 21 10:05 gw2

In container:

root@nfsen:/# ls -ln /var/nfsen/profiles-data/live/
total 17
drwxrwxr-x 9 65534 65534 10 Mar 20 20:14 gw1
drwxrwxr-x 7   999    33  9 Mar 21 10:05 gw2

So it looks to me like I have a mixture of mapped and unmapped uids which needs sorting out with a bunch of chown/chmod.

As for /var/log/journal which was reported as an error when trying to run privileged, there is an ACL on that:

root@nfsen:/# ls -l /var/log/journal
total 41
drwxr-sr-x+ 2 nobody nogroup 102 Mar 20 05:47 08d432c8b863425cbea5dfaad760dc2e
root@nfsen:/# getfacl /var/log/journal
getfacl: Removing leading '/' from absolute path names
# file: var/log/journal
# owner: nobody
# group: nogroup
# flags: -s-
user::rwx
group::r-x
group:4294967295:r-x
mask::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:4294967295:r-x
default:mask::r-x
default:other::r-x

EDIT: I’ve now been able to fix this.

The basic file permissions cleanup turned out to be pretty simple, so I am recording it here for reference. I ran this on the host, while the container itself was already running (so its filesystem was mounted on the host).

#!/usr/bin/python3
import os

for root, dirnames, filenames in os.walk('/var/lib/incus/storage-pools/default/containers/nfsen/rootfs'):
    for name in dirnames + filenames:
        fullpath = os.path.join(root, name)
        st = os.lstat(fullpath)
        uid = st.st_uid
        uid = (1000000 + uid) if (uid >= 0 and uid <= 65535) else -1
        gid = st.st_gid
        gid = (1000000 + gid) if (gid >= 0 and gid <= 65535) else -1
        if uid != -1 or gid != -1:
            os.chown(fullpath, uid, gid, follow_symlinks=False)

However, if I then try to set privileged mode and restart the container, it breaks again. It stops remapping permissions once it gets to /var/log/journal:

root@nuc3:~# incus config set nfsen security.privileged=on
root@nuc3:~# incus start nfsen
Remapping container filesystem
Error: Failed to handle idmapped storage: invalid argument - Failed to change ACLs on /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal
Try `incus info --show-log nfsen` for more info

Then removing security.privileged and restarting the container, I find the broken perms again:

root@nfsen:/# ls -l /var
total 96
drwxr-xr-x  2 nobody nogroup 38 Mar 20 06:25 backups
drwxr-xr-x 11 nobody nogroup 12 Apr 20  2023 cache
drwxrwxrwt  2 nobody nogroup  3 Mar 21 08:44 crash
drwxr-xr-x 37 nobody nogroup 37 Mar 21 09:11 lib
drwxrwsr-x  2 nobody nogroup  2 Apr 24  2018 local
lrwxrwxrwx  1 nobody nogroup  9 Jun 10  2020 lock -> /run/lock
drwxrwxr-x  9 nobody nogroup 84 Mar 20 06:25 log
drwxrwsr-x  2 root   mail     2 Jun 10  2020 mail
drwxr-xr-x  9 root   root     9 Sep 10  2020 nfsen
drwxr-xr-x  2 root   root     2 Jun 10  2020 opt
lrwxrwxrwx  1 root   root     4 Jun 10  2020 run -> /run
drwxr-xr-x  2 root   root     2 Oct 30  2019 snap
drwxr-xr-x  4 root   root     5 Jun 10  2020 spool
drwxrwxrwt  3 root   root     3 Mar 21 10:25 tmp
drwxr-xr-x  3 root   root     3 Jun 11  2020 www

Clearly it would be good if incus would either skip over the problematic file(s) and finish the job, or undo what it has done before terminating; leaving a half-broken container isn’t great.

I see there are ACLs set on /var/log/journal and all files within it. After running the fix script again:

root@nuc3:~# getfacl -Rsp /var/lib/incus/storage-pools/default/containers/nfsen/rootfs
# file: /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal
# owner: 1000000
# group: 1000101
# flags: -s-
user::rwx
group::r-x
group:adm:r-x
mask::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:adm:r-x
default:mask::r-x
default:other::r-x

# file: /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal/08d432c8b863425cbea5dfaad760dc2e
# owner: 1000000
# group: 1000101
# flags: -s-
user::rwx
group::r-x
group:adm:r-x
mask::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:adm:r-x
default:mask::r-x
default:other::r-x

# file: /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal/08d432c8b863425cbea5dfaad760dc2e/system@e56b453e0c444b66895b1a16c14305ef-0000000000000001-0005ff417a02fd74.journal
# owner: 1000000
# group: 1000101
user::rw-
group::r-x			#effective:r--
group:adm:r--
mask::r--
other::---

... etc

Ah: all these have an ACL for “group:adm” (unmapped uid 4) instead of “group:1000004”. Hairy script to fix:

getfacl -Rsp /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal | grep '^# file:' |
while read a b f; do getfacl "$f" | sed 's/:adm:/:1000004:/g' | setfacl --set-file=- "$f"; done

And hey presto, I can start the container in privileged mode! (Not that I really needed to, but I don’t like things being broken). At least I now have the tools to fix things in future if necessary.