Incus Container Wont stop or allow itself to be deleted

When the container is running, on the host I see an odd mix of uids:

root@nuc3:~# ls -l /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var
total 96
drwxr-xr-x  2 root    root    38 Mar 20 06:25 backups
drwxr-xr-x 11 root    root    12 Apr 20  2023 cache
drwxrwxrwt  2 root    root     3 Mar 21 08:44 crash
drwxr-xr-x 37 root    root    37 Mar 21 09:11 lib
drwxrwsr-x  2 root    staff    2 Apr 24  2018 local
lrwxrwxrwx  1 root    root     9 Jun 10  2020 lock -> /run/lock
drwxrwxr-x  9 root    input   84 Mar 20 06:25 log
drwxrwsr-x  2 root    mail     2 Jun 10  2020 mail
drwxr-xr-x  9 root    root     9 Sep 10  2020 nfsen
drwxr-xr-x  2 1000000 1000000  2 Jun 10  2020 opt
lrwxrwxrwx  1 1000000 1000000  4 Jun 10  2020 run -> /run
drwxr-xr-x  2 1000000 1000000  2 Oct 30  2019 snap
drwxr-xr-x  4 1000000 1000000  5 Jun 10  2020 spool
drwxrwxrwt  3 1000000 1000000  3 Mar 21 10:02 tmp
drwxr-xr-x  3 1000000 1000000  3 Jun 11  2020 www

which inside the container shows as:

root@nfsen:/# ls -l /var
total 96
drwxr-xr-x  2 nobody nogroup 38 Mar 20 06:25 backups
drwxr-xr-x 11 nobody nogroup 12 Apr 20  2023 cache
drwxrwxrwt  2 nobody nogroup  3 Mar 21 08:44 crash
drwxr-xr-x 37 nobody nogroup 37 Mar 21 09:11 lib
drwxrwsr-x  2 nobody nogroup  2 Apr 24  2018 local
lrwxrwxrwx  1 nobody nogroup  9 Jun 10  2020 lock -> /run/lock
drwxrwxr-x  9 nobody nogroup 84 Mar 20 06:25 log
drwxrwsr-x  2 nobody nogroup  2 Jun 10  2020 mail
drwxr-xr-x  9 nobody nogroup  9 Sep 10  2020 nfsen
drwxr-xr-x  2 root   root     2 Jun 10  2020 opt
lrwxrwxrwx  1 root   root     4 Jun 10  2020 run -> /run
drwxr-xr-x  2 root   root     2 Oct 30  2019 snap
drwxr-xr-x  4 root   root     5 Jun 10  2020 spool
drwxrwxrwt  3 root   root     3 Mar 21 10:02 tmp
drwxr-xr-x  3 root   root     3 Jun 11  2020 www

On host:

root@nuc3:~# ls -l /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/nfsen/profiles-data/live/
total 17
drwxrwxr-x 9 lxd     www-data 10 Mar 20 20:14 gw1
drwxrwxr-x 7 1000999  1000033  9 Mar 21 10:05 gw2

In container:

root@nfsen:/# ls -ln /var/nfsen/profiles-data/live/
total 17
drwxrwxr-x 9 65534 65534 10 Mar 20 20:14 gw1
drwxrwxr-x 7   999    33  9 Mar 21 10:05 gw2

So it looks to me like I have a mixture of mapped and unmapped uids which needs sorting out with a bunch of chown/chmod.

As for /var/log/journal which was reported as an error when trying to run privileged, there is an ACL on that:

root@nfsen:/# ls -l /var/log/journal
total 41
drwxr-sr-x+ 2 nobody nogroup 102 Mar 20 05:47 08d432c8b863425cbea5dfaad760dc2e
root@nfsen:/# getfacl /var/log/journal
getfacl: Removing leading '/' from absolute path names
# file: var/log/journal
# owner: nobody
# group: nogroup
# flags: -s-
user::rwx
group::r-x
group:4294967295:r-x
mask::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:4294967295:r-x
default:mask::r-x
default:other::r-x

EDIT: I’ve now been able to fix this.

The basic file permissions cleanup turned out to be pretty simple, so I am recording it here for reference. I ran this on the host, while the container itself was already running (so its filesystem was mounted on the host).

#!/usr/bin/python3
import os

for root, dirnames, filenames in os.walk('/var/lib/incus/storage-pools/default/containers/nfsen/rootfs'):
    for name in dirnames + filenames:
        fullpath = os.path.join(root, name)
        st = os.lstat(fullpath)
        uid = st.st_uid
        uid = (1000000 + uid) if (uid >= 0 and uid <= 65535) else -1
        gid = st.st_gid
        gid = (1000000 + gid) if (gid >= 0 and gid <= 65535) else -1
        if uid != -1 or gid != -1:
            os.chown(fullpath, uid, gid, follow_symlinks=False)

However, if I then try to set privileged mode and restart the container, it breaks again. It stops remapping permissions once it gets to /var/log/journal:

root@nuc3:~# incus config set nfsen security.privileged=on
root@nuc3:~# incus start nfsen
Remapping container filesystem
Error: Failed to handle idmapped storage: invalid argument - Failed to change ACLs on /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal
Try `incus info --show-log nfsen` for more info

Then removing security.privileged and restarting the container, I find the broken perms again:

root@nfsen:/# ls -l /var
total 96
drwxr-xr-x  2 nobody nogroup 38 Mar 20 06:25 backups
drwxr-xr-x 11 nobody nogroup 12 Apr 20  2023 cache
drwxrwxrwt  2 nobody nogroup  3 Mar 21 08:44 crash
drwxr-xr-x 37 nobody nogroup 37 Mar 21 09:11 lib
drwxrwsr-x  2 nobody nogroup  2 Apr 24  2018 local
lrwxrwxrwx  1 nobody nogroup  9 Jun 10  2020 lock -> /run/lock
drwxrwxr-x  9 nobody nogroup 84 Mar 20 06:25 log
drwxrwsr-x  2 root   mail     2 Jun 10  2020 mail
drwxr-xr-x  9 root   root     9 Sep 10  2020 nfsen
drwxr-xr-x  2 root   root     2 Jun 10  2020 opt
lrwxrwxrwx  1 root   root     4 Jun 10  2020 run -> /run
drwxr-xr-x  2 root   root     2 Oct 30  2019 snap
drwxr-xr-x  4 root   root     5 Jun 10  2020 spool
drwxrwxrwt  3 root   root     3 Mar 21 10:25 tmp
drwxr-xr-x  3 root   root     3 Jun 11  2020 www

Clearly it would be good if incus would either skip over the problematic file(s) and finish the job, or undo what it has done before terminating; leaving a half-broken container isn’t great.

I see there are ACLs set on /var/log/journal and all files within it. After running the fix script again:

root@nuc3:~# getfacl -Rsp /var/lib/incus/storage-pools/default/containers/nfsen/rootfs
# file: /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal
# owner: 1000000
# group: 1000101
# flags: -s-
user::rwx
group::r-x
group:adm:r-x
mask::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:adm:r-x
default:mask::r-x
default:other::r-x

# file: /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal/08d432c8b863425cbea5dfaad760dc2e
# owner: 1000000
# group: 1000101
# flags: -s-
user::rwx
group::r-x
group:adm:r-x
mask::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:adm:r-x
default:mask::r-x
default:other::r-x

# file: /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal/08d432c8b863425cbea5dfaad760dc2e/system@e56b453e0c444b66895b1a16c14305ef-0000000000000001-0005ff417a02fd74.journal
# owner: 1000000
# group: 1000101
user::rw-
group::r-x			#effective:r--
group:adm:r--
mask::r--
other::---

... etc

Ah: all these have an ACL for “group:adm” (unmapped uid 4) instead of “group:1000004”. Hairy script to fix:

getfacl -Rsp /var/lib/incus/storage-pools/default/containers/nfsen/rootfs/var/log/journal | grep '^# file:' |
while read a b f; do getfacl "$f" | sed 's/:adm:/:1000004:/g' | setfacl --set-file=- "$f"; done

And hey presto, I can start the container in privileged mode! (Not that I really needed to, but I don’t like things being broken). At least I now have the tools to fix things in future if necessary.