as we can see overlayfs is not getting mounted on top of idmapped btrfs. Which is also correct (for old kernel versions).
Okay, then how your setup work at all? (-: I’ve read your old reports about collabora-online. Are you using docker to deploy it? Could you check which docker storage driver are you using? I can assume that your Docker uses btrfs storage driver instead of overlayfs. This may explain how docker with idmapped mounts works for you at all on such an old kernel version.
@amikhalitsyn I can confirm setting the storage driver to btrfs works indeed! I did also try overlay, overlay2 and fuse-overlayfs and none of those worked so it seems btrfs is the only one working.
I reported an issue with security.syscalls.intercept.mknod misbehaving/not functioning as intended with anything beyond 5.15 LTS some time ago and now I’ve tested it again with the new 6.1 LTS and it still doesn’t work, so I thought I might just mention it again
As far as I understand, something was broken for you after 5.15 LTS kernel, correct? But now you are writing that btrfs worked before, and is working now (with fresh kernels), correct? Then what’s the problem? Where the kernel regression is?
mknod interception does not seem to work then, I did not know this until now that you’ve mentioned it.
I am not sure if this is a kernel regression or a docker issue or otherwise, I am only certain this occurs after 5.19 and prior to that it seems docker will select the btrfs storage driver instead of overlay2, however, I tested this with ext4 and then docker opts for vfs driver in which case mknod works there too.
uname -a
#Linux archlinux 6.1.15-1-lts #1 SMP PREEMPT_DYNAMIC Fri, 03 Mar 2023 12:22:08 +0000 x86_64 GNU/Linux
truncate -s 10GiB btrfspool.img
losetup -f btrfspool.img
lxc storage create btrfspool btrfs source=/dev/loop0
lxc init images:archlinux docker-btrfs --storage=btrfspool
lxc config set docker-btrfs security.{nesting=true,syscalls.intercept.mknod=true}
lxc start docker-btrfs
lxc exec docker-btrfs -- su -l
pacman -S vim docker
mkdir -p /etc/docker
echo -e '{\n\t"storage-driver":"btrfs"\n}' > /etc/docker/daemon.json
systemctl enable --now docker.service
docker run -it --rm busybox
mknod /root/null c 1 3
exit
sed -i 's/btrfs/overlay2/' /etc/docker/daemon.json
systemctl restart docker.service
docker run -it --rm busybox
mknod /root/null c 1 3
# mknod: /root/null: Operation not permitted
Probably it’s because before 5.19 overlayfs was fail to mount on top of idmapped mount. And if the container rootfs mount was idmapped then docker used btfs (or vfs) as a fallback storage drivers. And yes, it explains why on ext4 you have vfs driver, but on btrfs you have btrfs driver.
# mknod: /root/null: Operation not permitted
That’s weird. Because mknod interception on overlayfs doesn’t lead to -EACCESS error, it just goes to fallback method and use the bindmount of a device node from the host. And this is bad, but not so bad as EACCESS. Is this command listing was really executed by you and you can confirm that EACCESS is reproducible?
uname -a
Linux 44b0b877ba2f 6.1.15-1-lts #1 SMP PREEMPT_DYNAMIC Fri, 03 Mar 2023 12:22:08 +0000 x86_64 GNU/Linux
mount | grep idmap
/dev/disk/by-uuid/d8183cdb-d608-4130-888e-87b91f7e0d68 on / type btrfs (rw,relatime,idmapped,space_cache=v2,user_subvol_rm_allowed,subvolid=295,subvol=/containers/docker-overlayfs)
mount | grep idmap
mkdir {work,upper,lower,ovl}
mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work ovl
mknod mknod /root/ovl/null c 1 3
stat /root/ovl/null
mount | grep null
inside the same container where you’ve done this experiment with docker, what happens?
I just want to sort out and classify all the problem, so we can analyze this internally and decide importance/priorities.
@amikhalitsyn inside the docker container (docker run -it --rm busybox) or the LXD container nesting docker?
Inside LXD container (works fine):
/dev/disk/by-uuid/d8183cdb-d608-4130-888e-87b91f7e0d68 on / type btrfs (rw,relatime,idmapped,space_cache=v2,user_subvol_rm_allowed,subvolid=295,subvol=/containers/docker-overlayfs)
dev on /dev/null type devtmpfs (rw,nosuid,relatime,size=8169348k,nr_inodes=2042337,mode=755,inode64)
/dev/sda2 on /root/ovl/null type ext4 (rw,relatime)
Yep, and as you can see from result, mknod is working but (!) it creates bindmount in the place of /root/ovl/null, but not the device node (compare with your previous experiments on btrfs).
Has there been any movement on this? It’s been a pretty large issue within my org, preventing software from working correctly. I am using unprivileged lxc container running docker, and the overlay2/overlayfs drivers still are not working. And reformatting all of my servers to use btrfs is not an option.
Docker version 24.0.2, build cb74dfc
lxc version 5.0.2
Linux 5.15.108-1-pve #1 SMP PVE 5.15.108-1 (2023-06-17T09:41Z) x86_64 GNU/Linux