Unprivileged instance can not use docker after update node kernel from 5.4 to 5.15

Hi, I have a 4 node LXD cluster using CEPH as storage backend.
I upgrade lxd1’s kernel version from 5.4 to 5.15(Ubuntu 20.04)

I create an instance on lxd1 using lxc launch images:ubuntu/22.04/cloud base-image-builder1 --target lxd1 .
And then install R and docker in base-image-builder1.
Finally I publish it as base image using lxc publish base-image-builder1 --alias xiyou_base_image.

But when I create instance with base image using lxc launch xiyou_base_image base-image-builder2 --target lxd1, I exec dockerd in base-image-builder2, I got error failed to mount overlay: invalid argument" storage-driver=overlay2.

When I create instance with same base image on lxd2 (kernel version of lxd2 is 5.4), it work~

How can I use docker command in unprivileged instance running on 5.15 kernel node? :face_holding_back_tears:

When I switch Storage Driver from overlay2 to vfs, docker command work.
But can I use docker in unprivileged instance using overlay2 as storage driver? (using 5.15 kernel)

Did you apply the required container config keys to run Docker with overlay2 in the container ? You can follow this video if not : https://www.youtube.com/watch?v=_fCSSEyiGro

yes, security.nesting is true

You need also the following ones for overlay2 :

  • security.syscalls.intercept.mknod to true
  • security.syscalls.intercept.setxattr to true

As explained in the video, these config keys are required the run Docker with overlay2 driver inside a container (except on top of ZFS storage) :wink:

thank you, I add these config item and try again, but I got same error.
I use ceph rbd as lxd cluster storage.
After I exec this command sudo apt install --install-recommends linux-generic-hwe-20.04 to upgrade kernel and reboot, I got this error when I use docker command in contaner.

Weird :face_with_raised_eyebrow: I also use sometimes Docker in containers with overlay2 driver, but I use ZFS in block mode (with ext4 inside a zvol) and a kernel 5.19, so it’s not the same environment.

Maybe do you use shiftfs (sometimes shiftfs bring me some issues with storage) ?

$ lxc info | grep shiftfs
    shiftfs: "false"

my shiftfs output is false.

I sure that kernel upgrade cause the error, because I can reproduce it after upgrade kernel.
But I can not downgrade kernel to old version, because I need upgrade kernel to fix other error. Why owner of most file in container is 1000000 - #16 by stgraber

I get some message from system log. Will it cause docker command error in unprivileged instance?

My instance only has 1 block device, not append another device that mount to /var/lib/docker as the video you share. it work for me when I use 5.4 kernel.

hi, I exec these commands, and then I install docker in container:

lxc storage volume create remote demo
lxc config device add shpc-153-instance-F9uaEfo2 remote disk pool=remote source=demo path=/var/lib/docker

I can use docker with overlay2 Storage Driver in container successfully.
But I want to know that if I can use docker with overlay2 Storage Driver when I only have root disk device or not (no mount another disk device to /var/lib/docker)?

thanks

What storage pool driver is remote pool using?

ceph rbd

@libinkai the problem here is that you’ve security.shifted enabled on your storage pool. overlayfs doesn’t support mounting on top of idmapped mounts on kernels < 5.19.
So, you can:

  • update to 5.19+
    or
  • disable security.shifted on storage pool
1 Like

thank you! I will try it

Hi, it seems that security.shifted on storage pool is disabled because it is default false.

please show cat /proc/1/mountinfo from the host and from the container where the issue is reproducible

in container

root@test4:~# cat /proc/1/mountinfo
1435 387 252:32 /rootfs / rw,relatime,idmapped shared:273 master:272 - ext4 /dev/rbd2 rw,discard,stripe=16
1436 1435 0:78 / /dev rw,relatime shared:275 - tmpfs none rw,size=492k,mode=755,uid=1000000,gid=1000000,inode64
1437 1435 0:79 / /proc rw,nosuid,nodev,noexec,relatime shared:292 - proc proc rw
1438 1435 0:80 / /sys rw,relatime shared:335 - sysfs sysfs rw
1439 1436 0:5 /fuse /dev/fuse rw,nosuid,relatime shared:276 master:2 - devtmpfs udev rw,size=1933228k,nr_inodes=483307,mode=755,inode64
1440 1436 0:5 /net/tun /dev/net/tun rw,nosuid,relatime shared:277 master:2 - devtmpfs udev rw,size=1933228k,nr_inodes=483307,mode=755,inode64
1441 1437 0:38 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime shared:293 master:61 - binfmt_misc binfmt_misc rw
1442 1438 0:34 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:336 master:18 - fusectl fusectl rw
1443 1438 0:30 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:337 master:10 - pstore pstore rw
1444 1438 0:21 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime shared:338 master:19 - configfs configfs rw
1445 1438 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:339 master:16 - debugfs debugfs rw
1446 1438 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:340 master:8 - securityfs securityfs rw
1447 1438 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:341 master:17 - tracefs tracefs rw
1448 1436 0:20 / /dev/mqueue rw,nosuid,nodev,noexec,relatime shared:278 master:15 - mqueue mqueue rw
1449 1436 0:81 / /dev/.lxc/proc rw,relatime shared:279 - proc proc rw
1450 1436 0:80 / /dev/.lxc/sys rw,relatime shared:280 - sysfs sys rw
1451 1436 0:55 / /dev/lxd rw,relatime shared:281 - tmpfs tmpfs rw,size=100k,mode=755,inode64
1452 1436 0:54 /test4 /dev/.lxd-mounts rw,relatime master:347 - tmpfs tmpfs rw,size=100k,mode=711,inode64
1453 1438 0:29 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:342 - cgroup2 none rw
1454 1437 0:53 /proc/cpuinfo /proc/cpuinfo rw,nosuid,nodev,relatime shared:294 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1455 1437 0:53 /proc/diskstats /proc/diskstats rw,nosuid,nodev,relatime shared:295 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1456 1437 0:53 /proc/loadavg /proc/loadavg rw,nosuid,nodev,relatime shared:296 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1457 1437 0:53 /proc/meminfo /proc/meminfo rw,nosuid,nodev,relatime shared:297 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1458 1437 0:53 /proc/slabinfo /proc/slabinfo rw,nosuid,nodev,relatime shared:298 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1459 1437 0:53 /proc/stat /proc/stat rw,nosuid,nodev,relatime shared:299 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1460 1437 0:53 /proc/swaps /proc/swaps rw,nosuid,nodev,relatime shared:308 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1461 1437 0:53 /proc/uptime /proc/uptime rw,nosuid,nodev,relatime shared:323 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1462 1438 0:53 /sys/devices/system/cpu /sys/devices/system/cpu rw,nosuid,nodev,relatime shared:343 master:346 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
1463 1436 0:5 /full /dev/full rw,nosuid,relatime shared:283 master:2 - devtmpfs udev rw,size=1933228k,nr_inodes=483307,mode=755,inode64
1464 1436 0:5 /null /dev/null rw,nosuid,relatime shared:284 master:2 - devtmpfs udev rw,size=1933228k,nr_inodes=483307,mode=755,inode64
1465 1436 0:5 /random /dev/random rw,nosuid,relatime shared:285 master:2 - devtmpfs udev rw,size=1933228k,nr_inodes=483307,mode=755,inode64
1466 1436 0:5 /tty /dev/tty rw,nosuid,relatime shared:286 master:2 - devtmpfs udev rw,size=1933228k,nr_inodes=483307,mode=755,inode64
1467 1436 0:5 /urandom /dev/urandom rw,nosuid,relatime shared:287 master:2 - devtmpfs udev rw,size=1933228k,nr_inodes=483307,mode=755,inode64
1468 1436 0:5 /zero /dev/zero rw,nosuid,relatime shared:288 master:2 - devtmpfs udev rw,size=1933228k,nr_inodes=483307,mode=755,inode64
1469 1436 0:82 / /dev/pts rw,nosuid,noexec,relatime shared:289 - devpts devpts rw,gid=1000005,mode=620,ptmxmode=666,max=1024
1470 1436 0:82 /ptmx /dev/ptmx rw,nosuid,noexec,relatime shared:290 - devpts devpts rw,gid=1000005,mode=620,ptmxmode=666,max=1024
1471 1436 0:82 /0 /dev/console rw,nosuid,noexec,relatime shared:291 - devpts devpts rw,gid=1000005,mode=620,ptmxmode=666,max=1024
1472 1437 0:78 /.lxc-boot-id /proc/sys/kernel/random/boot_id ro,nosuid,nodev,noexec,relatime shared:275 - tmpfs none rw,size=492k,mode=755,uid=1000000,gid=1000000,inode64
415 1436 0:83 / /dev/shm rw,nosuid,nodev shared:282 - tmpfs tmpfs rw,uid=1000000,gid=1000000,inode64
443 1435 0:84 / /run rw,nosuid,nodev shared:344 - tmpfs tmpfs rw,size=398320k,mode=755,uid=1000000,gid=1000000,inode64
475 443 0:85 / /run/lock rw,nosuid,nodev,noexec,relatime shared:355 - tmpfs tmpfs rw,size=5120k,uid=1000000,gid=1000000,inode64
1846 443 0:87 / /run/user/0 rw,nosuid,nodev,relatime shared:982 - tmpfs tmpfs rw,size=398316k,mode=700,uid=1000000,gid=1000000,inode64

in host

24 30 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw
25 30 0:23 / /proc rw,nosuid,nodev,noexec,relatime shared:12 - proc proc rw
26 30 0:5 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev rw,size=1933228k,nr_inodes=483307,mode=755,inode64
27 26 0:24 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
28 30 0:25 / /run rw,nosuid,nodev,noexec,relatime shared:5 - tmpfs tmpfs rw,size=398320k,mode=755,inode64
30 1 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw
31 24 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:8 - securityfs securityfs rw
32 26 0:27 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw,inode64
33 28 0:28 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k,inode64
34 24 0:29 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:9 - cgroup2 cgroup2 rw
35 24 0:30 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:10 - pstore pstore rw
36 24 0:31 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:11 - bpf bpf rw,mode=700
37 25 0:32 / /proc/sys/fs/binfmt_misc rw,relatime shared:13 - autofs systemd-1 rw,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=30928
38 26 0:33 / /dev/hugepages rw,relatime shared:14 - hugetlbfs hugetlbfs rw,pagesize=2M
39 26 0:20 / /dev/mqueue rw,nosuid,nodev,noexec,relatime shared:15 - mqueue mqueue rw
40 24 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:16 - debugfs debugfs rw
41 24 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:17 - tracefs tracefs rw
42 24 0:34 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:18 - fusectl fusectl rw
43 24 0:21 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime shared:19 - configfs configfs rw
65 28 0:35 / /run/credentials/systemd-sysusers.service ro,nosuid,nodev,noexec,relatime shared:20 - ramfs none rw,mode=700
89 30 7:0 / /snap/core20/1879 ro,nodev,relatime shared:30 - squashfs /dev/loop0 ro,errors=continue
95 30 7:2 / /snap/core22/634 ro,nodev,relatime shared:47 - squashfs /dev/loop2 ro,errors=continue
98 30 7:3 / /snap/lxd/24643 ro,nodev,relatime shared:49 - squashfs /dev/loop3 ro,errors=continue
101 30 7:4 / /snap/lxd/24846 ro,nodev,relatime shared:51 - squashfs /dev/loop4 ro,errors=continue
104 30 7:5 / /snap/snapd/18357 ro,nodev,relatime shared:53 - squashfs /dev/loop5 ro,errors=continue
107 30 7:6 / /snap/snapd/19122 ro,nodev,relatime shared:55 - squashfs /dev/loop6 ro,errors=continue
184 37 0:38 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime shared:61 - binfmt_misc binfmt_misc rw
327 28 0:25 /snapd/ns /run/snapd/ns rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=398320k,mode=755,inode64
348 327 0:4 mnt:[4026532647] /run/snapd/ns/lxd.mnt rw - nsfs nsfs rw
747 30 0:47 / /var/snap/lxd/common/ns rw,relatime - tmpfs tmpfs rw,size=1024k,mode=700,inode64
768 747 0:4 mnt:[4026532650] /var/snap/lxd/common/ns/shmounts rw - nsfs nsfs rw
727 747 0:4 mnt:[4026532647] /var/snap/lxd/common/ns/mntns rw - nsfs nsfs rw
353 30 7:7 / /snap/core20/1891 ro,nodev,relatime shared:263 - squashfs /dev/loop7 ro,errors=continue
92 28 0:46 / /run/user/1000 rw,nosuid,nodev,relatime shared:45 - tmpfs tmpfs rw,size=398316k,nr_inodes=99579,mode=700,uid=1000,gid=1000,inode64

1435 387 252:32 /rootfs / rw,relatime,idmapped shared:273 master:272 - ext4 /dev/rbd2 rw,discard,stripe=16

as you can see the rootfs mount in the container is idmapped. This is what I was talking about.

So, you are using Ceph rbd device and then ext4 on top of it. Please check the container disk device options, especially the shift option.

You can try:
lxc config device override ct_name root shift=false

Alternatively, you can disable idmapped mounts at all

  • systemctl edit snap.lxd.daemon.service
  • add
[Service]
Environment=LXD_IDMAPPED_MOUNTS_DISABLE=1
  • systemctl reload snap.lxd.daemon

Until you upgrade to kernel 5.19+

2 Likes

wow, thank you! I will try it! :100: